## Index## Figures## Tables |
## Sathishkumar V E† , Myeongbae Lee† , Jonghyun Lim† , Yubin Kim†† , Changsun Shin††† , Jangwoo Park††† and Yongyun Cho†††## |

Load Type | June-August | March-May, September- October | November- February |
---|---|---|---|

Light Load | 23:00-09:00 | 23:00-09:00 | 23:00-09:00 |

Medium Load | 09:00-10:00 | 09:00-10:00 | 09:00-10:00 |

12:00-01:00 | 12:00-01:00 | 12:00-07:00 | |

17:00-23:00 | 17:00-23:00 | 20:00-22:00 | |

Maximum Load | 10:00-12:00 | 10:00-12:00 | 10:00-12:00 |

01:00-17:00 | 01:00-17:00 | 17:00-20:00 | |

22:00-23:00 |

Since the steel industry in open space and has no heaters or cooling facilities, the temperature variables have no impact on energy consumption. The overview of the full dataset is shown in Table 2.

Table 2.

Data Variables | Type | Measurement |
---|---|---|

Industry Energy Consumption | Continuous | KWh |

Hour of the Day | Continuous | Hour |

Lagging Current Reactive Power | Continuous | KVarh |

Leading Current Reactive Power | Continuous | KVarh |

Lagging Current power Factor | Continuous | % |

Leading Current Power Factor | Continuous | % |

Continuous | Ppm | |

Week Status | Categorical | (Weekend (0) or a Weekday (1)) |

Day of Week (Monday Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday) | Categorical | Sunday, Monday .... Saturday |

Load Type | Categorical | Light Load, Medium Load, Maximum Load |

Certain added features are created from the date/ time factor, which consists of dNumber of seconds each day from midnight(NSM), weekend or week day status, and day of the week. Fig. 1 displays the energy consumption profile over the interval, and displays elevated variability.

For five consecutive weeks, an hourly heat map is produced to identify any time trends, and shown in Fig. 2. This shows that the steel industry's energy consumption trend has a powerful time component. During the weekend, energy usage is lower than at the other days. Energy usage continues to grow from 8 a.m. Then holds it up until 8 p.m.

The performance of the regression model is evaluated using different assessment parameters. Root Mean Squared Error (RMSE), Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE) are the performance measurement indices used here. Root mean squared error (RMSE) is the standard deviation of the sample between the observed value and the predicted one. Using these metrics, large errors can be identified and variability of the model response can be assessed with respect to variance. RMSE is a scale-dependent calculation which results in the same unit measurement values. RMSE is determined using Equation (1).

Mean absolute error (MAE) is used to evaluate the acuteness of the prediction. MAE is a scale-dependent metric which effectively represents prediction error by minimizing the offset between positive and negative errors. We can calculate MAE using the equation below.

The mean absolute percentage error (MAPE) is the mean or average of forecast errors in the absolute percentage. Error is defined as the actual or observed value without the forecast value. Percentage errors are summed up irrespective of signing for MAPE estimation. Since it gives an error in terms of percentages, this measure is fairly easy to grasp Furthermore, since absolute percentage errors are used, the issue of equally cancelling positive and negative errors is prevented. MAPE, therefore, has a managerial appeal and is a measure that is generally used in forecasting. If MAPE is smaller, it indicates a better forecast.

Here, [TeX:] $$Y_{i}$$ is the actual measurement value, [TeX:] $$\widehat{Y}_{i}$$ is the value predicted, [TeX:] $$\bar{y}$$ is the sample average, and [TeX:] $$n$$ is the sample size.

The entire one-year data set is divided into training and test validation. In model training, 75% of the data is utilized and 25% in testing purposes. The figures are shown in Table 3.

Table 3.

Dataset | Number of Observations |
---|---|

Training | 6572 and 10 Variables |

Testing | 2188 and 10 Variables |

It is essential to find optimal tuning parameters for each of the regression algorithms for finding and reducing error values while designing a model. LR has no tuning parameters and the grid search is not performed for LR. The outcomes of the grid search for RF, SVM and GBM are presented in Fig. 3, Fig. 4, Fig. 5 respectively.

The grid quest suggests setting parameters by putting all configurable grids within the parameter space [22]. Each axis of the grid is an algorithm parameter, and at each point in the grid is a particular combination of parameters. The role needs to be optimized at every level. In this paper one of the most common validation methods, such as k-fold CV, is used during the hyperparameter tuning process to remove bias in data collection. K-fold CV is a common sort of cv which is generally used in data mining. Even though there is no definite / strict rule for determining the value of K, in the field of data mining a value of K = 5 (or 10) is very common.

As Rodriguez stated [23], when the number of folds is either five or ten, the bias of an accurate calculation would be smaller. In this aspect, as indicated by Kohavi [24] and Wong [25], the number of folds K was set at ten, and correlated with the trade-off between the measurement time and the bias. Ten rounds of training and validation were therefore performed using different partitions, and then the results are summed to reflect the output of LR, SVM, GBM, and RF on the training set. In this study all data processing was done using R software [26].

LM has no tuning parameters. SVM model has two hyperparameters which is to be fine-tuned. As indicated in Fig. 3, the optimal sigma and cost values for SVM and RBF are 0.1 and 25 respectively. GBM is a tree-based model with two hyperparameters, which are the number of trees and the maximum depth of the tree. The optimal value for the number of trees for GBM is 5300 and the maximum depth of the tree is 6 as shown in Fig. 4. The RF based on an ensemble-based model, has two parameters, which are namely mtry and the number of trees. In Fig. 5, the RMSE value stays constant for RF after 400 and the randomly chosen predictors or mtry value is 10.

Table 4 shows the performance results of each of the models, in which the models producing RMSE, MAE and MAPE are revealed as best. Because it determines the error values processed by the developed models, it is evident that RF and GBM model has less considered to other models, which are RMSE, MAE and MAPE, in the testing set of Table 4. In the result, LM has the worst performance. Out of all 4 models, the developed RF has fewer error values and considered as the best model in this research. GBM performance is almost close to RF in the test set. But, the performance of GBM is better in case of the test set.

This paper explores the potential for predicting energy consumption by data mining approaches. This study leads to the conclusion that RF is best in predicting the energy and GBM performance also equal to RF. So, RF and GBM are more suitable for predicting steel industry energy consumption prediction. A accurate long-term forecast of energy usage is one among the most critical problems for energy management and optimization in the steel industry. In the exploratory analysis the data analysis reveals thought-provoking results. This work aims to establish the best performing prediction algorithm to predict the hourly consumption of energy in the steel industry. The findings indicate that the RF model improves RMSE, MAE, and MAPE of predictions in consideration to other regression models considered in this research.

https://orcid.org/0000-0002-8271-2022

e-mail : sathishkumar@scnu.ac.kr

He is currently pursuing PhD in the Department of Information and Communication Engineering, Sunchon National University. He received his Bachelor of Technology in Information Technology from Madras Institute of Technology and Master of Engineering in Biometrics and Cyber Security from PSG College of Technology. His current research interests include Big Data Analytics, Data Mining, Cryptography and Vertical Farming.

https://orcid.org/0000-0002-7160-2637

e-mail : lmb@scnu.ac.kr

He completed Bachelor degree in Computer Engineering from Korea. He received Master degree on Computer Science in South Korea. And currently pursuing Doctorate degree in the Information and Communication Engineering. His area of interest includes Advanced Agriculture Technology, IT Convergence, Cloud and Ubiquitous Computing.

https://orcid.org/0000-0001-6832-4077

e-mail : sshb56@s.scnu.ac.kr

He completed Bachelor degree in Information and Communication Engineering from Korea. And cur- rently pursuing Master degree in the Information and Communication Engineering. His area of interest includes Advanced Agriculture Technology, System Software and Ubiquitous Computing.

https://orcid.org/0000-0002-7744-5599

e-mail : ceobin@elsys.kr

He received his Bachelor and MS de- gree, and currently is pursuing PhD in the Department of Computer Science, Sunchon National University. Currently, he is a managing director of ELSYS Co, Ltd. His current research interests include Big Solar Energy System, Geo- thermal Energy System, IoT and Agriculture/ICT Conver- gence.

https://orcid.org/0000-0002-5494-4395

e-mail : csshin@scnu.ac.kr

He received the PhD degree in Com- puter Engineering at Wonkwang Uni- versity. Currently, he is a Professor in the Dept. of Information & Communi- cation Engineering, Sunchon National University. His re- search interests include Distributed Computing, Machine Learning, IoT and Agriculture/ICT Convergence.

https://orcid.org/0000-0001-8201-8949

e-mail : jwpark@scnu.ac.kr

He received the BS, MS and PhD de- grees in Electronic Engineering from Hanyang University, Seoul, Korea in 1987, 1989 and 1993, respectively. In 1995, he joined as the faculty member at Sunchon Natio- nal University, where he is currently a Professor in the Department of Information & Communication Engineering. His research focuses on Localization and SoC and system designs and RFID/USN technologies.

https://orcid.org/0000-0002-4855-4163

e-mail : yycho@scnu.ac.kr

He received the PhD degree in Com- puter Engineering from Soongsil Uni- versity. Currently, he is an assistant professor in the Department of Infor- mation and Communication Engineering, Sunchon National University. His research interests include System Software, Embedded Software and Ubiquitous Computing.

- 1 Ç. Oluklulu,
*A Research on the Photovoltaic Modules That Are Being Used Actively in Utilizing Solar Energy, Sizing of the Modules and Architectural Using Means of the Modules*, Master’s Thesis, Gazi University Ankara Turkey, 2001.custom:[[[-]]] - 2 V. I. Ugursal, "Energy Consumption, Associated Questions and some Answers,"
*Appl. Energy*, vol. 130, pp. 783-792, 2014.custom:[[[-]]] - 3
*Rinkesh, What is the Energy Crisis. Available (Internet),*, https://www.conserve-energy-future.com/causesand-solutions-to-the-global-energy-crisis.php#abh_posts(accessedon25Jan.2019) - 4 D. Streimikiene, "Residential Energy Consumption Trends, Main Drivers and Policies in Lithuania,"
*Renew. Sustain. Energy Rev*, vol. 35, pp. 285-293, 2014.custom:[[[-]]] - 5 J. Zuo, Z. Y. Zhao, "Green Building Research-Current Status and Future Agenda: A review. Renew. Sustain,"
*Energy Rev.*, vol. 30, pp. 271-281, 2014.custom:[[[-]]] - 6 Seung-Moon, Lee, "Mid-term Korea Energy Demand Outlook,"
*Korea Energy Economics Institute*, May, 2014.custom:[[[-]]] - 7 L. Ekonomou, "Greek long-term energy consumption prediction using artificial neural networks,"
*Energy*, vol. 35, no. 2, pp. 512-517, 2010.custom:[[[-]]] - 8 G. Munz, S. Li, G. Carle, "Traffic Anomaly Detection Using k-means Clustering," in
*In Proceedings of the GI/ITG Workshop MMBnet*, Hamburg, Germany, , Sep, 2007;pp. 13-14. custom:[[[-]]] - 9 K. Kandananond, "Forecasting Electricity Demand in Thailand with an Artificial Neural Network Approach,"
*Energies*, vol. 4, no. 12, pp. 1246-1257, 2011.custom:[[[-]]] - 10 C. De Cauwer, J. Van Mierlo, T. Coosemans, "Energy Consumption Prediction for Electric Vehicles based on Real-world Data,"
*Energies*, vol. 8, no. 8, pp. 8573-8593, 2015.custom:[[[-]]] - 11 B. Dong, C. Cao, S. E. Lee, "Applying Support Vector Machines to Predict Building Energy Consumption in Tropical Region,"
*Energy Build.*, vol. 37, no. 5, pp. 545-553, 2005.custom:[[[-]]] - 12 P. A. Gonzalez, J. M. Zamarreno, "Prediction of Hourly Energy Consumption in Buildings Based on a Feedback Artificial Neural Network,"
*Energy Build.*, vol. 37, no. 6, pp. 595-601, 2005.custom:[[[-]]] - 13 B. B. Ekici, U. T. Aksoy, "Prediction of Building Energy Consumption by Using Artificial Neural Networks,"
*Adv. Eng. Softw.*, vol. 40, no. 5, pp. 356-362, 2009.doi:[[[10.1016/j.advengsoft.2008.05.003]]] - 14 Q. Li, P. Ren, Q. Meng, "Prediction Model of Annual Energy Consumption of Residential Buildings," in
*In Proceedings of the 2010 International Conference on Advances in Energy Engineering*, Beijing, China, 2010;pp. 223-226. custom:[[[-]]] - 15 L. Xuemei, D. Yuyan, D. Lixing, J. Liangzhong, "Building Cooling Load Forecasting Using Fuzzy Support Vector Machine and Fuzzy C-mean Clustering," in
*In Proceed-ings of the 2010 International Conference on Computer and Communication Technologies in Agriculture Engineer-ing*, Chengdu, 2010;pp. 438-441. custom:[[[-]]] - 16 Y. Ma, J. Q. Yu, C. Y. Yang, L. Wang, "Study on Power Energy Consumption Model for Large-scale Public Building," in
*In Proceedings of the 2010 2nd International Workshop on. IEEE Intelligent Systems and Applications*, Wuhan, 2010;pp. 1-4. custom:[[[-]]] - 17 J. Zhao, Z. Han, W. Pedrycz, W. Wang, "Granular model of long-term prediction for energy system in steel industry,"
*IEEE Transactions on Cybernetics*, vol. 46, no. 2, pp. 388-400, 2015.doi:[[[10.1109/TCYB.2015.2445918]]] - 18 Y. Zhang, X. Zhang, L. Tang, "Energy consumption prediction in ironmaking process using hybrid algorithm of SVM and PSO,"
*In International Symposium on Neural Networks*, pp. 594-600, 2012.custom:[[[-]]] - 19 B. Zeng, M. Zhou, J. Zhang, "Forecasting the energy consumption of China’s manufacturing using a homologous grey prediction model,"
*Sustainability*, vol. 9, no. 11, pp. 1-16, 2017.custom:[[[-]]] - 20 P. Sen, M. Roy, P. Pal., "Application of ARIMA for forecasting energy consumption and GHG emission: A case study of an Indian pig iron manufacturing organization,"
*Energy*, vol. 116, no. 1, pp. 1031-1038, 2016.custom:[[[-]]] - 21 J. Reimann, "Methodology and model for predicting energy consumption in manufacturing at multiple scales,"
*Procedia Manufacturing*, vol. 21, pp. 694-701, 2018.custom:[[[-]]] - 22 J. Zhou, E. Li, H. Wei, C. Li, Q. Qiao, D. J. Armaghani, "Random forests and cubist algorithms for predicting shear strengths of rockfill materials,"
*Applied Sciences*, vol. 9, no. 8, pp. 1-16, 2019.custom:[[[-]]] - 23 J. D. Rodriguez, A. Perez, J. A. Lozano, "Sensitivity analysis of k-fold cross validation in prediction error estimation,"
*IEEE Transactions on Pattern Analysis and Machine Intelligence*, vol. 32, no. 3, pp. 569-575, 2009.doi:[[[10.1109/TPAMI.2009.187]]] - 24 R. Kohavi, "A study of cross-validation and bootstrap for accuracy estimation and model selection,"
*In Ijcai*, vol. 14, no. 2, pp. 1137-1145, 1995.custom:[[[-]]] - 25 T. T. Wong, "Performance evaluation of classification algorithms by k-fold and leave-one-out cross validation,"
*Pattern Recognition*, vol. 48, no. 9, pp. 2839-2846, 2015.doi:[[[10.1016/j.patcog.2015.03.009]]] - 26 R. C. Team, "R: A language and environment for statistical computing,"
*201*, 2013.custom:[[[-]]]

S. V. E, M. Lee, J. Lim, Y. Kim, C. Shin, J. Park and Y. Cho, "An Energy Consumption Prediction Model for Smart Factory Using Data Mining Algorithms," KIPS Transactions on Software and Data Engineering, vol. 9, no. 5, pp. 153-160, 2020. DOI: https://doi.org/10.3745/KTSDE.2020.9.5.153.

Sathishkumar V E, Myeongbae Lee, Jonghyun Lim, Yubin Kim, Changsun Shin, Jangwoo Park, and Yongyun Cho. 2020. An Energy Consumption Prediction Model for Smart Factory Using Data Mining Algorithms. *KIPS Transactions on Software and Data Engineering*, 9, 5, (2020), 153-160. DOI: https://doi.org/10.3745/KTSDE.2020.9.5.153.