Advanced Data-Driven Prediction Models for BOF ...

Advanced Data-Driven Prediction Models for BOF Endpoint Detection Stefan Klanke 1, Mike Löpke1, Norbert Uebber 1, Hans-Jürgen Odenthal1 Joris Van Poucke2, Andy Van Yperen-De Deyne2 1

SMS group GmbH, Eduard-Schloemann-Straße 4, D-40237 Düsseldorf, Germany Tel.: +49 211-881-6521 Fax: +49 211-881-4997 E-mail: [email protected] Web page: http://www.sms-group.com 2

ArcelorMittal, John Kennedylaan 51 B-9042 Gent, Belgium Tel.: +32 9 347 4817 Fax: +32 9 347 4992 E-mail: [email protected] Web page: http://www.arcelormittal.com/gent Keywords: BOF, endpoint prediction, data-driven model, machine learning With increasing computing power, data storage capacity, advanced algorithms and innovative sensor technologies it is today possible to approach the BOF process from another point of view: machine learning models and Data-driven Prediction Models (DdPM). These approaches process large amounts of data to predict the BOF process conditions, e. g. temperature, carbon and phosphorus content of the melt at the end-of-blow (EOB). In a cooperative effort between SMS group and ArcelorMittal Gent, a detailed study based on approx. 10,000 BOF heats has been carried out. The target values of the investigation were chosen to be melt temperature T EOB and carbon content [%C]EOB. In an off-line analysis, different strategies for preprocessing and validation were employed in combination with several supervised learning approaches (e. g. Bayesian regression, Support Vector Machine (SVM), deep neural networks (DNN)) as well as different learning schemes (e. g. sliding learning). The data-driven methods can either directly predict the target values TEOB and [%C]EOB or predict deviations from the already existing metallurgical model to improve the prediction accuracy. The metallurgical model is based on known physical and chemical correlations, i.e. mass balance, energy balance, and empirical/statistical equations. It could be assessed that the DdPM model provides higher prediction accuracy as compared to the conventional metallurgical model. One aim of using offline DdPM approaches is to gain a better understanding of influencing factors such as scrap type, lance pattern, bottom stirring rate etc., to detect drifts or shifts in the process and to improve the metallurgical model. In a next step, the DdPM approach is to be incorporated in the online BOF process control in order to improve the accuracy in reaching the desired melt temperature. The paper summarizes fundamental R&D work of the partners, focuses on the applied mathematical models and shows DdPM potentials. INTRODUCTION Today the steel industry is facing major challenges and competitive pressure. The steel market is becoming increasingly dynamic and therefore the need for more flexibility arises. There is a need for continuously improving processes with a focus on consistent enhancement of efficiency, improvement of quality and better competitiveness. In a cooperative effort between SMS group and ArcelorMittal Gent a Data-driven Prediction Model (DdPM) based on machine learning techniques was used to predict the BOF conditions at the end-of-blow [1, 2]. Machine learning algorithms can learn from process data and can predict process steps on the basis of new data without being explicitly programmed for this. The results of this approach were promising.

AISTech 2017 Proceedings. © 2017 by AIST.

1307

TARGET AND OBJECTIVES ArcelorMittal Gent produces flat carbon steel that is used in automotive and industrial applications. The steelshop has an annual production capacity of 5.3 million tons of slabs. For this ArcelorMittal Gent uses two LD TBM converters (Fig 1) with a capacity of 300 tons each, two standard ladle metallurgy stations, two RH degassers, one ladle furnace and two casters. Both converters do not have a sublance but they are equipped with a converter gas recovery. In 2016 about 18,000 heats were produced.

Fig.1. Hot metal charging at the 300 tons converter at ArcelorMittal Gent The main target of the cooperation project was to improve the prediction accuracy for the steel quantities T, [%C], Celox and [%P] in their order of appearance for the end-of-blow by using machine learning and data-driven modeling approaches in combination with classical models which are in operation at ArcelorMittal Gent. L2 MODEL ENVIRONMENT AT ARCLORMITTAL GENT ArcelorMittal Gent operates a L2 model for the BOF converters, which covers the essential metallurgical and thermodynamic process mechanisms and is used both for process design and for end-of-blow prediction. The model has a differentiated parameterization which is optimized by means of recalculations after the end of the process. All relevant process variables are stored in a database system so that a comprehensive basis for adjustments exists. DATA-DRIVEN AND MACHINE LEARNING APPROACHES FOR THE BOF PROCESS While established methods of process modeling are based on metallurgical and thermodynamic rules for idealized systems, data-driven or statistical models are based on partly hidden relationships that are systematically determined by applying certain algorithms to a dataset. This has the advantage that not only process variables can be used, which are part of the metallurgical equations, but also data that are influencing the metallurgical process although not present in the metallurgical equations, e. g. lining age or lance age. They may still be useful to improve the prediction accuracy. Due to the data-driven nature of the methodology, the model can be easily adapted to the current process conditions, e. g. in order to consider the current state of the lining wear or the lance level, by training the model with current data. In contrast, traditional metallurgical models are harder to expand or modify. Usually, the lack of trained staff or time will delay or prevent these necessary adjustments. Innovation processes will be shortened by adopting a semi- or fully-automated model. This flexibility allows widespread use of machine learning and data mining techniques on all the systems involved in the steel production route. COOPERATION PROJECT DDPM The cooperation project between SMS group and ArcelorMittal Gent has been divided into two parts. In a first step, historical data from ArcelorMittal Gent have been used to predict target variables by data-driven methods. SMS group used these historical data for an offline simulation that was aimed at

1308


1) finding suitable mechanisms for preprocessing the data, 2) deriving and selecting features from the data that could be fed into machine learning algorithms as input data, and 3) picking the appropriate machine learning methods for the prediction. In a second step, an online system for learning and application of models is to be implemented and integrated into the model infrastructure at ArcelorMittal Gent. However, at the time of writing this article, this is ongoing work. Thus, results presented in this paper are based on the offline simulation on historical data. IMPLEMENTATION OF OFFLINE SIMULATIONS The offline simulation is based on BOF process data from converter 2 and converter 3, covering the time period 1/2015 – 2/2016. The data can be viewed as a flat table, where each row corresponds to a single heat of either converter 2 or 3. The data for each heat includes basic information such as the converter number, timestamp for the start of blow, measured temperature of the melt (forming the target value), amounts and types of scrap, amount and analysis of raw iron ([%C], [%Si], [%P],…), and so on. In total, each heat is described by approx. 800 process characteristics. These characteristics have been classified into two groups, depending on whether the features are available before and during the blowing process, or after end-of-blow, such as the total consumption of oxygen. Preparation of data The raw data might contain erroneous values due to defect sensors or incorrect data processing. This data has to be identified and replaced automatically by reliable values. While this step can sometimes be handled automatically using simple rules (“discard heats where process value X lies outside two standard deviations around the mean value of X”), better results will be achieved if such filter rules are specified (or at least reviewed) by a process specialist. During a joint discussion, rules were determined how to deal with missing data, e. g. replacing a missing chemical analysis by an alternative analysis that occurred earlier in the process, or by short-term average of previous data. As a final step in data preparation, all data were normalized to mean (= 0) and standard deviation (= 1) for each data column. Selection of features One of the most important steps before training a model is to specify which features shall be taken into account as input data of the model. There are two main aspects to keep in mind here: First, selecting too many features increases the risk of over-fitting, i.e. the model is made overly complex and is allowed to adapt to the smallest particularities in the training data. Such a model will usually not generalize well, means the performance on new data will be worse. One of the most general and popular methods to counteract over-fitting is to use cross-validation. As an alternative, some machine learning algorithms, such as Bayesian linear regression [3] or Gaussian processes [4], have such complexity control built-in. The second aspect for feature selection is to avoid inclusion of data which is not trustworthy, or even worse, not available. Although recent sensor technology has become more complex and costly, some sensors to measure specific metallurgical features are increasingly susceptible and tend to fail. Similar to the data preparation stage, the feature selection can be carried out partly manually or automatically. The manual approach is guided by the process knowledge of a specialist who makes initial decisions about which features seem most likely to have an influence, or which (measured) features are too costly to use. The automatic feature selection does not use any information of the process and is normally based on a predefined maximal set of features. The most common methods are “Backward elimination” (BE) and “Forward selection” (FS). BE starts with all features and successively removes one feature; FS starts with one feature and adds one feature in each step. These processes automatically evaluate the quality of the models and stop when the quality (usually measured by cross-validation) does not improve anymore [5]. If information about the process is not available, automatic feature selection will be an excellent way to evaluate the data. However, a major disadvantage is the long computing time in case of a high number of features. Selection of algorithm A large number of algorithms for machine learning are available in the literature and commercial software packages. The following algorithms have been applied in the project:

Linear regression in different variants, i.e. including polynomial features, ridge regression, Bayesian techniques, etc. Support vector regression [6]. Gaussian processes [4]. Neural networks, i.e. using multiple hidden layers and dropout regularization [7]. Decision trees [8] and random forests [9].

For each algorithm, a special set of parameters is needed and optimized to fit the given data and feature set depending on the algorithm and size of the data. The training time varies widely, especially when feature selection based on BE or FS is run in


1309

an outer loop. After some experiments, it has been decided to focus efforts on Bayesian linear models with Automatic Relevance Determination (ARD) [3]. The reasons for this decision are:

The dataset already contains a vast amount of features such as computation results of (non-linear) sub-models in the existing L2 system. Therefore, the main task of the learning system is more about combining existing features than generating new non-linear dependencies. It is apparent that some mechanism for automatically retraining of the models from time to time will be necessary, because the underlying BOF process is varying in time (e. g. modernizations, exchange of parts), and because the submodels in the L2 system are also retrained regularly (e. g. to follow the underlying change of the BOF process or to incorporate improvements). It is therefore important to apply algorithms that can be trained and validated robustly with as little human intervention as possible. The offline simulation should be as realistic as possible, such that its results can be taken online easily. This implies a realistic scheme for data usage and allowed processing time. Complex prediction models need all the more training data to avoid over-fitting, but this is in contradiction to the changing nature of the BOF process that only produces about 9000 heats (= data rows) per year. During this time, it is possible that the change in the process is too big to be handled by the machine learning algorithm. Early experiments have confirmed that the simpler linear models perform just as well or even better than the nonlinear models that need much more computation power and training time. Bayesian regression with ARD has complexity control and feature selection built-in, which means that with relatively little effort it can pick the set of features that is optimal for the given training base each time a model is retrained. Bayesian regression models yield not only point prediction (melt temperature) for the most likely outcome, but they yield a predictive variance as well. This can be used to judge how certain the model is about its prediction, and can also favorably be used to combine predictions from multiple candidate models.

Training of models For judging the quality of trained models, the already prepared data must be separated into a set of training data and a set of test data. The size of the data set for learning purposes depends on the chosen algorithm and the number of features. At the same time, the size of the training data is closely related to the duration of time span during which that training data was produced. Bigger training sets imply longer time spans which might imply a broader variation of changes and drifts in the BOF process. In principle, the test data can be chosen arbitrarily, although of course it must not intersect the training data. However, for a realistic test, the test data should be newer than the training data. Some experiments have been carried out to investigate the effect of an increasing gap between the learning and the test data. Typically the prediction error becomes bigger when this gap increases, i.e. when learning data are becoming older. For this reason, experiments with sliding and growing learning basis have been carried out. It has been simulated that data for different heats become available sequentially and the models have been retrained every K heats (in the carried out experiments, K = 20). A growing model would then be trained on all heats observed so far, whereas a sliding model would be trained on the most recent N heats (N = 1000, 2500, or 3000). As a base model, Bayesian linear regression has been selected with automatic relevance determination. Then, the model performance has been evaluated by calculating the root-mean-square (RMS) error for the temperature deviations 'TRMS (Equation 1) for 100 successive test heats. Note, that for each of this set of test heats, N/K = 5 different models have been evaluated. ଵ

οୖ୑ୗ ൌ ට σ௡௜ሺܶ௥௘௔௟ǡ௜ െ ܶ௣௥௘ௗǡ௜ ሻଶ ௡

(Eq.1)

As a comparison, the same RMS errors for the recalculated temperature 'Trecalc have been calculated. Trecalc is available in the L2 system and forms the baseline for the offline simulation. The results for converter 2 are shown in Figure 2. The data basis was about 5800 heats in total. The calculated RMS errors are represented as relative values, where 100% corresponds to the RMS temperature error between real and recalculated temperatures over the entire 5800 heats. The predictions start after 1000 heats for the “sliding 1000 heats” and the “growing” model. The other sliding models start later when enough training data has been collected. It is apparent that the models rank differently with time. For example, the “sliding 1000 heats” model yields better predictions than the “growing” model for heats 1800 - 1899, whereas most of the time, the growing model performs best. This can indicate some shift or drift in the underlying BOF process, in the heats between 0 and 900, which the “sliding 1000 heats” model becomes indifferent to when those data become too old for its training set. After about 2500 heats, the DdPM prediction models compare favorably to the established model (Trecalc).

1310


Fig.2. Relative RMS error of four DdPM models and Trecalc, calculated for 100 heats each; 5800 heats of converter 2 Having multiple models available raises the question of which model to pick for predicting the temperature of the current heat. To address this, the predictive variance of the Bayesian linear regression models has been incorporated. One possible strategy is to pick the model that yields the lowest predictive variance. The predictive variance is a measure of how well the training data was fitted and how well the test data matches the training data. Another strategy is to calculate a weighted average of the different predictions, often referred to as “model averaging” in the literature [10]. For the data from converter 2, the weighted average performs slightly better, results are shown in Figure 3. After 500 heats, the combined model outperforms the model behind “Trecalc” in each test set. The data used has been restricted to have less than 50 K absolute difference between Treal and Trecalc, but similar filters for the predictions have not been used.

Fig. 3: Relative RMS error of the weighted average predictions from the four previous models; 5800 heats of converter 2


1311

RESULTS ArcelorMittal Gent has provided large amounts of data from heats for the converter 2 and 3 for the period 01/2015 – 02/2016. These data have been considered as an evaluation basis for data-driven prediction calculation. The TRMS errors of predicted temperature deviations (ǻT = Treal - Tpred) were used as a measure for the quality of the predictions. To compare the used methods, the reference value of the established model applications was set to 100%. Table 1 exemplarily lists the relative TRMS error of converter 2 that can be obtained by the Data-driven Prediction Models based on the historical data. It was differentiated according to model applications with process data available until the start-of-blow (such as amount and analysis of hot metal) and applications using information available at end-of-blow (such as the used amount of oxygen in the process). Table 1: Converter 2; overall statistics for all but the first 3000 heats (which are used for learning), i.e. the 2800 heats where all models yielded predictions

Latest model calculation i.e. with information at ignition (SOB) Recalculation mode i.e. with information after end- of- blow (EOB)

Basis

Rel. 'TRMS error (%)

Rel. 'TRMS error DdPM model “weighted average” (%)

ǻT = Treal – Tpred SOB

100

90.8

ǻT = Treal – Tpred EOB

100

90.5

In both cases, an improvement in the prediction accuracy of approx. 9% could be achieved. Even if these results are an indication, it can be stated that according to the gained experience with a wide data base, data-driven methods for the prediction of converter variables are a highly competitive approach and can support and supplement established models. OUTLOOK It could be shown, that the application of sliding learning and the inclusion of latest heats information is very useful. Therefore, an online model is proposed, for which the necessary data are available as soon as possible, in order to adapt the model timely. This will require validation of each of the models during the online learning. The machine learning algorithms used already indicate confidence intervals, which can be taken to accept or reject current models. Comparison with the results of the last model applications could also increase the confidence in the current output: if it is similar, the current prediction will be accepted, otherwise the model will be rejected. In an extreme case where all candidate models are rejected, the L2 system will be used as the fallback. Efficiency and flexibility are among the strengths of data-driven prediction models. However, one challenge of this approach is the limited transparency. The relationships between targets and influencing variables that the learning algorithms uncover are often not clear and comprehensible, especially for complex non-linear models such as SVM and GP. Hence, wellconceived metallurgical and thermodynamic modeling most certainly is advantageous, as it allows extrapolation, which is important if working points of the past have to be changed, e. g. by using new additions or by aiming at other slag compositions. The advantage of the Data-driven Prediction Model is that – after a learning period – it can improve the existing model and adapt itself fast, without human intervention. For the online application of data-driven modeling at ArcelorMittal Gent, it is therefore intended to seek for a combination of both approaches. The intended concept also applies the data-driven prediction to the error of the established prediction modelling. Simulations in advance indicate potential for further improvements. This approach is illustrated in Fig. 4.

Fig. 4: Concept of the integration of the Data-driven DdPM software into the ArcelorMittal Gent L2 system

1312


Currently, the required DdPM software is implemented at ArcelorMittal Gent and will be subsequently tested. With the extensive offline simulations in advance an increase in the already very good prediction performance is to be expected. Furthermore, the prediction of carbon and phosphorus in the steel after end-of-blow will be investigated in future steps with similar approaches. NOMENCLATURE Treal Trecalc Tpred SOB Tpred EOB K N n

Measured steel temperature at end-of-blow Steel temperature at end-of-blow predicted by ArcelorMittal Gent knowledge based model Steel temperature at end-of -blow predicted by DdPM model based on process information available at start-of-blow Steel temperature at end-of-blow predicted by DdPM model based on process information available at end-of-blow Step size for model retraining Number of heats used for learning Running index

REFERENCES 1.

N. Uebber; H. J. Odenthal; J. Schlüter; H. Blom; K. Morik: A novel data-driven prediction model for BOF endpoint, JSI Paris 2012, 30th Intern. Steel Industry Conf., 18.-19.12.2012, Paris (F), pp 28-29. 2. H. J. Odenthal, N. Uebber, J. Schlüter, M. Löpke, K. Morik, H. Blom: Data-driven Prediction Model for the BOF process – Application of Data Mining at AG der Dillinger Hüttenwerke, Germany, MPT International, Volume 5, 2014, pp 30-36. 3. M. E. Tipping: Sparse Bayesian learning and the relevance vector machine, Journal of Machine Learning Research, 2001, pp 211-244. 4. C. E. Rasmussen; C. Williams: Gaussian Processes for Machine Learning, MIT Press, 2006. 5. I. Guyon; A. Elisseeff: An Introduction to Variable and Feature Selection, Journal of Machine Learning Research, 2003, pp 1157-1182. 6. H. Drucker; C.J.C. Burges; L. Kaufman; A. Smola; V. Vapnik: Support vector regression machines, Advances in Neural Information Processing Systems 9, MIT Press,1997, Cambridge, MA. 7. Y. LeCun; Y. Bengio; G. Hinton: Deep learning, Nature 521, 2015, pp 436–444. 8. L. Breiman; J. Friedman; C. J. Stone; R.A. Olshen: Classification and Regression Trees; Wadsworth Statistics/Probability, 1984. 9. L. Breiman: Random Forests, Machine Learning, Volume 45 Issue 1, 2001. 10. T. Hastie, R. Tibshirani, J. Friedman: The Elements of Statistical Learning, 2nd ed., Springer, 2009.


1313

1314