Temperature distribution prediction in control cooling process with recurrent neural network for variable-velocity hot rolling strips

Control cooling is an essential method for microstructure and mechanical property control in hot rolling strip making. Therefore, it is vital to realize high-precision temperature distribution prediction and control in cooling process to ensure the industrial production. In this paper, a traditional mechanism model based on finite-difference method combined with online cycle velocity calculation strategy was introduced as one of the baseline methods estimating temperature distribution. However, considering calculation time, variable-velocity rolling makes it difficult to rapidly realize temperature and modifying water distribution of all segments in cooling zone. Herein, a temperature distribution prediction method based on recurrent neural network was proposed, by fully considering the variable-velocity rolling dynamic characteristics. And the temperature distribution prediction performance of the model with different recurrent cell and time steps was evaluated. The results indicated that the proposed model could realize temperature distribution prediction, and the model based on bi-LSTM and 48 timesteps has the highest determination coefficient value of 0.976, the lowest root mean square error of 8.03, and a mean absolute error of 5.7. Furthermore, compared with baseline model, the proposed model retained lower computational cost, making it applicable in industrial application by providing real-time temperature distribution prediction.


Introduction
As an important fundamental raw material, hot rolling strips are widely used in industries, automobile, shipbuilding, and other fields. In hot rolling strip making, the control cooling process on run-out table is vital for steel product pursuing excellent and stable performance [1]. After finishing rolling, the strip will be cooled by water from a start cooling temperature to target temperature with a given cooling rate before being coiled, which promotes obtaining more fine grain size and improving strength and toughness [2,3]. In control cooling process, high-precision temperature distribution prediction and cooling path control are key factors for ensuring the mechanical property [4].
In cooling process, as the strip delivers from the finishing mill to the down coiler, water distribution must be adjusted according to variable rolling velocity to meet the target coil temperature (CT). Water cooling of strip contains complicated heat exchange process, affected by many factors, such as strip temperature, water flow, strip thickness, chemical composition, and strip velocity etc. [5,6]. Once the strip enters the cooling zone, strip thickness, width, and composition in strip length direction have little difference; the start cooling temperature and velocity are the main factors influencing temperature control. Currently, to increase the productivity and decrease heat loss on delay of transfer bar, the finishing rolling process always adopts the variablevelocity rolling strategy, which leads the strip velocity varying in a large scale [7]. Meanwhile, the adjustment of rolling velocity is also used to control finishing mill delivery temperature (FDT, herein also is start cooling temperature), by which the rolling acceleration/deceleration and velocity are dynamically changed to meet the target FDT and also influence coil temperature control accuracy, significantly [8]. Under variable-velocity rolling, water distribution is expressed as open/closed cooling headers, and header flow is the only adjustment method to guarantee target coil temperature. Therefore, precise water distribution calculation based on accuracy velocity perdition is crucial for cooling path and temperature distribution control in cooling process [9].
To build a high-accuracy temperature distribution prediction and control model, many scholars have been paying attention to related research. So far, three main kinds of models have been developed and applied. Index model is an important scheme to deal with convective heat transfer, considering convective between cooling water and strip surface as the main heat exchange in the cooling process [10,11]. However, due to simplifications and ignoring temperature gradient in strip thickness direction, the application effect of index model is not good, especially for heavy thickness plates. Another kind of model is statistical model, which was built relying on statistical regularity [12,13]. In statistical modeling, by fully considering the physical mechanism of the cooling process, the main influence parameters of cooling process were considered and determined by statistical regression methods [14][15][16][17]. Furthermore, to improve the control precision, the influence parameters were classified to categories based on steel grades and thicknesses or other boundary conditions. Then, each category has a series of model parameters. However, the temperature control accuracy of this method was limited by the number of categories. Furthermore, some advanced methods were used to improve the control accuracy, such as fuzzy control and model prediction control [12,18,19]. The advanced methods in control cooling greatly rely on the statistical model or index model. The third model is built by finite difference method (FDM) and heat transfer theory, which divides strip and time into nodes and finite number of steps to solve differential equations by approximate derivatives. Thence, the spatial and sequential temperature distribution of strip could be obtained [20,21]. During modeling, the heat transfer form is radiation and convectional heat transfer for surface node, while the heat transfer form is heat conduction for inner nodes. In control cooling process, the temperature gradient in thick direction is fully considered in FDM. Consequently, control accuracy was obviously improved, especially for the heavy thickness strip. At present, the most used method is combining FDM and statistical method; the former is used for temperature prediction model building, and the latter is used for seeking optimal model parameters.
As above-mentioned, strip velocity significantly influences cooling process. To realize high-precision prediction of velocity in cooling process, time-velocity-distance (TVD) method and real-time modification are the commonly used methods. Li proposed a velocity-controlled strategy for each strip element based on TVD [22]. An online cyclic modification and calculation strategy is designed to minimize the temperature fluctuation in variable-velocity rolling. However, consumption time of modification and calculation for each segment exceeds the requirement of the online system. Based on this, Li proposed an improved method that could realize online monitor and control of cooling temperature by modifying at setpoint and decreasing the computation time [20]. However, due to the irregular changes of hot rolling, the more times of temperature modifying are beneficial for the temperature control. Thence, a new temperature prediction method which could quickly realize prediction and repeatedly match the more times of velocity modifying is needed.
Although the introduced temperature prediction methods and models are still accepted and widely used due to practicality and stability, it is difficult to further improve control level and satisfy the requirement mentioned above. Therefore, some model based on machine learning was developed and shows prospect of high control accuracy.
As one of the well-known instance-based learning methods, KNN has been widely applied in many fields and also generally used in control cooling process. Zheng adopted a new approach to build a model for accelerated cooling process by IIR-KNN [23]. Based on the k selected similar case, locally linear reconstruction is applied to determine the best output parameters for current plate. Zhang proposed a variable scale grid model for temperature self-adaptive control, which builds a multi-dimension space system, and each dimension is determined by effect of each factor on heat transfer [24]. Every plate can find a corresponding point in space, according to its own process conditions to correct temperature calculation. The neural network is another one of most popular algorithms used in steel industry, which has stronger nonlinearity and capability of adaptive information processing. Some researchers have successfully adopted ANNs to predict heat transfer coefficient, temperature, or water flow [25][26][27]. Valentina introduced an ANN to find correlations between model parameters and process variables [28]. Xing developed a hybrid intelligent identification model by combining the RBF neural networks, CBR, and fuzzy logic reasoning, which can make a great contribution in improving the coil temperature precision by prediction precision of correct identification [29]. However, in the abovementioned, variable strip velocity and temperature distribution in cooling process were ignored. They only take constant velocity and final cooling temperature into consideration. Obviously, in order to keep mechanical performance fluctuation within a narrow range, those are not suitable for the variable-velocity rolling and cannot realize temperature distribution prediction.
In view of the above problems, by fully considering the ability of dealing with sequence and the variable-velocity rolling dynamic characteristic, a novel temperature prediction model based on recurrent neural network was proposed and constructed, which could accurately predict temperature distribution under variable-velocity rolling. To improve the precision and robustness, the Pauta criterion and isolate forest algorithm were used to preprocess the industrial data collected from control cooling process. In view of particular of water distribution feature, the feature process method was introduced. The loss function was modified to improve training efficiency and accuracy of the model. In addition, the recurrent cell and length of input sequence were tested to determined optimal recurrent neural network model.

Description of cooling process
The schematic diagram Fig. 1 shows control cooling process of hot rolling strips on run-out table. During the cooling process, the strip will be sent into run-out table after being delivered from the finishing mill, and then it will be cooled in the following long cooling zone and finally coiled by down coiler. The cooling zone is made up by ultrafast cooling (UFC) and laminar cooling. The UFC system lies between the finishing mill and laminar cooling, contains 40 top headers and 40 bottom headers, and divided into 4 groups. The UFC header flow and pressure can be adjusted stepless for different cooling ability with range of headers, heat dissipation along the length and width directions could be ignored. Also, heat latent of phase transaction could be contained in correction of specific heat. The heat transfer equation is simply expressed as Eq. (1): The boundary conditions on top and bottom surface can be described in form like Newton's convection Eq. (2) and equivalent to water cooling and air cooling: where ρ is density of strip steels in kg/m 3 , c p is specific heat in J/(kg K), λ is coefficient of thermal conductivity in W/ (m K), T is temperature of strip steels in K, τ is heat transfer time in s, x is thickness direction, T m is cooling medium temperature in K, and α is heat transfer coefficient in W/(m 2 K).
Finite difference method was used to model internal heat transfer through conduction. And finite difference calculations maintained the temperature of each node as the piece moves from FDT pyrometers to CT pyrometers. As shown in Fig. 2, the half thickness of strip was researched. With boundary condition, the node in thickness Eq. (1) can be expressed in explicit difference form as Eq. (3): where For air cooling zone boundary conditions, the equivalent heat transfer coefficient is given by Eq. (5): where is emissivity of strip, and K is Stephen-Boltzmann constant in W/(m 2 K 4 ).
For water cooling zone boundary conditions, the main factors which affect water convection heat transfer are water flow and temperature, strip velocity, and thickness. In addition, water pressure in UFC process is also a key factor for a 50 ~ 300m 3 /h and 0.3 ~ 1.0 MPa, respectively. Laminar cooling is consisted of 72 U-type top headers and 72 straightbottom headers, divided into 16 groups. The foremost 14 groups are the main cooling section and the last two groups are vernier cooling for feedback control. In cooling process, the number of cooling headers and the water flow of each header are taken as control variables to adjust the temperature distribution of the strip.

Thermal dynamic model
Considering that heat transfer in thickness direction is much higher than width direction and rolling direction for high rolling velocity and uniform cooling capability of cooling water convection. So, the heat transfer coefficient of water cooling in the UFC system and laminar cooling can be calculated as follows, Eqs. (6) and (7), respectively: where C w is correction coefficient of strip width, A 1 -A 6 are model coefficients, Q is summation water flow in m 3 /h, T w is measured water temperature in K, P is measured cooling water pressure in MPa, V is measured velocity of strip in m/s, T o is reference cooling water temperature in K, P o is reference water pressure in MPa, V o is based velocity in m/s, B is width of cooling bank in m, L is effective cooling length in m, and H is strip thickness in m.
Based on the above equations, the through thickness temperature field can be obtained, once the parameters in heat transfer coefficient equations were determined. As can be seen in Eqs. (6)- (7), water flow, strip temperature, water temperature, and pressure were easier to obtain; the difficulty was to get accurate strip velocity in variable-velocity rolling.

The online cycle velocity calculation strategy
In rolling process, strip velocity combining with inter-stand cooling is regulated to keep the measured strip temperature meet target FDT. Generally, strip goes through low velocity threading, first acceleration, high acceleration, and tail out deceleration for FDT control as shown in Fig. 3, the typical time, velocity, and distance curve (TVD curve). TVD curve represents the relationship of rolling velocity, time, and rolling distance, by which the entrance and exit velocity of segment in different micro cooling could be obtained. Meanwhile, rolling acceleration or deceleration will be changed, when the measured FDT exceeds allowable tolerance. Due to variable-velocity rolling, water distribution should be dynamically adjusted to guarantee final coil temperature. Thus, accuracy velocity prediction for strip is basis for temperature prediction and water distribution control. To realize high-accuracy velocity prediction, an online cycle velocity calculation algorithm was introduced. To achieve high-accuracy temperature control in whole strip length, the strip is divided into numbers of fixed-length segments, and each segment was separately controlled. The velocity curve and water-cooling time of each segment passing through run-out table needs to be predicted before the segment enters the cooling section. Generally, in different rolling stage, strip velocity change has certain characteristics. Then, the entire length of strip was divided into five parts based on the characteristics of TVD curve: low velocity threading, low acceleration before the strip coiled, high acceleration to max running speed, tail out deceleration, and the remaining length uncoiled, named S 1 ~ S 5 part for convenience. The online cycle velocity calculation algorithm is shown in Fig. 4. To begin with, each part length l i and velocity v i of S i (i values from 1 to 5) was calculated with finishing rolling preset threading velocity and acceleration/deceleration according to kinematics formula (Eq. (8)). In this way, the overall TVD curve of strip was determined. In addition, running velocity of the segment in cooling section was calculated based on the overall TVD curve. Herein, running velocity of each segment was discerned as entrance and exit velocity of V i,j and V ′ i,j at jth micro cooling section. Furthermore, prediction velocity was utilized to calculate the temperature distribution on run-out table, and water distribution would be adjusted to satisfy the target temperature. Once a new segment arrives at FDT, velocity calculation algorithm will be trigged. Based on real-time rolling velocity and acceleration/deceleration, the above three steps will be repeated. After re-calculation, the velocity of all segments in cooling zone would be modified: The following figures in Fig. 5 are calculation velocity and actual velocity curve of 4.25 mm SPA-H and 13.75 mm Q355B, respectively. The results show that the abovementioned velocity calculation algorithm can accurately predict the strip velocity.

Problem existing
With the above thermal dynamic model and online cycle velocity calculation strategy, it would be possible to achieve desired cooling process temperature control by re-calculation of temperature distribution and modifying water distribution, successively. The more times TVD curves are updating, the higher the accuracy of velocity and temperature prediction. However, it would occupy excessive computation resources of temperature re-calculation and modifying, which could not meet requirement of the online model. Despite that the improved method was researched, it still tries to reduce the times of velocity and temperature update which is not beneficial to temperature control in cooling process. Therefore, it is necessary to build a less computational model which also could achieve variable-velocity rolling temperature prediction to enhance the capability of dynamic control. Fig. 3 The typical time-velocity-distance curve

Fundamental of recurrent neural network
Recurrent neural network (RNN) is a type of recursive neural network, which takes sequence data as input and performs recursion in evolution direction of sequence, and RNN was developed to solve the issue of the information forgotten in traditional feed-forward and feedback neural networks [30]. As shown in Fig. 6, given an input sequence X = x 1 , x 2 , ...., x t , a standard RNN computes the hidden vector sequence S = s 1 , s 2 , ...., s t and output vector sequence Y = y 1 , y 2 , ...., y t by iterating the following equations from t = 1 to t: where V denotes weight matrices of hidden-output layer, U is input-hidden layer weight matrices. W expresses hidden to (9) y t = g V * S t (10) s t = f U * s t + W * s t−1 hidden step weight matrices. g and f are function activation function, and f is usually an element-wise application of a sigmoid function. It can be seen from Fig. 6 that the values of hidden layer depend not only on current input x t , but also on previous hidden layers. In other words, RNN has memory of the previous input content. Due to the "memory," the RNN and its evolutions have performed well in sequence tasks and were widely used in text recognition, speech recognition, real-time translation, and image identification [31][32][33].
Subsequently, the long short-term memory (LSTM) unit [34,35] and its simple version gated recurrent unit (GRU) [36,37] are proposed to overcome vanishing gradient and capture long-term dependency. As can be seen in Fig. 7, compared with standard RNN, LSTM and GRU have more complex structure. In standard RNN, repeating module only has a very simple structure, tanh layer. While in LSTM, the repeating module has a different structure, except for tanh layer; there are also three different gates: forget gate f t , input gate i t and output gate o t . To achieve the long-term dependency, at each time step, the cell state c t is slowly updated to new unit, which is determined by f t and i t . The f t controls the memory with sigmoid activation function based on input vector x t , and prior hidden state h t−1 , while i t controls remembering current information. Finally, hidden state h t is calculated using o t and c t with tanh activation. Furthermore, GRU is simplified from LSTM to decrease a large amount of learning parameters. As shown in Fig. 7c, different with LSTM, h t is determined by an update gate z t and a reset gate r t , while no longer a cell state exists in the unit. The former controls how much information from the previous hidden state will carry over to the current one, while the latter decides whether the previous hidden state is forgettable [37].

Temperature distribution prediction based on recurrent neural networks
As Sect. 3 mentioned, to realize high-precision temperature control of entire strip length, the strip is divided into a number of segments, and each segment is separately controlled. Correspondingly, cooling zone is also divided into several micro cooling zones, same length with segment. In the cooling process, segment passes through all micro cooling zones in sequence from FDT pyrometer to CT pyrometer; with temperature drop increasing, strip temperature finally meets the target value. Obviously, strip temperature calculation in each micro cooling zone has obvious feature of sequence data. Thence, RNN algorithm can be adopted to prediction cooling temperature in variable-velocity process based on existing models. Taking one temperature calculation in micro cooling zone as one timestep, the illustration of the proposed model is shown in Fig. 8. Except for basic parameter of strip, such as thickness, width which is almost not changed, strip velocity, and water distribution in each micro cooling zone are divided into input sequential values, and temperature distribution sequence is output. It should be noted: T, V, and F shown in Fig. 8 mean temperature, velocity, and water distribution in each cooling zone.
In hot rolling strip line, automation system records almost all of process parameters in database or production logs, mainly including real-time data and model setup data. Two months of 12,998 coil production data relevant to control cooling process on run-out table have been collected. First, due to the unstable measuring of pyrometer, data of each strip head end and tail end is removed. Then, the remained data are randomly sampled and combined to obtain 99,000 experiment strip segment data. These samples are used to develop temperature distribution prediction model based on RNN. Nineteen key variables including temperature distribution data are selected for neural network modeling. Because medium temperature in cooling zone is hardly measured, and temperature distribution except for initial temperature FDT and final cooling temperature CT is measured, other temperature distribution values were padded with traditional model re-calculation values based on measured temperature data. The descriptions of dataset are shown in Table 1 and Fig. 9. In Fig. 9, fractal dimension visualization diagrams of experiment data show distribution of significant variables and output data. The enormous amount of data proves the developed model will have a strong robustness with the dataset. In particular, total flow here is used for abnormal detection.
Water distribution in each cooling zone represents cooling header opened/closed status and header water flow. Because of different cooling mechanism, UFC spray headers and laminar spray headers have different cooling efficiency even though in same cooling water flow. Moreover, water flow of UFC spray header can be adjusted in a wide range. In order to distinguish UFC header, laminar main cooling header, and vernier header, cooling efficiency E for different spray header was introduced. Taking one top header opened as example, heat transfer coefficient Eqs. (6 ~ 7) can be transformed shown as Eq. (10). After the cooling efficiency of different spray header is determined, water distribution in micro cooling zone can be indicatory as shown in Fig. 10. In addition, owing to the different number in each micro cooling zone, water distribution sequence is padding with 0, same with the closed spray headers.

Industrial data processing
In general, the original data collected from hot rolling line always contains abnormality, noise, and outliers, which may influence modeling and lead to misleading predictions. Therefore, these data must be removed. In first, the date of missing values was directly removed. And then, noise data was eliminated by Pauta criterion. The Pauta criterion implemented in this paper was given as Eqs. (11)(12)(13) [38]; data with absolute deviation exceeding the limitation will be regarded as noise and removed. Eighteen key variables will be treated in this way: where μ is average of x i and σ is standard deviation of samples. The result of abnormal detection with Pauta criterion is shown in Fig. 11. Due to that Pauta criterion is only effective for normal distribution or approximate normal distribution, in order to remove more outlier data, isolation forest is adopted to mining anomaly. Isolation forest algorithm, also named iforest, was proposed by Zhou [38]. Isolation forest is an efficient approach to anomaly detection that considers that most points in the dataset are not anomalous, and anomalies are typically very different from the rest of the dataset. Due to its linear time complexity and low demand for storage space, isolation forest has been used in many scenarios. This paper applies iforest to eliminate outlier of 19 features shown in Table 1. There are two hyperparameter isolation trees t, and sub-sampling size ψ should be optimized for iforest. Herein, we use grid search method to select suitability for each parameter. The grid search result is shown in Fig. 12. As can be seen in Fig. 12, the fluctuating range of proportion is higher at small isolation tree t and then stabilized when t ≥ 100. In general, train process time will be linearly increasing with isolation, and the iforest will be converged well when isolation tree t = 100; the higher isolation has no meaning [38]. On the other hand, with the increasing of sub-sampling size ψ, the proportion is gradually decreasing; when ψ is bigger than 256, the proportion is stable in range of 0.14 ~ 0.16. In summary, we select t = 100 and ψ = 256 as algorithm parameter. Base on low signal-to-noise ratio in general, the top 2% rank in dataset was removed as abnormal data, after determining parameter.

Data normalization and division
Different parameters often have different dimensions and units (as shown in Table 1). In order to eliminate the "eaten" situation caused by dimensions, input data must be normalized. The method used in this paper is zero-mean normalization and the formula is where x * i , x i , , and are normalized data, original data, mean of all the data, and standard deviation. After normalization, the dataset is split into training/validation/testing set with proportion 80%, 10%, and10%, respectively.
Learning parameter depending on the proposed model is trained with root mean square prop (RMSPorp) algorithm for minimizing error between prediction valve and actual value. The loss function of mean square error here is defined as where ŷi and y i are prediction temperature distribution and actual temperature distribution in cooling zone, respectively, while N is the number data in dataset. Due to only the start cooling temperature FDT and final CT in temperature distribution sequence could be measured, others are padded with traditional model re-calculation values. If prediction error of temperature distribution was directly used as criterion of gradient descent, each prediction error of temperature in distribution sequence is equivalent, while measured actual value is worth specially utilizing. Thus, an additional weight was added, and the definition of new improved loss function is as follows: And the weight of each error is derived from the distance with actual temperature measured as Eq. (17); the smaller the distance between temperature distribution and actual temperature, the greater the weight value of the loss corresponding: where j is serial number in sequence, and n is length of the input sequence.

Prediction of temperature distribution
In this section, to build an optimal model, several experiments such as compassion concerning types of recurrent cells, sequence length, and hyper-parameter settings are carried out and briefly illustrated. Above all, different types of recurrent cells are compared, i.e., standard RNN, LSTM, and GRU. The recurrent cell shows a significant difference in extracting the temporal information via hidden states as introduced in Sect. 4.2. Secondly, the influence of length of input sequence, named timesteps, was examined. Literatures show that vanishing gradient due to decay as sequence length increasing makes the loss of earlier input sequence [38]. The performance of the proposed method is evaluated with prediction accuracy and computation time. Prediction accuracy is measured by three different metrics: root mean square error (RMSE), mean absolute error (MAE), and coefficient of determination (R 2 ). The formulas of RMSE, MAE, and R 2 are as follows: Fig. 12 The proportion of anomalies with the number of isolated trees and sub-sampling size where ŷi and y i are prediction and actual temperature distribution in cooling zone, respectively, while N is number of the dataset. In addition, to evaluate whether it is competitive on the dynamic control for temperature distribution prediction in control cooling process, computation time of the proposed method is compared with traditional model. This project is using Python and the Keras framework. The training and testing were performed on a system equipped with an Intel(R) Core(TM) i5-6300HQ CPU, 16 GB of RAM.
The evaluation criteria result of prediction accuracy is as depicted in Table 2. As can be seen in Table 2, models based on LSTM and GRU slightly outperformed than ones with standard RNN. The models based on bidirectional unit show similar performance regularity, because bidirectional unit performs in sequential dependencies of forwarding and backward direction [39]; it is observed that the bidirectional unit of each model generally enhances the prediction accuracy. In addition, it is shown that the length of input sequence (timesteps) affects prediction accuracy of each recurrent unit. Due to higher velocity prediction accuracy in more timesteps, the models with 48 timesteps have higher prediction performance in both unidirectional and bidirectional models, even if long sequence prevents from capturing a piece of extended temporal [40]. As aforementioned, more timesteps mean more times of velocity modification calculation, and higher accuracy velocity prediction will be realized, which also makes the proposed model yield higher performance. In conclusion, the bi-LSTM model yielded the better value for each evaluation criterion than did the other units, and the model based on bi-LSTM and 48 timesteps showed the best prediction performance with the lowest RMSE 8.03 and MAE 5.7 and highest R 2 0.976. In order to further evaluate the performance of the proposed model for temperature distribution prediction, several segments with different rolling velocity of 11.8 mm 380CL strip were selected as examples; temperature distribution prediction is described in Fig. 13. Compared with the baseline model, the proposed model with 48 timesteps yielded desirable prediction as shown in Fig. 13, which proved the ability of temperature distribution prediction. Moreover, the final CT prediction performance of models based on bi-LSTM is shown in Fig. 14. It can be clearly seen that  The bold emphasis represents the best prediction performance temperature prediction error decreased with increasing input length. And prediction accuracy within ± 20 ℃ was 94.9%, 95.6%, and 97.6% for 12, 24, and 48 timesteps, respectively, equaling or higher than that of the baseline model, whose best accuracy within ± 20 ℃ is 95% in application. Furthermore, the model based on bi-LSTM and 48 timesteps could realize 95% prediction accuracy of ± 15 ℃, meaning more times of modifying velocity are an advantage for temperature prediction, and control.

Computation time
We aim to develop not only a high-precision prediction but also a real-time model which could just in time deal with variable-velocity rolling. Thus, the model should deal with variable-velocity dynamic process and satisfy the requirement of online temperature monitoring and modification. Herein, the proposed deep neural network-based model and traditional mechanism model were built to compare to computation time. Table 3 shows the comparison result for each model with different timesteps. It is shown that computational cost of the proposed model is considerably lower than the baseline mechanism model. The baseline mechanism model requires excessive computational resources by 1.2, 1.7, and 2.5 s for 12, 24, and 48 timesteps to accomplish re-calculation of temperature distribution for all segments in the cooling process, respectively. The proposed model based on bi-LSTM shows relatively similar consumption time of less than 0.001 s.
To summarize, the proposed model outperformed the mechanism model concerning computation time, which makes it possible to establish a real-time dynamic control in an actual control cooling process.

Conclusions and future work
In this paper, a temperature distribution prediction based on recurrent neural network for variable-velocity hot rolling strips was proposed. The isolated forest algorithm was used to eliminate outliers. And different recurrent cell and length of input sequences were tested to determine the optimal model. The key results were as follows: A traditional mechanism model of temperature distribution prediction based on finite-difference method was introduced as baseline model. Moreover, to match precise velocity value, an online cycle velocity calculation strategy was proposed. The result showed that the proposed velocity calculation strategy could realize high-precision velocity prediction.