## 2.1 LSSVM regression prediction model of billet surface temperature

The support vector machine (SVM) theory proposed by C. Cortes[13] in 1995 is a machine learning model that seeks the best compromise between the learning accuracy of training samples and the ability to identify arbitrary samples based on the VC dimension theory and the principle of minimum structural risk. SVM has strong generalization ability, can solve nonlinear problems and can avoid local minima in solving small sample training set.

In 1999, J.A.K. Suykens[14] proposed the least squares support vector machine (LSSVM). Based on the original method, the two norm was used and the inequality constraint was changed into equality constraint, so that solving the convex quadratic programming problem was transformed into solving linear equations, and the solving efficiency was improved.

The mathematical description of LSSVM is as follows: Suppose there is a training data set composed of samples,\(D=\left\{ {\left( {{I_i},{Y_i}} \right)\left| {i=1,2,3,...,N} \right.} \right\}\),\({I_i}\)is the input va, ue,\({Y_i}\) is the output value. LSSVM regression model can be expressed as:

$$Y(I)={\omega ^T}\varphi ({I_i})+b$$

1

Where, \(\omega\) is the weight vector, \(\varphi ({I_i})\) is a nonlinear mapping function that maps \({I_i}\) to a higher dimensional space, and is an offset quantity. In the prediction model of this paper \(N=12\). \({I_1}\) to \({I_{12}}\) are molten steel temperature in Tundish, inlet temperature of crystallizer, outlet temperature of crystallizer, water flow rate of crystallizer, casting billet pulling speed, temperature of secondary cooling water, water pressure of valve port from section #0 to #2, valve opening from section #0 to #2 respectively.

In order to solve the problem of partial specific points, the error variable \({e_i}\) is introduced into each sample, and the \({L_2}\)-norm of the error variable is added into the original function. The LSSVM optimization problem can be translated into:

$$\left\{ \begin{gathered} {\hbox{min} _{\omega ,b,e}}J(\omega ,e)=\frac{1}{2}{\omega ^T}\omega +\gamma \frac{1}{2}\sum\limits_{{i=1}}^{n} {{e_i}^{2}} \hfill \\ s.t.{Y_i}({\omega ^T}{I_i}+b)=1 - {e_i},i=1,......,N \hfill \\ \end{gathered} \right.$$

2

Where, \(\gamma\) is the penalty factor to adjust the relationship between output \({Y_i}\) and error variable \({e_i}\).

Lagrange multiplier is introduced to solve the optimization problem:

$$L(\omega ,b,e;\alpha )=J(\omega ,e) - \sum\limits_{{i=1}}^{n} {{\alpha _i}[{Y_i}({\omega ^T}{x_i}+b) - 1+{e_i}]}$$

3

Where, \({\alpha _i}\) represents the Lagrange multiplier corresponding to \({I_i}\).

According to the KKT Conditions (Karush-Kuhn-Tucker Conditions), take the derivative of each variable to solve the values of \({\alpha _i}\) and :

$$\left\{ \begin{gathered} \frac{{\partial L}}{{\partial \omega }}=0 \to \omega =\sum\limits_{{i=1}}^{n} {{\alpha _i}{Y_i}{I_i}} \hfill \\ \frac{{\partial L}}{{\partial b}}=0 \to 0=\sum\limits_{{i=1}}^{n} {{\alpha _i}{Y_i}} \hfill \\ \frac{{\partial L}}{{\partial {e_i}}}=0 \to {\alpha _i}=\gamma {e_k},k=1,2,...,N \hfill \\ \frac{{\partial L}}{{\partial {\alpha _i}}}=0 \to {y_i}({\omega ^T}{I_i}+b) - 1+{e_k},k=1,2,...,N \hfill \\ \end{gathered} \right.$$

4

For the new sample , the output of the LSSVM nonlinear regression model is:

$$Y(I)=\sum\limits_{{i=1}}^{n} {{\alpha _i}{K_{ij}}+b}$$

5

Where, \({K_{ij}}\) is the kernel function matrix. Radial basis function (RBF) has the advantages of strong adaptability and wide application, so RBF is chosen as the kernel function of this model. So, \({K_{ij}}=\exp \left\{ {\left. { - \frac{{{{\left\| {{I_i} - {I_j}} \right\|}^2}}}{{2{\sigma ^2}}}} \right\}} \right.\),\(\sigma\) is the kernel parameter.

In LSSVM regression modeling, the prediction accuracy depends on the value of penalty factor \(\gamma\) and kernel parameter \(\sigma\). The penalty factor \(\gamma\)is used to balance accuracy and error. The larger \(\gamma\) is, the smaller the error is. However, the more complex the model decision function is, the more parameters it contains, and overfitting is easy to occur. The kernel parameter \(\sigma\) represents the refinement of the partition between the value and the sample. The smaller \(\sigma\) is, the more complex the curves selected in the low-dimensional space are, the finer the categories are, and the overfitting is easy to occur. Therefore, this paper adopts the sparrow search algorithm improved by Logistic chaos mapping and golden sine to carry out global optimization and select appropriate values of \(\gamma\) and \(\sigma\).

## 2.2 Hybrid improved ILGSSA algorithm

Sparrow search algorithm is a population optimization algorithm based on swarm intelligence, foraging behavior and anti-predation behavior of sparrows [15]. In foraging behavior, the population of sparrows is divided into the finder population and follower population. After foraging, those randomly selected individuals in the population turn to be the guard population. The finder population is responsible for finding the feeding area and direction, and the follower population forages with the finder population. Each sparrow is likely to be a finder, but the ratio of finders to followers remains constant throughout the population. When the alarm value is greater than the safe value, the sparrow population will give up the current position and fly to the safe area.

In this paper, SSA is adopted to optimize the penalty factor value and kernel parameters of LSSVM, which solves the problem of low prediction accuracy due to the limitation of parameter selection.

Assume that there is a sparrow population with a number of \(n~\), and position of the sparrows in \(~m~\) dimensional solution space is expressed as:

$$X=\left[ {\begin{array}{*{20}{c}} {{x_{1,1}}}&{{x_{1,2}}}& \cdots & \cdots &{{x_{1,m}}} \\ {{x_{2,1}}}&{{x_{2,2}}}& \cdots & \cdots &{{x_{2,m}}} \\ \vdots & \vdots & \vdots & \vdots & \vdots \\ {{x_{n,1}}}&{{x_{n,2}}}& \cdots & \cdots &{{x_{n,m}}} \end{array}} \right]$$

6

where, is the optimal dimension of the billet surface temperature prediction model. In this model,\(m=2\).

The fitness of the sparrows can be expressed as:

$${F_X}={\left[ {{f_1},{f_2},...,{f_n}} \right]^T}~$$

7

where, \({f_i}\) is the fitness of each sparrow, which is the sum of the mean square error of the training set and the mean square error of the test set of the model (that is, total error).

Because the initial solution is randomly generated in SSA, which takes aggregation to the initial solution, and cause ununiform distribution of solution space. Logistic chaotic map has advantage of good randomicity, so we adopt Logistic chaotic map to generate initial solution [16–21]. However, in practice, Logistic map points distribution tends to relatively cluster in the upper half, and rare in the lower one as it is shown in Fig. 1(a). In order to make the map points more uniformly distributed and enhance the ergodicity of the chaotic map, we propose an improved Logistic chaotic map, its formula is given as:

$${x_{i,j+2}}=\mu {x_{i,j+1}}(1 - {x_{i,j+1}})+(4 - \mu ){x_{i,j}}(1 - {x_{i,j}})$$

8

where, \({x_{i,j}}\)\(\in \left( {0,1} \right)\), is the position of the th individual in the th dimension of the initial generation of sparrow population; \({x_{i,1}}\) and \({x_{i,2}}\) are random numbers uniformly distributed on (0,1); \(\mu\) is the coefficient of chaos. If \(\mu\) is closer to 4, the system is more uniformly distributed on (0, 1). In this paper, \(\mu\) is chosen to be 3.99.

In order to compare the improvement effect clearly and intuitively, we set the number of iteration times to 2000, the map points distribution after improvement is shown in Fig. 1(b), and the histogram of the points distributions before and after the improvement are shown in Fig. 3.

It can be seen from the comparison in Fig. 2, the bigger the chaotic value which is before the improvement, the more the map points will be clustered, and the number of map points will reach maximum when the chaotic value reaches maximum. While the number of the improved Logistic chaotic map points with a higher ergodicity is more uniform. Therefore, the sparrow population diversity is improved by initializing the population with the improved Logistic chaotic map.

According to Golden sine algorithm simulate the searching process of unit circle by sine function [22–25] can obtain high quality area. In this paper, the update rule of finders' position of sparrow population is defined as:

$$\left\{ {\begin{array}{*{20}{c}} {x_{{i,j}}^{{t+1}}=\left\{ {\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} {x_{{i,j}}^{t}\cdot \left| {\sin \left( {{r_1}} \right)} \right| - {r_2}\sin \left( {{r_1}} \right)\cdot \left| {{c_1}x_{{best}}^{t} - {c_2}x_{{i,j}}^{t}} \right|}&{\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} {}&{} \end{array}if}&{{R_2}<ST} \end{array}} \end{array}} \\ {\begin{array}{*{20}{c}} {x_{{i,j}}^{t}+Q}&{\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} {}&{} \end{array}}&{} \end{array}}&{} \end{array}}&{} \end{array}}&{} \end{array}}&{} \end{array}}&{} \end{array}}&{} \end{array}}&{}&{otherwise} \end{array}} \end{array}} \end{array}} \right.} \\ {{c_1}= - \pi \left( {1 - \pi } \right)+\pi \tau \begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} {}&{} \end{array}}&{}&{} \end{array}}&{}&{} \end{array}}&{}&{} \end{array}}&{}&{}&{} \end{array}}&{} \end{array}\begin{array}{*{20}{c}} {}&{} \end{array}} \\ {{c_2}= - \pi \tau +\pi \left( {1 - \tau } \right)\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} {}&{} \end{array}}&{}&{} \end{array}}&{}&{} \end{array}}&{}&{} \end{array}}&{}&{}&{} \end{array}}&{} \end{array}\begin{array}{*{20}{c}} {}&{} \end{array}} \\ {\tau =\frac{{\sqrt 5 - 1}}{2}\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} {}&{} \end{array}}&{}&{} \end{array}}&{}&{} \end{array}}&{}&{} \end{array}}&{}&{}&{} \end{array}}&{} \end{array}\begin{array}{*{20}{c}} {}&{} \end{array}\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} {}&{} \end{array}}&{} \end{array}} \end{array}} \right.$$

9

where, \(x_{{i,j}}^{t}\) is the position of the th individual in the th dimension of sparrow population in the th generation; \({x_{best}}\) is the global optimal position; is a random number that obeys standard normal distribution; \({r_1}\)and \({r_2}\) are random numbers obeying uniform distribution on \(\left[ {0,2\pi } \right]\) and \(\left[ {0,\pi } \right]\) respectively; \({c_1}\)and \({c_2}\) are partition coefficients; \(\tau\) is golden ratio; \({R_2}\) is the early warning value obeying the random number of uniform distribution on \(\left[ {0,1} \right]\); \(ST\)\(\in \left[ {0.5,1} \right]\) is the safe value.

While \({R_2}<~ST\), the early warning value is less than the safety value, the finder population is in a safe state and it searches for food in a wide area around the current location; while \(~~{R_2} \geqslant ~ST\), the early warning value exceeds the safe value, the finder population leaves the current location and flies to another place randomly obeying standard normal distribution.

The follower population position update rule in the population is defined as:

$$x_{{i,j}}^{{t+1}}=\left\{ {\begin{array}{*{20}{c}} {Q\cdot \left| {\sin \left( {{r_1}} \right)} \right| - {r_2}\sin \left( {{r_1}} \right)\cdot \left| {{c_1}x_{{worst}}^{t} - {c_2}x_{{i,j}}^{t}} \right|\begin{array}{*{20}{c}} {}&{\begin{array}{*{20}{c}} {if}&{i>\frac{n}{2}} \end{array}} \end{array}} \\ {x_{{i,j}}^{{t+1}}+\left| {x_{{i,,j}}^{t} - x_{p}^{t}} \right|\cdot {A^+}\cdot L\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} {}&{} \end{array}}&{} \end{array}}&{} \end{array}}&{}&{otherwise} \end{array}} \end{array}} \right.$$

10

Where, \({\text{~}}{{\text{x}}_{worst}}\) is the global worst position; \(x_{p}^{t}\) is the current best position of the finder population; is a\(1 \times m\) matrix in which each element is randomly assigned a value of 1 or -1, \({A^+}=A{~^T}{\left( {A{A^T}} \right)^{ - 1}}\).

When \(~i>\frac{n}{2}\), it means that the current position of the finder population is not good, there is not enough food, and the follower population will fly to another region. When \(~i \leqslant \frac{n}{2}\), the follower population forages near the finder population.

The initial population randomly selects 10–20% of individuals as guards who are responsible for early warning and detection of the surrounding environment:

$$x_{{i,j}}^{{t+1}}=\left\{ {\begin{array}{*{20}{c}} {x_{{best}}^{t}+\beta \cdot \left| {x_{{i,j}}^{t} - x_{{best}}^{t}} \right|\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} {}&{if} \end{array}}&{{f_i}>{f_g}} \end{array}} \\ {x_{{i,j}}^{t}+K\cdot \frac{{\left| {x_{{i,j}}^{t} - x_{{worst}}^{t}} \right|}}{{\left( {{f_i} - {f_w}} \right)+\varepsilon }}\begin{array}{*{20}{c}} {}&{if}&{{f_i}={f_g}} \end{array}} \end{array}} \right.$$

11

Where, \(\beta\) is a random number that obeys standard normal distribution; is a random number that obeys the uniform distribution on [-1,1]; \({f_i}\) is the fitness of the current individual sparrow; \({f_w}\) and \({f_g}\) are the current global worst fitness and best fitness respectively. ε is an infinitesimal constant.

If \({f_i}>{f_g}\), the population felt danger and approached to the safe position, and if \({f_i}={f_g}\), the population stays in a safe position and moving around it.

When the sparrow population reaches the minimum fitness or the maximum number of iterations, the population stops updating.

## 2.3 ILGSSA-LSSVM algorithm flow

In this paper, the fitness (i.e. the total error) was used to evaluate the global optimization results. When the sparrow population reaches the minimum fitness or the maximum number of iterations, the population stops updating, and the optimal position information output is the optimal value of \(\gamma\) and \(\sigma\) of the LSSVM surface temperature prediction model. The flow of ILGSSA-LSSVM algorithm is shown in Fig. 3, which includes the following 6 steps:

1) The improved Logistic chaotic mapping initializes sparrow population.

2) Calculate and sort the individual fitness of the population, and mark the best fitness and the worst fitness.

3) Update the position of the finder population, follower population and guard population.

4) Determine whether the minimum fitness or the maximum number of iterations has been achieved. If not, go to Step 2).

5) Assign the optimal individual position of output to the \(\gamma\) and \(\sigma\) values of the LSSVM.

6) Predict the billet surface temperature by LSSVM regression model.

A comparison of the fitness curves between ILGSSA-LSSVM model and SSA-LSSVM model in this paper under the number of 20 iterations is shown in Fig. 4. The fitness of ILGSSA tends to be stable and converges to 0.01273 at 12th generation, while the SSA tends to be stable and converges to 0.01467 at 18th generation. Compared with SSA-LSSVM model, ILGSSA-LSSVM model not only converges faster, but also has smaller fitness.