Sales Forecasting Model Based on BP Neural Network Optimized by Improved Immune Genetic Algorithm

A new sales forecasting model based on an Improved Immune Genetic Algorithm (IIGA), IIGA that optimizes the BPNN (IIGA-BP) has been proposed. The IIGA presents a new way of population initialization, a regulatory mechanism of antibody concentration, and a design method of adaptive crossover operator and mutation operator, which effectively improved the convergence ability and optimization anility of IIGA. And IIGA can optimize the BPNN’s initial weights and threshold and improve the randomness of network parameters as well as the drawbacks that lead to output instability of BPNN and easiness into local minimum value. It taking the past records of sales figures of a certain steel enterprise as an example, utilizing the Matlab to construct the BP neural network, Immune Genetic Algorithm that optimizes the BPNN (IGA-BP), IGA-BP neural network, and IIGA-BP neural network prediction models for simulation and comparative analysis. The experiment demonstrates that IIGA-BP neural network prediction model possessing a higher prediction accuracy and more stable prediction effects.


I. INTRODUCTION
With the development of enterprise market oriented process, the competitive pressure among enterprises and sales market's instability are growing increasingly. The way to reasonably and accurately predict the market demand has become of great importance for the modern enterprise to control production capacity, to avoid effective operational risks and to achieve precision marketing 1 . Therefore, sales prediction is an integral part in modern enterprise management, in which the exactitude of prediction is directly related to the success of the company. In order to accurately predict the sales in the future, scholars home and abroad have once again thrown themselves into the study of sales forecasting model.
In recent years, scholars home and abroad have conducted a lot of researches for sales forecasting model, and have proposed several methods like neural network, extreme learning machine, machine learning, and time series model [2][3][4] . Neural network are widely used in the construction of sales forecasting model for its powerful adaptive learning and preferable nonlinear data fitting 5 . Besides, BPNN as the representative has gradually become the focus of scholars for its complete theoretical system. Such as: In view of sales of make to stock enterprise, Wu et al. 6 build a sales model based on BPNN by making use of its strong learning ability and powerful information storage. Qin et al. 7 , put forward BPNN sales forecasting model by selecting the network types based on the establishment of the network parameter optimization mechanism. By using the logsig activation function, Gao et al. 8 made the BPNN based prediction model in terms of the elements that may affect the data collection and reducing dimension processing. Bibaudalves et al. 9 , orienting the data's seasonal quality and value proposed a BPNN prediction model with average monthly best structure. However, it has some drawbacks like it's easy to fall into local minimum value and very sensitive to the initial network weights, which are hard to overcome. In this regard, scholars have proposed many effective methods for optimizing BPNN. Such as: Luo et al. 10 used the genetic algorithm to determine the BPNN's network structure and weights search space. Then they integrated the advantages of two algorithms to construct a neural network prediction model with preferred performance. Zhang et al. 11 , proposed an improved bacterial chemotaxis optimization and integrated it into the BPNN to develop a prediction model that used to predict a variety of stock index. Sheng et al. 12 , took the self calibration of synchronous time series as the reference and took advantages of genetic algorithm optimizing network parameters. They constructed the neutral network sales prediction model that combined the BPNN, time series forecasting model, and genetic algorithm together. According to the inner rules and development trend of the stock market, Liu 13 made the index prediction of it come to be true by the improved BPNN. Cao et al. 14 , proposed a vehicle sales forecasting model based on grey neural network with the combination of the advantage that the grey model deals with the small simple data and finding regularity of weak correlation data. Ji et al. 15 , by using Adaptive particle swarm optimization (APSO) to update weights and thresholds of BPNN raised the agricultural machinery proposed prediction model based on data APSO-BP algorithm. Zhang et al. 16 , optimized network parameters by using particle swarm optimization and reduced the dimension of the input neurons with Johnson algorithm to propose BPNN prediction model based on data mining. Wang et al. 17 , came up with an improved BPNN by using cuckoo search algorithm (CS-BP ), which the CS-BP model compared with traditional BPNN and multivariable linear regression has a good performance in forecasting lightning. Zha et al. 18 , using immune algorithm to optimize BPNN's initial weights and thresholds, proposed BPNN prediction method based on Immune Genetic Algorithm (IGA). In order to establish the accurate inventory prediction model, Cao et al. 19 adopted the principal component analysis to analyze the index data, used bayesian regularization algorithm to optimize BPNN, which is effective to avoid the overfitting phenomenon during the model training process. Li et al. 20 , with the aim to improve the accuracy and speed of power loaded prediction, proposed a algorithm that mixed the particle swarm optimization and BPNN (PSO-BP), and used the PSO-BP algorithm to predict the power load data. According to the importance of financial markets and the timing characteristics of financial indicators, Lu 21 improved the weights back propagation model and proposed timing weights back propagation model.
The above literature shows that BPNN has achieved good results in sales forecasting. However, because traditional BPNN uses the principle of gradient descent to adjust network parameters, it has the disadvantages of slow convergence speed and easiness to fall into local minimum value, and the random selection of the algorithm's initial weight and threshold made neurons lack the ability to adjust. When initializing the weights of the neural network, if a random initial is used, two situations are likely to occur 22-26 : (1) The result of initialized weights are the same. Then the symmetrical structure will be destroyed. In the forward propagation of the network, the output value of the neural network node of each layer is the same, so when it transfers backward, the partial derivative of the weight is also the same, which will make the ownership values of nodes of each layer to the previous ones are all the same, but the weights layers by layers will not be the same. In addition, initializing the weights to the same value will make the network convergence very slow.
(2) All weights are initialized to 0. Then the weights cannot be updated. This is because in the forward propagation of the network, the output values of all nodes are the same, and when the output value of each node in the backward propagation seeks the partial derivative of the loss function, result must be 0. Then it is impossible to use gradient descent method to update the weights, nor will training of natural neural network not be able to proceed.
Based on this, this paper proposes an effective neural network sales forecasting model that integrates IIGA and traditional BPNN model. The premature convergence problem of unique population initialization method, antibody concentration adjustment mechanism and crossover operator, mutation operator in the IIGA are utilized to initialize the weights and thresholds of each connection layer of BPNN, and improve the randomness of network parameters that cause the BP neutral network the shortcomings of unstable output and easiness to fall into the local extremes. Therefore, the convergence speed of the algorithm and the prediction accuracy of the model will be improved.

A. BP NEURAL NETWORK
BPNN is a multiple layer feed forward neural network with one or several implied layers , which its strong nonlinear data fitting ability is the mathematical model to solve the uncertain problem. It is the most widely used neutral network by now through its using the back propagation algorithm based on the principle of gradient descent for supervising learning to get pattern recognition and approximation of functions 27 . The most typical three layer neural network structure can realize the highly nonlinear mapping of any function's distribution parallel information processing from input to output. Through the forward transmission of data information and the back propagation of error, the weights and thresholds are adjusted according to certain rules to correct the network error until the output error or training times meet the predetermined conditions, as shown in Fig. 1.

B. IMPROVED IMMUNE GENETIC ALGORITHM TO OPTIMIZE BP NEURAL NETWORK PRINCIPLE
Being a new improved and refined algorithm, IGA has the immune function, which considers both the search speed and ability by adding the genetic operator into the immune operator whereas it also avoids the degeneracy phenomenon and premature convergence of genetic algorithm. It encodes the parameters that need to be optimized to form a series population. Through immune genetic evolution operation, it evolves the population to a better search space, so that the individuals in the population reach the optimal 28 . The IGA has outstanding global search ability and population diversity maintenance mechanism which just make up for the deficiency of BPNN. However, the standard IGA still has the problems of slow convergence speed and local minimum value in dealing with complex optimization problems. Therefore, the IIGA is considered to optimize the BP network.
In this paper, the IIGA is used to optimize BPNN from the following three aspects: (1) Determine the network topology structure.
(2) Optimize the initial weights and thresholds of BPNN. Guided by the fitness function of the optimal antibody in the population, the IIGA is used to search for the optimal solution and a group of network parameters with the highest global fitness are obtained as the initial weights and thresholds.
(3)Do a simulation prediction. The obtained initial weights and thresholds are used in the BPNN again, to find the optimal weights and thresholds during the training period, and then the optimal prediction output is realized.

C. PROCESS OF BP NEURAL NETWORK OPTIMIZED BY IMPROVED IMMUNE GENETIC ALGORITHM
The following is the flow that used the IIGA to optimize the BPNN algorithm, as shown in Fig. 2.
The specific steps are as follows: (1)Determine the network topology structure.
(2)Antigen recognition. The target function of antigen's corresponding problem: the mean square error of the network prediction is the least.
where m is the amount of the samples, n is the total number of output nods; ij Q and ij Y are the expect output results and predictable results of the neutral network respectively which the i antibody corresponds.
(3)Initial antibody production. In order to avoid the disadvantages of randomization of initial weights and thresholds and low computational efficiency, the convergence speed of the algorithm is accelerated whereas ensuring the diversity of population. In this paper, the Nguyen Widrow method 29 is used to generate 70% of the initial population, and the remaining 30% is randomly generated by multiple evolution. In each generation of antibody group, only some antibodies with higher fitness were retained, and the antibody with weak fitness was replaced in the next evolution process, which was a cycle until the randomly generated population size was reached.
the Nguyen Widrow method:   normr J is the matrix that the J matrix is developed into 1.
(4)Antibody coding. In this paper, real coding is used to encode the initial connection weights and thresholds of neural networks, and the random distribution of gene code strings is obtained. Each gene code string corresponds to a set of weight threshold.
The length of the antibody code is: S n m m l m l       (4) where n , m , and, l are nodes of input layer, hidden layer and output layer respectively.
(5)Affinity calculation. ①The affinity between antibody and antigen. The affinity between the antibody and the antigen is the criterion for determining the quality of the antibody. The reciprocal of the mean square error of the training data is used as the original affinity function of the IGA, that is the fitness function.
The fitness function is as follows: where  is the small positive quantities that is to avoid zero denominator;   Ex is the objective function of the network. In order to prevent the initial stage of evolution, special antibodies in the initial antibody population mislead the population's development direction. This paper uses a linear adjustment strategy to adjust the fitness function 30 . ' min where max f and min f are the largest and smallest fitness values in the contemporary population respectively; f is the current fitness values of the antibody; ' f is the revised antibody fitness values;  is the adjustment coefficients.
②Affinity between antibody and antibody. This article uses euclidean distance as an important indicator to measure the similarity between antibodies. The Euclidean Distance between antibody The affinity function is as follows: where   , 0,1 vr S  reflects the degree of similarity between antibodies v and antibodies r .
The antibody concentration function is as follows: where v C reflects the proportion of similar antibodies in the response population; N is the total number of antibodies; ,  and  are the adjustment coefficient.  is a little less than 1,so 0.9 to 1 is usually taken, and  is a little more than 1, so 1 to 1.1 is usually taken. In this way, the calculation of antibody concentration considers both the same antibody and similar antibodies.
In order to avoid simple selection operations, antibodies with high fitness are gradually accumulated, resulting in excessively high antibody concentration. According to the principle of mutual inhibition of antibodies due to different concentrations in the immune mechanism, the introduction of concentration regulators adjusts the probability of antibodies being selected s P For any antibody (10) where  and  is the variable parameter between 0 to 1; i C is the concentration of antibody;   i fv is the fitness value of the antibody i v ;

 
Maxf v is the maximum fitness value in the population. This not only retains antibodies with high adaptability, but also inhibits antibodies with high concentrations, maintains the diversity of the population, and improves the convergence ability of the algorithm.
(7)Immune manipulation. ① Clonal selection. The clone selection is performed according to the strategy of combining elite retention by the combined way of maintaining the best solution found over time before selection and roulette . In this way, every time the memory bank is updated, the antibodies with higher fitness are stored first, and then the selection probability is calculated remanent refering to "(10)," and the excellent antibodies in the population are stored in the memory bank.
② Adaptive crossover and mutation.
In order to make the algorithm's crossover and mutation probabilities to be adjusted by itself in terms of the degree of antibody fitness in the population and evolutionary algebra , the crossover and mutation probabilities need to be parameterized. Crossover where ' f is the maximum fitness of the antibody to be crossed ; max f is the maximum fitness of the current population; avg f is the average fitness of the current population; min f is the minimum fitness of the current population; f is the fitness of the antibody to be mutated; t is the current evolution algebra of the algorithm; T is the total evolution algebra of the algorithm; c P and m P are the initial value of the crossover and mutation probability.
(8)Determination of initial weights and thresholds of network.
It means to take the fitness of the optimal antibody in the population as the guide cycle to perform operations such as evaluation, selection, crossover, and mutation, etc. until the set evolutionary algebra is reached and select a group of network parameters with the highest adaptability as initial weights and thresholds.
(9)After obtaining the initial weights and thresholds, use the BP algorithm to learn and continuously adjust until the established network error is met or the maximum evolution algebra is reached, and finally the optimal weights and thresholds are obtained, and the prediction output is achieved.

A. IDENTIFICATION OF INFLUENCING FACTORS OF STEEL SALES
The sales status of steel is affected by various complex factors. Taking the historical sales data of a steel company as an example, select from the many factors that affect steel sales the Gross domestic product  

B. GREY RELATION ANALYSIS
Grey relational analysis is a method to judge the correlation between factors in the system by weighing the similarity of the sequence curves of various factors in the same coordinate system. It has no special requirements for the sample size and the regularity among samples, and has the characteristics of small amount of calculation and high precision. It is a new method to study the small sample and poor information analysis and prediction 31 . In this paper, the grey factor correlation analysis method is used to identify the main influencing factors of steel sales volume, and find out the correlation degree between the input factors and the output volume, so as to extract the main control factors affecting the output index. The steps of grey relation analysis are as follows: (1)Determine reference sequence and comparison sequence.
According to research needs, take the steel sales volume from 1994 to 2018 in Table 1  (2)Nondimensionalization of data. In order to avoid large prediction error caused by too large order of magnitude difference, dimensionless processing is needed before data calculation. This paper adopts the Mean conversion method to disabuse the dimension to clear up the dimension differences of data. And the calculating equation is: (13) (3)Calculate the relationship coefficient between each factor and steel sales volume.
; the resolution coefficient  is generally taken 0 to 1.
(4)Calculate the correlation degree between each factor and steel sales volume.
The above process is based on the Matlab 2018a platform to write a calculation program, the resolution coefficient  is set to 0.5, then the gray correlation analysis results are shown in Table 2 According to the calculated relational degree results in Table 2, the threshold is equal to 0.7, and the factors greater than 0.7 are used as the selected indicators. The order of gray correlation is: Total output value of the national construction industry  

C. PRINCIPAL COMPONENT ANALYSIS
It can be seen from Table 1 that the original data is multidimensional data. Due to the coupling between the data, not all the data has a high contribution rate to the sales of steel. If the indicators selected using grey relation analysis are directly used as the input nodes of the prediction model, due to the complexity and redundancy of the experimental data, the performance of the network and the prediction accuracy of the model will inevitably be affected 32 . For this reason, this paper chooses the principal component analysis method to perform dimensionality reduction processing on the indicators selected by the gray correlation analysis, and determines the input indicators to reduce the coupling relationship between the variables and retain the data that is valuable for the prediction results.
The steps of principal component analysis are as follows (1)Establish a correlation matrix R .
Assuming that the input original data set is X and each sample corresponds to P influencing factors, i.e.   12 ,, P X X X X  . Then the Correlation Matrix R is: where ij r is the correlation coefficient between factor i x and factor j x , , 1, 2, , i j p  , n is the total number of samples; x is the average of x . Generally speaking, when the cumulative contribution rate is reached 75% to 95% ,the corresponding previous m principal component can contain most of the information in the original data set. Variance contribution rate and cumulative variance contribution rate are: (4)Calculate the principal component score.
The above process is based on the Matlab 2018a platform to write a calculation program, and the princomp function is used to obtain the contribution rate of each indicator factor, as shown in Table 3 It can be seen from Table 3 that the first three principal components already contain more than 95% of the information of all influencing factors. Therefore, the first three principal components are selected to replace the factors selected by the original gray correlation analysis as the final input indicators. The score value of each principal component, as shown in Table 4.

A. EXPERIMENTAL ENVIRONMENT AND PARAMETER SETTING
In order to fully verify the performance of the algorithm, a steel sales data set of a steel company is used as an example for verification. Taking  To determine the number of hidden layer nodes, empirical formulas are usually used to calculate the range of nodes, and then BPNN trial calculations are performed to determine the exact number of nodes. q m n a    (22) where q , m , and n are the number of nodes in the hidden layer, the input layer, and the output layer; a is generally taken 1 to 10 as the adjustment factor. By continuously increasing the number of hidden layer nodes and taking the minimum network output error as the standard, the optimal number of hidden layer nodes is obtained. After repeated adjustments, when the number of hidden layer nodes is 10, the output error is the smallest, and the network topology is 8-10-1.
The experimental parameters are shown in Table 5

B. NETWORK PERFORMANCE ANALYSIS
The evolution curve of fitness function of IGA model and IIGA model, as shown in Fig. 3. It can be seen from Fig. 3 that after 54 generations of the IIGA model evolved, the fitness function value of the optimization result gradually stabilized, about 2.2961, indicating that every individual is in Near the optimal solution. IGA model fitness function value gradually stabilized at around 1.7294 after the 76th generation, and the curve has oscillation phenomenon in the later stage of evolution. It shows that the IIGA model is better than the classic IGA model in optimizing network parameters.
When the target error is 10 -4 , the three network model training error curves of BP, IGA-BP, and IIGA-BP are shown in Fig. 4.
It can be seen from Fig. 4 that the IIGA-BP algorithm has the fastest convergence speed and the steepest curve in the early stage of training. After getting rid of the local optimal solution around the 5th generation, it quickly enters a stable state. The design of adaptive adjustment strategy of crossover operator and mutation operator in this article is related to the design. Moreover, the IIGA-BP neural network reaches the target error after 23 evolution, the IGA-BP neural network reaches the target error after 65 evolution, and the BPNN has a training error after 100 evolution. Converges to 6.253×10 -3 and falls into the local optimum and cannot escape. It can be seen that the BPNN optimized by the IIGA can increase the convergence speed of the network and reduce the possibility of the BPNN falling into a local minimum value which has a strong superiority.

C. PREDICTION ACCURACY ANALYSIS OF ALGORITHM MODEL
In order to further compare the performance differences of the three models and verify the accuracy and applicability of the prediction model, 7 sets of data with years 1995, 1996, 1997, 2002, 2004, 2009, 2010 in Table 4 as verification samples are randomly selected. and the remaining 18 sets of data are as training samples. The BP, IGA-BP, and IIGA-BP neural network models are conducted with performance simulation experiments, and run them independently for 20 times to obtain the simulation results shown in Figure 5 and Figure 6. The predicted values of the three network models are compared with the true values, as shown in Table 7.  From Fig. 5 and Fig. 6, we can see that the BP neural network model and IGA-BP neural network model predict the dispersion and the error fluctuation range are relatively large; the IIGA-BP neural network model predicts the dispersion is small, the error is basically maintained Between [-0.25, 0.25], the prediction effect is more stable, and the prediction accuracy is greatly improved. In order to display the prediction accuracy of the three models more intuitively, the Mean Absolute Percentage Error (MAPE), Root Mean Square Error (RMSE), and Nash-Sutcliffe efficiency coefficient (NSE) are used to further evaluate the error indicators of the three network models.
where i y and i y  are the expected output value and the predicted output value respectively; y is the average value of the measured value; N is the number of predicted samples.
The prediction accuracy comparison of the three models is shown in Table 8 Neural It can be seen from Table 7 that the MAPE of the IIGA-BP neural network model is 0.1372, which is 23.82% higher than the prediction accuracy of the BP neural network model, which is also better than the prediction accuracy of the IGA-BP neural network model. Compared with the other two models, the RMSE value of the IIGA-BP network is the smallest, the error of the model is small, and the generalization performance is strong. The NSE of the IIGA-BP neural network is 0.9909. Compared with the other two models, this model has better nonlinear data fitting ability and higher prediction stability.

V. CONCLUSION
This article focuses on the shortcomings of BPNN in sales forecasting. An IIGA is proposed to optimize the sales forecast model of BPNN. The IIGA proposes a new way of population initialization, antibody concentration, adjustment mechanism, crossover operator, mutation operator, adaptive strategy design method, which guarantees the diversity of the population and improves the convergence ability and global search ability of the algorithm. The IIGA is used to expand the weights search space of the BP network, obtain the optimal weights and thresholds, and improve the shortcomings of the BP network that it is easy to fall into the local minimum value. Finally, through the construction of IIGA neural network steel sales forecasting model and horizontal evaluation with BP and IGA-BP neural network prediction models, experiments show that the BPNN optimized by the IIGA has strong nonlinear data fitting And generalization performance, whereas improving model prediction accuracy and convergence speed, it also simplifies the network structure, reduces the evolutionary algebra, the stability of the prediction output is better. However, due to the lack of network training data and imperfect network training, the model prediction accuracy is not too high. For in depth research, the training data set can be further expanded, the learning effect of the model can be strengthened, and the prediction accuracy can be improved.