Prediction of Postoperative Survival Level of Esophageal Cancer Patients Based on Kaplan-Meier(K-M) Survival Analysis and Gray Wolf Optimization (GWO)-BP Model

: Background: Esophageal squamous cell carcinoma (ESCC) is a global safety problem, especially the low 5-year survival rate of patients after surgery, and their healthy life after surgery is directly threatened. Methods: Kaplan-Meier(K-M) survival analysis is used to screen the blood indexes of patients with ESCC. The gray wolf algorithm (GWO) is introduced to optimize the weight threshold of back-propagation (BP) neural network, and a prediction model based on K-M-GWO-BP is established. Results: According to the influencing factors of postoperative survival, the postoperative survival level of patients is predicted. K-M survival analysis is used to analyze the relevant risk factors, the redundant variables are eliminated, and the whole structure of the neural network is simplified. The initial weight of BP neural network is optimized by GWO. Conclusions: BP neural network model, PSO-BP, GA-BP, SSA-BP, GWO-BP, K-M-BP, K-M-PSO-BP, K-MGA-BP, K-M-SSA-BP and K-M-GWO-BP are compared, the prediction accuracy of K-M-GWO-BP neural network model is the best.


Introduction
Esophageal cancer was one of the most common malignant tumors of digestive system in the world. Global Cancer Epidemiology Statistics (GLOBOCAN2018) shown that there are 572000 new cases of esophageal cancer worldwide, and 509000 cases were expected to die of esophageal cancer in 2018 [1]. The number of new cases of esophageal cancer in China ranked first in the world, accounting for about 50% of the global incidence of esophageal cancer [2,3]. It was one of the countries with the highest incidence of esophageal cancer in the world [4,5]. In China, squamous cell carcinoma was the main pathological type of esophageal cancer (EC), accounting for more than 90%. Surgery was the first choice for patients with resectable EC. With the progress of medical and health technology, the development of minimally invasive concept and the development of (ERAS) concept of accelerated rehabilitation surgery, the long-term prognosis of patients has been significantly improved [6,7]. However, due to the complexity of EC surgery, more postoperative complications, and a high recurrence rate after surgical resection, the 5-year survival rate was about 40% [8,9].
In fact, the survival rate of all patients with ESCC more than 5 years after operation is less than 20% [12]. According to the low accuracy of predicting the survival rate of cancer patients, recent studies have shown that a computer-aided classification method for lung cancer prediction based on evolutionary system has been proposed [13]. The work demonstrated that the proposed probabilistic genetic algorithm optimized neural network models, integrating with the t-SNE dimensionality reduction algorithm, achieved accurate prediction of patient survival [14]. The proposed GPU-based training of BP neural network was tested on a breast cancer data, which shown a significant enhancement in training speed [16]. BP neural network model [17,18], genetic algorithm model [19,20], support vector machine model [21], decision tree method [22]and time series method [23] were commonly used prediction methods at present. However, BP neural network had some defects such as local optimization, irrelevant to physical meaning, strong dependence on training data and slow convergence speed, which hindered its application in practical engineering [17,19]. Strong macro search and global optimization capabilities were the characteristics of genetic algorithm (GA) [20]. The problem of local minimization of network could be solved to improve network performance. Therefore, GA was widely used to optimize BP neural network [19]. Due to the characteristics of multi-media and multi factors in the blood of esophageal cancer, it was difficult to determine the influencing factors which had the optimal correlation with the prediction indexes of the model. In the process of neural network modeling, it was time-consuming and difficult to optimize the neural network.
However, due to poor correlation between model input variables and output variables, and high redundancy and coupling relationship between variables, the defects of poor prediction accuracy of the model have not been well solved [24]. Some researchers adopt grey wolf optimization (GWO) to optimize the BP neural network's global search capability, which could greatly avoid trapping in local best solution [25][26][27]. The GWO studied in this paper was a new swarm intelligence optimization algorithm, which was proposed in 2014 [28]. The basic idea of GWO came from the mechanism of cooperative predation of gray wolf population. Compared with the previous algorithms, the setting of GWO was relatively simple, and only needs the guidance of individual gray wolf to hunt for the optimal solution. The GWO had the characteristics of simple implementation and fast convergence speed, which shown excellent results in standard test functions. At the same time, the research shown that the GWO algorithm was better than other intelligent optimization algorithms in some application fields, such as particle swarm optimization algorithm (PSO), GA [19]. The objectives of this work are summarized as follows. 1) A K-M-BP neural network model is proposed. The purpose of the model is to reduce the dimension of data and improve the accuracy of BP neural network prediction model. K-M analysis is used to screen the blood factors with high correlation with the survival level of patients to simplify the network structure. BP neural network is applied to predict the survival level of patients with esophageal cancer. Case study and experimental results demonstrate that K-M-BP neural network model is more effective than BP neural network model in predicting the survival level of patients. 2) Based on the proposed framework, a K-M-GWO-BP is proposed by adopting GWO as the optimizer for evolving the BP. GWO is used to optimize the BP neural network trained model to improve the prediction accuracy. The proposed GWO-BP is tested on a set of benchmark functions to verify its effectiveness. The prediction accuracy and applicability of BP, GWO-BP and K-M-GWO-BP prediction models are constructed to explore a new way of survival level prediction. The experimental results show that the proposed K-M-GWO-BP neural network model is superior to some of the latest BP neural network models in terms of calculation speed and prediction accuracy.
In the rest of this article, the sources of the data are described in Section II. Then, the proposed K-M and GWO and GWO-BP are given in Section III. Afterwards, the experimental results are detailed in Sections IV and V. Finally, conclusions are drawn and future work is outlined in Section VI.

Objects and analysis 2.1 Collect patient samples
A total of 331 patients with ESCC were treated in the affiliated Hospital of Zhengzhou University from January 2007 to December 2018, including 210 males (63.44%) and 121 females (36.56%). Patients were concentrated at age of 38 to 80 years old with average age of 60.61 years old.
Blood indicators were regarded as important factors in the clinical manifestations of cancer patients. The relationship between neutrophil to lymphocyte ratio (NLR), platelet to lymphocyte ratio (PLR) and lymphocyte to monocyte ratio (MLR) and the prognostic and clinicopathological significance in patients with ESCC have been reported by many studies. NLR, PLR and LMR might be served as prognostic markers in patients with ESCC [29]. Peripheral blood cell count ratio was suggested to evaluate clinical response and prognosis of patients with non-surgical ESCC [30]. Serum TT may be an important factor in prognosis of ESCC patients confirmed [31]. Preoperative serum FIB was validated to verify survival of ESCC, especially for the early pathological TNM stage (I-II) and N0 patients [32]. The nomogram combined with C-reactive protein (CRP) / ALB ratio could be used as a predictive model for the efficacy and survival outcome of thoracic ESCC treated with received chemo radiotherapy (CRT) or single radiotherapy (RT), which was found by zhang research [33].

Composite model for predicting survival level of patients with EC 3.1 K-M survival analyze
K-M survival analysis is a method to analyze the result and process of an event. K-M considers not only the occurrence of the event, but also the duration of the event. Therefore, survival analysis is also called time to event analysis. Survival analysis is very common for the study of survival time of cancer and other diseases in the medical field [34,35]. In order to analyze the influencing factors of survival time of EC patients, the blood indicators of patients are used as input and the survival time is used as output. The statistical software SPSS20.0 is used for K-M survival analyze. The accuracy of variable selection is determined by the size of correlation.

GWO algorithm
GWO is a new intelligent optimization algorithm proposed in 2014. The population system and predation behavior of grey wolf are imitated by GWO. In the Figure 1, the goal of optimization is achieved by simulating the hunting process of wolves. The wolf pack is composed of 5-12 wolves, which can be divided into 4 grades according to the fitness value. The wolf in the first layer of the pyramid is the leader wolf, which is expressed as , and has the decision-making power on all major issues of the whole wolf pack. The wolf in the second layer is represented as β, which helps the leader wolf to make decisions. The wolf in the third layer is represented as δ, which is responsible for sentinel, reconnaissance and other tasks. The wolf at the bottom is denoted as ω, which is under the command of the first three levels of gray wolf. In the process of predation, α, β, δ wolves constantly change their positions to pursue prey, and the remaining gray wolf ω follows the first three, and the optimal solution is the specific location of prey. Due to the uncertainty of the location of gray wolf, the distance between each wolf and its prey is expressed as follows : r2 is a random number at [0,1]. ( ) is defined as the convergence factor. The custom maximum number of iterations is expressed as max. For the three wolves, there are the following mathematical descriptions: The position of the next generation wolf is defined by equation (4).

BP neural network algorithm
(1) Determine the input layer, hidden layer and output layer The number of nodes in the input layer, hidden layer and output layer of the network is expressed by l, m and n, respectively.
is a random number in the range of 1-10,in Figure 2. The initial weight between input layer and hidden layer is determined by ij, and that between hidden layer and output layer is determined by vjk. The threshold of hidden layer is represented by a, a= ［a1，a2，…，am］ 。The threshold of the output layer is expressed by b, b=［b1，b2，…，bn］. Figure.2 Structure diagram of neural network (2) Calculate hidden layer output In formula (6), the output of the jth neuron in the hidden layer is ℎ . is the input of the i th neuron in the input layer.
is the jth threshold of the hidden layer.
(3) Calculate output of output layer The kth threshold of the output layer is represented by . The output of the output layer is .
(4) Update connection layer weights The following objective functions are defined: In equation (8), A is the number of training samples and n is the number of output nodes. is the expected output of sample s; is the output of the kth output node under the action of sample s.
(6) Judge Whether the algorithm reaches the maximum number of iterations is judged. If the maximum number of iterations is not reached, return to step (2). If the maximum number of iterations is reached, the network training ends.

GWO-BP neural network algorithm
The convergence speed of BP neural network is slow and easy to fall into local minimum.
Therefore, GWO algorithm is used to enhance the global search ability. As shown in Figure 3, the gray wolf position is taken as the weight and threshold of BP neural network, and the gray wolf algorithm is iterated for many times. The location of prey is continuously judged and updated by gray wolf. The threshold and weight of BP neural network are constantly updated to calculate the global optimal result. The steps are as follows: Figure. 3 Flow chart of GWO-BP neural network Step 1: selecting appropriate training samples. The variables selected by K-M survival analysis are used as training input samples.
Step 2: the establishment of BP neural network model. The number of input layers is l. The number of output layers is n. The number of hidden layer neuron nodes m, as shown in formula (5). a is an arbitrary constant from 1 to 10. Therefore, after many experiments, it can be concluded that when a is 5, the convergence speed and fitting accuracy of the neural network model are the most suitable in the table.
Step 3: initialization of GWO optimization algorithm. The optimal positions Xα，Xβ and Xδ are initialized.
Step 4: calculating individual fitness value. The weights and thresholds of BP neural network are set as the object of GWO algorithm. The error sum of each neural node of BP neural network is used as the fitness function of GWO optimization algorithm to measure the individual position, and the position of the current optimal fitness value is obtained.
Step 5: updating the parameters r1, r2, q in GWO. According to formula (1) and equation (2), the position of each wolf was updated, and a new BP neural network is constructed and trained. According to equation (12), the fitness function value of each wolf is calculated, and the new α, β, δ are determined again.
Step 6: determining the number of iterations. When the number of iterations reaches the upper limit, GWO optimization algorithm is finished, and the optimal initial weights and thresholds of BP neural network are obtained. If the number of iterations does not reach the upper limit, return to step (5).
Step 7: output of prediction results. BP neural network is trained and evaluated according to the weights and thresholds optimized by GWO optimization algorithm, and finally the prediction results are obtained.
In the process of building the network, Matlab simulation software is used to update the individual position in GWO optimization algorithm until the number of iterations reaches the set value. As shown in the Figure 4, the optimal fitness value of GWO optimization algorithm before the number of iterations reaches 500. The optimal initial weights and thresholds of BP neural network are obtained by GWO optimization algorithm. When the number of iterations is 300, the optimal fitness value of GWO optimization algorithm before the number of iterations reaches 300. 300 iterations and 500 iterations are compared to calculate the speed and optimal value, and the optimal number of iterations 500 is obtained.  In the first step of prediction modeling, relevant data need to be obtained in Table 1. The input and output data in modeling are preprocessed to obtain accurate and applicable sample set. In view of the nonlinear complexity of the patient's blood system, K-M survival analysis is used to screen the input variables. The purpose of screening variables is related to survival level, and irrelevant variables are deleted. The significance of chi square value is less than 0.05, and the two variables are significantly correlated. The degree of freedom refers to the number of variables whose values are not limited when calculating a unified measurement. Significance refers to the risk level of rejecting zero hypothesis when zero hypothesis is true, also known as probability level, or significance level.

Performance test of GWO algorithm
In order to verify the validity and generality of GWO, 23 benchmark tests are selected to test GWO algorithm. Among them, F1 and F2 are unimodal test functions, and F9, F11 and F13 are multi peak test functions, as shown in Table 2. Salp swarm algorithm (SSA), differential evolution (DE), particle swarm optimization (PSO), ant lion optimization (ALO), dragonfly algorithm (DA) and GWO are selected for comparative study. In order to make the algorithm fairer, the parameters of the five algorithms are set as follows. The population size is set to 30 and the cutoff iterations are set to 500. In SSA, c1 is between 0 and 2, c2 and c3 are random numbers between 0 and 1. In DE algorithm, the scale factor is set to 0.5 and the crossover constant is set to 0.2. In PSO algorithm, the maximum value of inertia weight is set to 0.9, and the minimum value is set to 0.4. The learning factor of PSO algorithm is set to ca = 2.5, cb = 0.5, and the maximum limit speed is set to 1. In ALO algorithm and DA the same dimension as GWO. The convergence accuracy and convergence rate of the algorithms are evaluated.      Under the condition of the same population size and maximum iteration times, 30 tests are conducted on 23 functions by SSA, DE, PSO, ALO, DA and GWO. The average value and standard deviation are used as statistical data to observe the experimental results in Table 3. No matter in unimodal function or multimodal function, GWO is superior to other algorithms in convergence accuracy and stability. Therefore, GWO has good global convergence performance. Optimal fitness value diagram of benchmark functions are given in Figure 5, Figure 6, Figure 7. By comparing the advantages of the six algorithms, GWO has the advantages of fast computing speed and low fitness.

Prediction of survival level of patients with EC by K-M-GWO-BP
In the construction of BP network model, the input layer is determined by the number of influence variables. The number of output layers is determined by the number of prediction. The number of input layer and output layer is 5 and 1 respectively. There is no unified way to determine the number of hidden layers, but it plays an important role in the accuracy of the prediction model. The number of hidden layers is selected by comparing the training errors under different numbers of hidden layers. Select the number of hidden layers from 3 to 13 for BP network training, and get the results as shown in Table 4 through 10 experiments. When the number of hidden layers is 11, the training error is 0.0173, and the training result is the best.   In order to comprehensively reflect the performance of the K-M-GWO-BP prediction model, the prediction results are evaluated by three indexes: the average value of the absolute error, the variance of the absolute error and the average value of the relative error. The agreement between the predicted value and the real value of test data in the prediction model is reflected by the average value of absolute error and relative error. The smaller the corresponding value is, the higher the prediction accuracy of the model is. The variance of the absolute error reflects the fluctuation of the difference, and the smaller the value is, the more stable the prediction result is. The predicted results for the normalized data are given in Table 5, with an average absolute error of 3.4156 for K-M-GWO-BP. The average relative error of 0.3277 is smaller than that of BP, PSO-BP, GA-BP, SSA-BP, GWO-BP, K-M-BP, K-M-PSO-BP, K-M-SSA-BP, K-M-GA-BP indicating that the prediction accuracy and fitting degree are higher. The average absolute error of PSO-BP is 7.3707 and the average relative error is 0.8831 higher than that of BP prediction model. The absolute error variance of BP is smaller than that of K-M-GWO-BP prediction model. The absolute error of the models without K-M analysis is given in Figure 8. The absolute error of K-M-GWO-BP model is minimum. The comprehensive results show that the K-M-GWO-BP prediction model has better training accuracy and prediction effect.

Discussion
In this paper, a comprehensive model for predicting the survival level of patients with esophageal squamous cell carcinoma based on K-M survival analysis and gray wolf optimized backward propagation neural network is proposed. In view of the strong coupling and nonlinear characteristics of patient blood sample data, the sample data are analyzed by K-M survival analysis to reduce the impact of data correlation on modeling accuracy. On the basis of obtaining all kinds of sample data, the corresponding BP neural network model is distributed and constructed. The grey wolf algorithm with global optimization ability is used to optimize the parameters of error back propagation neural network, which avoids the blindness of artificial parameter selection and improves the prediction accuracy of the model. In this paper, 17 factors are found. Because there are many influence factors, the correlation is large. Reducing screened blood factors for ESCC is our next goal to improve the accuracy of predicting survival.