## 2.1 Classification Optimization Algorithm based on Neural Network

Classification is an important pattern recognition and data mining technology. Its function is to find out the model or function that describes and distinguishes the data class or concept according to the characteristics of the data set. There are two key steps to realize the classification: (1) according to the training data set of known categories, a classifier is trained and constructed to describe the predetermined data class set or concept set. (2) Use the obtained model or function to classify unknown objects.

The implementation of classification generally has the following three processes:

(1) Data preprocessing: the purpose is to optimize the data set according to the requirements of classification to ensure the accuracy and effectiveness of classification.

(2) The process of constructing classifier is divided into training stage and testing stage. In the process of general classification, the data set needs to be divided into two parts, one is used for classifier training, other is used for classifier testing. In the training stage, we use certain rules or algorithms to learn the attribute data in the training data set. Because we know the Category attribute value of each training sample in advance, this process is also known as tutor learning. In the test phase, the test data set is used to evaluate the classification accuracy of the classifier. The test data set is input into the classifier for classification, and the predicted class is used to match the known target.

(3) After the accuracy evaluation, if the classifier can meet the needs of users, it can be used to classify other data.

There are many methods to construct classifier, which are widely used in decision tree, Bayesian algorithm, association rule, support vector machine and neural network.

Decision tree classification algorithm, also known as greedy algorithm, is a case based inductive learning algorithm. The decision tree algorithm is represented by ID3 algorithm and C5.0 algorithm. The algorithm is constructed in a top-down way. It constructs the tree classification information from a group of unordered and irregular instances. The node of the tree is a recursively selected test sample attribute. Once an attribute appears in a node, it will not appear in the descendants of that node. In the process of constructing decision tree, it is necessary to prune the tree according to some rules.

The advantages of decision tree classification algorithm are: first of all, its construction process is relatively simple, the calculation is small, and the construction rules are easy to understand. Secondly, the decision tree can be constructed quickly without any solution, and can process non numerical data. The disadvantage of decision tree classification algorithm is its poor scalability. When new instances are added, the original decision tree cannot be used anymore, so a new decision tree needs to be rebuilt. Decision tree is sensitive to noise, so when the data quality is poor, the effect of building decision tree is not ideal.

The condition that Bayesian algorithm can achieve a good classification effect is that the attributes of the samples to be classified are independent of each other, otherwise, it will greatly affect the classification accuracy, and cause great computational overhead. And Bayesian algorithm only has good classification effect under the premise of large sample size data, so it is not suitable for small sample data.

Apriori is the representative of association rule classification algorithm, which is a kind of classification algorithm through association rule discovery method. Apriori algorithm divides the process of discovering association rules into two steps: the first step is to retrieve all frequent item sets in the transaction database through iteration, that is, the item set whose support is not lower than the threshold set by the user. In the second step, we use frequent itemsets to construct rules that satisfy the minimum confidence of users.

The key of artificial neural network to simulate the thinking mode of human brain lies in the following two points:

(1) Information is stored on the network through the distribution of excitation patterns on neurons.

(2) Information processing is completed by the dynamic process of simultaneous interaction between neurons.

Artificial neural network has the following characteristics:

(1) Not limited. The network consists of a large number of neurons, in which the structure and function of a single neuron are relatively simple. The function of the whole system depends on the interaction between neurons.

(2) Nonlinear. The neurons in the network simulate the characteristics of biological neurons, with two states of activation and inhibition, which determines the nonlinear characteristics of the network.

(3) Non-convex. In some cases, some functions may have multiple extremums, which correspond to multiple stable states of the system, and each stable state may represent the ability of the system to evolve in a certain direction. Therefore, the nonconvex feature of neural network is that it may have multiple stable equilibrium states, and then can evolve in multiple directions. Therefore, neural network has the characteristics of evolutionary diversity.

(4) Non constant quality. According to the process of information processing, the network will change simultaneously with the nonlinear dynamic system, which is the ability of self-organization, self-learning and self-adaptive of neural network.

## 2.2 Neural Tree Network Model Optimization Algorithm

The optimization of neural tree network model can be divided into two parts: topology optimization and parameter optimization. The former can be optimized by tree coding evolutionary algorithm with special instruction set, and the latter can be optimized by related parameter search algorithm. Therefore, in order to improve the efficiency of neural tree network model optimization, we need to balance the two optimization processes in the neural tree network model learning algorithm. The specific steps of the algorithm are as follows:

Step 1: initialize the relevant parameters. Set the members of function set and terminal set, and create the initial evolutionary population according to the related parameters.

Step 2: use the selected topology optimization algorithm to optimize the topology of the neural tree network model.

Step 3: if the topological structure of the model meets the requirements, turn to step 4, otherwise turn to step 2, and continue to optimize the topological structure of the model.

Step 4: use the selected parameter search algorithm to optimize the relevant parameters in the optimal model found in step 3.

Step 5: if the stop condition is met, turn to step 6, otherwise turn to step 4, and continue to optimize the relevant parameters of the model.

Step 6: if the optimal neural tree network model satisfying the conditions is found, the algorithm stops and outputs relevant information. Otherwise, step 2 will be followed.

The parameter optimization of neural tree network model generally uses the common optimization search algorithm, while the topology optimization of neural tree network model uses the evolutionary algorithm based on tree coding.

The function set F and terminal set T of the neural tree network model are as follows:

The depth d of the neural tree network model satisfies the following condition:

Where Dmax is the maximum depth allowed by the neural tree network model for practical problem.

The problem of terminal node repetition in the neural tree network model refers to that in the process of population creation or evolution, some individuals may have one or more identical terminal nodes in the child nodes of several function nodes, as shown in Figure 1. The thick line in the figure indicates that the terminal nodes of each branch are the same. When the size of terminal set T is small, the problem becomes more prominent.

In fact, multiple repeated terminal nodes under the same function node can be combined into one terminal node by weight accumulation. The specific operation process is shown in Figure 2.

The repetition of a large number of terminal nodes will waste the storage space of the individual of the neural tree network model, and affect the evolution speed and accuracy of the whole population. Therefore, it is necessary to control the selection of terminal nodes under the same function node in the neural tree network model, so as to avoid the repetition of multiple terminal nodes.

Neural network model can be divided into many groups of small units, each group of small units includes a neuron and its connected adjustable weights. Such small elements are called linear threshold elements, which are represented by M-N model.

Set in a linear threshold unit, the input is n-dimensional variable, the input is 1-dimensional variable, and the activation function adopts threshold function, as shown in Figure 3.

Expressed by mathematical formula:

It is found that the linear threshold element can realize any Boolean product or any Boolean summation term. The feedforward network composed of linear threshold unit can realize any Boolean function.

Perceptron is a kind of neural network which can imitate biometrics, which is proposed by F. Tosenblatt. It is composed of middle layer and output layer. The weight from the input part to the middle layer is fixed, while the weight from the middle layer to the output layer can be adjusted. Because the perceptron has only the output layer, it can only deal with the linear separable problem, but not the more complex nonlinear separable problem. In order to further deal with the nonlinear separable problem, a multilayer feedforward network is needed.

## 2.3 Neural Network and Intelligent Cluster Optimization Algorithm

Intelligent algorithm is also called "soft computing". The intelligent algorithm imitates the phenomena in the nature, and abstracts and simulates its process to solve the problem. With the development of computer and information technology, the types and scale of problems to be solved are expanding. Intelligent algorithm can solve the problem of large-scale complex system well, and has the advantages of universal, stable, easy to parallel implementation and so on. Although the rise time of intelligent algorithm is not long, it has been widely concerned and applied in many complex problems.

Classic intelligent algorithms include simulated annealing algorithm, genetic algorithm, swarm intelligence algorithm, etc. neural network is also one of the main schools of intelligent algorithm. The combination of other intelligent algorithms and neural network effectively assists the neural network modeling process, optimizes the network structure and parameters, improves the local minimum of the network, and slow convergence speed, which finally achieves more accurate calculation results.

In today's information developed and data explosion society, the scale of data sets grows exponentially. In the face of large-scale data, the traditional neural network modeling methods are often powerless, because the internal algorithm of neural network is relatively complex. When faced with large-scale data sets, the computing cost is huge, so it needs to occupy a lot of memory space, which makes it difficult for general computers to undertake their computing tasks. Using neural network to deal with large-scale data sets has become an urgent problem, which leads to the combination of neural network and cloud computing. As a new distributed computing platform, cloud computing can maximize the use of computing resources and provide users with reliable, customized services.

The research of swarm intelligence algorithm began in the early of last century. Its basic idea is to construct stochastic optimization algorithm by simulating the shape of natural organisms. Ant colony optimization algorithm is a random search algorithm. Based on the study of the collective foraging behavior of real ant colony in nature, the algorithm simulates the process of ant colony cooperation to obtain the optimal solution. The solution path is constructed by several ants, and the quality of the result is improved by exchanging pheromones. The main characteristics of ACO are positive feedback and implicit parallelism. The positive feedback mechanism can find the optimal solution quickly. The implicit parallelism can prevent the algorithm from falling into the local optimal solution and make the algorithm converge to a subset of solution space by exchanging pheromones among multiple individuals. This is conducive to further search the solution space and find the optimal solution.

Fuzzy logic system and neural network belong to different scientific categories. Fuzzy logic system belongs to the category of system theory, which can describe and deal with the concept of fuzziness in human language and thinking, and imitate human intelligent behavior from a macro perspective. From the view of knowledge expression, fuzzy logic system can express human experience knowledge. From the perspective of knowledge storage, fuzzy logic system stores knowledge in rule base. Fuzzy system and neural network are both model free and nonlinear dynamic systems, which have the characteristics of knowledge storage and parallel processing. Their differences are manifested in knowledge expression and related reasoning methods, and these differences are complementary. The neural network provides the connection structure and learning ability to the fuzzy logic system, and the fuzzy logic system provides the neural network the structure framework with the advanced fuzzy thinking and reasoning ability.

## 2.4 Common Variable Filtering Methods

There are many methods for variable selection, among which principal component analysis, factor analysis and discriminant analysis are common methods.

Using the idea of dimension reduction, principal component analysis transforms multiple indexes into a few comprehensive indexes. It transforms a given set of correlated variables into another set of uncorrelated variables through linear transformation. These new variables are arranged in decreasing order of variance, so the new independent variables formed are the functions of the original independent variables. Then through the threshold value set by the user, the variables with correlation higher than the threshold value are selected in the new independent variables. Principal component analysis expresses principal component as linear combination of variables, and explains the total variance of variables.

The basic purpose of factor analysis is to use a few factors to describe the relationship between many indicators or factors. In this method, several closely related variables are classified into the same class, and each class of variables becomes a factor. This method reflects most of the information of the original data with a few factors. Different from principal component analysis, factor analysis expresses variables as linear combinations of factors, focusing on the covariance between variables, and it needs to assume that there is no correlation between factors. In factor analysis, the number of factors is set by users.

According to different criteria, the discriminant analysis method establishes a better discriminant function for discriminant analysis. There are Fisher, Bayes, distance method and so on.

The research of genetic programming algorithm shows that cross operation is the main way to generate new generation population. Therefore, this paper introduces the "building block library" to store the individual sub modules of the neural tree network model in the evolution process.

The crossover operator of genetic programming algorithm is realized by exchanging the children of two parents. Different from the standard genetic programming algorithm, the crossover operator of this algorithm only produces one descendant, and allows the fitness of the child to be exchanged in the parent to guide the crossover operation.

The steps are as follows:

(1) Select the child (a) with the maximum fitness value from the parent (Pi).

(2) Select the child (b) with the minimum fitness value from the parent (Qi).

(3) The subtree (a) in the parent (Pi) is replaced by subtree (b) to produce the offspring (Oi).

(4) All sub individuals are considered as potential building blocks.

Figure 4 shows the operation process of crossover operator with fitness mark. Among them, the left is the parent, the right is the offspring, and the number of potential blocks in the offspring is 3.

For neural networks, topology design is an important content. If the structure design is too complex, the learning process will be complicated, and there will be over fitting phenomenon, which will reduce the actual availability of neural network. In addition, when the structure of neural network is fixed, the network parameters such as adjustable weights are limited, so the searchable space of the solution is also limited. When the structure of neural network is designed without necessary experience, the search space of parameter solution is very large, and different structures will cause many kinds of permutations and combinations of parameters, so the structure design of neural network is a very important and complex problem. When there is no necessary experience in the structural design of neural network, the simplest network that conforms to the sample input variables can be selected. To sum up, in order to design an artificial neural network in a practical project, it can be summed up as two kinds of optimization problems: the learning of network weight parameters and the reasonable design of network topology.