Database task processing optimization based on performance evaluation and machine learning algorithm

One of the key components of artificial intelligence algorithms is machine learning, involving a variety of fields, and has been applied in many artificial intelligence systems, including computer vision algorithm, radio network algorithm, medical diagnosis algorithm and intelligent robot system algorithm. In the form of machine learning algorithm, the machine learning module of the algorithm is first used to calculate the consumption, the main performance modules are optimized and improved, and the system data under database optimization are obtained, and selected the optimized structure of the database for calculation, data analysis for the calculation results. Finally, in the design of the database optimization system, the separation of the database system storage engine is studied, and the database optimization form under the data processing structure is proposed. In terms of performance and functionality and reliability, it helps to solve the loss problem caused by big data processing in transmission. In the research of data intensive downward moving calculation process, the optimized solution of data processing is estimated. The results show that using computer terminal sampling comparison to select the executable data processing scheme. The result of this paper shows that it can improve the calculation efficiency of data optimization system query.


Introduction
Machine learning is an important research direction of arti cial intelligence.At rst, it is conducive to the study of learning and computing ability, which makes the eld of arti cial intelligence develop rapidly [1].
So far, the form of big data has had an impact on people's life, research, work and other aspects.The earliest researches on machine learning algorithms all set certain characteristics of data, such as single data structure, relatively clear concept, corresponding data mode, and relatively stable static structure [2].
When the data has the above characteristics, in the existing machine learning theories and algorithms, the intelligent processing results of the data can be realized.However, back in the era of big data development, there are often many kinds of data structures [3].Their characteristics such as diverse concepts, diverse data patterns, and variable scale and dynamic forms have brought great challenges to the application of traditional machine learning algorithms.Under this challenge, many technology companies, such as Microsoft, Amazon and Google, have established research stations to study machine learning algorithm technology, and have conducted in-depth mining of the huge business value hidden in big data [4].It can be seen that in the future, the research of machine learning industry will be more extensive and closely combined with industry, and develop together, and promote the development of information technology industry [5].
Data processing and storage capabilities are not suitable for the development of big data today.In the context of the development of big data, the data processing ability does not adapt to the development of storage technology.Problems such as large-scale data migration or movement make the system lose its computing power, functionality and reliability.The traditional data processing mode is complicated due to the increase of data movement cost, and the computing center is gradually transformed into a data center [6].At the same time, it also drives the progress of database optimization structure.The core of database system optimization is to reduce the data movement errors between computing nodes and storage points by using the processing of moving down data, and improve the overall operation e ciency of the system.But there are certain limitations in storage devices and processing [7].The earliest research on the scheme only moved down the simple data, but the current data volume is increasing and the data intensive environment is complex, which makes the current data processing optimization scheme imperfect [8].Therefore, fully developing the function of memory and realizing the operation of data processing in the form of database system optimization can solve the current environmental problems [9].

Related Work
A series of methods are proposed in the literature to optimize the execution speed of neural network on CPU, in which data is used to replace oating-point data, and good results are obtained.The literature proposes to use k-means to quantify the parameters, and only 1% accuracy is lost on Imagenet, but 16-24 times compression ratio is obtained [10].With the mobility of the network, pruning can often get good results.The redundancy of neurons is proposed in the literature, and a data independent pruning method is proposed to remove redundant neurons.It is proposed in the literature that hashnet is used to compress the network, and the compression ratio is 8-16 times under the condition that the generalization is almost not lost.In the literature, Huffman tree is proposed to compress the network parameters, and a complete framework is designed.Pruning, quantization and Huffman tree compression methods are used to compress the network [11].The compression rate reaches a certain level under the condition of accuracy, and the speed is increased by 3 times on the CPU.A general near data processing framework proposed in the literature.It does not need to change the existing hardware structure, can optimize different data processing applications, and has high versatility [12].Biscuit implements a C + + function library on the storage device and the host, which runs in the host side program and SSD side module respectively.The user can call these two function libraries through ow programming to realize the custom data ltering task.It is proposed in the literature that a highperformance DBMS based on near data processing is designed and implemented under the support of biscuit framework [13].Yoursql is developed based on MariaDB, and some data operation tasks of the database engine are completed by the SSD internal processor.It is proposed in the literature to publish papers on the basis of smart SSDs.The upgraded SSDs adopt new interfaces to improve the internal and external bandwidth, and can transfer more complex database operations scan and join operations to SSD devices [14].Compared with the traditional storage SSD which directly transfers data to the memory of the host computing node, the framework completes the scan operation by designing a scan controller inside the SSD.Take advantage of the large internal bandwidth to quickly generate database tables and return them to memory for the next operation [15].The literature provides theoretical formula derivation and simulation experiments.The experimental results show that under the smart SSD framework, the performance of scan, join and the combination of the two operations is 7 times, 5 times and 47 times that of traditional devices.In terms of energy consumption, SSD only increases energy consumption by 1%, which is 45 times that of the traditional model.It is proposed in the literature that smart SSD and yoursql both select appropriate data intensive operators to move down to the storage device SSD for calculation, which is an important research on database optimization under the near data processing framework.In addition, non relational (NoSQL) database is one of the most rapidly developing directions in the current database industry [16].There are many different structures in physical storage, such as key value pair storage, column storage, document type, graph database, etc.It also needs to design a near data processing computing framework to move data intensive operations down.Data intensive operation optimization for database is an important approach to the practical application of near data processing.In the literature, an adaptive data partition scheme is proposed to improve the query e ciency of spark SQL, but there is still no index for the table [17].Some researchers also try to establish ne-grained indexes on tables.In the literature, temporal indexes are established for temporal big data and spark SQL parser is extended, which makes the performance of temporal queries better than that of native spark SQL.A two-level index scheme of HBase massive data based on Solr is proposed in the literature.The literature also establishes a secondary index for HBase, so the query e ciency can be improved by establishing a secondary index for structured data.

Basic concepts
Among machine learning algorithms, clustering algorithm belongs to unsupervised learning algorithm.It is a model that completes the training data set in the form of no label.It is mostly used for classi cation and decision-making algorithms.The prediction basis of this algorithm is based on the optimized form of data.Clustering algorithm and classi cation algorithm have the same purpose, but for different kinds of target data sets, different classi cation forms need to be used for calculation.Some calculation forms are also applied to the models of optimization regression calculation, classi cation calculation and clustering calculation.Such as gradient value reduction, back propagation algorithm.The following is the application of several typical machine learning algorithms. (

1) Regression algorithm
The regression algorithm is used to predict the model construction of continuous variable data.In the calculation formula of regression algorithm, the form of marked data set is used, and the output data is determined by the input data calculation, so it belongs to supervised learning algorithm.The most common form of regression algorithm is linear regression, which is to draw up the form of the data set with a straight line and convert the variable relationship of the data set into a linear one, so the probability of tting is high.Therefore, the advantage of the linear regression algorithm is that it is easy to understand and can avoid the defective calculation caused by over tting.In addition, gradient descent is also a prediction form under the linear algorithm model.Assuming that the relationship between the coordination variable and the response variable is linear, the linear equation algorithm is extremely applicable.The focus of the algorithm is shifted from statistical calculation to data analysis and processing calculation, so linear regression is also conducive to data analysis and processing.However, the linear calculation method is not recommended in most practical applications, because it oversimpli es the defects in practical applications and may lead to prediction errors and other problems.
(2) Clustering algorithm Data clustering algorithm is also called unsupervised classi cation algorithm.It is to establish a separate technical algorithm in a group of data objects, which makes the data of a whole group very similar.However, the data in different groups are different.The main feature of clustering data is to nd patterns, feature points and the number of samples in a group.In the machine learning algorithm, the clustering algorithm of data is applied to the actual prediction through the steps of technical segmentation, aggregation and marketing, so the clustering data has been widely used in the actual life in some elds.
(3) Gradient descent Gradient descent calculation is a form of iterative optimization, which aims to minimize the value by using the cost function.However, the calculated slope and gradient value are calculated in the form of derivatives.During each iteration, the coe cient value is used to subtract the product between the gradient value derivative and the learning rate, so that the local value after iteration reaches the minimum value.Gradient descent algorithm has the following disadvantages: If the speed is too slow, the gradient descent may never converge because it is trying to nd the local minimum accurately.The learning speed will affect which minimum value is reached and how much speed is reached.It is also the best way to change the learning speed.As the errors begin to decrease, the predicted value will also decrease.

Mathematical model
In recent years, machine learning algorithms have greatly improved the amount of computation and the storage system, which makes it more di cult to build a model of memory storage devices.For the platform model of machine learning algorithm, the optimization model can be used in multiple devices, and the compression technology can also solve some problems with large storage capacity.The mobility of the model is actually based on the adjustment algorithm under the machine learning algorithm, which has no comparative value with the traditional computing model.However, the optimization of the model is the best solution to the migration itself.
As shown in formula (1): 1 Formula (1) shows that when q is the loss function value of positive in nity, its actual use value is also calculated by Q.The data in the abstract form is n × P independent and non-interference vectors, as shown in formula (2): Where n represents the number of sample data and X represents the characteristic value of the ith sample.The marked data is n mutually independent values as shown in formula (3):

3
The loss function under the optimized machine learning calculation method is shown in formula (4): Where n represents the data itself.In the predictive machine learning algorithm model, when the teacher model and the student model are xed at the same time, the loss function of the student model obtained is:

5
The results of the input data model under the machine learning algorithm can also lead to the failure of the model to be applied to the loss function optimization, so it is necessary to optimize another part of the loss function under the original target.The cross loss function formed between the output value and the target value is used to predict the classi cation data.Its de nition is shown in formula (6): 6 There are many advantages in applying adversary calculation to predict the loss function under the student model.The improved optimization method of the predicted machine learning model needs to consider the loss function under the special form.When the purpose, design concept and other directions are different, the loss function generated can not get the optimal solution in most cases, so it is not conducive to the application of machine learning algorithm.

Simulation results
The network model structure is designed for the model of machine learning algorithm.In order to t the characteristics of the student model and the classroom model, the stage sample structure similar to the teacher model is used.The speci c structure is shown in Fig. 1.
As can be seen from Fig. 1, under the combination of student model and teacher model.Several network structures designed and optimized together can better describe the characteristics of machine learning algorithm model.Table 1 analyzes the comparison results between the prediction of machine learning algorithm model optimization and traditional data schemes under different teacher models.Teacher refers to the teacher model, student refers to the student model, and KD refers to the student model predicted under the machine learning algorithm.1 that by comparing the optimization data under the traditional machine learning algorithm model, it can be found that the use of confrontation algorithm can better improve the accuracy of the machine learning algorithm.

Analysis of database optimization requirements
The requirement of database optimization is the state information of each node of the database, such as CPU utilization, load degree of the database system and internal storage, and the performance module of database optimization is transmitted in real time through data.The most important part of database optimization is to accept the transmission of dynamic data, which involves several applications: the form of real-time data transmission; Data dynamic sending address and carrier; Transformation curve form of real-time data, etc.The database optimization process is shown in Fig. 2.
As can be seen from Fig. 2, the process of database optimization mainly includes: the overall use of the data set, the use of each terminal space, the load of each server node, and the use of storage and CPU.The load, storage and CPU usage are real-time dynamic data, and the overall usage of the data set can also be transmitted in real time.

Task processing model
For the problem between the server terminal and the edge value, a service terminal coordination mechanism is proposed.When the server terminal thinks that the edge value is not credible, it can only complete the task data of the server terminal according to the task load, computing capacity and storage status of the server terminal, and the remaining tasks are placed in the processing of the remaining edge values.The advantages of this idea are that the server terminal and the edge value state are simple to operate, the computational complexity is low, the decision can be completed in the limited numerical information, and the decision result can be guaranteed.
The modeling results of the task scheduling model between the server terminal and the edge value are as follows: Object: From the above formula, it can be seen that the task scheduling of each server terminal is independent and does not affect each other in the task scheduling of the server terminal and the edge value.The state arrangement of the task scheduling problem is performed on each server terminal.In the conclusion of traditional task scheduling, a heuristic computing method is proposed which can effectively solve the scheduling of server terminal and edge value tasks.This method is based on the optimized dynamic calculation form.
The rst step to solve the problem by using dynamic calculation is to determine the equation of migration.The equation of state transition is: 12 The standard for selecting the variable K is to ensure that the value of the successdp [k] before completion is maximized on the basis of the successdp [k], that is, the formula ( 13) is satis ed: 13 The modeling structure in the task scheduling of edge value is shown in ( 14): Object: 14 Subject to: It can be seen from the formula that only when the completion time is less than the deadline can the task be considered successful.However, if the data transmission time and service time under non edge computing are less than the deadline, it can be judged that the task is completed.In the formula, if the edge value of the server terminal connection of the task cannot be compared with the deadline, the task will be migrated to the database system for optimization.The task transfer market is also the database system time given by the migration task, so the time that the migration task of the server terminal arrives at the database system is cloudtransi as the migration cost.

Optimization analysis
First of all, the size and distribution of the table data are known results.Therefore, sample the data and calculate the size of the database system to obtain the growth multiple of the data.In order to ensure the accuracy and rationality of the data, the sampling operation shall be repeated for many times, and the nal average result shall be taken,.When the amount of data is too large, the error of sampling can be ignored, and the actual storage size m of each node is:

21
Where M 'represents the size of the data stored in the node database, P represents the number of data when approaching M', represents the actual size of the le, and the unit is MB.Because f is larger than the actual available storage size m of the node, the optimization shall be performed in batches.The total number of batches t is expressed by formula (22): Page 11/16

22
The formula shows that after t rounds of database system optimization processing, the maximum amount of data to be imported is m, which is equal to its storage capacity.Read on K nodes, and each node has y multi-threaded equations, so all data can be transmitted through t rounds.In addition, to ensure the data volume f of the node, load balancing and other operations should be performed on the database.
The generated database optimization test data set is investigated.Compared with the traditional database system, the operator moves down indiscriminately and the operator moves down selectively.
The comparison results of optimized data under the same data scale form and the same investigation operation form are shown in Table 2. From Table 2, it can be concluded that the survey time is in the two processing forms, and the survey time is moved down compared with the form of using the storage engine, which reduces the migration calls of operators and reduces the data transmission form; In terms of the optimized storage data mode and combination of database, the application of system power consumption function can reduce the power consumption of the system,and the combination of these two processing modes can reduce the power consumption more; In terms of stability, the two modes improve the investigation e ciency of the system in terms of reducing data transmission.For the optimization of the traditional database, the server should be buffered to the running mode to make the data transmission more stable.

Test results
The performance indexes in the form of database optimization are analyzed and compared according to the optimization process, and the comprehensive index values under the algorithm are obtained.The experimental results are shown in Fig. 3.
From the database optimization index in Fig. 3, it can be seen that the default database optimization algorithm has a great impact on the distribution of data.Once the data inclination is large and the development is unbalanced, it indicates that the load of the database is too heavy and the index rises.The load of other forms of databases is small, so the index is low.
The batch database optimization algorithm does not change due to the distribution of data.The data optimization indexes between each database are relatively balanced, so the performance of the indexes is relatively consistent.The comprehensive index analysis is shown in Fig. 4.
As can be seen in Fig. 4, the default database optimization index is generally high, and the batch database optimization index is generally low.Among them, database 3 has the most signi cant effect in database optimization.

Conclusion
So far, the actual application industry of machine learning algorithm is far less than the research eld.
Besides the development of professional knowledge and skills, one of the reasons is the di culty of machine learning algorithm optimization.In the optimization of machine learning algorithm, the resource conditions for training cannot be generated.The server system, mobile network equipment and IOT equipment are applied to the optimization equipment of machine learning algorithm.Therefore, the optimization processing form of database task is required for the demand model, so that the equipment has the required computing capacity.To strengthen the bottleneck period of large-scale systematization of data in large-scale data environment, the near data processing application uses the new database optimized storage system to store, which can effectively reduce the waiting time during data transmission and improve the e ciency of system calculation.Using the near data processing technology to optimize the processing, the database system is taken as the overall research direction, and the optimization forms such as the storage engine and the investigation engine of the database are taken as the research objects.The research database optimizes the calculation performance, reduces the time of data transmission, and improves the overall system e ciency.

Figure 1 Design 2 Database optimization process Figure 3
Figure 1

Figure 4 Comparison
Figure 4

Table 1
Comparison effect of prediction under adversary algorithm in classi cation problem

Table 2
Near data processing mode of operator down shift