Using AI technology to customize manufacture product label for decision making

When the manufacturing industry is dealing with information technology, it has to face a large number of parameters and frequent adjustments. How to correctly import and maintain it has always been a huge challenge. Once the setting is wrong, it will bring losses, ranging from poor products that require maintenance, heavy work or scrapping, and at worst, resulting in production line shutdown, reduced factory productivity, delayed shipments and other adverse consequences. In order to improve this problem, this study uses the data in the approval form of a customized label set by an electronic manufacturer and use articial intelligence models to nd out the hidden rules behind a large number of customized labels, through data processing and model building. Model and parameter experiments are used to improve the effectiveness of articial intelligence models, and for the problem of time characteristics but uneven distribution of data, the method of cyclic testing is adopted to increase the diversity of the test set. The results of this paper, we integrate each stage and an auxiliary decision-making is established. When the user's setting is inconsistent with the predicted result, a warning will be displayed to speed up the operation process, reduce the scope of conrmation, and ultimately reduce the error rate, thereby improving the problem, reducing scrap and production line shutdown to improve factory productivity. In the statics, the accuracy rate of new recruits was only 80%. The accuracy rate of the articial intelligence model can be increased to 95%. The number of stoppages is reduced from 4 times per month to 1 time per month. Under full capacity, this assistance the decision-making system can reduce loss cost.

productivity will affect quality and delivery, which will cause customers to be unhappy; If customers are unhappy, reducing orders and revenue will lead to unhappy shareholders and the board of directors; if the board is unhappy, reducing bonuses will lead to unhappy corporate executives; If the top of the company is unhappy, lowering bonuses, pursuing performance, and raising requirements will lead to unhappiness at the bottom of the company; unhappy and negative attitudes at the bottom of the company will lead to a decline in factory productivity. Source: Refer to famous quote in order to avoid falling into a bad vicious circle, making oneself unhappy, it becomes an imperative goal to improve the problem. In this context, the manufacturing industry needs a better solution [8][9][10].
This study uses arti cial intelligence models to try to predict the setting value in the changing environment to reduce the setting error rate and maintain the stable productivity of the factory. On the technical side: When facing a single customer and a single product, if the speci cations can be clari ed, the nal answer should be limited to a few possibilities based on conditions. When the answer falls outside these possible ranges, the probability of error is extremely high. However, it is di cult to be familiar with all the rules due to human experience, and it is di cult to pass on the experience of senior staff to new recruits through education and training. Card control through programming is theoretically feasible, but requires huge resources. However, the information department of general enterprises has limited resources, and it is di cult to develop all speci cations with limited time and manpower, not to mention clari cation of speci cations, meetings, and communication. Time cost. As a member of the information department, I feel very empathetic to resource constraints. Faced with projects that are never nished, every speci cation change is tormenting. In order to be simple and fast, the program structure of the stacked bed frame also makes the enterprise virtually impossible with less technical debt, you don't know when potential problems will emerge due to the lack of quality assurance [11][12][13][14].
The data in this research comes from a certain factory in Science Park. As general companies are becoming informatized and intelligent, their awareness of information security has gradually increased, and their own data protection measures are different from the past. It is di cult for the outside world to obtain it at will. The data in this study is based on the factory label: after the factory is informatized and automated, all speci cations, all stages of raw materials, semi-nished products, nished products, packaging, etc. will be assigned numbers and labeled for identi cation. Common label contents include: serial number, Part number, model number, date, quantity and other text and barcode. The date range of the data of this research is from the fourth quarter of 2018 to the second quarter of 2020: the data of this research is established by the company in recent years by gradually introducing the existing paper sign-off process into the information-based transfer process, but the variable eld has been revised before, so the data before revision is nally discarded and only the data after revision is taken. In addition, due to time constraints, the data for this study only uses data before May 1, 2020 to run the experiment [15][16].
The data in this study only uses the data that has been validated: Although adding the data in the upload signature can make the amount of data larger, there will be more unexpected errors and blanks in the data in the signature, so only the data that has been approved by each unit is used. It is con rmed that there are more than 35,000 complete and effective documents that have been applied to the production line. The data in this study retains historical version data: the common reasons for the new version are customer speci cation changes. In order to avoid mistakes, factories will formulate SOP, but only for process steps, some rules about professional knowledge and experience cannot be completely written into the le, and this part is the weakest and most frequently mistaken place for users. Since changes and errors are the norm in the factory industry, after evaluation, these data are retained but not excluded, but time characteristics are added to facilitate identi cation and in uence the weight of the data through the characteristics, thereby obtaining correct results.
For the variable parameters that appear for the rst time, the user can only con rm with multiple departments via phone or letter. This part cannot rely on experience and rules. In the collected data, all you see are past and con rmed answers, which are actually unknown at the current point in time. Therefore, in data processing, the rst appearance of each variable parameter is marked as unknown. Fig. 1 shows the ow of the research data approval form.
1. The customer (User) provides the reference image le, style, printed content and other information of the label.
2. The Product Control Coordinator (PCC) obtains information through letters and phone calls and lls in the application form.
3. The label room staff set the variable parameters corresponding to each item on the label. 4. The label room staff sometimes needs to ask the information staff (CIM) to assist in setting new variables.
5. The label room staff creates proo ng label les based on the information in the sign-off form.
. After the approval is completed and effective, the label le is used to print the label during production on the production line.
The label room handles about 5 business groups, about 60 customers in each business group, and about 20 products per customer. There are about 10 speci cations for each product, about 5 labels for each speci cation, and about 4 versions for each label. 1 million variable parameters need to be set. Generally, the machine has a dedicated person responsible for maintaining the settings, but the label room is a small number of people facing all have a product label, so it is more di cult. In the third point, it is easy to set errors: old variables are easy to cause errors due to confusion; new variables may cause errors due to poor communication. The actual sign-off form for this study is shown in Table 1. The customer provides the label style and documentation, which are lled in the corresponding elds of the form by the product control coordinator (PCC), including the reference drawing and the content of the object on the left side of the lower half of the form; the label room staff judges whether new variables are needed, and then Ask the information personnel to assist in setting up and writing the numbering logic, and then set the variable name eld on the right, design and upload the proo ng le; after approval by the staff of each site, the form will be returned to the staff of the label room to determine the new version Effective timing. However, when there are special rules, it is di cult for the product control coordinator to clearly describe in the limited elds. If the label room personnel misunderstand, it will cause an error; when the label room personnel is set, it is di cult to clarify and test whether the old variables can be used, and whether the numbering logic assisted by the information personnel is wrong, it is di cult to clarify and test. If this part is not rigorous enough, errors may occur [17][18]].

Literature Review
There are three common applications of arti cial intelligence in the manufacturing industry: scheduling optimization, numerical monitoring, and image recognition. The purpose of scheduling optimization is mainly to balance the load of the production line, shorten the production cycle, and reduce costs; the purpose of numerical monitoring is to reduce defective products, Control yield and reduce costs; image recognition is often used in optical inspection stations for the purpose of reducing defective products, controlling yield, and reducing costs.
There are many literatures on the application of arti cial intelligence models in the manufacturing industry to study how to control the yield, but most of the arti cial intelligence-related research focuses on the production environment. Through monitoring measurement values, photos, etc., to predict the production yield status, seldom focuses In the part of forecasting parameters. As shown in Fig. 5, although the production yield rate is indeed very important, the production parameters and photos are all data that can only be obtained at the production stage. Even if problems are found through the arti cial intelligence model, most cases still need to be stopped for adjustment, resulting in loss of productivity. If errors can be found when setting parameters at an earlier stage, the number of stoppages can be reduced.

Application of Arti cial Intelligence in Manufacturing
As shown in Table 1, the relevant research on similar issues in the manufacturing industry is as follows: In 2020, 6 people including Nils Thielen from Germany used KNN, random forest, and neural network model to solve the manpower problem required for the automatic optical inspection (AOI) result veri cation during the production of surface mount technology (SMT) in the manufacturing industry. Verify the results of automatic optical inspection (AOI) to save labor costs [16]. In 2020, in order to solve the problem of the high cost of setting up inspection stations in the manufacturing industry, Hou Yuzhe tried to replace the defect recognition system established by traditional computer vision with a deep learning model. Especially for the recognition of small objects, he proposed a two-stage object detection algorithm. , To reduce the over-tting situation, and nally use this automatic defect detection system to assist manual detection to reduce the missed detection rate and labor costs [17]. In 2016, Chen Weihan tried to solve the problem of balancing the workload of the production line, reducing the number of workstations to reduce enterprise costs, and maximizing the work e ciency of the production line, using genetic algorithms, immune algorithms, and particle swarm algorithms, and proposed a new The coding method is discussed and compared, and the results show that the solution speed of the particle swarm algorithm is better than the other two algorithms, and the solution quality of the immune algorithm is better than the other two algorithms [18]. In 2012, in order to solve the problem of the deviation of the control parameters of the semiconductor process production machine, which led to the deviation of the process, which caused the wafer yield to decrease, and even scrapped, Zhao Peiyao used a neural network combined with a failure detection and classi cation system to establish an early warning mechanism to ensure Process yield and guaranteed production capacity [19].

Manufacturing
Through the assembly line, raw materials, semi-nished products, etc. are processed and manufactured into semi-nished or nal products that can be used or sold downstream of the industrial chain, which can be called manufacturing. The common production line processes include: production, assembly, testing, packaging, etc. There are two common processes: the semi-nished product process and the integrated manufacturing process of integrated products. The semi-nished product process can provide downstream manufacturers with standardized semi-nished products for downstream manufacturers to make more complex products; the integrated manufacturing process of integrated products can use upstream Semi-nished products are customized and made into nal products. One advantage of the manufacturing industry is that it can purchase raw materials on a large scale, and then manufacture them, especially sharing raw materials and parts for different products, thereby reducing costs and improving production e ciency. Electronics manufacturing industry mostly uses SMT (Surface Mount Technology) surface mount technology to solder electronic parts on a printed circuit board (PCB) and process them into nished products required by customers. SMT technology is to print the solder paste on the PCB, and then put the speci c electronic parts corresponding to each position. When passing through the hightemperature re ow furnace, the melted solder paste will cover the solder feet of the electronic parts. After the temperature is cooled, it can be sold the electronic parts on the circuit board. [17].

Industry 4.0
It was rst proposed by the German government in 2011, and was later regarded as the fourth industrial revolution. Its goal is no longer only focusing on technological development, but focusing on integration, to intelligent existing technologies, sales, and product experience. To integrate the goal, build an intelligent integrated sensory control system that can be highly automated, intelligent in production, modularized, and can automatically eliminate production obstacles. Most manufacturing factories have not fully implemented informatization and automation. With such an unstable foundation, the effect of rashly promoting Industry 4.0 is extremely limited, and most of them are only slogans in the end. Most of the more successful cases are the courageously provided by the corporate leadership with su cient budget and resources to plan and build from scratch, which can also avoid obstacles caused by existing processes and personnel habits. Some companies have a clearer understanding of their own shortcomings, and rst implement Industry 3.5. The intelligence expressed by arti cially manufactured machines can be called arti cial intelligence, which is usually realized by computer programs and algorithms. It systematically learns from data and uses the learned knowledge to make predictions, thereby establishing expert systems and decision-making aids. System, identi cation system, etc. This concept was proposed in the middle of the 20th century. From 1943 to 1956, scientists began to explore the possibility of building arti cial brains. The Turing test convinced everyone that machine thinking was possible. This was the birth period of arti cial intelligence; from 1956 to 1974, many algorithms The birth of, the researchers are quite optimistic, it is the golden age; the optimism before 1974~1980 made people excessively expect, but nally found that it was not as expected, and encountered the rst low tide; 1980~1987 launched the expert system, simple design makes it easy The realization or modi cation of the NAS showed practicality, and arti cial intelligence revived; 1987~1993, due to the improvement of the e ciency of the machines produced by Apple and IBM, the market demand for arti cial intelligence hardware fell, which led to the second low tide; 1993~2011 nally realized the initial For some goals, arti cial intelligence has been successfully applied to the technology industry; from 2011 to now, due to the rapid development of hardware technology, storage space and computing power have increased signi cantly. Slogans such as machine learning, data exploration, and big data have taken turns to lead the trend, and arti cial intelligence has entered period of vigorous development. [19] 2

.2.2 KNN
Nearest neighbor method, the classi cation of a target is determined by the classi cation of its K neighbors. It is explained by the spatial distribution. The concept is the same data. The higher the similarity, the closer the distribution. It is easy to understand in machine learning algorithms. Practice and have basic results. The input of the model can be classi cation or regression data, and the output is the category of the target. K is the number of nearest neighbors. If it is set too small, it will reduce the classi cation accuracy. If it is set too large, it may increase noise and affect the results. The model can be based on the neighbor distance gives different weights to optimize the results. When new data is available in this model, it can be added directly without retraining, and it is not sensitive to outliers; its disadvantage is that it needs to be re-calculated each time the classi cation requires a large amount of memory, and it is sensitive to the local structure of the data. When the data is distributed unbalanced, the forecast is prone to deviation. Commonly used in text classi cation, pattern recognition, cluster analysis, multi-classi cation elds, etc. [9] 2.2.3 SVM Support vector machine, based on a binary linear classi er, divides the data scattered on the plane into two categories, nds a separation line and maximizes the distance between the separation line and the boundary of the two categories, and can be extended to high-dimensional space. The space can be understood as the separation interface of oil-water mixed liquid. The so-called support vector means that the classi cation boundary is supported by the points of the boundary. The input of the model can be classi cation or regression data, and the output is the target category. If it is a non-linear classi cation, the kernel technique can be used to replace the dot product with a non-linear kernel function. This model has strong generalization ability (adaptability to new samples) and is easy to interpret. It does not rely on all the data, but uses some data to make hyperplane decision-making; its disadvantage is poor performance and sensitivity to missing data. If the feature dimension is too high, Poor performance, no universal solution to nonlinear problems, sensitive to the kernel function and its parameters, and not good at explaining the kernel function. Commonly used in text classi cation, image recognition, handwriting recognition, etc. [10]

Decision Tree
Establish a tree model, where each node represents a feature, and each bifurcation represents the possible attributes of the feature. Along the tree structure, the path is determined according to each feature attribute, and nally reaching the leaf node is the answer. The decision tree model can build classi cation trees, regression trees, and even both classi cation and regression trees that can be used. Basically, the decision tree has only a single output. If you want multiple outputs, you can build multiple trees to handle different targets. When building a decision tree, it is necessary to determine the order of using features to create nodes based on the amount of variation as the division criterion. There are two common processing methods: use entropy to calculate the information gain, and subtract the information disorder before the division from the information after the division. Information disorder; and Gini coe cient to calculate impurity. If the decision tree model loses control and builds too many branches, it will be easy to over t, so that all data has its own path, so it is necessary to restrict growth or even pruning. This model is easy to understand and implement, has a high degree of interpretability, does not need too much data, does not need to do too much data pre-processing, and can process data and category data at the same time; its disadvantage is that it is easy to over t, and small changes in data are possible. Generating completely different trees leads to unstable results. When the data is unbalanced, the growth tends to have more numerical features, resulting in poor performance and neglecting the correlation between the attributes in the data. [11] 2

.2.5 Random Forest
Random forest is composed of multiple decision trees. Because a single decision tree is easy to over t, it has the characteristics of low deviation but high variance. Random forest uses random sampling and random selection of features to build multiple decision trees to solve this problem. , Its output is determined by voting on the answers of all decision trees. The input of the model can be classi cation or regression data, and the output is the target category. The random sampling method of random forest is to use the method of retrieval and replacement. The same sample may be selected multiple times, or none of them may be selected at random; Feature selection is to randomly select some of the features that may be different from all the features to build a decision tree each time, so that the differences between the decision trees are greater. This model can evaluate the importance of features and has a fast training time.
It can process high-dimensional data. If there is missing data, it can still maintain its accuracy. If the data is unbalanced, it can balance errors; its disadvantage is that the interpretation is poor and it can handle regression. When there is a problem, it is impossible to make a prediction beyond the data range. If the data is noisy, it may still be over-tting. It is often used in various classi cation and regression problems, or used in outlier detection, and also used in unsupervised learning classi cation problems. [12] 2.2.6 GBDT Gradient Boosting Decision Tree is the abbreviation of Gradient Boosting Decision Tree. The general concept of Boosting is to gradually optimize through a series of learning, and to increase the weight of the parts that have not been well divided in the past to strengthen learning; the concept of Gradient Boosting is in a series of learning Establish a new model for the gradient descent direction of the loss function of the previous model, amplify the weight of the wrong part in disguise, and strive to make the loss function smaller, the better the overall performance. The decision tree of this model is a regression tree. Compared with random forests that can be parallelized to build decision trees, GBDT can only serialize and wait for the previous decision tree to process before knowing the optimization direction and use time in exchange for results. This model can prevent over-tting and does not require too much data pre-processing to process complex features. There are many non-linear transformations and can process linear or non-linear data; its disadvantage is that it is computationally complex and cannot be parallelized and timeconsuming. Not suitable for sparse high-dimensional data. [13] 2

.2.7 XGBoost
Limit gradient boosting is the abbreviation of eXtreme Gradient Boosting. It improves and optimizes the GBDT model in the loss function, regularization, segmentation point query algorithm, sparse perception algorithm, parallelization algorithm, etc. XGBoost is the same as GBDT, based on regression tree operation, through a series of weak classi ers, with negative gradient as the learning goal; its objective function can control the number of leaf nodes and the score of leaf nodes to prevent over tting; in the branching strategy , The score after splitting must be greater than the score before splitting. In order to limit overgrowth, only when the gain is greater than the threshold value will the split be performed; the score of the leaf node must be multiplied by the reduction weight, so that the in uence of each tree will not be too large to prevent Over tting: The concept of randomly extracting some features is introduced to reduce the chance of over tting and reduce the amount of calculation; when dealing with missing values, the direction can be determined according to the amount of information gain, or the default direction can be speci ed to speed up the calculation. This model can well prevent over-tting. Nodes at the same level can be processed in parallel and can handle sparse data. The disadvantage is that when the node is split, it needs to traverse the data set, and besides storing the feature value, it also needs to store the feature corresponding to the sample. The index of the gradient statistics is equivalent to twice the memory usage. [14]

Neural Networks
The concept of similar neural network is to simulate the function and structure of biological neural network through algorithms. Each neuron is composed of its input, excitation function, and output. The entire network is composed of input layer, hidden layer, and output layer. The input layer simulates numerous neurons receive huge non-linear input information. The hidden layer simulates the synaptic connection of neurons and is responsible for transmitting the information to the corresponding position. The more the more complex non-linear relationships, the more it can lead to over-tting. The output layer: The simulation information is processed through neural connections, and the model can learn by optimizing the weights of neurons in each layer and the connections between neurons. The input of the model can be classi ed or regression data, but the categorical data needs to be converted into numerical values through one-hot encoding. In the entire network, each neuron is connected to all the neurons in the next layer, and the neurons in the same layer. The elements are not connected to each other. The number of layers in the network and the number of neurons in each layer are determined by the complexity of the problem. It can be optimized and controlled during the parameter tuning stage. Usually, when the number of neurons is the same, a deep neural network. The performance is better than that of shallow neural networks. In addition to the development of deep learning research, it also extends more advanced methods such as recurrent neural networks and convolutional neural networks.
This model has self-learning ability, can learn the rules and patterns behind it from the data, and the learned knowledge is scattered throughout the neural network, so it has a certain fault tolerance, and a small part of the damage will not cause too much to the whole It can also adjust itself based on the learned results and combined with the newly provided data; its disadvantage is that the learning speed is slow, and it is not explanatory. When the neural network is deeper and more complex, the huge amount of parameters requires a larger amount of Data assists in training, otherwise it is easy to over t. Because neural networks have in nite possibilities, it is di cult to nd the best solution. In the process of tuning parameters, only a lot of attempts can be used to obtain better parameters. Commonly used in speech recognition, image recognition, recommend systems. [15] 3

. Research Methods
The data used in this study comes from the data in the customized labeling and endorsement form of electronic manufacturer. The elds lled in by the applicant are the input data of the arti cial intelligence model. The variable setting parameters actually set in the label room are the arti cial intelligence model. The output data.

Research structure
As shown in Fig. 2, the research steps are mainly divided into three blocks: data processing, model modeling, and optimization experiment. The purpose of data processing in the rst block is to prepare the input data required for model modeling, including data collection, Sorting and screening in section 3.2. Data pre-processing that the model can operate normally and correctly in section 3.3. Feature construction and selection to clarify the importance of features in section 3.4. The purpose of modeling the second block model, in order to nd a model that is more suitable for the data type of this research, since most of the data elds in this research are categorical data, seven models that are well-known in the performance of categorical data are selected and compared with each other in section 3.5. Block optimization experiment based on the results of the previous block, after nding a better model, try to optimize further and obtain better results. This part is divided into three types of experiments, feature processing experiments, loop test experiments, and parameters Experiment. Finally, the stage results of each step can be used as a xed parameter for other steps, and go back and optimize other steps.

Data collection, sorting and screening
The data of this research is stored in the company's internal database. According to the approval form process, three types of data sheets are designed to store. One is the approval form data, which records the contents lled in by each unit in the transmission and can be based on the site authority of the approval process Modi cation; Two is the historical version data that has been validated after the approval is completed, and it is provided for con rmation by the customer and the inspection unit, which is also the basis for the next revision; the third is the data actually used by the production line after it takes effect, which is extremely sensitive data. Once modi ed, it will be directly modi ed. Affect the production line. The data used in this study falls into the second category: historical version data that has been validated and entered into force. As the information in subscribe, there will be many blank elds that have not been lled in, and there may be errors that have not been nalized. On the other hand, the data actually used by the production line is only the latest version of the data, the data is not continuous, and there is no past data, resulting in too little data volume, it is di cult to nd out the setting rules; and historical version data is only recorded when the production line takes effect, not only can ensure the accuracy of the data, but also can be traced back to the past. Using this data to train an arti cial intelligence model, it is expected that the model can learn the rules behind each customized label over time and version evolution. As shown in Table 2, the data of this study are stored independently in ve factories under the same structure. Different factories may face different customers, ll in forms and set up personnel, and the proportion of data varies greatly. Compared with the scale of data It is 3~26 times, so we nally decided to use only the largest plant area data for research. Total 39579 As shown in Table 3, the data of this research is mainly based on Hsinchu factory. The data comes from 4501 signed forms, but because only the second type of signed forms are used, the number of valid forms is actually only 3787, which is about the total number of forms. About 84% of the total. The source of these forms comes from 2877 different material numbers, but only 2588 material numbers have gone through the sign-off process. There are 1199 application forms for revision of part numbers, and some part numbers have been revised several times. Table 3 Quantity of form data

Number of items Quantity
The amount of form data applied for 4501 The amount of form data after signing off 3787 The quantity of applied form material number 2877 The quantity of the material number of the signed form 2588 As shown in Table 4, each form contains one or more different label types, and all the used label types are 24 types, only 23 types are actually applied to the production line. In practice, there are 6 types that are more commonly used: 1. CB_SN, semi-nished product customer label, mostly a two-dimensional code, used to record the serial number of the semi-nished product stage; 2. SN, production line Input the serial number label, mostly one-dimensional code, to record the production serial number in the factory; 3. FCC, the product label on the host, mostly one-dimensional code, record the product serial number; 4. BOX, the label on the color box, mostly one-dimensional code, record the serial number of the color box; 5. CARTON, the label on the outer box, mostly composite bar code, including the outer box serial number, product speci cations, quantity, content serial number List; 6. PALLET, pallet label, mostly composite bar code, including pallet serial number, product speci cation, quantity, content serial number list. The other types are usually an extra label at the same position to display additional content or special speci cations. Because the amount of data is small or too special, this research will skip these data and not adopt it, and use six types is the main research data.  As shown in Table 6, the eld that may be incorrectly set by the setting personnel is the variable parameter eld. This study will list as the nal prediction target  The intersection of all types of new variables 279 The intersection of the six types of new variables 264 As shown in Table 7, among the 583 variable types, the number of variable types and the amount of data of each type are not even. Sort according to the number of variable types: FCC> CARTON> BOX> SN> PALLET> CB_SN; sort according to the amount of data: CARTON > BOX> FCC> PALLET> CB_SN> SN.
The more types of variables, the more complicated the rules behind, the higher the error rate may be; if the amount of data is less, the more di cult it is to accumulate experience and learn the rules, and the error rate will also increase. Looking at these two elements together, the ratio of the amount of available data divided by the number indicates how much data can be used for training for each answer. Sort according to this ratio: CB_SN> CARTON> BOX > PALLET> the sum of the six categories> All> SN> FCC, it can be expected that CB_SN has the most learning materials and is the easiest to learn; FCC has the least learning materials and is the least easy to learn. If the user can clearly understand several types of rules and will not confuse this part of the answer, it is equivalent to mastering these types. For the overall accuracy rate, this study uses the sum of the proportions of several commonly used variables as one of the control groups, and then compares it with the trained arti cial intelligence model. Refer to Table  24 to Table 29 in the attachment for the frequency of use of each type of variable. As shown in Table 9, explain the original data elds. The last change date of the form, usually the effective date.

Version Version
The number of changes of the same material number and the same label style.

Data pre-processing
In the process of informatization, it is inevitable that some data is incomplete. It may be that the data table structure design is not perfect at the beginning, resulting in some data not being collected at the initial stage; or the personnel are not rigorously operated, causing the data to be misplanted; or the data is improperly maintained and other modi cations are made. Data is accidentally moved by mistake, causing data abnormalities; it may also cause abnormal system instability and data damage.
The situation and processing methods of incomplete data in this study are as follows: When there are missing values or missing data, the common processing method is to discard or supplement the value. Because the amount of data in this study is not large, the available data should not be discarded as much as possible. The data may contain obscure data and noise, which are usually discarded or repaired. The data in this research has been revised. Therefore, when screening the data, the old version of the data is directly ltered out, and only the new version is used for research; and this research There is no mandatory way to ll in the data in the serial number name (SN Name) and serial number header (SN Title) elds, as long as the users of each station can understand it, and there is no restriction on symbols and capitalization. It depends on the user's habits. There may be several ways to ll in the same object. In this study, this type of data was removed through a program, leaving only letters, numbers, and then the letters were converted to uppercase, so that the fuzzy approximate data can be converted Integrate into consistent data. Some elds need to be normalized. The value-related elds in this study are only the date. The date is converted into seconds through a program and then normalized.
In some models, the category eld needs to be converted by One Hot Encoding before it can be used. Most of the data elds in this study are category data, and one category eld needs to be converted into multiple values using one hot encoding. The elds of 0 or 1 are then passed as input to the model for learning. The text eld is usually processed after cutting keywords or conversion vectors. However, the text eld data in this study is for supplementary explanation, and the usage is not high and the amount of information is uneven, so it is not used in the study. Some eld data may contain valuable hidden data. After combining professional knowledge and experience, the hidden information of the data can be integrated and extended. The data in this study implies some information that cannot be seen directly on the surface, item number. The rst two codes of the data represent the production stage of the factory for the item number. Extracting into a new eld may help the model.

Feature construction and selection
The original data contains multiple features, but not all features are necessary. Too many features may interfere with the learning of the model, and there may even be doubts about dimensional explosion. Reducing unnecessary features can reduce the calculation of the model. In practice, it is interpretable to convince the boss or customer. Many reasons show the necessity of analyzing the importance of features. After analyzing the importance of data features, follow-up studies can be used to verify the correctness of the model. If the prediction is inaccurate, this judgment basis can more quickly clarify the problem and improve the model. For important features, in addition to optimizing the existing model, it also When collecting follow-up data, it can enhance the user's proper use of important features. At this stage, this research tries to clarify the importance of each feature, nd out the key features in different feature elds, and con rm whether each feature is as expected to help the model predict the correct answer. In addition to preliminary screening based on experience and professional knowledge, this study uses the random forest model to explore the importance of features. Random forest judges the importance of features by calculating the contribution of each feature on each tree, and then Take the average value and draw a graph to see the importance of each feature at a glance.

Common model modeling and comparison
The data type of this research is quite special. The data eld is mainly based on categories. When you are not sure which model is more suitable for this data type, try and compare several common models that perform better in classi cation. The data that has been pre-processed before is modeled, and the effectiveness is evaluated. As shown in Table 10, the advantages and disadvantages of the arti cial intelligence model are sorted out. Each classi cation needs to be recalculated. The memory demand is large. When the data distribution is unbalanced, the forecast is easy to be biased.

SVM
Strong generalization ability and is easy to understand. Part of the data can be used to make hyperplane decisions. It works well for processing high-dimensional data.
Poor performance and sensitive to missing data. There is no universal solution to nonlinear problems. The explanation of the kernel function is not high.

Decision tree
Easy to understand and implement, and has a high degree of interpretability. No need to do too much data pre-processing, can process data and category information at the same time It is easy to over t, the result of data changes is unstable, and the performance is poor when the data is unbalanced. Ignore the correlation between the attributes in the data.

Random forest
Not easy to over t, training speed is fast, The importance of features can be assessed with high accuracy. Can deal with missing data, unbalanced data.
The degree of interpretation is poor, and when dealing with regression problems, it is impossible to make predictions beyond the scope of the data. If the data is noisy, it may still over t.

GBDT
Can prevent over tting and does not require complex features, Characteristic processing, non-linear transformation, can process linear or non-linear data.
The computational complexity is high.
Failure to parallelize is time-consuming. Not suitable for sparse high-dimensional data.
XGBoost Prevents over-tting, nodes at the same level can be parallel processing, can handle sparse data.
When a node splits, it is necessary to traverse the data set. Use twice as much memory.

NN
Flexible, it has fault tolerance and selfadjustment ability.
It is not explanatory and computationally intensive. A lot of information is needed, and tuning requires a lot of trial and error.

Optimization model
Based on the results of the previous modeling, after nding a better model, try further optimization to obtain better results. This block is divided into three parts, feature processing, loop experiment, and parameter experiment.

Feature processing experiment
Through more complete data pre-processing, the model learning effect can be better, mainly for the processing of non-required elds and fuzzy data. Use all the models to try to learn the modeling for the data before and after processing, and then evaluate and compare bene t. Complementary value method: SN_NAME is required, but may be an alias; SN_TITLE is not required and may be empty. Fill in the empty SN_TITLE into the data in the SN_NAME eld. Fuzzy processing method: There is no mandatory and standardized way to ll in these two elds, as long as users of each station can understand it, and there is no restriction on symbols and capitalization. Depending on the user's habits, there may be several types of lling for the same object. Therefore, after removing the symbols in the two elds, only letters, numbers, and Chinese are left, and then the letters are converted to uppercase, so that the fuzzy approximate data can be integrated into consistent data.

Cycle experiment
After testing the neural network, it was found that the data was too optimistic and inconsistent with the actual application. After analysis, it was found that the data has time characteristics. If the training set and the test set are randomly selected, there may be future data of the test data in the training set. Seeing the answer before veri cation may cause the model to get out of control, over tting, and failing to learn correctly. Therefore, the experiment was designed to clarify the actual situation based on the test set segmentation method. During the analysis and evaluation, it was found that the material numbers in the data were unevenly distributed in the entry action. Due to the time characteristics, the test data could not be randomly selected. In this case, the data diversity in the test set was low. As shown in Fig. 3, in order to increase the diversity of the test set, partial data is used section by section through loops, so that more diverse data can have the opportunity to act as test data.
As shown in Fig. 4, in order to increase diversity, different methods of using partial data are designed so that more data can have the opportunity to act as test data. In order to avoid the use of partial data and too little data, the models are all memorizing answers, this experiment uses all types together.
Method one is that all data is used only once regardless of training or testing; method two is that all data will be put into the training set once; method three is that all data will be used once in the test set; method four is that all data will be used in the test set Once, use the used data as the next training set, and then take a certain percentage of the new data as the test set. Part of the information at the beginning and end may be skipped.

Parameter experiment
The model needs to adjust the parameters according to the data type to nd a better parameter set to perform more ideally, so that the model has a better performance. Based on the results of modeling comparison, this part is for neural network-like.
First, for the generation parameters, observe the learning curve of the data to nd out the generation parameters that are suitable for the data type; then adjust the test set proportions and observe the changes in accuracy to determine the better test set proportion parameters; then adjust the excitation function, except In addition to the accuracy rate, the time factor is also taken into consideration to determine the better excitation function parameters; in order to prevent over-tting, observe the changes in the accuracy rate to nd the better pruning ratio parameters; to ensure a similar neural network The model has enough neuron connections to learn the data rules completely, do experiments to con rm the best network depth; nally, integrate the parameters obtained at each stage to verify whether the neural network model progresses as expected.

Results And Analysis
After the process of research methods and steps in the previous chapter, the research results and experimental results are presented and analyzed in the following chapters. Although the results of some experiments are not signi cant, as long as each link can be optimized, it will eventually bring Great improvement.

Feature construction and selection
As shown in Fig. 5, using the random forest model to directly draw the top ten features, it is found that the features on the weight map do not match the expectations, not the original data eld names, and the importance proportion is also very low. Most of the researched data are categorical elds, which have been converted into multiple elds after one hot encoding (One Hot Encoding). Observing in this state, it is impossible to see the importance of the original data elds.
As shown in Fig. 6, the importance of each original feature can be seen by adding up the importance value of each derived eld based on the original eld. It can be seen from the Fig. that SN_NAME has the highest importance, which is higher than SN_TITLE. In fact, SN_TITLE is the word displayed on the label. However, in the setting, SN_TITLE may be empty, which causes its importance to decrease, even after compensation , The effect is also limited; although SN_NAME may be an alias, the same object has a higher correlation with the target variable due to the user's setting habits. The importance of ACTION_TYPE is the lowest. After analysis, it is found that the label content of the new part number may directly copy the label content of similar products and change it, resulting in the inability to clearly distinguish the new version and the entry situation, which is of little help to the prediction model. The importance of PN_RANK is not as expected. After analysis, it is found that the label style is mainly affected by the process stage, and there is no obvious rule restriction on the variable content.

Common model modeling and comparison
The control group in this experiment uses statistical data in practice. Each time the form is entered, the average change content is 20%. If the entire form is sent directly without modi cation, the accuracy rate is 80%, but in fact, there are new materials. The number needs to be considered, this value will only be lower. As shown in Table 11, the horizontal axis of this table is of various types: All means all types; Others is the remaining data after excluding the six types. The vertical axis is the control group and various models.
Looking at the vertical axis, it can be seen that regardless of the model, the accuracy of CB_SN is very high. It can be found that the target eld of CB_SN has the least type, but in addition to looking at the target eld type, attention should also be paid to the amount of data in Table 6 Table   It can be found that the higher the data ratio, the more data available for training, the higher the accuracy rate, but because there is still a problem of sparseness between the amount of data and the target eld that needs to be considered, it is not absolute. The CARTON data ratio is the 2nd, and the BOX data ratio is the 3rd. However, the performance of these two types in SVM and decision tree is worse than that of the 4th PALLET in the data ratio. Looking at the horizontal axis, SVM and decision trees are generally poor; GBDT's performance is not satisfactory; the accuracy of KNN, random forest, XGBoost, NN and other models exceeds the control group, reaching the reference standard, and the performance is good, each with its own advantages and disadvantages.

Evaluation of the effectiveness of data pre-processing
In the previous step to clarify the importance of features, you can know that SN_TITLE and SN_NAME are the more important elds, and these two elds have room for further optimization. After compensation and processing of fuzzy data, each arti cial intelligence model is retrained and observed. Accuracy.
Complementary value method: SN_NAME is required, but may be an alias; SN_TITLE is not required and may be empty. Fill in the empty SN_TITLE into the data in the SN_NAME eld. Fuzzy processing method: There is no mandatory and standardized way to ll in these two elds, as long as users of each station can understand it, and there is no restriction on symbols and capitalization. Depending on the user's habits, there may be several types of lling for the same object. Therefore, after removing the symbols in the 2 elds, only letters, numbers, and Chinese are left, and then converted to uppercase, so that the fuzzy approximate data can be integrated into consistent data. As shown in Table 12, the horizontal axis of this table is various types, the vertical axis is the control group and various models, and the data column is the accuracy rate change value. It can be seen that SVM and decision tree have increased signi cantly, and the accuracy rate has increased by up to 21.8%; as for KNN, random forest, GBDT, XGBoost and NN, the impact is less signi cant. Some models may have already learned the upper limit on the existing data and features, so there is not much room for optimization, or some models such as random forest can deal with the problem of missing values or fuzzy data.

Evaluation of the effectiveness of circuit training
The experiment in this section only tries to optimize the neural network model such as the best overall performance, explains the purpose of the loop test, and analyzes and discusses the experimental results.

Comparison of effectiveness evaluation of test set segmentation methods
After testing the neural network, it was found that the data was too optimistic and inconsistent with the actual application. After analysis, it was found that the data has time characteristics. If the training set and the test set are randomly selected, there may be future data of the test data in the training set. Seeing the answer before veri cation may cause the model to get out of control, over tting, and failing to learn correctly. Therefore, the experiment was designed to clarify the actual situation based on the test set segmentation method. As shown in Table 13, the horizontal axis of this table is the various types, and the vertical axis is the test set segmentation method. It can be clearly seen that the accuracy of all types in the random segmentation test set is better than the accuracy rate of the time segmentation test set. It is much higher. The accuracy of the random segmentation test set is all greater than 90%, the CB_SN even reaches 98% of the data, and the accuracy of the time segmentation test set falls between 84% and 95%.

Comparison of effectiveness evaluation of circuit training
80% of the entire data is used for training, and only 20% is used for veri cation. Because the data in this study is special, the eld information is mostly categorical data, and the interval time of each material number is inconsistent, resulting in uneven data distribution, And due to time characteristics, the test data cannot be randomly selected. In this case, the data diversity in the test set is low. In order to solve this problem, this experiment was designed. In order to increase the diversity of the test set, partial data is used section by section through a loop, so that more data can have the opportunity to act as the test data. As shown in Table 14, the horizontal axis of this table is various types, and the vertical axis is the number of cycles. In this experiment, the total data volume of the test set is the same as that of the control group. It can be proved that only the test set is dispersed and increased diversity. It can indeed improve results. Only the results of the SN type are lower than the control group, which may be related to the fact that most of the data types are special cases. It can be seen from the table that the best result has no signi cant relationship with the number of cycles, and it is speculated that it should be more related to the diversity of the test set distribution.

Comparison of effectiveness evaluation of circuit training
In order to increase diversity, design different methods of using partial data, so that more data can have the opportunity to act as test data. In order to avoid the use of partial data, the amount of data is too small, resulting in the model is memorizing answers, this experiment only uses all types to carry out. The control group in this experiment is the basic neural network model and method one of the previous experiment.
Method one is that all data is used only once regardless of training or testing; method two is that all data will be put into the training set once; method three is that all data will be used once in the test set; method four is that all data will be used in the test set Once, the data used each time is used as the next training set, and a certain percentage of the new data is taken as the test set. Part of the information at the beginning and end may be skipped. As shown in Table 15, the horizontal axis of this table is the amount of data for each training of methods one to three, the control group and various methods, and the vertical axis is the proportion of the amount of data used for each cycle training. From this table, it can be observed that the results of method two and method three are better than those of the control group, but only looking at the data, there is no obvious advantage or disadvantage between method two and method three. Because the research data of this experiment is not enough, no further veri cation is possible. The result is generally lower than that of the control group. After analysis, it is speculated that the model will have problems because each training will be trained to the rst data. Observing the learning curve, it is found that the learning curve of method 2 is relatively normal, and the learning curve of method 3 is suspected of over tting. Therefore, method 2 is selected, and the amount of data is 1/6 which is the better parameter for the experiment.

Evaluation of the effectiveness of model optimization
The experiments in this chapter are aimed at optimizing neural network models such as the best overall performance, and discuss and compare the results after sorting out the experimental results.

Comparison of effectiveness evaluation of adjustment generations
This experiment focuses on the generation parameters and observes the changes in the learning curve.
As shown in Fig. 7, Fig. 7a is the learning curve loss graph, and Fig. 7b is the learning curve accuracy graph. From Fig. 7a, it can be observed that the learning curve slowly rises all the way after the reversal, and cannot fall until after 60 It can be seen from Fig. 7b that the learning curve is constantly uctuating, but there is no obvious trend overall. It is the highest when the generation is 35, and it becomes lower after 60. It can be seen from the Fig. 7 that there is a gap between the training learning curve and the test learning curve, and convergence can be achieved quickly. If there are too many generations, it may cause the neural network model to over t the answer.

Comparison of effectiveness evaluation of adjusting the test set proportion
This experiment adjusts the test set ratio and observes the changes in accuracy. At rst glance, the smaller the test ratio, the higher the accuracy rate. However, after analysis, it is obvious that the high accuracy rate is only an illusion when the test ratio is small. As the test ratio is less, the diversity of test data is lower. The more di cult it is to verify the correctness of the data; if the test ratio is too high, it will in turn lead to too low diversity of training data, resulting in a sharp drop in accuracy, so 0.2 is the better test ratio in the end. an illusion when the test ratio is small. As the test ratio is less, the diversity of test data is lower. The more di cult it is to verify the correctness of the data; if the test ratio is too high, it will in turn lead to too low diversity of training data, resulting in a sharp drop in accuracy, so 0.2 is the better test ratio in the end.

Comparison of effectiveness evaluation of adjusting trigger function
This experiment adjusts the excitation function and observes the changes in accuracy.    (1,9); the difference is about 0.5% under Rolling(2,6); the difference is about 0.5 under Rolling(3,7) %. In order to avoid over-tting caused by too many neurons in the neural network, pruning is needed to prevent the neural network model from memorizing the answer. In different types, the difference between the best and the worst is only about 1%, and there are no obvious advantages and disadvantages. The neural network model of this study is not over-tting even after cyclic training. Therefore, it is not possible to optimize too much through this parameter. The overall effect is better to take 0.25.

Adjusting the depth of the network effectiveness evaluation comparison
The data in this study has been transformed or the input layer has a dimension of about 1,000, and the output layer has a dimension of about 256. This experiment adjusts the number of hidden layers to observe the changes in accuracy. Due to the limited amount of data in this study, if the depth of the network is too deep, the parameters will be too large and over-tting. Therefore, this experiment only tested three hidden layers at most.  layers, but the difference is very small. The difference between the results of each network depth is not big, the difference is about 0.5% under Rolling (1,9); the difference is about 0.5% under Rolling (2, 6); the difference is about 0.5 under Rolling (3, 7) %.
In order to solve overly complex problems, the deeper the neural network-like depth is, the better, but too deep may cause over-tting. As can be seen from the table, under different types, the difference between the best and the worst is only 0.5% are no obvious advantages and disadvantages. The problem to be solved in this study is not particularly complicated. The neural network model can learn well under the infrastructure even after it is trained in a loop. Therefore, it is impossible to optimize too much through this parameter, and the overall effect is better. The results of 2 hidden layers are better.

Conclusion
In recent years, there has been a lot of research on arti cial intelligence in the manufacturing industry, but most of them focus on timely monitoring of production lines, and rarely on production parameters. This research collects the ll-in content of the approval form of customized labels in the manufacturing industry to predict production line labels. In the study, such as random forests and neural networks were used for training and prediction. Practical statistical data was used as a control group to evaluate the effectiveness of arti cial intelligence models. Finally, production line labels decision-making was established, enhance the operating experience and improve work e ciency, reduce error rate, and maintain factory productivity.
In the past, setting personnel could only base their settings on their own experience and knowledge.
However, due to factors such as insu cient experience and knowledge, incorrect cognition, insu cient caution, poor communication, etc., setting errors may result, causing abnormalities during production and reducing factory productivity. Bring losses to the enterprise. Existing corporate conventional practices have limited results. In order to solve this problem, this research use arti cial intelligence models to break through the arti cial intelligence model to learn the relationship between the content of the sign-off form of the customized label and the variable parameter setting. The experiments results of this research have con rmed that most arti cial intelligence models are better than the control group. Among them, the random forest and the neural network method is better performance in AI method. The auxiliary decisionmaking system established by the better trained model can reduce the error rate in practice as expected and maintain the productivity of the factory. Through the results of this research, it can be seen that among the data features, the label alias is the most important and has the greatest impact, while features such as form action and item number production stage are not very helpful. To avoid problem of dimensional explosion, so you can consider removing features of lower importance to avoid unnecessary features that may cause the model to waste computing resources.
From the experiments results of this research: we can know that take the loop training parameter as method 2, and each time 1/6 the amount of data; take the generation parameter as 30; take the test ratio parameter as 0.2; take the excitation function as relu; take the pruning ratio parameter as 0.25; take The hidden layer parameter is 2; it is a better parameter combination. The contributions of this paper, before tuning, the accuracy is about 85%. After tuning, it rises to 89.4%. If combined with the cyclic AI test, it can be increased to 95%. According to statistics, the accuracy of new recruits is only about 80%, and use the best of arti cial intelligence models can be as high as 95%. The accuracy rate of the line is reduced from 4 times per month to 1 time per month. In the case of full capacity, this assists decision-making the system can reduce cost. This arti cial intelligence module is based on the factory approval form and parameter settings. There are many systems in the company that involve veri cation and parameter setting. If this module can be extended to other application scenarios in the factory, it will be optimize and overall improvement.
Future works: base n the research process and experimental results of this research, the following suggestions are put forward for future research reference.
1. Data volume integration: The amount of data in this study is limited. If data from various factories can be integrated in future research, the amount of data will be more and more abundant, and the arti cial intelligence model will be less prone to over tting.
2. Hidden data elds: When collecting data in this research, the description eld has not been properly used, so this eld has not been used. If the lling standard can be promoted, additional valuable implicit information can be obtained from this eld when conducting research in the future.
Assistance arti cial intelligence model training and learning.

Declarations Ethical approval
No need ethical approval.

Funding details
No Funding Schematic diagram of circuit training