An adaptive genetic algorithm-based background elimination model for English text

In this paper, an adaptive genetic algorithm is used to conduct an in-depth study and analysis of English text background elimination, and a corresponding model is designed. The curve results after the initial character editorialization are curved and transformed, and the adaptive genetic algorithm is used for the transformation to solve the influence of multiple inflection points of curve images on feature extraction. Then, using the minimum deviation method, the error values of the input characters and the sample set in the spatial coordinate system are calculated, and the deviation values of the angle and the straight line are used to match the characters with the smallest deviation value to match the highest degree. To enhance identification accuracy, a genetic algorithm is used to iterate the feature sets of angles and line segments, and the optimum features are ultimately generated via cross evolution of generations. The character library is then utilized as an input item for average grouping for trials, and the resulting feature sets are placed in a position matrix and compared one by one to the samples in the database. It is found that the improved stroke-structure feature extraction algorithm based on a genetic algorithm can improve the recognition accuracy and better accomplish the recognition task with better results compared to others. Finally, by analyzing the limitations and characteristics of traditional particle swarm optimization algorithm and differential evolution algorithm, and giving full play to the advantages and applicability of different algorithms, a new differential evolution particle swarm algorithm with better performance and more stable performance is proposed. The algorithm is based on the PSO algorithm, and when the population update of the PSO algorithm is stagnant and the search space is limited, the crossover and mutation operations of the DE algorithm are used to perturb the population, increase the diversity of the population, and improve the global optimization ability of the algorithm. The algorithm is tested on a common dataset for text mining to verify the effectiveness and feasibility of the algorithm.


Introduction
With the rapid growth of Internet users and the popularity of various mobile terminal devices, people can easily and quickly publish information on various platforms and devices, which makes data in various forms grow exponentially in various fields and the total amount of data increases dramatically. Data resources are becoming more valuable nowadays, and they have become an essential strategic resource in terms of productivity and cost (Ahlawat & Rishi 2019). Obtaining hidden information with regularity from massive data has also become an important way to improve enterprise productivity and enhance user experience. In the past, enterprises kept scattered data with expensive storage hardware and a single domain, and the way to obtain knowledge through manual search and organization was not only time-consuming and inefficient but also error-prone. Due to the limitations of data organization and traditional data analysis tools, people's needs for data are limited to simple operations such as query and mathematical statistics, and they cannot analyze data at a deeper level to meet the quantitative analysis needs of specific targets. With the dramatic reduction of storage hardware cost and the rapid increase of processor computing power, data acquisition, preservation, and & Tang Xiaohui xiaohui428@163.com calculation no longer become obstacles, and the demand for higher-level and regular information from massive data has driven the development of data analysis technology (Azmi & Kusumaningrum 2019) (Muhammad Talha et al. 2020). When someone impersonates another person's identity for signature, the victim usually only authenticates the forged signature to obtain evidence after suffering a loss, and it is difficult to authenticate in advance to avoid the loss. The use of offline signature authentication algorithms can use computers to automatically authenticate many handwritten signature images, thus reducing or even replacing the work of human handwriting experts and improving the efficiency of signature authentication. Acts such as signature impersonation may be avoided in this manner, resulting in increased security in people's everyday lives. The complex environmental information, text content, and the diversity of image shooting angles lead to the problem of inaccurate localization in most text detection methods, and even the mature OCR technology cannot achieve the satisfactory detection results (Wang et al. 2019). In summary, given the difficulties faced in extracting text from natural scenes and the broad application prospects and economic value of this technology, this paper conducts an in-depth study of natural scene text detection technology. And for the above problems, many solutions have been proposed by researchers in recent years. Before that, structural feature extraction and statistical feature extraction were also proposed to extract multiple feature vectors to achieve the best recognition effect. However, due to the different styles of fonts and different writing habits, it is impossible to unify the writing styles. Therefore, the difficulty of the current research is still how to refine the feature extraction, maximize the same set for matching, and improve the recognition accuracy (Ratre 2020). ANOVA and correlation analysis were used in the analysis of the factors influencing the difficulty of English reading comprehension to eliminate the factors that did not have a significant influence on the difficulty; in the process of calculating the weight values of each influencing factor of difficulty, the data of experts judging the importance degree of each factor were organized and calculated. The correlation analysis was conducted between the test data of the pre-and posttest levels of English reading comprehension ability, and the subjects' English test scores (Talha 2021).
Therefore, this study further verifies the applicability of item response theory in English reading; in addition, current research on the definition method of English reading comprehension difficulty is even less common, and this study makes up for the lack of research in this area and provides reference and reference for subsequent researchers. At the level of practical significance, based on the developed adaptive testing system, the test and application of English reading comprehension ability were conducted in the junior English teaching classroom, and the specific testing process was combined to provide an implementation path with a reference value for adaptive testing of English reading comprehension ability. In estimating the difficulty parameters of the items, this study invited experts working in English research to judge the importance of the factors influencing reading comprehension difficulty, and the experts determined the relative importance of each factor based on their professional knowledge and work experience. Finally, at the end of the adaptive test, data on the subjects' satisfaction with the test, the convenience of the operation process, the friendliness of the operation interface, and the subjects' suggestions for the improvement in this test system were collected by the questionnaire method (Talha et al. 2021).

Current status of research
Connected domain-based methods aggregate regions into connected regions based on the characteristic of text with specific similar features, then filter non-text regions using trained classifiers or heuristic rules, and finally fuse adjacent individual texts to form text lines using a bottom-up strategy (Xu et al. 2020). The connected domain-based methods can be further divided into edge-based detection methods and text-level detection methods. The edge-based text detection method first extracts information such as edges and corner points of text in an image, then obtains the connected regions as candidate regions by smoothing and morphological operations, and finally verifies the candidate regions by using heuristic rules or classifiers to filter out non-text connected domains ). This method can quickly extract the text in the image, but for the text with shadows or uneven illumination, the detection effect is poor because the edges or corner points of the text cannot be detected accurately. Puri et al. proposed a classification-based text detection algorithm for natural scenes based on the idea of sparse representation of distinguished dictionaries (Puri & Singh 2020). The algorithm first uses a wavelet transform to detect image edges, then uses a sliding window to scan the detected image edges as patches, then uses a simple classification process with two learned discriminative dictionaries to obtain text candidate regions, and finally uses an adaptive tour smoothing algorithm and contour projection analysis to fine-tune the candidate regions to form stable text regions (Elharrouss et al. 2020).
K-means is a classical division-based unsupervised clustering algorithm with the advantages of simplicity, efficiency, and ease of implementation and has been applied in many fields (Virk & Maini 2020). The algorithm starts with random initialization of a set of cluster centers and then repeats the steps of dividing the dataset using the cluster centers and updating the cluster centers within the divided set of clusters until the cluster centers converge (Ziani et al. 2019). Abdel-Kader divides the population into two subpopulations based on the size of individual fitness values and then updates the PSO algorithm and genetic algorithm in each of the two subpopulations. Pandey proposed a dynamic differential evolutionary algorithm for clustering that uses the median value of random samples to initialize the cluster centers and divides subpopulations for updating and then merging (Pandey et al. 2021).
Since a single evolutionary algorithm cannot have good performance in terms of stability, convergence speed, and searchability, trying to use a hybrid algorithm for clustering becomes a feasible solution (Lizarraga et al. 2020). Li et al. improved the K-means algorithm using a genetic algorithm to adjust the search direction of the population by changing the variation rate at each iteration and performing variation operations to avoid clustering algorithm from falling into local optimum (Li et al. 2021). Song used randomly selected samples from each cluster set after clustering by genetic algorithm with better global optimization capability as initial values of individual clustering centers and then used a hybrid GQPSO algorithm for clustering by quantum particle swarm optimization algorithm (Feng & Sun 2019) (Talha et al. 2021).
In a comprehensive view, the current theoretical studies of adaptive test question banks by scholars are relatively rich, giving the implementation and optimization schemes of attribute classification, content balance, and item generation in the process of question bank construction, but there are still more problems in the practical studies of adaptive test question banks, such as quality control of question banks, exposure control of questions, and automatic item generation based on the cultural background of subjects (Kumar & Jaiswal 2019). In addition, there is not much literature on adaptive testing of English reading comprehension, and the parameter values of each item are estimated by statistical analysis of many sample data when constructing an English reading comprehension question bank, and few studies have calculated the item parameters directly for the reading material and the questions themselves. Based on this, this study will use a new method to estimate difficulty parameters when constructing English reading comprehension question banks and provide new ideas for similar difficulty parameter estimation in the future.
3 Adaptive genetic algorithm for English text context elimination analysis

Improved adaptive genetic algorithm
The idea of genetic algorithm (GA) comes from biological evolution, where the evolved individuals will partially inherit the good characteristics of their parents, so the practical application optimization problem can be transformed into a genetic problem by direct encoding or indirect encoding, and the optimal solution is sought based on the target adaptation. Multi-generation evolution is the key step of genetic algorithm to find the optimal solution, but the image factors of each evolutionary generation will not adapt according to the results of recent evolution, so the quality of the optimal solution after satisfying the termination condition is relatively poor, and it is difficult to obtain the global optimal solution in complex multi-objective optimization problems (Sen et al. 2019). Scholars have proposed the adaptive genetic algorithm (AGA) to solve this problem, which can be used to improve the algorithm's parameter setting problem and create adaptive designs for the crossover probability and variation probability formulas, so that the crossover and variation processes change with overall adaptation and evolutionary generations to find a more optimal solution in the crossover and variation processes. The adaptive genetic algorithm searches the solution space based on the adaptive cross-variance operation, which can refine the local search process while global search, and has the advantages of both global and local refinement. Furthermore, since the adaptive evolutionary algorithm can conduct genetic operations based on population grouping, several stages may be completed in parallel, reducing system load and increasing layout efficiency (Ma et al. 2019). Due to the parallel nature of FPGAs and the advantages of heuristic layout, the adaptive genetic algorithm has strong applicability to FPGA resource layout problems and has been widely used in FPGA design-related research in recent years.
In adaptive genetic algorithms, the feasible solution of a multi-objective optimization problem is transformed in some way for algorithmic manipulation, and this transformation method is called coding. This transformation is to convert the solution from its solution space to the encoding space to obtain the mapping relationship between the solution space and the encoding space, as shown in Fig. 1.
Adaptive genetic algorithms use fitness to represent the degree of chromosome excellence and fitness as an indicator to select evolved individuals, so a reasonable and effective fitness function needs to be designed. Equation (1) is a common fitness function that can be used to solve the maximum (minimum) optimization problem (Chouhan et al. 2019). However, this expression may yield the negative results, which cannot be applied to roulette strategy-based selection algorithms in practical applications, and when the values of certain functions differ greatly, the average value of the results will not represent the overall fitness of the population, and the population will be locally optimal and locally worst, which eventually affects the optimization effect. Fitnessðf To solve the appealing problem, Eqs. (2) and (3) can be used to solve the maximum (minimum) optimization problem, where mins (max) are a predetermined number that is appropriately small (large) and is generally taken as the minimum (maximum) function value of the estimated objective function. The fitness function is still not sensitive enough since mine (max) is calculated in advance and is less precise, making the algorithm performance unstable, which may be improved using the adaptive parameter modification method. Fitnessðf Fitnessðf The outcomes of this algorithm are based mostly on the metrics that have been developed for the assessment of the FPGA resource layout algorithm for the design of the two frequently used evaluation metrics are: The most ideal scenario is a single-task resource utilization of 1 to assess the appropriateness of the task placement for the ratio of resources needed for the job and resources provided for the layout. However, in the case of single-task placement, because the layout algorithm is required to reserve enough resource space for wiring to allocate resources to the task, so the algorithm may generally allocate more than the minimum resources needed by the task. The resource utilization of a particular layout can be expressed as a weighted average of the utilization of a single task, with a larger RE indicating that the resource management algorithm can find space for the task group closer to the needs of the task group itself. The expression for the calculation of RE is as follows.
where p is the total number of tasks, M is the minimum number of resources required for a single task, R is the actual number of allocated resources, and A N is the normalized value of reserved resources such as wiring, which is the average of the layout reservation space obtained by the algorithm processing 100 times of resource layout tasks (Abasi et al. 2021). The Biggest Resource Area (BRA) is a rectangular area where the FPGA can continuously allocate resources under a specific algorithm, such as the quadtree-based algorithm, where the Biggest Resource Area is the unallocated resource area bounded by alpha, and the closely aligned genetic algorithm, where the Biggest Resource Area is the unallocated resource area bounded by alpha, and the closely aligned genetic algorithm, where the Biggest Resource Area is the unallocated. This area does not include the space fragments surrounded by multiple tasks. Combining the above two layout optimization metrics, the multi-objective fitness function F is designed, and its expression is shown as follows.
where c 1 and c 2 are the scaling factors used to ensure that the two objectives of resource flexibility, resource utilization, and maximum available margin are normalized to the same order of magnitude. w r and w b are system parameters describing the resource utilization and maximum available margin objectives, which are weighted when performing FPGA resource layout optimization. The adaptive adjustment of the variation probability Pm uses the variation of two numbers generated by a random function, such as the variation of the time and teacher code in the schedule of a class. The performance of the genetic algorithm is affected by crossover and variation, and the magnitude of the adaptive variation probability is not a fixed value but varies with the crossover probability (Sanches et al. 2021). These two operations, adaptive crossover, and variation are coordinated with each other to ensure the global search ability of the genetic algorithm to obtain the global optimal solution. After the above process operation, the conflict coping mechanism between populations has been established, but the possibility of new conflict contradiction formation cannot be excluded, so it is necessary to carry out conflict detection and elimination work again.
After executing all the above processes, it is considered to have completed iteration of evolution, which is following the evolutionary law and principle of ''survival of the fittest,'' and for each iteration of evolutionary operation, the fitness value of the children left behind will be a little larger compared with that of their parents. If there is no significant increase or no change in the fitness value of the children at the end of the iteration, it means that the approximate global optimal solution of the problem has appeared. Figure 2 shows the optimized design flow chart of the improved novel adaptive genetic algorithm.
With the adaptation degree already high, the handleattribute crossover operation designed in this paper can change the chromosome properties locally in the chromosome to further improve the adaptation degree. Based on this, the probability of the handle-attribute crossover operation is adaptively adjusted in this paper.
The adaptive genetic algorithm in this method places a greater emphasis on the function of variation and employs inverse-order variation, which involves randomly swapping genes in two distinct locations in the chromosome to be altered, essentially repositioning the genes in reverse-order arrangement on the vacant positions of the chromosome so that the chromosome itself contains tasks in a changed order of arrangement. In the case of a low adaptation, the effect of handle-attribute-based crossover operation will be very limited in enhancing the adaptation, while the reverseorder variation can enhance the searchability of the algorithm and improve the adaptation enhancement in changing the order of task arrangement in the chromosome. The probability of variation based on the adaptive algorithm can be calculated as follows.
The word frequency can indicate the degree of contribution of the lexical item to the information expression of this document, and the higher the word frequency, the greater its association with the text. By ranking the word frequency of keywords and then removing feature lexical items with lower frequency, the vector dimension of the text can be reduced to some extent and the computational workload can be reduced. However, in some cases, the less frequent lexical items are more distinguishable and representative, so word frequency is only an important factor in evaluating feature weights.
The frequency of the lexical item in a document, as well as the distribution of the lexical item throughout the corpus, should be taken into account when calculating lexical item weights. A lexical item is more representative only if it appears more frequently in a certain document and less frequently in other documents. To fully consider the degree of contribution of both word frequency and inverse document frequency to the expression of text topics, the TF-IDF, the product of word frequency and inverse document frequency, is commonly used as the calculation of word item weights.
At the same time, considering the problems of inconsistent text length and the disparity in the number of dimensions with nonzero weights, Euclidean distance and Ming's distance cannot correctly represent the similarity between two texts to a large extent, and since the length of An adaptive genetic algorithm-based background elimination model for English text 8137 the text is normalized in the cosine distance formula, the cosine distance is used to measure the similarity between document vectors, which is more consistent with the characteristics of document vectors with high dimensionality and small feature values.

Experimental design for English text background elimination
In this paper, this method is used for skew correction of signature images. The method achieves tilt correction of the signature image by rotating the minimum inertia axis of the signature curve to coincide with the horizontal direction. After passing the background elimination, the grayscale of the signature stroke in the signature image is unchanged while the background grayscale is reset to 255. Therefore, the signature stroke curve can be obtained by counting the pixels in the image whose grayscale is not equal to 255 and represented by Eq. (9).
Finally, the angle between the direction of the minimum inertia axis of the signature and the horizontal coordinate axis is the calculated rotation angle (Biswas & Islam 2021). The signature picture is skew corrected by intercepting the image of the signature stroke portion, which rotates all of the pixels in the signature by this angle around the signature center of mass. Three background removal techniques are intended to successfully remove the impact of backdrop on text re-identification: pixel-level background elimination method, feature-level unsupervised background elimination method, and feature-level supervised background elimination method.
The text image segmentation mask is created using the fusion segmentation technique described in Sect. 2.2, and the mask and text image are then fused at the pixel and feature levels, respectively, using the MPF (multi-pooling fusion) network as the backbone network. The feature-level unsupervised background elimination method is to fuse the text segmentation mask with the feature map generated by the network model, using the segmentation mask to retain the feature data of the foreground image of the text and set the background part of the feature data to 0 to achieve the purpose of eliminating the background information of the text. The network structure is shown in Fig. 3, first, the text image is passed through the text segmentation network to obtain the binary segmentation mask, the foreground is 1 and the background is 0. The new feature map is used to map the feature vector for computing text similarity. The feature set unsupervised background elimination method can reduce the reliance on the accuracy of text mask segmentation. Unlike unsupervised methods, the feature-level supervised background elimination approach makes the network automatically learn to weaken background information (Bibi et al. 2020). The method adds a network branch and calculates the feature activation loss function; then, the feature activation loss function is combined with the text classification loss function as a multitask learning loss to supervise the network model to extract useful text foreground features. Based on the experimental results, it can be concluded that the quadtree-based layout algorithm, on the other hand, will have redundant resource occupation due to its way of mapping resources according to nodes, and the resource utilization rate is only 25.6 percent in the multitask layout problem with different resource demands; the two-dimensional layout algorithm, on the other hand, will have redundant resource occupation due to its way of mapping resources according to nodes, and the resource utilization rate is only 25.6 percent in the multitask layout problem with different resources. However, the resource utilization rate is only 43.2%, which is relatively low because the task layout is too loose and the average reserved resources are larger, while the GA-based layout algorithm and the AGAbased layout algorithm obtain a tighter task layout because of the advantage of heuristic layout, and the resource utilization rate of the layout is also significantly improved, reaching 61.5% and 61.5%, respectively. The resource utilization of the layout is also significantly improved, reaching 61.5% and 69.1%, respectively, which can effectively reduce the redundant occupation of resources.
The two-dimensional random layout algorithm is limited in terms of maximum available blank area because the tasks randomly select blank locations for arbitrary placement, and the resource space is partitioned by each task, limiting the maximum available blank area; the quadtreebased resource layout algorithm only has a small portion of blank areas for the deepest remaining nodes. The geneticbased resource layout algorithm and the adaptive AGAbased resource layout algorithm, on the other hand, have obvious advantages. Utilizing heuristic layout, a tighter layout result can be obtained, and the maximum available blank area reaches 30.8% and 40.6% of the resources, respectively, which improves the number of tasks accommodated by the FPGA layout, as shown in Fig. 4.
The P values of word frequency, a mean number of syllables, a mean number of letters, real-word familiarity, and narrative are less than 0.001, indicating that these five factors have a highly significant impact on the difficulty of reading texts; the P values of word count, lexical diversity, mean sentence length, and syntactic complexity are less than 0.01, indicating that these five factors have a minor impact on the difficulty of reading texts (Dutta et al. 2019). Syntactic simplicity, syntactic type density, connectedness, and temporality all had P values higher than 0.05, suggesting that these four variables did not have a significant impact on reading difficulties. Since lexical diversity, syntactic complexity, and syntactic pattern density are considered by multiple indicators, the secondary indicators of these three factors will also be analyzed by ANOVA to test the effect on reading comprehension difficulty. The mean number of words before main verbs and the mean number of modifiers of noun phrases in syntactic complexity are less than 0.01, which means that they have a significant effect on the difficulty of reading text; only the prepositional phrase density had a value less than 0.05, but the overall effect of syntactic pattern density on the difficulty of the reading text was not significant.

Algorithm performance test results
Test reliability, also known as test reliability, refers to the consistency or stability of the results of a test for the same group of people at different times. In other words, a good measurement instrument must first ensure that the results of multiple measurements remain stable and have certain high reliability to judge test validity, otherwise, the instrument is meaningless. Since the test system in this study is guided by item response theory and uses an adaptive personalized question selection strategy, which makes the subjects' responses vary from test to test, and the IRT has the property of constant estimation of the subjects' ability parameters, the correlation between the subjects' results of these two tests can be regarded as the reliability level of the system when two adaptive tests are conducted within an appropriate time interval.. At the conclusion of the two adaptive tests, each subject's two adaptive test results were downloaded and tallied individually. The mean, standard deviation, and maximum and minimum values of the two ability values were first counted, as shown in Fig. 5. The ability estimates of the system range from -3 to 3. As can be seen from Fig. 5, the mean ability estimates of the two adaptive tests are 0.589 and 0.694, respectively, which are relatively close to each other, indicating that the subjects' reading comprehension ability levels are relatively consistent within a short period, which means that the adaptive testing system also tends to be stable in its estimation of the subjects' English reading comprehension ability. In addition, the difference between the maximum value and the minimum value is more obvious, which indicates that the system can effectively distinguish the level of ability. As can be seen from Fig. 5, the correlation coefficient between the two adaptive test proficiency level values is 0.899, and the p value is less than 0.01, which indicates that the system has good reliability in estimating the English reading comprehension ability in junior high school, and can better reflect the real proficiency level of the subjects.
Through the experiments, the hyperparameters of Den-seNet are selected in this paper, and the DenseNet network structure with better performance is obtained. To select a more suitable feature extraction network for offline signature authentication, the classical convolutional neural network structure as a feature extraction network composed of Siamese network is experimented on GPDS dataset in this paper to compare the performance of different structures of convolutional neural networks in offline signature authentication, and the experimental results are shown in Fig. 6.
As shown in Fig. 6, the DenseNet structure not only outperforms other convolutional neural network structures in terms of performance (10.93 percent iso-error rate), but it also has a smaller model size (4.2 MB). The VGG network structure provides comparable performance to the DenseNet structure; however, the VGG network structure has more than three times the amount of parameters as the DenseNet structure. However, the number of parameters in the VGG network structure is more than three times that of the DenseNet structure, and many parameters easily bring the risk of overfitting. Therefore, in this paper, the Den-seNet structure of the offline signature authentication algorithm based on the Siamese network is selected as the feature extraction network for feature extraction of signature images, and the Siamese network is composed for offline signature authentication. Through the experimental comparison in the previous paper, this paper uses the improved Siamese network for similarity metrics and combines it to implement an offline signature  The signature image is regionally divided to obtain seven signature regions, and these signature region subgraphs are substituted for the complete signature image to train the improved Siamese network. In this paper, experiments are conducted on the GPDS dataset, CEDAR dataset, and ChnSig dataset, and each signature region subgraph is fed into the improved Siamese network individually for similarity metric, and the signature authentication is performed based on this to evaluate the performance of the algorithm on different signature regions.
While cosine similarity performs well in areas such as text analysis, it performs poorly in signature authentication. This is because cosine similarity is insensitive to the length of the feature vectors and focuses only on the angle between the feature vectors. The small difference between a real signature and a skilled forged signature makes the angle between the feature vectors vary very little, which is evidenced by the fact that the threshold value of cosine similarity is very close to 1 in the experiments. Although the Euclidean distance performance is better than the cosine similarity, it cannot combine the characteristics of the feature distribution in the high-dimensional feature space and thus has insufficient performance compared to the metric learning performance. Moreover, the threshold of the Euclidean distance takes the whole negative domain, which will not facilitate the algorithm to adjust the threshold according to the risk. When the Siamese network uses metric learning for feature metrics, it achieves better performance in all datasets. As a result, the metric learning method is used in this work to conduct similarity metrics on signature characteristics before performing offline signature authentication.

Analysis of experimental results
In this section, the candidate region extraction stage makes better use of the color information of the color image based on the improved algorithm in the section and extracts stable polar regions using the improved MSER algorithm on R, G, and B channels, respectively. The performance difference between the improved MSER algorithm and the classical MSER algorithm in extracting the extreme value region on the gray scale is compared. To compare the performance differences of the three methods of the improved MSER algorithm for extracting polar regions on grayscale images, the experiments are conducted on the ICDAR2013 test set using equal MSER thresholds, and the specific experimental results are shown in Fig. 7. And in the experiments of this paper, the size of its feature population is set to 5, while the encoding is set to a five-digit binary chromosome to perform the individual selection. By calculating the selection probability of each feature, then the selection probability can be obtained by the proportional fitness function to select the chromosomes left behind. The offspring left by the fitness function is then the first offspring. As we continue to obtain offspring, each offspring is a better solution than the previous generation. Then what will be obtained eventually is the optimal solution that cannot be obtained from the offspring.
From the comparison of the whole convergence curves, the DE algorithm with better diversity has an advantage in text clustering in high dimensions and converges quickly to a good approximate optimal solution in the early stage of population update. When comparing the convergence curves of the IDEPSO and DEPSO algorithms, the IDEPSO algorithm eventually converges to a stable F value better, and its convergence speed is faster than DEPSO, implying that it takes fewer iterations and running time to achieve a better optimal solution, which better reflects that the operation of adaptively adjusting the number of clustering centers can improve. Comparing the convergence curves of each algorithm, it is easy to see that the IDEPSO algorithm makes full use of the features of both PSO and DE algorithms to obtain the optimal fitness value while maintaining a fast convergence rate and has a strong optimization capability and efficient operation efficiency.
The average convergence curves of F for each algorithm on the four datasets are given in Fig. 8. From the first 30 iterations, the convergence curves of PSO and GQPSO algorithms on the four datasets are very steep, converge to a low stable value very early, converge the fastest, and obtain the adaptation value F. The GAI-PSO algorithm is relatively flat, and the F curve increases steadily, while the convergence curves of DE, DEPSO, and the convergence curves of DE, DEPSO, and IDEPSO algorithms are close to each other and steeper but the converged F values are at a higher level, indicating that these algorithms can achieve better fitness values F in the early stage of population evolution. The IDEPSO algorithm converges almost completely to the highest fitness value F, except in the DS1 dataset. After 60 iterations, GAI-PSO starts to converge and stabilize, while the other algorithms converge completely.
Then the parameter cases of the PSO algorithm and DE algorithm are tested on these datasets, and the parameter combinations that achieve the best text clustering results are obtained. Finally, the IDEPSO algorithm proposed in this paper is tested on the test dataset with PSO algorithm, DE algorithm, and other types of improved algorithms of PSO, and then, the performance of these algorithms is compared in terms of internal evaluation metrics, convergence curves, external evaluation metrics, and stability to verify the effectiveness and feasibility of the proposed IDEPSO algorithm.

Conclusion
In order to investigate the crossover and variation problem, a new adaptive genetic operator is developed based on the individual fitness between average and maximum fitness, as well as the degree of population fitness concentration and dispersion in evolution, which nonlinearly adaptively regulates the operational process of genetic evolution and takes half of the maximum fitness. It avoids premature convergence of the algorithm, ensures population diversity, and gives full play to the local search advantages of the crossover and variation operators. This paper introduces the background and significance of offline character recognition from the beginning. The present state of handwritten character recognition research is addressed, followed by the fundamental concept and offline Chinese character recognition method. And an improved feature extraction algorithm is proposed for experiments. Finally, the improved character feature extraction algorithm is combined with a genetic algorithm to improve the recognition accuracy of feature extraction to a greater extent. The final experimental results are compared with the traditional feature extraction algorithm, and then, the corresponding conclusions are drawn based on the results. The main content of the improved algorithm is to put the features obtained from the characters into the undirected graph, then transform the character curve to the line, compare each line of features obtained from the characters with the same set, and then use the least deviation method to complete the recognition. The whole process belongs to the category of structural features. The recognition speed is fast, the recognition accuracy is high, and it has high operability.
Authors' contribution TX designed the model, collected dataset, performed the analysis, validated the results, written and reviewed the manuscript.
Funding This work is supported by the Project of Shandong Province Higher Educational Science and Technology Program: A Corpusbased Study on the Translation of Gaomi Dialect in English Versions of Mo Yan's Novels (No. J18RA238).