Offline handwritten Tai Le character recognition using wavelet deep convolution features and ensemble deep variationally sparse Gaussian processes

Handwriting recognition is an important application in pattern recognition. Because handwritten Tai Le has similar characters and the proportion of characters with similar shapes is relatively high, this paper proposes a wavelet deep convolution (WDC) feature combined with an ensemble deep variational sparse Gaussian process (EDVSGP) offline handwritten Tai Le recognition method. A Tai Le recognition sample database is constructed, 2 and 3 levels of wavelet decomposition prior features are extracted, and then, the wavelet a priori feature is transformed into a WDC feature. To avoid the dimensional disaster caused by feature fusion, a dimensionality reduction method combining LDA and PCA is used to reducing the dimensionality of the fused features without reducing the accuracy rate. Moreover, to better recognize handwritten Tai Le characters, 6 deep variationally sparse Gaussian process (DVSGP) models are integrated as an EDVSGP classification model. Using the handwritten Tai Le character database, the proposed method is superior to the existing methods, with an accuracy, recall and F1-score of 94.15%, 94.15%, and 94.15%, respectively. The universality of the method is verified on a dataset of handwritten Chinese characters with similar shapes (HCFC2021.4) and the Devanagari dataset.


Introduction
The use of optical character recognition technology in printed Chinese character recognition began in the last century. Since the 1960s, traditional OCR technology has performed very well in the recognition of modern printed characters, but there are still challenges in handwritten character recognition (Zhang et al. 2018). According to the feature domain knowledge, the traditional normalizationcooperated direction-decomposed feature map (directMap) and convolutional neural network (CNN) were proposed to significantly improve Chinese character recognition accuracy (Zhang et al. 2017). In the field of recognition networks, multilingual text recognition networks (MultReNets), as new multitext recognition frameworks, have been used in recognizing French, Urdu, etc., achieving good results (Chen et al. 2020). Typical recognition methods mostly use manually extracted features; for example, Bhattacharya et al. proposed a new substrokewise relative feature for Indian character recognition, which achieved good results in the Bangla database (Bhattacharya et al. 2018). A good feature also requires a classifier with excellent performance. The success of deep learning largely relies on expensive computing resources. Therefore, reducing resource consumption on the basis of the same recognition accuracy has become the direction of future  (Sandler et al.2018;Howard et al. 2019;Wang et al. 2020b, a). Tai Le is a script widely used by Dai people living mainly in the Southwest Region of China. The Dai population is close to 2 million. At present, research on offline Tai Le recognition is still lacking, and there is no public offline handwritten Tai Le database. Tai Le is very different from Chinese characters and other characters, and it has its own characteristics, as shown in Fig. 1. There are three main differences. (1) The similarity between classes of Tai Le characters is greater than that of Chinese or English characters. These highly similar characters pose a significant challenge to recognition work. (2) Although Tai Le has only 35 characters (5 tonal characters, 11 vowel characters, and 19 consonant characters), its similar characters account for more than 70% of the total category. (3) Due to different styles of individuals writing Tai Le and their randomness of writing, the within-class discrimination of the same category becomes larger, and characters are prone to deformation. These large and highly similar characteristics and the deformation caused by random writing have brought great challenges to the recognition of Tai Le characters. Therefore, offline handwritten Tai Le character recognition cannot completely imitate other word recognition methods; considerable manpower and material resources are required for related research.
Considering the special characteristics of handwritten Tai Le characters, a wavelet deep convolution (WDC) feature combining the wavelet decomposition (WD) feature and the deep convolution (DC) feature is proposed. The WD method decomposes the original images into multiple characters through different levels of decomposition. Combining different levels of WD features will obtain more Tai Le information and enhance the degree of difference between different text categories (Melnyk et al. 2020;Sdiri et al. 2019). Additionally, the excellent performance of the deep convolutional neural network in feature extraction is used to further extract WDC features. This method not only retains the excellent effect of wavelet decomposition features in detail feature extraction but also allows the use of the CNN's effect in feature extraction.
Classifiers based on CNNs have achieved good recognition results on text samples; however, achieving good recognition results with deep convolutional neural networks requires a large number of training samples. For a handwritten Tai Le database with a small sample size, CNNs fail to obtain good recognition results. The deep Gaussian process (DGP) is a nonparametric probabilistic model whose training is based on the marginal likelihood. It not only considers data fitting but also considers the model complexity. Compared with CNNs, it reduces the trouble of tuning parameters and is not in danger of overfitting. More importantly, the DGP retains good training effects even with a small sample dataset (Damianou et al. 2013). Compared with the traditional DGP model, the DVSGP model used in this paper not only retains the above advantages but also solves the problem of the traditional Gaussian process in which there are a large number of computational and storage procedures when the input data are large (Titasias et al. 2009). Applying the DVSGP model to WDC features can effectively describe the character details of handwritten Tai Le characters. However, although a single DVSGP model will achieve good results when performing text recognition, the final recognition rate cannot be increased by continuously increasing the depth of the DGP. Therefore, an ensemble deep variationally sparse Gaussian process (EDVSGP) is established by ensembling the DVSGP.
The WDC feature and the EDVSGP model proposed in this paper overcome the above problems of handwritten Tai Le character recognition. The main contributions of this research are as follows: (1) At present, there is no offline handwritten Tai Le character database. Therefore, we construct an offline handwritten Tai Le character dataset SDH2021.4 with 35 categories and use it as a test benchmark.
(2) For the problem of a large number of similar characters and high similarity between classes in The remainder of this article is organized as follows. Section 2 briefly reviews the research results of other people in the field of text recognition and the application of the DVSGP. Section 3 introduces the preprocessing method for handwritten Tai Le characters. Section 4 describes the constructed WDC feature. Section 5 introduces the recognition model of handwritten Tai Le characters based on EDVSGP. Section 6 describes the constructed handwritten Tai Le character dataset and presents the experimental results of the proposed method on the dataset. Finally, Sect. 7 presents the concluding remarks.

Related works
With the development of pattern recognition, new methods have been proposed to constantly improve text recognition accuracy. These methods mainly focus on practical problems from two aspects: feature extraction and classifier models. The following focuses on these two aspects.
In terms of feature extraction, Narang et al. used SIFT features and Gabor filter features to extract a priori features from Devanagari handwritten text, used SVM to complete classification, and achieved good results (Narang et al. 2020). The traditional convolution feature can obtain the end-to-end data features, but it is relatively easy to ignore the spatial order attribute of the image. Ying et al. proposed a new NCN network that uses the convolution structure to extract the image features and uses the bidirectional circulation module to further obtain the spatial features of the image. The experiments in this paper also prove the effectiveness of this model (Raj and Abirami. 2020). Facing the problem of frequent recognition errors in the online signature verification system, Diaz et al. proposed a novel feature space to efficiently describe online signatures and finally achieve good results (Diaz et al. 2019).
Improving text recognition accuracy is primarily focused on improving the classifiers. Xu et al. designed a new network for LightweightNet to address the high computational cost and large memory cost in a deep neural network. The new network is faster and uses less memory than traditional feature-based manual classifiers; it achieved good results (Xu et al. 2019). Facing the high consumption of computing resources, Han et al. proposed a new network model (GhostNet) that reduces the consumption of computing resources by applying a series of linear transformations while maintaining high accuracy in recognition (Han et al. 2020). Tan et al. proposed a new mixed depthwise convolutional kernel (MixNet) that achieved a good improvement in recognition accuracy (Tan et al. 2019a). They also proposed a new efficiency network (EfficientNet) wherein the accuracy of top-1 on ImageNet was achieved with improved calculation speed (Tan et al. 2019b).
Although deep learning has achieved good results in many aspects, its development requires much data to support it. The DGP is a Gaussian process with hierarchical structures that retains all the characteristics of the Gaussian process. More importantly, DGP can be well-trained with a small sample space with no danger of overfitting. Facing the problem that traditional classifiers are prone to overfitting in volcano-seismic events, López-Pérez et al.  (Nguyen et al. 2020).
Although good results on handwritten characters have been achieved in academic circles, recognition research on handwritten Tai Le characters still needs further development. There are only 35 types of offline handwritten Tai Le characters; however, similar characters account for more than 70% of the total categories, and the gap between the categories is extremely small. However, due to the randomness of writing, a high probability exists that the characters are deformed, further increasing the recognition difficulty and making the recognition of handwritten Tai Le characters more challenging. To solve this problem, a WDC method is proposed to obtain prior features. The WD feature obtains multiple wavelet subimages by multilevel decomposition of the original images, which can increase the degree of difference between different character images. The deep convolution feature can further obtain highlevel semantic information and character contour information as the depth increases (Yu et al. 2020). This method solves the problem of a large number of similar characters in offline handwritten Tai Le characters. The DGP has strong adaptability, is difficult to overfit, and requires a small number of calculations. It has also been proved to be an excellent recognition framework. Therefore, the DVSGP model is used as the base model of the classification model to further build the EDVSGP, finally realizing the recognition of handwritten Tai Le characters.

Preprocessing
In the process of collecting the original images of handwritten Tai Le, the existence of noise in the handwritten Tai Le images has a great impact on the recognition. In this article, a median filter is used to eliminate this noise (Yang et al. 2020). Figure 2a shows the handwritten Tai Le character images after the denoising process. Figure 2b shows the geometric transformation of the Tai Le character images. The aspect ratio of the original characters is preserved through standardization. After coordinate standardization, each character is placed in the center of the standard coordinate system, while the shape of the text remains unchanged.

WD feature extraction
Compared with the feature extraction of other characters, the feature extraction of offline handwritten Tai Le characters is more focused on the microdimension of the character. The wavelet decomposition feature decomposes the Tai Le images by filtering in the horizontal, vertical, and diagonal directions, and multiple detail subgraphs are obtained, as shown in Fig. 3. Figure 3a and b shows the two-level and three-level wavelet decomposition characteristic diagrams of the character '' '' Different levels of detail subgraphs can enhance the difference in character detail information. Additionally, we combine multilevel wavelet decomposition features and use 2-level and 3-level wavelet decomposition features to extract handwritten Tai Le characters. Different levels of features result in better prior features.  Zisserman. 2015), ResNet-18, ResNet-50 (He et al. 2015), and the Inception v3 network  are chosen for further extraction of the WDC candidate model. The specific experimental results are shown in the sixth section. Through experimental comparison, the AlexNet network and the ResNet-50 network are selected as the models for further extracting WDC features. The wavelet decomposition prior features extracted from the handwritten Tai Le character samples are input into the AlexNet network and the ResNet-50 network for training and to extract deep convolutional features in the previous layer of the softmax classification layer. Then, the AlexNet network extracts 128-dimensional features, and the ResNet-50 network extracts 2048-dimensional characteristics. The extracted AlexNet convolutional features and ResNet-50 convolutional features are combined to increase dimensionality, and a 2176-dimensional convolution feature is obtained. For the handwritten Tai Le character database, different networks can obtain different prior features due to different convolutions and parameters. However, a betterquality prior feature can be obtained by complementing different prior features. Figure 4 shows the structure of a prior feature extraction model for character convolution of some handwritten Tai Le characters. The convolution layers extract features from the input matrix through the convolution operation of the input nodes, and the multilayer convolution overlay adds high dimensionality and abstract features to the image data. The fully connected layers take the output feature maps of the conv layer and the max pool layer as inputs. To prevent overfitting during the pretraining model training process, dropout is also introduced. To integrate different styles of deep convolutional features, the number of neurons under the AlexNet network is adjusted to 128, while the number of neurons under the ResNet-50 network is not adjusted and remains at 2,048. Finally, 128-dimensional AlexNet features and 2048-dimensional ResNet-50 features are obtained, and the WDC feature is formed by combining them.

WDC feature extraction
When training the AlexNet network and the ResNet-50 network, the initial epochs are set to 50, and the learning rate is set to 0.0001. After 50 iterations of training, the pretraining model of the CNN is obtained, and the deep convolution feature is further extracted from the wavelet feature of the handwritten Tai Le character. The CNN is more effective because it uses different levels of convolution layers and pooling layers for handwritten Tai Le characters. The wavelet decomposition feature is used to obtain low dimensionality and high-level features. If the convolution layer or pooling layer is missing, then the final recognition effect will be affected to a certain extent. The last softmax layer is more of a calculation method to convert the output of the previous layer in a probabilistic method to complete the classification function. Therefore, finally, we decided to use the reciprocal second layer of the fully connected layer as our feature extraction layer to obtain our final WDC feature.
Feature dimension overlay causes many difficulties during classification and even causes a dimensional explosion, which seriously affects the recognition results. Therefore, dimensionality reduction is performed on the extracted features. Three dimensionality reduction methods, namely PCA, LDA, and PCA ? LDA dimensionality reduction, are tested. The specific experimental results are shown in Sect. 6. Through specific experimental comparisons of different dimensionality reduction methods, we finally decided to use a combination of PCA and LDA to perform dimensionality reduction. First, the PCA method is used to reduce the dimensionality of the 2-3 wavelet deep convolution feature, which plays a role in noise reduction and mitigating the dimensional disaster. Then, the LDA method is used to reduce the dimensionality. Dimensionality reduction not only reduces the difficulty of the later classification but also improves the accuracy of recognition and speeds up the training.

DVSGP
The DGP was first proposed by Damianou et al. in 2012 (Damianou andLawrence. 2013). The DGP is a deep belief network based on Gaussian process mapping and has the property of the probability of the Gaussian process. Additionally, it can overcome the limitations of the Gaussian process. Considering the hierarchical process, the algorithm complexity of the DGP is equivalent to the training complexity of a layer of a convolutional neural network, which can be extended to large datasets and reduce the training time (Hebbal et al.2018). Because of the increased part of the induced variables, these induced variables decompose the model in the necessary form to perform the variational inference principle. This allows for variational inferences on very large datasets, which greatly increases the size of the dataset to which the Gaussian process can be applied. Thus, it can be used in handwriting recognition (Shutin et al.2011). The deep variational sparse Gaussian process (DVSGP) model is set as follows. Given data input X ¼ x n f g N n¼1 and output Y ¼ y n f g N n¼1 , suppose that an implicit function is extracted from a Gaussian process with a covariance function x with h zero mean and a hyperparameter of k x; x 0 À Á . The latent vector f represents the value of the function at observation point f ¼ f x n ð Þ f g N n¼1 and has a conditional distribution p fjX; h ð Þ¼N fj0; K ff À Á , where K ff is a matrix that is composed of all covariance functions in X. The likelihood of the data depends on the potential function value p yjf ð Þ. Predicting the potential test point x Ã 2 X Ã and integrating the posterior function and parameter, When the quantity of input data X is large, the covariance function k x; x 0 À Á will require many calculation steps to calculate its inverse log-likelihood and prediction. To reduce the number of calculations, a variational induction point framework is used. An additional induced point is introduced as input Z, and the response of the function at this point is inputted into the vector u Under the variational posterior q u,h ð Þ, the prediction of the new point after adding the induction point is expressed as: The computational complexity of the Gaussian process model after induced point processing is reduced from O(M3) to O(LNM 2 ), and this variational inference framework is used to train the model by minimizing the Kullback-Leibler divergence between the approximate and real posteriors. Figure 5 shows a schematic diagram of the model of the DVSGP. The DVSGP also uses the variational sparse framework. To apply the variational inference principle to the Gaussian process, a set of global variables is needed (Hensman et al. 2015). First, a variable U is introduced. Then, a change distribution range qðuÞ is defined, and pðyjxÞ is used to define the lower limit as log pðyjXÞ ! h1 1 þ logpðuÞ À logqðuÞii qðuÞ ,1 3 According to Eq. (4), the Gaussian distribution is the best distribution; its parameter is changed to qðuÞ, and 1 3 is changed to where k i is the column vector i th of k mn and i ¼ bK À1 mm K i K T i K À1 mm , and the gradient of 1 3 to the qðuÞ parameter is Then, we let the derivative be 0 to find the optimal solution. The best global variable is derived from the above formulas.
In the Gaussian process, the choice of kernel function is very important. The commonly used kernel functions include the radial basis kernel function RBF kernel, the linear kernel and the polynomial kernel. In this paper, the RBF kernel is selected as the kernel function of the DVSGP because the RBF kernel can map samples to a higher-dimensional space and can handle the nonlinear relationship between the class labels (Mesquita et al.2020). In addition, the RBF kernel has fewer parameters, and the numerical calculation is less difficult. Therefore, in the DVSGP model, RBF is used as the kernel, that is, the covariance function of the Gaussian process is expressed as where K is the hyperparameter, which defines the characteristic length scale of the similarity between samples. Finally, according to the principle of variational inference, the lower limit of evidence ELBO is maximized, In the multilayer setting of DVSGP, a form-similar approximation is used where F l is the output matrix of the l-th Gaussian process or the input matrix of the l ? 1-th Gaussian process. At this time, ELBO can be expressed as In this paper, a deep Gaussian model is established with 2-8 hidden layers based on the DVSGP. The width of the hidden layer from the input node to the output node gradually decreases, which is equivalent to completing a new feature extraction every time a mapping is completed. To obtain more advanced and compact feature points, softmax is selected as the output function.

Ensemble deep variationally sparse Gaussian processes
A single DVSGP can achieve fairly good results in handwriting recognition data, but to further obtain a better recognition effect, the voting method is used to integrate DVSGP models. The voting method is a simple and convenient ensemble method. It aggregates the predictions of multiple base classifications and uses the category with the most votes as its own prediction. Figure 6 shows the EDSVGP framework. The base classification models are the DVSGP models constructed above with 2-8 hidden layers. First, we extract the 2-level wavelet decomposition features and the 3-level wavelet decomposition features from the original data image, collect the 2-level and 3-level wavelet decomposition features, and then extract deep convolution features from the wavelet decomposition features. The wavelet deep convolution features extracted by increasing the dimensionality cause a significant burden on the DVSGP classifier and have a significant impact on the final results. Therefore, it is necessary to reduce the dimensionality of the features after extracting the convolution. Finally, the DVSGP feature data after dimensionality reduction are put into the EDVSGP classifier for final classification.
Experience and experimental methods are used in the selection of basic models. First, the DVSGP models with 2-9 hidden layers constructed above are selected as the base model, and the above models are further tested on handwritten Tai Le character data. The specific experimental data are shown in Sect. 6. By comparing the confusion matrix of the models under different hidden layers, the recognition errors of different DVSGP models are different. For example, the DVSGP model with 2 layers has 8% of the ( ) of class 10 identified as the ( ) of class 12. However, only 2% of the 6-layer DVSGP model classifies the 10th category into the 12th category. Therefore, the models of different hidden layers can be integrated by the voting method to form a class supplement, thereby improving the accuracy of the recognition rate.
To further determine the number of integrated models, experiments are conducted on different numbers of EDVSGPs. The above 8 DGP models with different hidden layers were matched according to different numbers, and the specific parameters are presented in Sect. 6. When 8 base classifiers are integrated, which means that 8 sparse variational Gaussian processes are added to the common voting, the accuracy rate does not increase significantly, but the training time increases for the deep structure. Therefore, according to the experimental results, 7 base classifiers are selected to establish the EDVSGP classifier through the voting method.

Datasets
To verify the method proposed in this paper, the proposed WDC features and EDVSGP classification model were evaluated on three handwritten character datasets, including our handwritten Tai Fig. 7. SDH2021.4 is divided into 35 categories, and each category has 1265 images. Thus, there are a total of 44,275 image samples, and each sample is a 64 9 64 binary image. The handwritten character data of each category are balanced and diversified. According to the ratio of 7:3, the dataset is divided into a training set and a test set. In each category, the training set and the test set contain 886 images and 379 images, respectively. To show our dataset more clearly, we uploaded the dataset to the GitHub platform (https://github.com/melody-lyf/Ensemble-DGP/ tree/main/SDH2021.4).

WDC feature implementation details
The WDC feature is a combination of 2-level and 3-level wavelet decomposition prior features and deep convolution features. To obtain the best convolution features, the fully connected layer of the previous layer of softmax classification of different deep network models is used as the WDC feature we extracted, and then, a single DVSGP-3 model is used for classification and comparison. In this article, the classic models of the AlexNet, VGG-16, VGG-19, ResNet-18, ResNet-50, and Inception v3 networks are selected as our candidate models for further extracting the wavelet deep convolution model. Table 1 shows the experimental results of WDC features extracted from different classic networks under the handwritten Tai Le WD prior features. Among the features of single networks, the highest recognition rate is the WDC feature of the ResNet-18 network, which achieves an accuracy of 92.96%. Because different network structures can obtain different feature information, the WDC features formed by different network structures are further collected into a multifeature fusion WDC feature dataset. In this paper, the numbered 1, 2, and 3 features with the highest recognition rate are combined for experiments. As shown in Table 1, the feature numbered 7 after the combination obtains the best recognition result. The recognition rate of feature number 10 after combining the three networks is only 0.08% higher than that of feature number 7, but the feature dimension is expanded sharply, which is unnecessary for the workload. Therefore, the AlexNet network and the ResNet-50 network are selected as the WDC features of the handwritten Tai Le characters for further classification.

EDVSGP implementation details
The EDVSGP model is integrated by the DVSGP model of 2-8 layers through an absolute majority voting method, as shown in Sect. 5. As the basic classifier of EDVSGP, DVSGP exhibits greater model flexibility and less data dependence. Compared with other ensemble methods, DVSGP also reduces the consumption of computing resources during ensembles. In this paper, Adam is adopted to optimize the training process. By setting the number of iterations to 30, the PSO optimization algorithm is used to optimize the learning rate. The final optimal learning rate is 0.01. Then, the obtained optimal learning rate is put into the DVSGP of different hidden levels to obtain the optimal number of iterations. Table 2 lists the classification results of the recognition of offline handwritten Tai Le characters by DGP with different hidden levels under different iteration times. The iteration results of each base classifier start to reach a large value after 20 iterations.

Experimental results
To reduce the loss of features caused by dimensionality reduction, PCA and LDA are combined. Table 3 shows the specific recognition results. The accuracy of the best recognition is improved by 0.34%, the training time is reduced by 27.36%, and the recognition time of a single text image is reduced by 33.9% compared with that before dimensionality reduction. The experiment fully verifies the effectiveness of the EDVSGP model on the WDC feature and shows the effectiveness of the dimensionality reduction method. To show the effectiveness of the WDC features and the EDVSGP model on handwritten Tai Le character data more intuitively, a confusion matrix is drawn. As shown in Fig. 8, the WDC feature and the EDVSGP model achieved good results on the handwritten Tai Le character data, with a relatively small overall error ratio. The error categories are all less than 11% (8%, 7%, 6%, 4%, 3%, 2%, and 1%). Among them, 11% of the classification errors recognize the 22nd type of character ( ) as the 13th type of character ( ). In addition, 8% of the classification errors recognize the 7th type of character ( ) as the 12th type ( ). Seven percent of classification errors recognize the 28th type of character ( ) as the 4th type of character ( ). Six percent of the classification errors recognize the 26th type of character ( ) as the 17th type of character ( ). Four percent of the classification errors recognize the 11th type of character ( ) as the 7th type of character ( ). Three percent of the classification errors recognize the 8th type of character ( ) as the 28th type of character ( ). The main reason for recognition errors is the large number of similar characters in handwritten Tai Le characters, which have a great impact on recognition. Table 4 lists specific errors of classification. The effectiveness of the WDC feature and the EDVSGP model on the handwritten Tai Le character dataset is verified.

Comparative experimental results of different characteristics
To verify the effectiveness of the WDC features proposed in this paper, the WDC features of handwritten Tai Le characters and the original images of handwritten Tai Le characters are compared on traditional classifier models, as shown in Fig. 9. The recognition results of handwritten Tai Le character original images on traditional classifiers are all less than 60%, and the overall recognition rate remains at approximately 50%. The proposed WDC feature achieves a relatively good recognition result on traditional classifiers. The results not only verify the effectiveness of the proposed WDC feature but also the efficiency of the EDVSGP model. To further verify the effectiveness of the WDC features proposed in this article, grid pixels, direction line elements, and 2-3 levels of mixed WD features and WDC features are extracted and tested on traditional classifiers, such as the decision tree (DT), SGD, logistic regression (LR), MLP classifier (MLP), and EDVSGP. The test results are shown in Fig. 10. Compared with other features, the WDC feature obtains the best recognition results on traditional classifiers. This shows the effectiveness of the proposed EDVSGP model in recognizing each feature. Therefore, compared with other features and classifiers, the WDC feature and the EDVSGP model are more suitable as feature and classification models of handwritten Tai Le characters, achieving better recognition results.

Comparison of different ensemble methods
To further verify the important influence of different numbers of DVSGP models of the EDVSGP model on the recognition of handwritten Tai Le characters, several experiments are conducted. As shown in Table 5, there are different numbers of DVSGP models, and the specific comparative experimental results are shown in Fig. 11. The experimental results further verify the effectiveness of the proposed EDVSGP model on handwritten Tai Le characters.

Comparison with other approaches
To further verify the effectiveness of the WDC feature and the EDVSGP model proposed in this paper, an experiment   Table 6. Compared with these deep models, the combination of the proposed WDC feature and the  To verify the universality of the WDC features and the EDVSGP model proposed in this paper, 100 types of similar Chinese characters are selected based on the HWDB1.1 handwritten Chinese character dataset (Liu et al.2011). In this way, the HCFC2021.4 dataset is constructed. Figure 12 displays some similar Chinese characteristics. There are 298 images in each category for a total of 29,800 images. Each sample image is normalized as a 64 9 64 binary image. Then, according to the ratio of 8:2, the handwritten Chinese similar character dataset is divided into a training set and a test set, balancing each type of dataset. We uploaded the Chinese similar character dataset to the GitHub platform (https://github.com/melody-lyf/ Ensemble-DGP/tree/main/HCFC2021.4). Table 7 shows a comparison experiment between the original images of the constructed HCFC2021.4 and the WDC features we proposed on the deep model. Compared to the recognition results of the original images on the classic network, the proposed WDC feature and EDVSGP model achieved the best recognition results, reaching an

Recognition on Devanagari
This paper verifies the applicability of the proposed 2-3level wavelet depth convolution feature and EDVSGP model on the Devanagari character dataset. The Devanagari character dataset is composed of 92,000 images in 46 different categories. The dataset is divided into a training set and a test set according to the ratio of 8.5:1.5. From the data in Table 8, we can see that the extracted WDC features have the best recognition results in each depth classification model, the recognition rate is 98.63%, and good experimental results are obtained. The test time of the EDVSGP model proposed in this paper is much lower than that of other classical network models in a single image, as shown in Table 9. The average single-image recognition time of other classical network models is 3.693 ms, which is 30 times that of this method. Experiments show the effectiveness of this method.

Conclusions
In this paper, a handwritten Tai Le character recognition method is proposed that combines the wavelet depth convolution feature (WDC) extraction method with the depth variational sparse Gaussian model (EDVSGP). First, the WDC features are extracted from the original images of the constructed handwritten Tai Le characters. The traditional wavelet decomposition prior feature extraction method and the deep convolution feature extraction method are combined to obtain clearer character details and solve the problem of a large number of similar characters in the handwritten Tai Le character set. Furthermore, the extracted WDC features are put into the EDVSGP model after dimensionality reduction with PCA and LDA. The DVSGP is highly adaptable to data, has fewer training parameters and is less computationally difficult when obtaining the final recognition results. As a result, recognition accuracy of 94.15% was achieved, while the recognition time of a single image was much shorter than that of other network methods, thus reaching our goal. However, our dataset is still a small-scale dataset. In the future, we plan to expand the sample library again and further improve the model to achieve better recognition effect.
Moreover, the proposed WDC method and EDVSGP model were verified on the HCFC2021.4 datasets and Devanagari dataset. The experimental results showed that the proposed method outperforms other network models recognizing similar characters. In the future, research on dataset expansion, feature dimensionality reduction and large-category character recognition will be conducted. Data availability Enquiries about data availability should be directed to the authors.

Declarations
Conflict of interest The authors declare that they have no conflicts of interest.
Human and animals rights This article does not contain any studies with human participants or animals performed by any of the authors.
Informed consent All referred studies are highlighted in the literature review.