Multi-view support vector machines with sub-view learning

Multi-view learning improves the performance of existing learning tasks by using complementary information between multiple feature sets. In the latest research, multi-view learning model using privileged information is proposed; specific models are PSVM-2V and MCPK. In these models, views complement each other by acting as privileged information policies; however, a single view contains privileged information that can guide the classifier, and the existing framework does not consider it. In order to use this information to correct multi-view support vector machine classifier, we propose a framework for generating a series of small-scale views based on information hidden in a single view, which extends the original multi-view parallel structure to a hierarchical structure with sub-view mechanism. In this paper, two sub-view learning structures SL-PSVM-2V and SL-MCPK are constructed. The two new models fully exploit the data features in the view. Similarly, they follow the principles of consistency and complementarity. We use the standard quadratic programming solver to solve the new model. In 55 groups of classification experiments, the proposed model improves the classification accuracy by about 1.91% on the original basis. SL-MCPK ranks 1.3846 on average in the accuracy experiment, indicating that they have good classification ability. In addition, the computational time statistics and noise data set experiments are carried out to prove the effectiveness of the proposed method from multiple perspectives.


Introduction
In recent years, multi-view learning has become an active research direction of machine learning and has been applied to learning problems in different fields, such as image classification (Han et al. 2018;Sun et al. 2019;Zhang et al. 2020), brain network analysis (Ahmed et al. 2017;Appice and Malerba 2016) and treatment research (Chao et al. 2019 The real world is full of multi-view data, so the research of multi-view learning is becoming more and more popular.
Multi-view data are directly collected in the real world or extracted by different feature extraction methods. Studies have shown that various viewpoints are interrelated and complementary. Compared with single-view training and direct connection of different view data, multi-view learning can mine more information.
The classification models of multi-view learning research are mainly divided into three categories: co-training style algorithms, co-regularization style algorithms, and marginconsistency. In the co-training style algorithm, the machine learning method trains alternately on different views. For example, multi-view collaborative clustering algorithm Appice and Malerba (2016) and Kim et al. (2019) uses multiple collaborative training for document classification. The co-regularization style algorithm takes the divergence between different viewpoints as a new regularization term or constraint in the learning objective function. The typical methods are SVM-2K (Farquhar et al. 2005), multi-view LS-TSVMs (Xie 2018), multi-view LSSVMs (Houthuys et al. 2018), multi-view RSVMs (Li et al. 2018), etc.
Margin-consistency algorithm has the edge consistency style, and they model the marginal variables of multiple views as close as possible, so that the machine learning model can use the potential consistency of the classification results from multiple views, and the representative algorithms include MVMED (Li et al. 2018), SMVMED (Sun and Chao 2013) and MED-2C (Chao and Sun 2016).
Most SVM-based multi-view models only consider the consensus principle, while ignoring the complementarity principle. The consensus principle is to maximize the consistency between multiple views. The complementary principle emphasizes that each view contains some knowledge that other views do not have, and the complementary information is shared between the views to comprehensively describe the data. In summary, the principle of consistency and complementarity is an important basis for multi-view learning.
Learning using privileged information(LUPI) is proposed for accompanying or hidden information in the learning model (Vapnik et al. 2007). Using privileged information for learning has been widely used in many tasks, such as text clustering (Sinoara et al. 2014) and image recognition (Guo et al. 2018;Yan et al. 2016). In classification tasks, this information can provide an effective supplementary strategy for classification. Vapnik and Vashist first proposed a SVM-based model under LUPI: SVM+ (Vapnik et al. 2007). In order to reduce the calculation time, L.Niu constructed a new LUPI model using the improved L2 norm (Niu and Wu 2012). M. Lapin assigns different weights to samples with privileged information (Lapin et al. 2014). For support vector machines with different tasks, the version of privileged information learning is introduced. Xu et al. (2019) proposed an autonomous learning model using privileged information. Liang and Cherkassky (2007) used group information as privileged information. The training data can be grouped according to a feature attribute, and a formal optimization formula is sorted out. Che et al. (2021) proposed a twin support vector machine model for privileged information learning based on LUPI paradigm. Some other models have been improved in the SVM with privileged information to make the model more robust (Li et al. 2021), or the tolerance to noise data can be improved by modifying the loss function of multi-view model (?).
LUPI can be used for information complementarity and information interaction. Under the principles of consensus and consistency, a series of classification models applying privileged information to multi-view learning are proposed. The initial model is PSVM-2V (Tang et al. 2018b), and then, a version that can realize multiple views (more than two views) is proposed: IPSVM-MV (Tang et al. 2018a). PSVM-2V and IPSVM-MV give full play to the performance of complementary information between views, but the consis-tency constraint of regularization makes the model solving too complex and time-consuming in application. In the latest research, Tang proposed the MCPK (coupled privileged kernel method for multi-view learning) model. MCPK uses coupling terms in the original target to minimize error combinations in all views, thus ensuring consistency. At the same time, due to the existence of coupling terms, the complexity of model optimization is greatly reduced (Tang et al. 2019).
PSVM-2V and MCPK use privileged information as complementary constraints between views. However, each view also has its own privileged or structural information, which will guide the classifier to work better. We can get inspiration from using group information as privileged information and propose the concept of sub-view learning. Sub-views are generated from the original view and trained in the LUPI paradigm, which can form multiple sub-view correction planes to correct multi-view learning. Based on the existing multi-view learning methods, this paper expands the view framework and proposes a classification model based on sub-view learning.
The following is a brief description of our contribution: • In this paper, a new multi-view learning method is proposed by using the sub-view mechanism. This new method is easy to implement and can effectively use the hidden features in multi-view data. • Two new classification methods with sub-view paradigm, SL-PSVM-2V and SL-MCK, are constructed by using PSVM-2V and MCPK multi-view models, respectively. The solution strategy based on quadratic programming is given. • Compared with multiple multi-view learning, analyze their compliance with the consistency principle and complementary principle, and sort out the transformation method between each model. • The effectiveness of the new method and the classification performance of each data set are verified through multiple sets of experiments, and the performance of each classification method in the noise data set is compared. At the same time, nonparametric tests are carried out to verify the significant differences between models.
The paper is organized as follows: Section 2 describes the main related work, including learning using privileged information principle, PSVM-2V and MCPK. Section 3 introduces two new classification models: SL-PSVM-2V and SL-MCPK. Section 4 analyzes and compares several algorithms and shows the transformation method. Section 5 shows the experimental results and analysis. Section 6 is the summary of this paper and the prospect of future research.

Related works
In this section, we illustrate the strategy of privileged information learning and introduce two multi-view models, PSVM-2V and MCPK, as pre-contents for subsequent sections.

Learning using privileged information
Privileged information can be used as an additional feature to help specific classifiers work better. Training data are used according to privilege information, which contains additional information provided only in the training process rather than in the test process. Privileged information is ubiquitous and useful. Vapnik et al. (2007) developed the earliest LUPI model: SVM+ by incorporating privileged knowledge into SVM. Many experiments have proved a comprehensive theoretical explanation and algorithm description of privileged information learning is carried out to ensure and improve the prediction performance. SVM+ appears in the form of privilege features. These features are used not only to estimate the relaxation of constraints, but also to establish an upper bound for the risk of decision functions. The purpose of LUPI is to use privileged information to learn a model, so as to further constrain the solution in the original space. The SVM model that realizes LUPI at the training stage can be expressed as follows: (1) where w and w * represent the weight vector, b and d represent the bias term, C is used to balance the loss, and γ balances the weight of privileged information in the privileged space. The optimal parameters w and b are solved for classification prediction, and this w is the result of correcting the standard classification plane through the correction plane. The final function for decision making is:

PSVM-2V
PSVM-2V introduces LUPI paradigm into multi-view learning. The basic idea of PSVM-2V is that two views complement each other as privileged information. The learning structure uses the regularization term to bridge the gap between the two classifiers and correct the classification hyperplane.
Considering a multi-view classification problem, the training data is: where the label y i ∈ {−1, +1}, and each training sample is independently distributed. In data x A and x B , there is a constant term (1) connection at the end of each feature data to express the classifier without clear bias term. PSVM-2V is formally built as Eq. (3): where w A and w B are the weight vectors of view A and view B, respectively. Under the excitation of LUPI paradigm, PSVM-2V limits the nonnegative slack variables ξ A i and ξ B i of view A and view B through the unknown nonnegative correction function determined by view B and view A, respectively. Thus, the complementary principle is realized.
the consistency between the two views and uses the slack variable η i to weigh the number of points that fail to meet the ε similarity. C A , C B and C are nonnegative penalty parameters. γ is a nonnegative tradeoff parameter that weighs two views.

MCPK
MCPK is a simple and effective multi-view learning privileged coupling kernel method (Tang et al. 2018a). To inherit the advantages of PSVM-2V directly, MCPK takes one view as main information and another view as privileged information. This pair view, namely the main view and privileged view, shares complementary information. Similarly, MCPK models the complementary point of view by drawing on the idea of LUPI to achieve the complementary principle. In order to complete the consensus principle, MCPK introduces a coupling term to establish a bridge for two different viewpoints. This term forces the prediction error combination of the two views to be small. Therefore, information from both views is merged into the model, and the high error variable of a sample in one view can be compensated by the corresponding low error variable in another view. Formally, MCPK classification optimization problem can be built as follows: where w A and w B are the weight vectors of view A and view B, respectively, and the two views are weighed by the nonnegative trade-off parameter γ . As slack variables, ξ A i and ξ A i are constrained by the correction functions determined by the two views. The coupling term C l i=1 ξ A i ξ B i makes the product of error variables of the two views as small as possible. When classifiers constructed from different views are more consistent, errors from both views are small, resulting in smaller couplings. Therefore, its consistency can be fully ensured. C is a nonnegative coupling parameter that controls the influence of the coupling term. C A and C B are nonnegative penalty parameters.

Our proposed method
In this section, we first introduce the concept of sub-view partitioning and then describe in detail two novel models with sub-view mechanism: SL-SPVM-2V and MCPK, and push their solution strategies, respectively.

Primal problem
In multi-view learning based on LUPI, views are located in parallel and corrected as privileged information. However, each view also contains information that helps to achieve classification. Such information may come from the practical significance of a feature, or may come from the feature distribution of the data. Either way, it can be used as privileged information. We use this hidden information to divide the original view data into multiple groups. Since this subset is part of the view, we call it a sub-view. Sub-views be divided by the privileged information of the original view itself (illustrated in Fig. 1), or dividing foundation based on other views (illustrated in Fig. 2). Sub-views can form a clas- sification space different from the main view (Source view of sub-view), providing more accurate guidance for classification. The sub-views formed by the mutual guidance of different views can allow different views to share this privileged information and more strictly follow the principle of complementarity.
The sub-view learning considers multi-view training learning with privileged information. The training data are Sub-view data indexes in view A and view B in T A and T B , respectively. Sub-view regularization term and constraints on slack variables are added to transfer sub-view information, and kernel technique maps sub-view data to different feature spaces. As the sub-view learning framework is shown in Fig. 3, the final classification hyperplane is not only composed of information from two views, but also the correction plane formed by sub-views is corrected by LUPI paradigm. Sub-views are only used in the training classifier stage and do not exist in the prediction classification result stage. Two multi-view subview learning version classification models are introduced in 3.2 and 3.3, respectively.
In order to clearly explain the model in the paper, the main symbols and their meanings are given in Table 1:  Description Weight vectors for views A and B w r Asub , w r

Weight vectors for the r-th sub-view of view A and view B
Mapping to high-dimensional feature spaces (view A and view B) Mapping to high-dimensional feature spaces (r-th sub-view of view A and B) The sets of indexes for storing samples contained in sub-views

Sub-views learning for PSVM-2V
Sub-views learning for PSVM-2V(SL-PSVM-2V) adds a regularization term of sub-view under the framework of PSVM-2V and adds new constraints to the slack variables of the original model by using the privileged information learning strategy. SL-PSVM-2V model can be built as follows: In this model, w A 2 and w B 2 are regularization terms of view A and view B, respectively. w r Asub 2 denotes the r-th sub-view regularization term of view A, and w r Bsub 2 is the r-th sub-view regularization term of view B, which are nonnegative slack parameters. γ is the balance parameter to balance the weight of two views. γ A and γ B are the balance parameters of balance view A data and its sub-view and balance view B data and its sub-view, respectively.
in the constraint is the mapping transformation of the r-th sub-view data of view A, and φ r Bsub (x B i )in the constraint is the mapping transformation of the r-th sub-view data of view B.
In the constraint i denote that the slack variables are constrained by the correction hyperplane formed by sub-views, so the slack variables ξ A i and ξ B i can not only achieve the purpose of correlation correction between two views, but also achieve the purpose of correction of the original view by the sub-view under the background of sub-view learning. C A and C B are nonnegative penalty parameters.
The SL-PSVM-2V nonnegative slack variable η i is used to control the gap between the classifiers associated with the two views, so as to ensure their consistency principle. C is a nonnegative penalty parameter, ε is an error parameter that allows violation of constraints.
In order to obtain the solution of Eq.(5), we conduct dual optimization. The Lagrange function is: where i are nonnegative Lagrange multiplier vectors. According to the KKT (Karush-Kuhn-Tucker) principle, we can get the following equations: Substitute formula Eq. (7)- (17) to Eq.(6), the objective function of the dual problem can be transformed. Since By substitution, the original dual problem can be reorganized as Eq. (18): This is a quadratic convex programming problem, which can be solved by the quadratic convex programming method. Solving the optimal parameters α A After getting the optimal w * A and w * B , use the following formulas to predict the labels of the new samples (x A , x B ) from view A and view B: The final predictor of multi-views can be constructed as the average prediction factor of each view: Taking into account the absence of views in the test phase, if the absence of views can be used Eq.(21) or Eq. (22) to obtain results, otherwise use the multi-view classification decision function Eq.(23). In order to express clearly, the process of SL-PSVM-2V is given in Algorithm 1: 1: Select kernel function: kernels function of view A and view B: The kernel of the sub-views of A: , and initializing kernel parameters. 2: Create and solve quadratic programming problem Eq.(18) and using cross-validation to determine the optimal parameters. 3: Solving quadratic programming Eq.(18) and retaining optimal result

Sub-views learning for MCPK
Sub-views learning for MCPK(SL-MCPK) is a sub-view learning version of MCPK, and the coupling term is retained in the framework to achieve consensus principle and complementary principle. The model can be established as Eq. (24): Similar to SL-PSVM-2V, w A 2 and w B 2 are regularization terms for view A and view B, w r Asub represents the r-th sub-view regularization of view A, and w r Bsub represents the r-th sub-view regularization of view B.
Parameters ξ A = [ξ A 1 , . . . , ξ A l ] and ξ B = [ξ B 1 , . . . , ξ B l ] are nonnegative slack parameter. Under the background of sub-view learning, slack variables ξ A and ξ B can not only achieve the purpose of correction of the two views, but also realize the correction of the original view by the sub-view. As a coupling term, contains the sub-view information on the basis of the original balance of the errors of the two views.
Parameters C A and C B are nonnegative penalty parameters. In the constraint formula, is a mapping transformation of the r-th sub-view data of view A,φ r Bsub x B i is the mapping transformation of the r-th sub-view data of view B.
i represent that the correction plane generated by the privileged information of the sub-view constrains the slack variables to achieve the purpose of correcting the classification hyperplane.
Solving the optimal w * A and w * B in the above problem is to construct multi-view classifier. The dual optimization of Eq.(24) is established. The Lagrangian function is as Eq. (25): where as nonnegative Lagrange multiplier vectors. According to KKT principle, we can get: Substituting the results into Formula Eq.(25), the following optimization form can be obtained: In Eq.(37), according to the conclusion ξ as constraints. Therefore, the optimization problem can be transformed into Eq.(38): Like SL-PSVM-2V, quadratic convex programming is used to solve the problem. Through the quadratic programming problem Eq.(38), solving the optimal parameters After getting the optimal w * A and w * B , the labels for predicting new samples (x A , x B ) from view A and view can be derived from the following formula: For multi-view final predictions: The process of SL-MCPK is given in Algorithm 2:

Algorithm 2 QP Algorithm for SL-MCPK
Require: Data set: 1: Select kernel function: kernels function of view A and view B: The kernel of the sub-views of A: , and initializing kernel parameters. 2: Create and solve quadratic programming problem Eq.(38) and using cross-validation to determine the optimal parameters. 3: Solving quadratic programming Eq.(38) and retaining optimal result

Model comparison and transformation method
In this section, the classification models SL-PSVM-2V and SL-MCPK with sub-view structure are compared with three related multi-view classification models: PSVM-2V, MCPK and SVM-2K.
As an extension of the multi-view learning structure, the sub-view learning structure transforms the original multiview model in the objective function and constraint term. The expansion of the structure does not affect the compliance of the original model with the principles of consistency and complementarity. PSVM-2V integrates LUPI learning framework and KCCA style consistency constraints into a unified framework. MCPK can be used as an improved and more effective version of PSVM-2V and SVM-2K in multi-view learning. Therefore, SL-MCPK can be considered as an evolutionary version of SL-PSVM-2V. For the complementarity principle, SVM-2K ignores it. In contrast, PSVM-2V and SL-PSVM-2V implement it by connecting multiple views and privileged information, and MCPK and

Experiments
In this section, we show the experimental results of the new model. The experiment is carried out on multiple data sets, including 15 sets of data set constructed from Digits, 40 sets of data set constructed from Corel, and Ionosphere noise data set constructed to explore the model noise sensitivity. The experimental environment is a computer with i7-6500 CPU and 8 GB memory. The program runs in MATLAB 2018b in Windows 10 operating system. All the comparison algorithms are solved by CVX convex optimization toolbox (Grant and Boyd 2014).
Corel consists of 599 classes with 97-100 images representing semantic topics such as elephants, roses, horses, etc. To be exact, Category 238 contains 97 samples, Categories 342 and 376 contain 99 samples, and the rest contain 100 samples. Each image has eight pre-extracted feature representations that can be tested as eight different views. Three different pre-extracted features (Color Structure, Color Layout, Dominant Color) are selected from this data as different views for experiments (Eidenberger 2004  Ionosphere contains 351 samples (225 positive samples and 126 negative samples). The positive example is the example of the radar return, which shows a certain structure in the ionosphere, the signal through the ionosphere, and the negative example is not. Each method can perform well in noise-free ionosphere data. Therefore, these data are suitable for exploring the anti-noise ability of existing methods and new methods. Two types of experiments are carried out on the ionosphere data set. Gaussian distribution and standard deviation are used to generate noisy samples, namely the standard deviation σ i of the i-th feature in the training data is calculated, and the Gaussian noise with a range of [0, σ i ] and a size of (l, 1). The Gaussian noise is randomly added to the sample data as an increment in proportion [0, 0.05, 0.1, 0.15, 0.2, 0.25, 0.3]; therefore, seven groups of training sets with different proportions of noise can be generated. In addition, we randomly change the label symbol in proportion to compare the performance of different models in the experiment with noise labels.

SL-PSVM-2V and SL-MCPK are compared with the following benchmark methods.
SVM+_A and SVM+_B: SVM+ method uses the nonnegative correction function determined by privileged information to replace the slack variable of standard SVM. View B is used as privileged information in SVM+_A, and view A is used as privileged information in SVM+_B.

SVM-2K:
The SVM-2K method combines KCCA (Kernel Canonical Correlation Analysis) with two SVM models for two-view classification. It is the earliest multi-view SVM model and the basis of this series of classification methods.
PSVM-2V: PSVM-2V model uses privileged information to meet two principles of multi-view learning.
MCPK: MCPK method uses coupling term and LUPI framework for two-view classification

Standard of comparison
Average accuracy (Acc.) and average standard deviation (Std.) are derived from 10 replicate experiments to measure the performance of different methods and give average rankings. Furthermore, the receiver operating characteristic (ROC) curve is used to demonstrate the performance improvement of the new method (Beck and Shultz 1986). The average CPU running time of quadratic programming on CVX is selected to compare the computational complexity of different methods.

Performance on digits
The performance on the Digits dataset is shown in Table 2. The sub-view partition method in these 15 groups of experiments is based on the features of the view itself. The sub-view learning versions of SL-MCPK and SL-PSVM-2V are more competitive than the original MCPK and PSVM-2V. SL-MCPK model has the highest average accuracy. In 15 groups of experiments, the average ranking was 1.1538, only one ranking is 3, and its ranking is the first with absolute advantage. SL-PSVM-2V of the sub-view learning version is also better than PSVM-2V in accuracy. The experimental results are shown in Fig. 5 in the form of the line graph, which can more intuitively compare the differences of each model.

Performance on Corel
On the Corel dataset, the division of sub-views is formed by the mutual guidance of two views.  Table 4. The average accuracy of SL-MCPK on this dataset is 81.04%, with an average ranking of 1.3846, which is a leading result in many methods. The second is MCPK, with an average ranking of 2.3077. This is because the coupling structure has achieved good results in the experiment. In the experiment, SVM-2K does not take into account the complementarity of views, and SVM+ considered the view more simply, so their classification performance could not achieve the desired results. For the original versions of PSVM-2V and MCPK, the corresponding multi-view classification model of subview versions has achieved good performance scores. In order to facilitate comparison, the experiment results are displayed visually in Fig. 6 and Fig. 7.

ROC curve and AUC
A good classifier will try to minimize two types of errors: false-positive rate and true-positive rate. Two corresponding errors can be obtained by changing the threshold, and then, a ROC curve can be obtained. In order to comprehensively compare and show the performance improvement of the sub-view learning model based on the original model, we compare the ROC curves of PSVM-2V, SL-PSVM-2V,   MCPK and SL-MCPK. Area under the curve (AUC) under ROC curve can measure the performance of a classifier. The higher the AUC value is, the better the classification performance is. In Fig. 8, the results of the first 8 groups of digit experiments are shown. In Fig. 9 and Fig. 10, the ROC and AUC of the first 8 groups of Corel dataset multi-view experiment are shown. The model parameters of the prediction category are the best results in the quintuple. It can be seen in the diagram that the model with sub-view learning structure performs better than the model without sub-view learning structure.

Average computer time
In this section, we give the average training time and average prediction time and solve all models by CVX tools solver. The experimental data are constructed based on Ionosphere, the view A dimension is 54 and the view B dimension is 25, the data length is 200, and the test set length is 40. All models adopt RBF Gaussian kernel function and run fairly in the same solution environment. According to Fig. 11, since the coupling term replaces the consistency term, MCPK consumes less training time than other SVM-based multi-view benchmark test methods, and the data dimension used by SVM+ method is the size of a single view, so it is not included. MCPK is an effective learning model with less time consumption. In the corresponding sub-view learning method, SL-MCPK also requires less solving time than SL-PSVM-2V. However, due to the addition of constraints and regularization terms, the sub-view method will linearly increase the scale of quadratic programming. However, the solving time of SL-MCPK is second only to MCPK method, and it is still in the front position compared with SVM-2K and PSVM-2V. In the category prediction time consumption, the sub-view structure is not used in the classification decision stage, so the solution time is consistent with other multi-view methods.

Performance on noisy data
The experimental results on the noise data are shown in Fig. 12. Whether the noise exists in the feature data or in the label, the sub-view version models SL-MCPK and SL-PSVM-2K are less sensitive to noise than other multi-view support vector machine models without sub-view structure.
In the absence of noise, all models have good performance.
With the increase of noise ratio, the accuracy of the multiview support vector machine model of the sub-view learning version decreases slightly. It is worth noting that the PSVM-2V model has better anti-noise ability than the MCPK model. SL-PSVM-2V inherits this character as a sub-view learning version of PSVM-2V, and the SL-MCPK model has the learning strategy of the sub-view structure, which to some extent makes up for the noise sensitivity of MCPK.

Parametric analysis
In the framework based on sub-view learning, there are some more parameters than the multi-view model. Due to the influence of these parameters, we change some parameters at one time when fixing other parameters to study the influence of parameters on the model.The influence of parameters on SL-PSVM-2V model and SL-MCPK is explored in the random experimental group of Digits datasets and Corel dataset.
Through the intuitive display of the experimental results in Figs. 13, 14, 15, we can get the following conclusions: (1) The model accuracy increases with the increase in C A and C B , respectively. This means that too small a value can lead to poor classification performance. In addition, using C A = C B always achieves the highest average accuracy, which is why we set them on average in our experiments.
(2) γ a and γ B , as the trade-off parameters of sub-view participation, achieve better results when γ A = γ B = 1, which is an important basis . This also shows that the sub-views can play a benign correction effect on the original view. The above conclusions have important reference significance for parameter selection in sub-view learning framework.

Nonparametric statistical test
We use the nonparametric statistical Wilcoxon test to further study the performance differences between the proposed sub-view learning method and other methods. The test ranks the performance differences between the two classifiers of each data set, ignores the positive and negative symbols, and compares the rankings of positive and negative differences (Wilcoxon 1992;Demšar 2006). Let d i be the difference between the performances (Acc.) of the two classifiers on i-th out of N datasets. R + be the sum of the ranks of the data sets of one algorithm over another, and R − be the sum of the ranks of the opposite algorithms. Ranks of are d i = 0 split evenly among the sums.
Let T be the smaller of the sums, T = min(R + , R − ). The Wilcoxon test value p-value was obtained by z-value ) between methods.
We all know that if the test value is less than the confidence level of α = 0.05, there is a significant difference between the proposed method and the baseline. From the information in Table 5, it can be seen that the performance of the proposed method and the comparison method is significantly improved. Compared with two models with sub-view structure, SL-MCPK has better performance.
In this section, we use the Friedman test for further analysis. We first assume that the algorithms involved in the experiment are the same, and the Friedman statistic can be calculated by Eq. (45): where R j = 1 N j r i j and r j i is the j-th of k model on the i-th of N datasets. The test statistic obeys a k-square distribution with degrees of freedom k − 1 when both k and N are big enough.
In the accuracy ranking of 55 experiments on the Digit and Corel datasets, the formula X 2 F = 107.4220, and the p − value of the hypothesis test is 2.5804e−22. When α = 0.05, the p − value is much smaller than α. This means it denies the above null hypothesis. In other words, the differences between these methods are obvious.

Conclusion and future works
This paper presents a multi-view support vector machine strategy called sub-view learning. This method extends the model structure of multi-view support vector machine  based on privileged information learning. The sub-views are divided by considering the privileged information in the view, which can be solved by the dual problem of quadratic programming. Through the given model transformation method, the specific models SL-PSVM-2V and SL-MCPK are the sub-view learning versions of PSVM-2V and MCPK, respectively. We conducted experiments on 55 multi-view datasets to verify that the new method has better performance and can effectively use more comprehensive data feature information to guide the generation of classifier. At the same time, the experimental results on noisy data sets show that the new method has better anti-noise ability. Of course, the sub-view learning mechanism will also bring some defects. The process of forming the correction plane through different mapping methods will bring more parameters than the traditional model, which increases the burden on the training of the model. In the future work, we plan to consider more basis for dividing sub-views and try to apply the multi-view learning containing sub-views to the regression task. previous versions of the manuscript. All authors read and approved the final manuscript.

Funding The authors have not disclosed any funding.
Data availability Enquiries about data availability should be directed to the authors.

Declarations
Conflict of interest The authors declare that they have no conflict of interest.
Human participants or animals This article does not contain any studies with human participants or animals performed by any of the authors.