Incomplete multi-view clustering based on low-rank representation with adaptive graph regularization

Incomplete multi-view clustering has attracted attention due to its ability to deal with clustering problems with incomplete information. However, most existing methods either ignore the local structure of the data or fail to consider the importance of different views. In addition, some methods based on mean filling may easily introduce useless information when the data has a large missing rate. To address these issues, this paper proposes an incomplete multi-view clustering algorithm based on graph regularized low-rank representations without using filling method. Specifically, we combine a distance regularization term and low-rank representation-based non-negativity constraints to directly learn graphs with global and local data structures from raw data. Furthermore, we introduce a novel weighted fusion mechanism in the model to learn a consistent representation of all views, which effectively avoids bad views from affecting the quality of the final fused consensus graph. Experimental results on six incomplete multi-view datasets demonstrate that our proposed method achieves the best performance compared with the existing state-of-the-art methods.


Introduction
In the era of information explosion today, the amount of data has continued to increase and achieve its large scale. Among the many data, how to find out the useful information has become the focus of people's attention. Clustering Wang et al. 2020;De Amorim and Hennig 2015;Bickel and Scheffer 2004;Bettoumi et al. 2019;Wang et al. 2020), one of the most important and basic tools for multivariate data analysis, has been extensively available in image processing, recommender systems, bioinformatics and other areas of research. Clustering analysis is the process of dividing a collection into multiple clusters according to the relationship between data objects and maximizing the intra-cluster similarity and inter-cluster dissimilarity of different clusters (Wong 2015), so as to mine effective information such as the inherent hidden structure in the data. If explained from the level of machine learning, clustering is an unsupervised learning method that can process data with unknown label information and extract useful information. In real life, sample labeling is time-consuming and laborious, so clustering has received extensive attention.
It is worth noting that a large part of existing clustering methods are proposed based on single-view data, but in the practical application of big data analysis, the data comes from different fields. This kind of data is called multi-view data. However, it is impossible to obtain the expected effect only by describing the data from a single view, so the problem of multi-view data clustering (Zou et al. 2018;Du et al. 2022;Kakade and Foster 2009) has attracted tremendous research enthusiasm. In the past few decades, many advanced multi-view clustering algorithms have been proposed, among which multi-view clustering algorithms based on graph (Nie et al. 2018;Huang et al. 2022;Wang and Yang 2019;Nie et al. 2017;Xiao et al. 2022) and subspace learning (Xia et al. 2022;Wang et al. 2015;Zhang et al. 2018;Du et al. 2021;Tang et al. 2021) have gradually become the focus of attention due to their excellent performance. The purpose of graph-based methods is to learn a consistent similarity matrix between multi-view data, and then use the similarity matrix to obtain the final clustering result through a spectral clustering algorithm. For example, Lin et al. (2021) use graph low-pass filter to achieve smooth data representation, and then learn high-quality attribute graphs to mine graph attribute information hidden in multi-view data. To avoid graph clustering performance being affected by noise in the raw data, Chen et al. (2022) proposed to alleviate noise by learning low-level tensor graphs and applying Tucker decomposition and l 2,1 -norm to the model. Although the graph learning-based methods have made good progress, researchers found that projecting the original data to another space may obtain more obvious clustering features than in the original space, so many subspace learning-based methods have been proposed. Subspace learning-based algorithms target different views of multi-view data, and aim to learn a consensus representation from multiple subspaces or latent spaces. To this end, Zhang et al. (2017) proposes a latent multi-view subspace clustering which takes full advantage of the information complementarity between views. Recent research has focused on high-level information in the partition space. Lv et al. (2021) proposed a subspace clustering method (PFSC) to achieve clustering by fusing multi-view information in the partition space. Both classes of methods show that they are suitable for complete multi-view data. However, in practical applications, device damage or other hardware problems will cause some instance objects in the view of the dataset to be missing, which is called incomplete multi-view data. Due to the lack of partial information in incomplete data, the traditional multi-view methods mentioned above cannot make full use of the consistency and complementarity of information. Therefore, in order to solve the clustering problem of incomplete data, incomplete multi view clustering is proposed in recent years.
To the best of our knowledge, the research community has proposed a range of approaches to deal with incomplete multi-view data (Wen et al. 2018;Liu et al. 2020Liu et al. , 2018Shao et al. 2016;Li et al. 2021). For example, Li et al. (2014) developed partial multi-view clustering (PVC) via matrix factorization, which characterizes raw data by exploiting the alignment information of samples between views. In this way, Zhao et al. (2016) introduced a graph embedding method based on PVC and proposed IMG, which established the connection between two views by learning a similarity matrix. However, both PVC and IMG can only process data of two views. The above two methods cannot handle multiview data with more than two views. In order to solve this problem, Zhou et al. (2019) fills the missing samples with the mean features of the available samples and performs graph learning on the filled samples. Liu et al. (2021a) proposed the method (IMSR) that jointly performs data imputation and self-representation learning, and IMSR is effective for different missing rates. Nevertheless, these two methods will introduce noise information due to filling, resulting in the clustering effect not reaching expectations. Since filling can lead to poor performance, Wen et al. (2020a) consider generating a partition for each view by making better use of the available data, and finally fuse the generated partitions into a consensus matrix. Liu et al. (2021b) mined the cross-view relationship between data points by leveraging the consensus graph and avoid the phenomenon of noise introduced by the filling method.
Although there are many methods to deal with the incomplete clustering problem, there are still many problems to be solved. For example, these methods have the following limitations: (1) Existing algorithms focus on learning global consistent representations, but do not consider local data structures, or make efficient use of data structures but fail to learn global consistent representations. (2) The filling method will introduce noise. Specifically, filling the poor part will not only fail to improve the performance, but also affect the original complete data. (3) Many methods can only deal with two incomplete data, not incomplete multi-view data.
(4) Existing methods treat all views equally and fail to identify the importance of different views. To address the above limitations, we propose a novel algorithm to obtain graphs with the internal structure of the data by introducing a distance regularization term and integrating it into a low-rank representation to obtain both the local and global structure of the data. The framework of our method is briefly outlined in Fig. 1. Our algorithm can adaptively learn such graphs from the data, obtaining a globally optimal graph for clustering. Then we stretch all the graphs through the index matrix, generate the indicator matrix of each view under its guidance, and use the self-weighting mechanism to fuse all the indicator matrices to obtain the final index matrix.
The main contributions of this work are summarized as follows: 1. For the global and local structure of the data that can be captured simultaneously, a distance regularization term is introduced and combined with low-rank representation. In this way, incomplete data information can be better utilized, and the global and local structure of the data can be understood at the same time, so as to obtain better quality graphs and achieve better clustering effect. 2. In order to avoid bad views affecting the quality of the final fused consensus graph, we propose a novel weighted mechanism that adaptively learns appropriate weights for different views, which is more conducive to exploring compact representations of incomplete data, reduce the impact of bad views and further improve clustering performance. 3. Extensive experimental results on six different classes of incomplete multi-view datasets demonstrate that our proposed method significantly outperforms existing methods. The rest of this article is organized as follows. Section 2 describes some of the work related to the algorithm in this paper. The IMC-LRAGR model is proposed in Sect. 3, and introduced the process details of optimizing the IMC-LRAGR algorithm. In Sect. 4, the algorithm is analyzed with experiments and results. Section 5 provides the conclusions of the paper.

Related work
In this section, we briefly review two research topics (graphbased clustering methods and incomplete multi-view) related to this paper clustering methods.

Graph-based clustering methods
This paper briefly reviews two graph-based clustering methods relevant to this paper, SSC (Elhamifar and Vidal 2013) and LRR (Liu et al. 2010). SSC is a method of learning a graph with adaptive neighborhood for arbitrary data. It constructs a graph by adaptively and flexibly selecting a small number of samples through sparse constraints rather than Euclidean distance. LRR is by imposing low-rank constraints on the representation matrix, and joint learning can accurately reveal the representation graph of the internal subspace structure. SSC uses the basic idea of self-representation learning, data can be represented as a linear combination of other data. The spatial representation model is: where X = [x 1 , x 2 , . . . , x n ] is the original data matrix, Z is the coefficient matrix, this structure represents the subspace structure of the data. Using the l 1 norm to slack the above formula, and adding a regularization term at the same time, the following formula can be obtained: where Z has a block diagonal structure, · 1 represents the l 1 norm of the matrix, which can ensure the sparsity of the matrix. However, SSC only considers a single sparse representation of the data, and cannot acquire the global data structure. In order to obtain the global structure of the original data, researchers thought of using the important property of matrix, low rank. The low rank of the matrix indicates that the matrix can be represented by fewer vectors. Liu et al. (2010) proposed a clustering method based on the low rank representation (LRR) using the low rank of the matrix. The naive optimization function of the low-rank representation can be expressed as: The above problems cannot be solved easily. However, the nuclear norm can convexly approximate the rank of a matrix, and all existing methods generally use the nuclear norm for representation. There is always noise in real data, and a sparse matrix E is introduced to approximate the noise matrix, so the optimization function of low-rank representation can be transformed into: where · * is the nuclear norm, and E can be constrained by different norm. It can be seen that its form is similar to SSC, but there is no constraint of Z ii = 0. After obtaining the coefficient matrix Z using the LRR and SSC algorithms, first use Z to construct the similarity matrix W , W = |Z | + Z T /2, and finally use the spectral clustering method (Shi and Malik 2000) obtain the final clustering result.

Incomplete multi-view clustering
For the past few years, the clustering methods based on graph learning and subspace learning among the complete clustering algorithms have been widely studied. But in daily life due to accidents or failures that may occur at any time, the data often lacks some information, which leads to algorithms based on complete datasets do not work properly. To cope with the above situation, many clustering methods for dealing with incomplete data have been developed in recent years (Liu et al. 2021b;Wang et al. 2021a;Yang et al. 2022;Wang et al. 2021b). These methods are generally divided into two categories: incomplete multi-view clustering based on matrix decomposition and incomplete multi-view clustering based on graph.
The incomplete multi-view clustering algorithm based on matrix factorization aims to directly obtain the lowdimensional consistent representation of all views by using matrix factorization technique. PVC uses NMF (Lee and Seung 1999) to learn a latent subspace such that similar instances are divided into the correct clusters. But PVC can only handle the data of two views. Therefore, for the case of more than two views, the researchers propose a clustering method that fills in missing data and utilizes matrix factorization to directly obtain a consistent representation of incomplete samples. For example, an advantageous algorithm called UEAF (Wen et al. 2019) works by reconstructing the original data matrix using the error term, and then using the recovered data for consistent representation learning. Zong et al. (2021) used mapped instances and clusters to connect multiple views and guided to find the index vectors of all instances based on NMF, and achieved good performance. Although these did not need to include samples of all views, it will lead to the loss of some information, which will affect the clustering performance.
The graph-based incomplete multi-view clustering method can better describe the relationship between data points and can explore the original geometric structure of the data more effectively than the matrix factorization based method. Since it is a graph-based clustering algorithm, the construction of all graphs greatly affects the clustering performance of the entire algorithm. Due to incomplete multi-view clustering has the problem of missing original samples, we must first solve how to construct a complete graph that can contain all sample information. To address this issue, Zhao et al. (2016) proposes a clustering algorithm for adaptively learning robust consensus graphs from low-dimensional consistent representations, which does not require filling in missing instances to obtain a consistent graph. However, this method can only deal with the special incomplete situation that some samples exist in all views and other samples exist in only one view. Based on this, Niu et al. (2021) proposed a method to integrate the processing and clustering of incomplete multi-view data into the same objective function, which can effectively deal with more common incomplete situation.
Based on the problems of the above algorithms, first of all, the algorithm in this paper does not use the filling method to obtain complete data, but uses the available information in the original data for representation learning, thus reducing the impact of noise introduced by filling. Secondly, our method can handle data from two or more views, so the algorithm can be better applied to more general real-world situations. Finally, our method uses two regularization terms to obtain a globally consistent representation while taking into account local feature structures, and employs a self-weighting mechanism when fusing the indicator matrix of each view, where each matrix is assigned different weights to reduce uncertain data adverse effects on the algorithm. Compared with other advanced incomplete multi-view clustering methods, the experimental results demonstrate the effectiveness of our algorithm.

Proposed method
In this section, we first give some notational conventions, as shown in Table 1. Table 1 summarizes some notations used in this paper. Then we propose a IMC-LRAGR model. Finally we introduce an iterative alternating method for IMC-LRAGR optimization.

Low-rank representation learning with adaptive graph regularization for incomplete data
In summary, filling the missing part with the mean value of the corresponding sample may introduce noise, especially in the case of large-scale data missing. To address this issue, we propose a graph-based approach to directly leverage information from existing available samples for representation learning. However, most of the existing graph-based learning methods cannot obtain the inherent local structure of the data, and thus cannot fully exploit the relationships among raw data instances. Based on the above analysis, we propose to introduce a distance constraint and a nonnegativity constraint to guarantee both locality and sparsity.
To handle incomplete multi-view data, first remove incomplete instances from the dataset and define it as dataset n ] ∈ R d v ×n , d v and n are the number of the features and instances of the v-th view, respectively. In order to be able to utilize both the local and global information of the raw sample for graph construction, we introduce distance constraints in the LRR model and use non-negative constraints to avoid undesired solutions, described by the mathematical model as: where Z (v) ∈ R n×n is the representation graph that needs to be obtained, each element z i j represents the representation coefficient of the sample x j with respect to the sample x i in the joint representation. After obtaining the representation graph, we can obtain the affinity matrix for the raw sample. Thus, we can perform spectral clustering on affinity matrix to learn the low-dimensional representation for clustering: i,i is a diagonal matrix. Since the incomplete instances in the dataset are removed, the dimensions of the resulting representation Z will be inconsistent. To solve this problem, we construct a complete representation graph by using the index matrix G, Thus, we can easily convert Eq. (6) to: In real-world practical applications, the data may be corrupted by different degrees of noise. Therefore, we introduce the following reconstruction error term to simulate the noise: where E (v) is the reconstruction error, λ 1 and λ 2 are the penalty parameters. We specify Z (v) 1 = 1 for each view, the purpose is to avoid that any sample does not contribute in the joint representation, 1 is a column vector of all ones.

Multi-view consensus representation learning
The multi-view clustering algorithms based on spectral clustering all learn the clustering indicator matrix of each sample and fuse it into the optimal clustering index matrix through the method of the third item in equation (6). This term can be equivalent to 1 2 n j=1 n i=1 ( P i,: − P j,: , the regularization weight of the target clustering indicator matrix P is the sum of all similarity weights across multiple graphs. However, incomplete data will lead to missing similarity graph, so using this method directly on multi-view data in the incomplete case may lead to performance degradation. Therefore, we first calculate the clustering indicator matrix of each view, and then use the clustering indicator matrix of all views to learn the consistency representation: where λ 3 is the penalty parameters, P * is the consensus representation of the final requirement. ω v Ω P (v) , P * is a co-regularization term introduced to measure the consistency between P * and P (v) of each view. The definition is as follows: To keep the problem simple to solve, we use the linear kernel, Since most existing methods treat all views equally, the importance of different views is ignored. To address this issue, our method uses an adaptively weighted mechanism. The weight ω v characterizes the importance of view v. We can simply adopt the inverse distance weighted scheme: Therefore, (11) can be finally rewritten as: Since the constant term can be omitted, our final objective function is as follows: Our proposed method does not need to do data filling and directly uses the available information of the raw data to learn the low-rank matrix Z . Then, based on the diagonal blockstructured affinity graph matrix, a weighted mechanism is adopted to learn the shared spectral clustering index matrix P * for all views.

Optimization algorithm
It would be difficult to directly compute problem (14), we adopt the alternating direction multiplier method (ADMM) to compute the local optimal solution of the objective function. The Lagrangian function of the objective function is as follows, introducing several variables that approximate Z (v) to make problem (15) separable: Then solving the derivative with respect to Z (v) and making it 0, the optimal solution for the variable Z (v) can be obtained as where O μ . J (v) -subproblem: Fixing the other variables, the subproblem to calculate variable J (v) can be simplified as: We update J (v) by using the singular value threshold (SVT) shrinking operator: where Θ represents SVT shrinkage operator. V (v) and ϕ (v) -subproblem: Fixing other variables and considering the following question: Use auxiliary variables can be transformed into the following optimization problem: j,: represent the i-th and j-th row vectors of matrix V (v) . We rewrite (21) as an equivalent minimization problem: where H . By taking the derivative of the above formula with respect to V (v) , the optimal solution of V (v) can be obtained as: Then we increase V (v) = max V (v) , 0 to ensure that the elements of matrix V (v) are all greater than 0. The update optimization of ϕ (v) i is as follows: E (v) -subproblem: Fixing the other variables, the subproblem to calculate variable E (v) can be simplified as: For the sparse constrained optimization problem, its closed-form solution can be easily obtained: where ϑ represents the shrinkage operator. P (v) -subproblem: Fixing the other variables, the subproblem to calculate variable P (v) can be simplified as: We solve problem (27) by eigenvalue decomposition, where P (v) consists of the first k eigenvectors corresponding to the first k smallest eigenvalues of matrix Fixing the other variables, the subproblem to calculate variable P * can be simplified as: We can also solve problem (28) by eigenvalue decomposition, where P * consists of the first k eigenvectors corresponding to the first k smallest eigenvalues of matrix 3 , and μ-subproblem: The four variables are updated as follows: We show the overall optimization process of Eq. (14) in Algorithm 1.

Convergence analysis and complexity analysis
The objective function (14) of the proposed model is a convex problem involving four variables Z (v) , E (v) , P (v) , and P * . Therefore, the iterative update of these four variables satisfies the convergence condition of ADMM (Lin et al. 2011), which can theoretically ensure that Algorithm 1 converges to a local minimum. The experimental results in the Sect. 4 also verify the fast and stable convergence performance of our algorithm. The main consideration is that the computational cost of the algorithm comes from eigenvalue decomposition, matrix inversion and singular value decomposition operations, so a large part of the time cost is caused by steps 6, 9 and 10 in Algorithm 1. In each iteration, the complexity of the above three steps is: O n 3 v , O n 3 , and O n 3 . The general computational complexity of the algorithm is about O u cn 3 + n 3 + n 3 v under u iterations, where n v is the number of non-missing instances in the v-th view and c is the number of views.

Experiments and analysis of results
In this section, we conducted a large number of experiments on 6 benchmark datasets to evaluate the performance of our proposed method. All methods are implemented on the same software and hardware: Win 10 system, MATLAB 2021a, Intel(R) Core(TM) i7-10875 H CPU. The specific parameters of the method in this paper are shown in Table 3 and Table 4.

Comparison algorithms
We validate the performance of the proposed method by comparing IMC-LRAGR with seven state-of-the-art multi-view clustering methods. The specific information of these seven algorithms is as follows: 1. BSV (Zhao et al. 2016) fills the missing data with the average value of the corresponding view eigenvalues, then performs the K-means algorithm for each view individually and reports the best clustering performance. 2. MIC (Shao et al. 2015) utilizes weighted nonnegative matrix factorization and semi-nonnegative matrix factorization for incomplete multi-view clustering. 3. DAIMC (Hu and Chen 2019) considers instance alignment information and uses L 2,1 regularization to force different basis matrices to be aligned simultaneously. 4. UEAF (Wen et al. 2019) uses graph learning to mine the local structure of the data, and simultaneously considers the reconstruction of the hidden information of missing views and the adaptive importance assessment of different views in one framework. 5. IMSCAGL (Wen et al. 2020a) is combining graph learning and spectral clustering for multi-view spectral clustering with LRR-based methods. 6. GIMC-FLSD (Wen et al. 2020b) integrates individual representation learning and local structure preservation into one framework, avoiding the introduction of any additional penalty parameters. 7. IMSR (Liu et al. 2021a) jointly performs data imputation and self-representation learning, and uses selfrepresentation subspace learning to solve the incomplete multi-view problem.

Evaluation metrics
For all algorithms, we measure clustering performance by the following three widely used evaluation metrics: Accuracy (ACC), Normalized Mutual Information (NMI), and Purity. The larger the value of all indicators, the better the clustering performance. The specific information of these three evaluation indicators is as follows: 1. Accuracy (ACC). ACC is used to compare the real label provided by labels and data. ACC is defined as follows: v i indicates the category label of the i-th sample obtained by clustering, u i represents the real category label of the data, map(v i ) represents the reproduction allocation of the best clustering index to ensure the correctness of statistics, δ(c, d) is the indication function, which is defined as: 2. Normalized Mutual Information (NMI). NMI is defined as follows: In the above formula, the numerator part can represent mutual information, and the denominator part is information entropy, p(o a ), p(c b ) and p(o a ∩c b ) can be regarded as the probability that the sample belongs to predicted class o a , true class c b and both. h = (− c b ) log c b )). 3. Purity. Purity is defined as follows: u is the number of samples involved in the entire clustering division. u d c is the number of data instances of the i-th input class that was assigned to the cluster R c .

Datasets setting
In our experiments, we separately adopt two types of incomplete multi-view datasets to validate our algorithm: (1) The first incomplete case. We set up paired samples in the dataset. For all data except paired samples, we removed one of the views to form incomplete multi-view data. For the three datasets of 100Leaves, Mfeat, and ORL, we randomly selected 30%, 50%, 70%, 90% views from the dataset of the samples are used as paired samples, and the remaining samples randomly delete one of their views. (2) In the second case of incompleteness, we do not set the paired samples of the dataset, and perform random deletion on the entire dataset, so that all samples will have missing views. On the BBCsport, 3sources and Webkb datasets, incomplete multi-view datasets are constructed with missing rates of 10%, 30%, and 50%.

Performance evaluation
For all compared algorithms, we exploit default parameters or iteratively search in a given parameter set to find the best parameters for these methods and report their best clustering results. For each dataset, all methods are performed 5 times on randomly formed incomplete cases, and their average results are reported for fair comparison. The experimental Fig. 2 Objective function value versus iteration on Mfeat, BBCsport, and ORL datasets at 30% missing rate Fig. 3 ACC versus a λ 1 and λ 2 by fixing λ 3 , b λ 3 by fixing λ 1 and λ 2 , on the 3source dataset with 30% missing rate Fig. 4 ACC versus a λ 1 and λ 2 by fixing λ 3 , b λ 3 by fixing λ 1 and λ 2 , on the BBCsport dataset with 30% missing rate results are shown in Tables 5, 6, 7 and 8. Tables 5, 6, 7 and 8 list the experimental results of different methods on two categories of incomplete multi-view datasets, respectively. From these experimental data, we can draw the following conclusions: (1) In most cases, our method outperforms IMSCAGL and IMSR on different evaluation metrics. Since these two methods are most related to the method in this paper, where IMSCAGL ignores the locality of data and IMSR does not consider the importance of different views. Our method uses a distance regularization term to explore the local structure of the data and differentiate the importance of views by assigning different weights to each view. Therefore, the problems existing in the above methods are effectively avoided and the performance of incomplete multi-view clustering is effectively improved. (2) From the Tables 5, 6, 7 and 8, it can be observed that our method achieves the best results on all six benchmark datasets compared to other algorithms. For the 3sources dataset, compared to the second-ranked GIMC-FLSD method, our method achieves significant improvements in ACC of about 8.54%, 8.27%, and 11.43% at three different missing rates, respectively, and the NMI value of our method is 4.94%, 9.29%, and 6.32% higher than the second-ranked method. This good experimental result demonstrates that by introducing distance constraints and non-negativity constraints, it is possible for our algorithm to learn graphs with clear cluster structures. Since our method can effectively utilize the global and local information of the data, it can better capture the relationship between the data. (3) The methods used in this paper can be roughly divided into two categories: methods based on graph learning (IMSCAGL, IMSR, OURS) and methods based on matrix factorization (MIC, UEAF, GIMC-FLSD).
According to the experimental results in the Tables 5, 6, 7 and 8, it can be observed that the graph-based method is much better than the matrix factorization-based method in most cases on different datasets, especially on the ORL dataset. The main reason is that most of the graph-based methods do not use the traditional mean filling method, but use the available instances to learn the data structure of each view and combine them to guide the common representation learning, which can make better use of the original data information for better clustering results. (4) From Tables 5, 6, 7 and 8, by observing various performance metrics, in most cases, the performance of BSV is the worst among all algorithms. This is because BSV does not use the complementary information between views and uses the mean of the samples to fill in the missing samples in the view, which will cause all missing samples to be classified into one class, resulting in poor performance. However, the algorithm in this paper makes full use of the complementarity between views and does not use the filling method, which greatly reduces the influence of noise introduced during filling. Therefore, the experimental results show that filling missing instances with corresponding average instances is not a good choice for solving incomplete multi-view clustering problems, and better utilization of the available information of the original data will achieve better results. (5) It can be seen from Tables 5, 6, 7 and 8 that the clustering performance (ACC, NMI, and PUR) of all methods gradually decreases as the missing rate increases. These illustrate that the gradual loss of complementary information between views will greatly affect the performance of the algorithm. From the standard deviation values in the table, it can be observed that compared with other methods, the standard deviation values of all evaluation indicators obtained by our method on the 6 datasets are relatively low. These show that our method is robust to datasets with different missing rates.

Convergence analysis based on experiments
As shown in Fig. 2, we analyze the convergence of our proposed algorithm on the Mfeat, BBCsport and ORL datasets. For each subplot, the x-axis represents the number of iterations, and the y-axis represents the objective function value. By observing the convergence curve in the figure, it can be seen that the value of the objective function of the proposed method drops rapidly and falls into a local optimum in the first few iterations. The Fig. 2 shows that the method in this paper has good convergence, and the experimental results here are consistent with the theoretical analysis in Sect. 3.4.
It is found through experiments that λ 1 and λ 2 have a great influence on the performance of the algorithm, so we first determine λ 3 = 0.1, and then find the most suitable λ 1 and λ 2 . It can be seen from the Figs. 3b and 4b that a stable and good ACC can be obtained when the parameters are located in suitable regions, indicating that our method can be relatively insensitive to the choice of parameters within a small parameter range.

Visualization
We present visualizations of the best clustering results of all algorithms on the Mfeat dataset. From the Fig. 5, we can observe that all incomplete multi-view clustering algorithms used can divide the Mfeat dataset into different clusters.
However, compared to the other two algorithms, the similar samples obtained by our method are closer.

Conclusion
In this paper, we propose a novel incomplete multi-view clustering method capable of capturing the global and local structure of the data. Compared with other existing methods, our algorithm can obtain compact and discriminative representations from incomplete data by introducing a distance regularization term in the model and utilizing a weighted fusion mechanism, which enables the learned graph with better quality. Experimental results on two different classes of incomplete multi-view datasets demonstrate that our method outperforms existing methods. Consistent with other traditional representation learning methods, for large-scale incomplete multi-view datasets, the proposed method is not only computationally expensive, but also has limited clustering performance improvement. In the future, we hope to extend the proposed method to deep learning to solve the clustering problem of large-scale data containing a large number of missing samples.