Label Propagation Based on Bipartite Graph

Label propagation (LP) is a popular graph-based semi-supervised learning framework. Its effectiveness is limited by the distribution of prior labels. If there are no objects with prior labels in parts of classes, label propagation has very poor performance. To address this issue, we propose a label propagation based on bipartite graph (LPBBG) algorithm. This approach try to learn a bipartite graph as exemplar constraints that reflect the relations between objects and exemplars to guide the learning process instead of label constraints in the traditional label propagation. In this paper, we provide a method for producing high-quality exemplars from two channels to represent the known classes (where some objects have prior labels) and the missing classes (where all the objects have no prior labels). Given generated exemplars, exemplar constraints can be learned using relationships in the known classes to evaluate that in the missing classes. Our experimental results show that the LPBBG algorithm outperforms existing LP methods in overcoming the label missing problem in some classes.


Introduction
In real-life scenarios, there is a large amount of unlabeled data that requires manual labeling, which is time-consuming and labor-intensive [1].To address this issue, researchers have turned their attention to semi-supervised learning Semi-Supervised Learning (SSL), which utilizes both labeled and unlabeled data to improve the learning performance with just a few labeled objects [2].Semi-supervised learning encompasses a variety of methods, including Generative Models [3], Self-Training [4,5], Co-Training [6,7], Transductive Support Vector Machines (TSVM) [8], Pseudo-Label [9], and Graph-Based Methods [10][11][12], among others.
However, existing work does not address the issue of missing labels in some classes of labeled data, leading to limited labeling of the data based solely on the initial labels provided.Such as the digital recognition problem, labels in labeled data are sparse and some classes do not have the labeled data.We think the label distribution is defective, causing the LP algorithm not to consider classifying objects into the missing classes.Indeed, we clearly know that a large amount of data are misclassified.
To address the challenge of missing labels in some classes, we propose a label propagation based on bipartite graph algorithm.Unlike traditional label constraints that rely on a single object to represent a class, our algorithm uses exemplar constraints that reflect the relationships between objects and multiple representatives of each class [32].We create an exemplar set by selecting objects that represent all classes, with each class being represented by a portion of the exemplar set.
Next, we explain the motivation for our algorithm.In the incomplete supervision setting, our intuition is to add label constraints for the missing classes to overcome the problem.Unfortunately, it is hard to directly obtain certain label constraints for each missing class.As a result, we aim to construct a bipartite graph using exemplar constraints that allow each class to be represented by multiple examples.This approach reduces the risk of generating unreliable information for the missing classes.To achieve this, we develop an exemplar generation strategy to generate high-quality exemplars from two channels to represent known and missing classes.Additionally, we devise a supervisory update strategy that leverages relationships in the known classes to evaluate that in the missing classes.
This paper is structured as follows: In Sect.2, we review related work in the field.Section 3 provides a background on the basic concepts and notation of label propagation.The proposed Label Propagation Based on Bipartite Graph (LPBBG) algorithm is introduced in Sect. 4. The experimental results are discussed in Sect. 5. Finally, in Sect.6, we conclude the paper with some final thoughts, and Acknowledgement section acknowledges the contributions of others.

Related Work
In this section we will introduce the current label propagation works in two aspects, i.e., the construction of the similarity matrix and the combination of LP with the other methods.
The construction of the similarity matrix is an essential issue of label propagation which directly affects the performance of the algorithm.Many scholars propose different similarity matrix construction methods to improve the LP method.For example, Gaussian Field and Harmonic Function (GFHF) [11] and Learning with Local and Global Consistency (LLGC) [12] create a fully connected graph and use the Gaussian kernel function to perform a similarity measure on the graph.Linear Neighbourhood Propagation (LNP) [15] suggests the linear neighborhood assumption, which is that each object can be represented by a linear combination of its neighbors.Sparsity Induced Similarity (SIS) [16] construct a dictionary and use the L1 norm to realize the adaptive neighborhood in linear programming.Global Linear Neighborhood Propagation (GLNP) [17] directly initializes a non-negative graph and learns a low-rank representation of it to capture both global and local structures.Dynamic Label Propagation (DLP) [18] exploits both the labeled and local information in order to iteratively enhance the graph.Two Phases Weighted Regularized Least Square minimization (TPWRLS) [19] uses a new two-phase scheme for the construction of graphs based on the self-representation of the data.Learning-to-learn and learning-to-teach (TLLT) [13] propagates these unlabeled nodes unevenly based on how difficult it is to evaluate their reliability and discriminability.In [20], they proposed a hierarchical sparse representation method, which modified the traditional planar dictionary into a tree-like structure in order to improve the dictionary-based sparse representation.Distinct from the Homophily label propagation from the previous study, for the first time, the label propagation in heterogeneous graphs under heterophily assumption was proposed in [33].
In some works, label propagation is combined with other methods to apply the LP to a wider range of problems.In [34], Special Label Propagation (SLP) was used to produce soft-label information to enrich the Pairwise Constraints (PC) ensembles to gain more supervised information to improve performance.The LP algorithm combined with multi-instance learning [24] can solve the unjustified similarity problem at the example level, which has a similarity of global appearance caused by background similarity.The combination of the LP algorithm with a graph reduction tool [26] constructs a temporal neighborhood graph using a portion of the stream data enabling the LP algorithm to process stream data.The authors [27] integrates label propagation and graph convolutional networks into a unified framework to improve performance and reduce complexity.In [28], the transductive label propagation and the deep neural network were integrated into a unified framework to generate pseudo labels and train the deep neural network.The learning process iterates between these two steps.The authors [29][30][31] combine label propagation with multi-view learning to learn a consistent graph on different views.
However, the methods mentioned above, which do not take into account the label distribution, may not effectively solve the label missing problem of some classes in labeled data.

Preliminary Notions
First, in this section, we define some notations on matrices.For any matrix M, we use [M] i j to represent the i-th row, j-th column element.The i-th row of the matrix M is denoted by [M] i. , and the j-th column is denoted by [M] .j .Let M(t) be the t-th iteration of M, and M T be the transpose of M. The remaining symbol definitions are shown in Table 1.We then briefly review the LP algorithm.
Given a data matrix X , the weight matrix W reflecting the similarity between the data objects is defined as follows, where σ is a parameter of the Gaussian kernel function.
Let Y be the initial label matrix.In matrix Y , [Y ] i j = 1 when object [X ] i is labeled by class ξ j and [Y ] i j = 0 otherwise.Let F be the membership matrix storing the classification results.The goal of label propagation is to minimize the following to get the optimal F, where μ > 0 is a trade-off parameter and L = I − S is the normalized Laplacian matrix.
is the normalized matrix of W .To avoid the high computational expense of inverting the matrix in the closed-form solution, people often use an iterative method to approximate it.The iterative formula of membership matrix F(t + 1) is formalized as, where F(1) = Y is the initial state of F and α is a constant parameter.

Label Propagation Based on Bipartite Graph
First, in this section, we illustrate the effect of missing labels in some classes and present our algorithmic framework.Next, we propose the new LP algorithm based on a bipartite graph to guide the classification process.

Effect of Label Missing
Label missing refers to the defectiveness of the label distribution in labeled data, where some classes do not have labeled data in the first place.A faulty label distribution will cause the LP algorithm to incorrectly classify data that does not belong to the initial labels.We use Fig. 1 , Tables 2 and 3 as examples to illustrate the effect of the label missing distribution.
In this example, there are four class labels, i.e., class 1, class 2, class 3 and class 4 [see Fig. 1].However, only classes 1 and 2 have labeled data, and the remaining classes do not.The ground-truth labels of objects is given in Table 2, but the LP algorithm that does not take into account the missing labels incorrectly classifies the data belonging to class 3 and class 4 as belonging to class 1 and class 2 (see Table 3).This shortcoming motivated us to propose a new LP algorithm.

Algorithmic Framework
To get rid of the defectiveness of the label distribution in the labeled data, we propose a new label propagation based on bipartite graph algorithm.In this algorithm, we define a bipartite graph matrix P ∈ R n×r to save the exemplar constraints, which are introduced in Table 3 The classification results of the LP methods Objects Classes Class 1 Class 2 Sect. 4.3.In this paper, we would like a class to be represented by more than one example.By generating enough exemplars to represent all known and missing classes, the constraints implicitly contain sufficient relationships between objects and classes.Even if there are no label constraints for missing classes, if we obtain exemplars to represent them, the relationships between objects and these exemplars can be used to reflect the relations between objects and the missing classes.Instead of using prior labels, the proposed algorithm propagates relationships between objects and exemplars on the similarity matrix to solve the label missing problem and obtain better label features.The proposed algorithm consists of two steps to address two key issues, which are outlined below.
• How do we generate the key representing objects as exemplars to cover all classes?
• How do we learn the exemplar constraints reasonably?
Figure 2shows the framework of our algorithm, where H is a n ×r label feature matrix that stores the results of propagating exemplar constraints.Our algorithm consists of the following steps.First, we generate high-quality exemplars from two channels, one for known classes and another for missing classes.Second, we iteratively update the bipartite matrix P by using relationships in the known classes to evaluate that in the missing classes.Finally, we obtain the n × c soft label matrix F by running the k-means algorithm on the optimal label feature matrix H .

Exemplar Constraints
In our paper [32], we introduce exemplar constraints that use multiple objects as exemplars to represent a class for the first time.To overcome the label missing problem, we extend the exemplar constraints to represent not only the known classes, but also the missing classes with selected exemplars.This approach has a lower risk of producing incorrect supervision information in incomplete supervision scenarios compared to using label constraints.Figure 3 illustrates the difference between label constraints (left) and exemplar constraints (right).
We select r objects to construct the exemplar set R reflecting all classes and form a bipartite graph with the sets of vertices being X and R. The relationship between X and R reflects the similarity between all objects and the selected exemplars, and we define a n × r matrix P to save these relationships.In P, the entries [P] i j represent the similarity between [X ] i and R j , with higher values indicating greater similarity.

Production of Exemplars
We present a production strategy for generating exemplars in the incomplete supervision scenario.Our strategy considers two channels for producing exemplars: the known classes For the known classes, since there are objects with labels, we only need to select the labeled data as exemplars for each class.However, in real applications, we found that the imbalanced label distribution can also seriously affect the performance of label propagation.Thus, we randomly choose the same number of exemplars for each class to avoid this negative effect.And we set the same number q to be the smallest number of labeled objects in any of the known classes to avoid some classes that do not have enough labels.In our experimental analysis, we also show that the performance of the algorithm is not reduced, while q is set to a small value.In this paper, we use R v to represent the exemplar set of the known classes.The exemplar set R v is defined as follows, where For the missing classes, we wish to select some exemplars representing the missing classes.If we sample these exemplars evenly enough from the dataset, we can obtain an exemplar set including different classes.The number of exemplars in different classes may not be equal, and some may belong to the known classes, but each missing class can still be represented by some selected exemplars.We do not need to determine which exemplars represent the same class.Therefore, exemplar constraints P also can be seen as a representation of higher-dimensional label features.
While it is difficult to accurately separate the data representing the missing class from the entire dataset, we can narrow the range of selection by judging the propagation score to increase the likelihood of selecting data representing the missing class.The propagation score vector indicates that an object has a higher score on its true class and a lower score on the other classes.As the example in Fig. 1, the membership vectors of object [X ] corresponds to class 3. Following the analysis above, we weakly assume that an object may belong to the missing class if its propagation score over its predicted class is less than the average propagation score of all the data.Since we can not guarantee that objects with low propagation scores must belong to the missing classes.We use X m to represent the candidate dataset of the missing classes, and the set is defined as follows, where F(t) ∈ R n×v is the first propagation result which is calculated by Eq. ( 3) and is the threshold of propagation score to judge data.Although we narrow the selection range in the above step, we still need to generate key exemplars to ensure that the exemplar set covers all the missing classes.Thus, we apply k-means clustering on X m to obtain r − qv cluster centers, and we select the nodes closest to these centers as exemplars.This process helps us to select exemplars that represent different missing classes.We use R m to represent the exemplar set for the missing classes, and it is defined as follows.
R m = kmeans(X m , r − qv) ( Then the set of total exemplars R can be obtained by combing two sets of exemplars generated from two channels and defined as follows. Furthermore, exemplar constraints can be easily initialized without error, and we make no use of any supervised information.In the matrix P(1), which represents the initial of P, it is only known that the object [X ] i and itself are members of the same class.Thus we can use the self-relations of objects to initialize P. The P( 1) is defined as, where R j is the j-th object in exemplar set R. Since some exemplars in the exemplar set R have labels, we can assume that exemplars with the same label belong to the same class, so their corresponding positions in the matrix P should be 1.We can leverage this potential information to update the matrix P.

Learning Exemplar Constraints
After obtaining the initial matrix P described in the previous section, we propagate the exemplar constraints to obtain the label feature matrix.However, relying solely on selfsupervision information is insufficient, and all label constraints are discarded, making it difficult to determine the reliability of the relationships in the label feature matrix.Fortunately, there are a few potential constraints in the exemplar constraints P that correspond to the known classes.We suggest using the average propagation score of the entire known class as a threshold to evaluate constraints in the missing classes since different classes may have varying propagation performances.We then construct a framework to iteratively learn the exemplar constraints P by using relationships in the known classes to evaluate that in the missing classes.
We illustrate the process of selecting reliable constraints for the missing classes using Fig. 4 .The first r objects above the P matrix represent the exemplar data.Green squares represent the initial self-supervision information.The square matrix I ξ i represents the membership among the exemplars of the class ξ i , and all values (green and blue squares) of this matrix should be 1 since these exemplars belong to the same class.We take the mean score of these Similar to the label propagation algorithm [12], the updating formula of label feature matrix H is defined as follows.
where H (1) = P is the initial state of H .
After we obtain the optimal label feature H , the threshold ε can be calculated by the following equation, (11) which I ξ l ∈ R q×q is a square matrix of the exemplars labeled by ξ l (see Fig. 4a).Depending on the threshold ε, we select the reliable relationships to learn the optimal bipartite matrix P. The matrix P can be obtained in the following way, We iteratively update H and P according to the above updating formulas, then, the optimal solution is obtained.In the final step, we obtain the n × c membership matrix F by running the k-means algorithm on H .

Description of Algorithm
Algorithm 1 shows the overall process of our method.Matrix multiplication is the major time cost, and the number of label propagation and overall iterations are t and õ, respectively.As a result, the total complexity is O(n 2 t õ), and t and õ are set to very small values in general.

Experimental Setup
Dataset.The experiments are performed on six benchmark datasets, which are introduced in Table 4. PenDigits [35], Digits [36], USPS [37] and MNIST10k [38] are handwritten digit datasets.For each class, the USPS dataset contains 1100 samples from 0 to 9. All images have been normalized to 16 × 16 grey-scale images.Each image in the MNIST10k dataset consists of 28 × 28 pixels, with 1100 images from each class and 11000 images in total.In the COIL-20 [39] dataset, the size of each image is uniformly processed to 128 × 128, 72 images in each class, for a total of 1440.Digits dataset contains 5620 samples and 10 classes, the size of each image is uniformly processed to 8 × 8. Statlog (Landsat Satellite) dataset [40] was generated from data purchased from NASA by the Australian Centre for Remote Sensing, and it consists of 6 classes and 5620 images.
Baselines.We compare the Label Propagation Based on Bipartite Graph (LPBBG) algorithm with five classical LP algorithms: the Learning with Local and Global Consistency (LLGC) method [12], the Linear Neighbourhood Propagation (LNP) method [15], the Sparsity Induced Similarity (SIS) method [16], the Gaussian Field and Harmonic Function (GFHF) method [14], the Dynamic Label Propagation (DLP) method [18].And then, we introduce the setting of each algorithm.The iteration of propagation t is set to no more than 20 for all of the algorithms considered.For LPBBG, the parameter σ is set to 0.1, and α is set to 0.99, respectively.The iteration of the alternating update õ takes the value 5.The neighbors k is set to 20, and the number q of exemplars of each class is set to 10.For LLGC and GFHF, after normalizing the feature matrix, the parameter σ is set to 0.1, and α is set to 0.99.For LNP, the number of k-nearest neighbors is set to 10 and the parameter α is set to 0.99 for all datasets.For the DLP method, the neighbors k is set to 10, the parameter λ is set to 0.1, and α is set to 0.05.For the SIS method, α is set to 0.99.
Evaluation procedure.All experiments were performed on the complete dataset.At the start of each experiment round, a set of the known classes were randomly selected, and labeled data samples were randomly chosen from these classes to build the tested dataset.All algorithms were run on the same tested dataset in each round.Each algorithm was performed 20 rounds on each tested dataset, and the average and standard deviation of three indices were used to measure the performance.

Evaluating Indices
To measure the effectiveness of the algorithm, we use three indices: the Accuracy measure (ACC) [41], the Adjusted Rand Index (ARI) [42] and the Normalized Mutual Information (NMI) [43].ACC is defined as the proportion of correctly classified positive and negative examples and all examples participating in the classification.Rand Index (RI) assesses clustering results by computing the similarity between two clusters.ARI is a refinement of the Rand Index based on the probability regularization.The similarity of the two clustering results can be measured using NMI.We use different indices to evaluate the performance of the proposed algorithm comprehensively.
Given a dataset with N objects, there are two partitions of these objects, i.e., = {θ 1 , θ 2 , ..., θ c } (the classification results) and = {φ 1 , φ 2 , ..., φ c } (the ground-truth labels).Let n i j = |θ i φ j | be the number of common nodes of groups θ i and φ j , b i = N j=1 n i j and d j = N i=1 n i j .The accuracy measure is defined as, where δ(θ i , map(φ i )) is a corresponding relationship between two partitions.δ = 1 when θ i = φ i and δ = 0 otherwise.
The adjusted rand index is defined as The normalized mutual information is defined as,

Experimental Results
We evaluated our approach against five algorithms on six datasets, with a fixed label rate of 5% across all datasets.We compared the performance of these methods under varying numbers of known classes.Figure 5shows the changes in the mean classification results.The mean clustering results of these methods are given in the tables 5and 6, respectively.The line charts in Fig. 5 display the change in classification accuracy across six datasets.The horizontal axis represents the number of known classes, which is fixed at different sizes based on the total number of classes in the dataset.Our algorithm achieves excellent classification accuracy on different datasets and varying numbers of known classes when compared to the other LP algorithms.Even in the extreme scenario where there is only one known class in the initial labeled data, our algorithm can produce satisfactory results.However, since it has only one kind of label, it makes no sense for the previous LP algorithms to assign all the data to the same class.In the Statlog dataset, the class with fewer data can not have a decisive influence due to the significant imbalance in the number of data in each class.If the number of known classes exceeds four, our algorithm's classification accuracy is lower than that of the previous algorithms.
Tables 5 and 6 present the clustering performance of the algorithms on tested datasets with different missing label distributions, and the best experimental results are annotated in bold.Previous LP algorithms perform unsatisfactorily on all datasets when the number of known classes is inadequate to reveal the data distribution.Our algorithm achieves satisfactory results by effectively using the known classes to evaluate the missing classes.The results demonstrate that our proposed algorithm outperforms LLGC, GFHF, LNP, DLP, and SIS on the tested datasets.
Furthermore, from the classification and clustering results, we can see that the performance of our method is stable for varying numbers of known classes.This is because we utilize the average propagation performance of the known classes as the evaluation criteria in our method.As a result, our algorithm provides greater advantages over previous LP algorithms when the number of known classes in the label matrix is smaller.
In our approach, a special parameter q controls the number of exemplars per known class, and we conducted experiments to investigate its impact on the algorithm.In Fig. 6, we present the mean results obtained by running our method 20 times on different datasets, with a label rate of 5%.The figure shows that the performance of our approach remains stable as q increases from 5 to 17, indicating its insensitivity to this parameter.However, if the value of q is set to be too small or too large, our method may be worse.Therefore, we need to set an appropriate value for q based on the number of total classes in the dataset.

Conclusions
In this paper, we propose a Label Propagation Based on Bipartite Graph (LPBBG) approach to address the issue of datasets that include classes without prior labels.Our algorithm constructs a bipartite graph based on exemplar constraints and designs a two-channel exemplar production strategy to ensure that all classes are covered by the exemplars.We also develop Fig. 6 Sensitivity to parameter q a supervisory update strategy to learn exemplar constraints using known classes to supervise missing classes.Our experiments evaluate the algorithm's performance from various perspectives, and the results demonstrate that the proposed approach can effectively overcome the label missing problem.
This study primarily focused on label propagation with low-quality label distribution.In the future, we aim to investigate the impact of noisy labels on label propagation.Additionally, we plan to develop a new label propagation algorithm that is highly robust and uses ensemble learning techniques to mitigate the effects of noisy labels.

Fig. 1
Fig. 1 Class 1 and Class 2 have the labeled objects

Fig. 4 2 F
Fig. 4 Illustration of the supervision process

1 : 2 : 2 W D − 1 2 3 :
Calculate the similarity matrix W by Eq. (1) Construct normalized Laplace matrix S by S = D − 1 Calculate the first propagation result F(t) and construct the candidate set of the missing classes X m by Eq. (5) 4: Construct the set of the exemplar set R by Eq. (

Fig. 5
Fig. 5 The changes of classifying accuracy with different number of known labels

Table 1
Definition of main symbols

Table 2
The ground-truth labels

Table 5
Performance comparison with different number of visible classes (NMI)

Table 6
Performance comparison with different number of visible classes (ARI)