Unsupervised multi-source domain adaptation with graph convolution network and multi-alignment in mixed latent space

This paper proposes an unsupervised multi-source domain adaptation algorithm with graph convolution network and multi-alignment in mixed latent space, which leverages domain labels, data structure, and category labels in a unified network but improves domain-invariant semantic representation by several innovations. Specifically, a novel data structure alignment is proposed to exploit the inherent properties of different domains while using current domain alignment and classification result alignment. Through this design, category consistency can be considered in both latent space, and domain and structure discrepancy between different source domains and the target domain can be eliminated. Moreover, we also use category alignment based on both CNN and GCN features to optimize category decision boundary. Experiment results show that the proposed method brings sufficient improvement especially for adaptation tasks with large shift in data distribution.


Introduction
Deep learning algorithms assume that the features of training set and test set obey the same distribution, which means models trained on large-scale annotated training data cannot be reused on other domains. Domain adaptation mainly solves this problem by establishing knowledge transfer from labeled source domain to unlabeled target domain [1]. Single-source Domain Adaptation (SDA) has been frequently researched but in most practical scenarios, available annotation data may come from multiple domains with distributions different from the target and even differ from each other. In this case, the SDA method can be simply applied by combining multiple source domains into single-source domain, which is call source-combined domain adaptation. Source-combined domain adaptation has made a progress as training data have been enriched. However, data bias still exists between each source domain and target domain as well as between different source domains. Forced transfer from domains with low correlation will suppress the performance of the target model, resulting in the problem of "negative transfer". In order to avoid these problems, Multi-source Domain Adaptation (MDA) is researched. Recent mainstream MDA methods either use different feature transformation methods to achieve feature alignment in latent spaces [2,3], or build intermediate domains through generator or encoder [4].
Graph Convolutional Adversarial Network (GCAN) for Unsupervised Domain Adaptation (UDA) [5] verifies that data structure, domain labels, and category labels play an important role in connecting labeled source domain and unlabeled target domain. Although this is discovered on SDA tasks, these three types of information may also be contributing to analysis on different source domains. Another inspiring phenomenon might be that researches processing non-Euclidean space data (e.g. for relational reasoning, knowledge graph) all exploit structure information to certain extent, which indicates its potential importance. Data structure generally reflects the inherent attributes of dataset, such as data distribution, geometric data structure, etc.
However, current MDA studies have not utilized structure information from the perspective of latent space transformation. Recently, some methods adopt domain alignment [3,6] where domain label represents global features of different domains. This strategy would help the algorithm discriminate global distribution of source domain and target domain. Category labels, especially pseudo-labels of target data, are also used to strengthen semantic alignment and optimize decision boundary of classification. In conclusion, adopting these three types of information has been verified effective in reducing domain shift from many aspects. In this paper, we propose an end-to-end unsupervised Multi-source Domain Adaptation algorithm with Graph convolution network and Multi-alignment in mixed latent space (MDA-GM). This algorithm continues to leverage domain labels, category labels, data structure information and seek to minimize distribution discrepancy. Except for the classification result alignment conducted during classification stage, the other three alignments are implemented in the stage of feature extraction. We map each pair of source domain and target domain data into a domain-specific latent feature space. Two individual feature extraction modules are designed to construct Convolutional Neural Network (CNN)-based and GCN-based latent feature spaces. Domain alignment and structure alignment are realized by aligning mapped features in hybrid latent feature spaces composed of the CNN-based and GCN-based latent feature spaces. The category alignment is designed using the real label of source domain and the pseudo-label of target domain. With the classification result alignment, the classification bias for the same target data between each classifier will be reduced. By these alignment strategies in multiple latent spaces, the proposed network can achieve good domain adaptation performance.
Main contributions of this work include: (i) we establish an end-to-end unsupervised MDA with GCN and multialignment in mixed latent feature space; (ii) we propose a data structure alignment scheme to ensure data with the same structure can be more closely clustered; (iii) an all-in-one feature communication step is set before feature alignment so that domain labels and data structure information can be mutually constrained; (iv) the proposed category alignment focuses on local features of categories, which together with category centroids, further optimizes category decision boundary.

Methodology
In this section, we will elaborate the principle of the four main parts in this method: CNN feature extraction, GCN feature extraction, latent feature space alignment, and classification alignment as shown in Fig. 1. Figure 2 visualizes the latent feature space alignment module in detail to examplify the varied distributions by each alignment individually. In practice, domain, structure and category alignment conduct simultaneously. Distribution before and after alignments of the source domain i(i = 1) (upper) and source domain N S (lower) with the target domain are also presented in Fig. 2.
(1) Problem setting and notations. There are N S different source domains and their underlying source distributions are defined as the source data with labels and target data, whereX represents one training batch.
(2) CNN feature extraction module. CNN Feature Extraction module consists of common feature extractors F c and domain-specific feature extractors The labeled source domain data and unlabeled target domain data are sent to CNN feature extraction module. Different from previous MDA that merely use one common feature extractor, we design a comprehensive feature extraction network that extracts domain-invariant features for each pair of source and target domains, respectively, while concerning domainspecific decision boundaries. This feature extraction module firstly uses a common feature extractor when following by N S domain-specific feature extractors. Both source and target domain data will pass through a common feature extractor F c (X ). Then, different source domain data pass through their own domain-specific feature extractor.
where CF S i (X S i ) denotes the CNN features of source domain i. The target domain data will go through each domainspecific feature extractor.
where CF S i T (X T ) denotes CNN features of target domain with respect to source domain i. This additional architecture largely retains domain-specific information between different source domains. Meanwhile, source and target domain distributions are aligned with less computational complexity.
(3) GCN feature extraction module. GCN Feature Extraction module consists of domain-specific data structure analyzers  Current research in the field of MDA has not constructed graph-based latent feature space for structure alignment. In fact, reconstruction-based domain adaptation methods focus on modelling data structure information and have obtain impressive performance, which to some extent verifies its contribution in domain adaptation. In [5], GCN has also been proved effective in processing graph data that contains strong structure information. GCN is a graph embedding technology used to obtain feature representation h v ∈ R m of graph node v, where h v is an m-dimension vector and can be used to produce an output o v based on the node label. In our method, the input image can be viewed as a graph node v, and image classification is implemented based on node classification. Given that an undirected graphḠ hasÃ=A + I where A represents the adjacency matrix and I denotes the identity matrix, the degree matrix can be calculated asD ii = jÃ i j . The parameter matrix and activation matrix (input node matrix) of the l th layer are denoted as W (l) ∈ R C×F and H (l) ∈ R N ×C , respectively. When l th =0, H is input node matrix. Based on above settings, the graph convolution model can be expressed by where σ represents the activation layer function. To align structure features, we use GCN to process indirect instance graphs and extract structure features of different domain data. In our framework, node feature and the relationship between nodes are the feature of input image data and their mutual relationship. To obtain adjacency matrix A, we firstly input source domain and target domain data pairs into domain-specific data structure analyzers (AlexNet) to obtain structure features of source domain and target domain using Using these structure features, the adjacency matrix A of source domain and target domain can be obtained by respectively. With the calculated adjacency matrix and the learned CNN features, dense-connected instance graphs for GCN can be constructed. Then, they are delivered to domain-specific GCN which maps densely-connected instance graphs to GCN-based latent feature space. The obtained GCN features of source domain i and target domain with respect to source domain i are denoted as respectively. (4) Latent feature space alignment module. In this paper, the Maximum Mean Discrepancy (MMD) is implemented to measure the discrepancy of latent spaces from different domains. It is formulated in Reproducing Kernel Hilbert Space H as follows: where φ(·) denotes a mapping function from the original samples space to H space. By reducing the discrepancy in CNN-based latent feature space, domain alignment can be realized.
Reducing the discrepancy in GCN-based latent feature space can reduce structure discrepancy between different source domains and target domains.
We set an simple but effective all-in-one feature communication step before feature alignment. Specifically, we concatenate CNN feature and GCN feature that domain labels and data structure information can be mutually constrained and enable domain alignment and structure alignment at the same time.
F S i (X S i ) and F S i T (X T ) denote concatenated feature. Although the proposed framework firstly considers both domain invariance and structure consistency, some incorrectly adapted samples that meet both conditions would still fail to be classified due to lack of category-discriminative features. Therefore, this paper additionally designs category alignment in the training process. Each domain-specific classifier is trained using the corresponding source domain data and labels, and the corresponding classification loss is where J is the cross-entropy loss,Ŷ S i denotes Ground truth label of source domain i. Since there is no target domain label available, we use domain-specific classifiers to obtain pseudo-labelsŶ s i t for target domain usinĝ In this paper, we choose "centroid-to-centroid distance" to accomplish category alignment. When we calculate the centroid for each category, all pseudo-labeled (correct or wrong) samples are used together, and the negative effects brought by wrong samples are neutralized by correct samples to some extent. Here, the centroid of samples of the same category can be defined by where N k represents the number of samples, x N k i denotes all samples belonging to category k, andf (·) denotes corresponding feature. Meanwhile, we need to align all same categories between each pair of source domain and target domain according to the theoretical analysis in [7]. But it would lead to enormous amount of calculation. In fact, category decision boundary is mainly determined by the nearest neighboring data samples of different categories. Therefore, in actual category alignment, we reduce the distance of sample centroid of the same categories. For the sample centroid of different categories, we only increase discrepancy between the nearest different centroid. The category alignment could be reformulated as r s1 , r s2 represent the closest sample centroid of different categories of source domain, r t1 , r t2 represent the closest sample centroid of different categories of target domain, and λ C is set to 0.01 according to [7]. (5) Classification alignment module. Classification Alignment module is composed of domain-specific classi- It ensures that the classification results of N S classifiers on target domain maintain consistent. The consistency of classification result loss between domain-specific classifiers could be reformulated as follows: Finally, the overall loss in training process is defined as where α, β, γ , ζ is the trade-off parameters. The target domain data are input to the trained feature extraction module and then pass through trained domain-specific classifiers to generate classification results. (2) Implementation details. All experiments run on PyTorch 1.5.0 by a server with Intel (R) Core (TM) i7-10700K CPU and NVIDIA GeForce RTX 2080 Ti GPU. The method is trained for 10000 epochs on Office-31, and 15000 epochs on Office-Home. All training images are randomly cropped into 224 × 224 patches as input. We use the batch stochastic gradient descent method with momentum of 0.9 and batch of 32 to finetune all trainable parameters. The learning rate follows the rule in [20], that is η=η 0 /(1 + 10 p) 0.75 , η 0 =0.01. The pre-trained ResNet-50 serves as a common feature extractor. We use the AlexNet model pre-trained on ImageNet as domain-specific structure analyzer to extract Table 1 Classification accuracy (%) of task D,W → A on Office-31 with different γ and ζ , "AFC" denotes all-in-one feature communication and "Dynamic" denotes (2/(1 + exp(−10 p))) − 1, p is changes linearly from 0 to 1 structure feature with 1000 dimensional output. Domainspecific GCNs are designed as single-layer GCN networks according to the framework of GCAN [5].

Experiments
(3) Discussion of trade-off parameters. This experiment explores the influence of trade-off parameters in (15), where γ , ζ control the contribution of category alignment and structure alignment. The classification accuracy under different values of γ and ζ of task D,W → A is shown in Table 1; we observe that it is best to choose the dynamic adjustment strategy for γ , ζ . It is because in the early stage of training, there will be the influence of noise signal. It is more reasonable to adopt the strategy of dynamic adjustment instead of fixed adaptive factor. For other two parameters α, β, we follow the settings in MFSAN [3] α = β = (2/(1 + exp(−10 p))) − 1, p is changes linearly from 0 to 1. Considering that both domain alignment and structure alignment are achieved by calculating MMD, and γ and ζ are set the same. we propose an all-in-one feature communication step, through this step domain alignment and structure alignment can be complied at one time. The results show that using this dynamic strategy not only improves the computing efficiency, but also improves the adaptation performance.
(4) Convergence analysis. We visualized the training process of D, W → A task on Office-31 and R, C, A → P task on Office-Home shown in Fig. 3. It can be seen that the training loss of each source domain decreases stably as number of training epochs increases, which indicates that MMD loss, classification consistency loss and cross-entropy loss all converge gradually. Due to the limitation of GPU, we used the training strategy of sending each source domain into the network one by one (the feeding order of D, W → A task and R, C, A → P task is D, W and R, C, A) for training. Source domain data sent into the network to align with the target in  [3], MSCLDA [7], SImpAl [12], MDADFRE [13], MDAN [14], MDMN [15], DARN [16], CMSDA [17]. In addition, some SDA methods are also compared, including: ResNet [18], DDC [8], DAN [9], D-CORAL [19], RevGrad [20], RTN [21], GCAN [5], FixBi [22]. In detail, three standards for MDA are adopted: "Single Best" refers to the best performance in SDA which evaluates whether the best result can be further improved by adding other source domains; "Source Combine" refers to the performance obtained by SDA after all source domains are merged into one domain; "Multi-Source" represents MDA performance. Table. 2 and Table. 3 show the experimental results of our MDA-GM in comparison with other algorithms. In specific, MDA-GM brings sufficient improvement to existing domain adaptation methods. The average accuracy rate on Office-31 and Office-Home is 95% and 80.1%, respectively, which achieves the highest accuracy on most adaptation tasks. Compared with baseline MFSAN, category alignment for CNN and GCN features, and structure alignment in GCN-based feature space, we introduced brings a significant improvement. For instance, average accuracy on Office-31 exceeds baseline by 4.8%, and increases even up to 13.4% on D, W → A task which suffers from large domain shift. The average accuracy rate on the Office-Home exceeds baseline by 6%. FixBi achieves the second-best results using advanced bidirectional matching and self-penalization techniques. In general, classification results by the proposed method demonstrate the effectiveness of introducing GCNbased latent feature space with structure alignment and additional category alignment. The domain adaptation with multiple source domains generally shows better results than single best and source-combine results. This shows that enriching source domain data can further improve classification results. Simple source-combine is also less competitive in domain adaptation because this ideology does not consider the domain shift exists between multiple source domains.
(6) Network complexity analysis. The number of network parameters in this paper is counted excluding the pre-trained parameters. Apart from the methods listed above, DCTN is testified consuming 119.02M parameters and MDAN which is lacking implementation details isn't engaged. As shown in Fig. 4, network parameter on Office-Home of the proposed MDA-GM (3.7M) is smaller than CMSDA (4.85M) The bold and italic texts denote the optimal and suboptimal results Feature visualization. We visualize the latent feature space representation of source and target domains as well as their categories with t-SNE [23]. Since the final results are obtained from each source domain classifier, we can visualize them individually. Figure 5 shows the feature alignment performance on Office-Home. It is obvious that domain alignment results of MFSAN and MDA-GM are much better than that of DAN, which is consistent with the quantitative analysis in Table. 3. The matching of source features and target features of MDA-GM is more accurate and has the most obvious clear decision boundary. Figure 6 shows the t-SNE visualization of classification objects in target domain. MDA-GM has higher category discrimination than MFSAN and DAN, which indicates that it has better capability to adapt to datasets with large shift.
(8) Ablation analysis. We reserved the domain alignment and classification result alignment provided by existing studies and compared the adaptation performance of MDA-GM with (w/) or without (w/o) category alignment and structure alignment. The results are shown in Fig. 7. It is obvious that category alignment and structure alignment largely promote MDA, and structure alignment has a better performance improvement than category alignment. It could be seen that all variants seem to achieve similar performance on A, W → D and A, D → W while ablating each of the alignment presents a noticeable decrease on D, W → A performance. This verifies that introducing GCN-based latent space mapping with a corresponding structure alignment largely benefits adaptation tasks. As the category alignment is set to optimize category decision boundary, ablating category alignment results in the least decrease in accuracy (1.7% on Office-31 and 3.2% on Office-Home). Abandoning structure alignment would lower classification accuracy more noticeably as structure information reflects the inherent properties of data. This alignment ensures data with the same structure in latent feature space can be more distinguished from data with different structures. This verifies the effectiveness of adopting GCN feature extraction, designing structure align-

Conclusions
This paper proposes an MDA-GM algorithm that uses domain labels, category labels and data structure information for image classification. GCN-based latent feature space is engaged in MDA together with conventional CNN-based latent feature space, which extracts domain data feature with rich structure information. Apart from existing domain alignment and classification result alignment, we introduce two novel alignment strategies generally applicable to structural level and category level features. In detail, the structure alignment ensures data with the same structure in latent feature space closely clustered. This alignment strategy can specifically improve classification performance on adaptation tasks with large domain shift. The category alignment further enhances category-discriminative ability and optimizes category decision boundary. Comparative experiments on two standard datasets show that the proposed method brings sufficient improvement among state-of-the-art MDA methods.