GWNN-HF: beyond assortativity in graph wavelet neural network

Graph wavelet neural network exerts a powerful learning ability in assortative networks where most of the adjacent nodes have the same label as the target node. However, it does not perform well in disassortative networks where most of the adjacent nodes have different label than the target node. So graph wavelet neural network cannot extract the most useful information based on different types of networks. On the one hand, graph wavelet neural network is not able to extract the similarity information of the same labeled neighbor nodes and the difference information of different labeled neighbor nodes in a flexible way. On the other hand, graph wavelet neural network only aggregates neighbor nodes so that it cannot obtain information of nodes which have similar feature with the target node and are far from the target node. To solve the above problems, we propose the GWNN-HF model, which can effectively adapt to different types of networks and get a better node representation. Specifically speaking, firstly, we design low-pass filter and high-pass filter convolution kernels to get low-pass and high-pass signals and then use the adaptive fusion method to fuse them, which effectively get commonality of same label nodes and difference of different label nodes. Secondly, we use the Relaxed Minimum Spanning Tree algorithm to construct a feature correlation graph and use an attention mechanism to fuse the original graph and feature correlation graph representation. Extensive experiments on benchmark datasets clearly indicate that GWNN-HF has a good performance in different types of network structures.


Introduction
Networks can be seen everywhere in real life, including assortative networks and disassortative networks. The assortative networks are ubiquitous, such as citation networks and social networks [1,2]. In assortative networks, most of the adjacent nodes have the same label as the target node [3]. There can also be many disassortative networks, such as chemical interactions in proteins often occur between different types of amino acids. In disassortative networks, most of the adjacent nodes have different labels than the target node [4][5][6][7].
In recent years, graph neural networks have attracted extensive attention because of their excellent performance in graph representation learning. It can embed graph into a lowdimensional space by aggregating and transforming adjacent node's information [8][9][10][11]. Graph wavelet neural network is called one of the most attractive graph neural network models, which tries to address the limitations of SpectralCNN by using graph wavelet transform instead of graph Fourier transform [12], which uses a set of wavelets as the theoretical basis, wavelets basis in GWNN are highly localized in the vertex domain, such localized property can make GWNN more flexible to adjust the receptive fields of nodes (via the scaling operation) [13]. So the advantages make it improve node classification task and learn a better node representation.
Nowadays, the lately published graph neural networks (including GWNN) can learn a good node representation in the assortative networks [14][15][16][17]. However, graph neural networks face the problem that the ability of learning node representation in disassortative networks declines rapidly. In order to improve graph neural networks that can extract the most useful information based on different types of networks, many GCN-based models have been proposed to improve graph neural networks ability of learning node representation in different kinds of networks. For example, FAGCN [18] designs a novel graph convolutional network to adaptively combine the low-pass and high-pass signals, but it only cares about the information of the neighbor node. Geom-GCN [19] utilizes the structural similarity to capture the long-range dependencies in disassortative networks, but it fails to obtain the difference of different label nodes.
The above-mentioned GCN-based models can be well adapted to different types of networks. However, no one has proposed an effective method to make graph wavelet neural network learn a better node representation regardless of the type of network in recent years. The problem is currently mainly faced with the following challenges: On the one hand, how to design a new filter to make GWNN effectively adapt to different types of networks; on the other hand, how to make full use of the role of feature correlation to assist in improving node representation. In order to handle challenges of graph wavelet neural network in learning node representation. Here, we propose GWNN-HF model. Specifically, as shown in Fig. 1. Firstly, based on graph wavelet neural network algorithm, we design low-pass and high-pass filter to get corresponding low-pass signals and high-pass signals from node feature, the low-pass signals retain the commonality of neighbors within the same label, high-pass signals capture the difference between neighbors from different labels. We also design an adaptive fusion approach to fuse low-pass signals with high-pass signals to shorten or enlarge the distance automatically between nodes in the original graph and feature correlation graph. Secondly, we build a feature correlation graph based on the node feature to make nodes with similar feature connect each other, then we use the attention mechanism to make the node representation which are from feature correlation graph and the original graph combined adaptatively. The method can make the target node adaptively obtain node information with similar feature so that the target node can learn a better representation. Our contributions can be summarized as follows: • We propose GWNN-HF that designs two innovative filters that efficiently integrate lowpass and high-pass signals for original graph and feature correlation graph, respectively, to enhance adaptability in assortative networks and disassortative networks. • To take full advantage of feature correlation, we use Relaxed Minimum Spanning Tree algorithm to construct feature correlation graph which can capture the global information from distant nodes with similar feature in the original graph. • Experiments show that the proposed model is superior to the representative graph neural network methods.

Graph neural network
From the perspective of spatial domain, graph neural networks usually focus on aggregating and transforming neighborhood information with different designs. GraphSage [8] takes firstorder neighbors as neighborhoods and defines weight functions as various aggregators on the neighborhood. GAT [20] uses the attention mechanism to learn the weight of neighbor nodes.
From the perspective of spectral domain, SpectralCNN [21] extends the convolution neural network to graph representation learning, which defines the convolution kernel in the spectral domain by using the graph signal processing theory. The convolution kernel is regarded as a trainable diagonal matrix. ChebNet [3] approximates the convolution kernel with the polynomial of the Laplacian matrix. GraphHeat [22] is a more powerful low-pass filter is designed through the heat kernel. GWNN [23] uses a wavelet basis instead of a Fourier basis to further improve the efficiency of the model.

Feature correlation graph
AM-GCN [24] uses the KNN algorithm to construct a feature graph, which can learn the most relevant information adaptively from the topology graph and node feature. SimP-GCN [25] constructs feature graph to balance structure and feature information adaptively, which can capture node similarity of original feature space and can perform well in disassortative networks.

Low-pass and high-pass signals
SpGAT [26] proposes a novel attention mechanism in spectral domain, which can effectively learn representation of low-pass and high-pass signals regarding weighted filters and graph wavelets bases. The method can make the global information of the graph be captured in an efficient manner with fewer learned parameters than GAT. From a theoretical point of view, FAGCN [18] analyzes the role of low-pass signal and high-pass signal in learning node representation, which considers that the low-pass signals can retain the similarity information of node feature, and the high-pass signals can capture the difference information of node features, the two different signals both have an important influence on the final node representation.

Preliminaries
We mainly study the simple graph G = {V , E, A}. In a simple graph G, V represents nodeset in the graph, E is the set of edges in the graph, and A is the adjacency matrix of the graph, where A i j = A ji . Let Y be the label set of all possible class and let X ∈ R n×d be the d-dimensional feature matrix of all nodes in the graph.

Graph wavelet neural network
Graph wavelet neural network [23] uses wavelet basis to replace Fourier basis. Since our work is based on GWNN [23], so the following contents introduce graph wavelet neural network model.
Graph wavelet neural network defines a group of wavelet basis φ s = (φ s 1 , φ s 2 , . . . , φ s n ), where scale is a scale parameter. A wavelet base can be written as the formula φ s i = U G s i U T , G s =diag(g(sλ 1 ), g(sλ 2 ), . . . , g(sλ n )) is a scaling matrix, where g(sλ i ) = e λ i s . The graph wavelet transform is defined as x * = φ −1 s x, and the inverse wavelet transform is defined as x = φ s x * [23,27].

Necessity of introducing high-pass signals on the graph
In the field of computer vision, the information of the image can be expressed by different signals. The high-pass signals mainly describe the rapidly changing details in an image, such as the outline of the image [28]. Also, for non-Euclidean graph data, according to the graph signal processing theory [29], we ascend Laplacian eigenvalue, smooth change signals carried by the vector corresponding to the smaller eigenvalue, which are regarded as the low-pass signals. Sharp change signals are carried by the vector corresponding to the larger eigenvalue, which are regarded as high-pass signals. Many studies have shown that graph neural networks success thanks to its ability to exploitation and utilization of low-pass signals. However, some research shows that low-pass signals only perform well in the assortative network. The effect is far inferior to the high-pass signals in the disassortative network. So low-pass or high-pass signals are both helpful to learn node representation.

GWNN-HF
In this section, we introduce GWNN-HF in detail. As is shown in Fig. 1, the purpose of this model is to obtain information of nodes with similar feature and remote in the original graph and obtain the difference between nodes with different labels while obtaining the commonality of nodes with the same label. To solve this problem, a feature correlation graph is constructed based on node feature by using the Relaxed Minimum Spanning Tree (RMST) algorithm. Based on low-pass filtering, some high-pass signals are introduced into the feature correlation graph and the original graph, respectively. The filtered representation of the original graph and the feature correlation graph is fused by the attention mechanism. Finally, the final node representation is obtained by a linear transformation of a multi-layer perceptron.

Construct a new feature correlation graph
Although graph wavelet neural network has been proven effective in many applications, their performance can be significantly affected when the graph structure is not optimal. For example, their performance in the disassortative network is obviously reduced.
Making full use of the role of feature correlation can solve the above problems. Therefore, considering the feature similarity to build feature correlation graph can make nodes with high feature similarity are connected. In this graph, the neighbor nodes may be located far away from the target node in the original topology, may also be close to the target node. So building a feature correlation graph can capture the global information from distant nodes with similar feature in the original graph. Combining original graph and the feature correlation graph can take local and global information into account effectively. Next, we introduce the specific process of constructing feature correlation graph.
The construction of the feature correlation graph mainly consists of two steps: The first step is to calculate the similarity matrix of feature, and the second step is to use the Relaxed Minimum Spanning Tree (RMST) algorithm to build a feature correlation graph according to the similarity matrix.

Algorithm 1 Build feature correlation graph
Input: input:node feature X Output: feature correlation adjacent matrix 1: to save cosine similarity distance between node feature, then use a similarity matrix S to save them; 2: Use Kruskal minimum spanning tree algorithm to bulid minimum spanning tree,then record Firstly, cosine similarity distance is used to calculate the distance between node feature. Then, we define a similarity matrix S to store the feature similarity distance between all nodes in the whole graph.
Secondly, according to the similarity matrix S, we apply the Relaxed Minimum Spanning Tree (RMST) algorithm [30] to construct the feature correlation graph.
represents the adjacency matrix of the feature correlation graph, d(i, j) represents the direct connection distance between node i and node j, i k and j k represents node i and node j nearest kth neighbor, respectively, mlink i j = max z i,k , z k,h , . . . , z m, j represents the path weights of node pairs (i, j) with the maximum weight in the multiple connected paths of the minimum spanning tree, where η is a hyperparameter that controls the information weights of the path of the minimum spanning tree and the neighboring nodes of its two nodes. The procedure for constructing feature correlation graphs is shown in algorithm 1.

The high-pass signals are introduced for graph wavelet neural network
Since low-pass signals and high-pass signals are important, so we want to get them to enhance the representation ability of graph wavelet neural network. The following contents describe the specific process of maintaining high-pass signals and low-pass signals in the original graph and feature correlation graph.
Firstly, according to the original graph, we can see from Fig. 1, due to not being clear what type of network, we need to preserve the commonality of nodes with the same label and the difference of nodes with different labels. Therefore, we separately design the convolution kernel of the low-pass filter I − 1 * o and the convolution kernel of the high-pass filter I + 1 * o for the original graph, where 1 is the trainable parameter of the low-pass filter and high-pass filter, whose value is limited to [0,1]. According to the graph signal processing theory, smooth change signals are carried by the vector corresponding to the smaller eigenvalue, which are regarded as the low-pass signals, and sharp change signals are carried by the vector corresponding to the larger eigenvalue, which are regarded as high-pass signals. The convolution kernel of low-pass filter can be rewritten by With the increase in eigenvalue λ i , the value of g θ (λ i ) is getting smaller, which can be seen to suppress high-pass signals. The convolution kernel of the high-pass filter can be rewritten by g θ (λ i ) = 1 + 1 * λ i , with the increase of λ i , the value of g θ (λ i ) increases, high-pass signals have been amplified. In this way, the convolution kernel of the low-pass filter can ensure that it mainly cares about the part with the smaller eigenvalues, and the corresponding eigenvector of small eigenvalue can carry the smooth-changing signals as low-pass signals. Similarly, the convolution kernel of the high-pass filter is guaranteed to mainly care about the part with larger eigenvalues, and the corresponding eigenvector of large eigenvalues can carry rapidly changing signals as high-pass signals. The low-pass signals' representation Z L o and high-pass signals' representation Z H o can be obtained by the formula as follows: s o represents the wavelet basis and inverse wavelet basis of the original graph, representatively. Based on obtaining the low-pass and high-pass representation of the node feature, we set the trainable parameter α to adaptively fuse information of the two different signals. The specific fusion method in the original graph is shown in the following formula: where Z o represents the final representation of the original graph. Secondly, the low-pass and high-pass signals of the node feature are obtained in the same way as the original graph in the feature correlation graph, where 2 is the trainable parameter of the low-pass filter and high-pass filter in the feature correlation graph, whose value is limited to [0,1], φ s f , φ −1 s f represents the wavelet basis and inverse wavelet basis of feature correlation graph, representatively. The low-pass signals' representation Z L f and high-pass signals' representation Z H f can be obtained by the formula as follows: However, it should be noted that we consider the path of the minimum spanning tree and the adjacency nodes of a single node in the process by using the Relaxed Minimum Spanning Tree algorithm, so most of the adjacent nodes in the feature correlation graph are similar; introducing too many high-pass signals may damage node representation with similar feature in the feature correlation graph, so when we fuse low-pass signals and high-pass signals of node feature, a hyperparameter β and a specially trainable parameter γ are designed to fuse the two different signals; the β is mainly used to suppress high-pass signals; the fusion method is as shown in the following formula, where Z f represents the final representation obtained on the feature correlation graph.

Fuse the representation of the original graph and feature correlation graph
The attention mechanism is able to extract the most useful parts of the feature and suppress other useless parts, which can adaptively adjust the weights in different situations to obtain important information [24]. In the process of learning the node representation of the assortative networks, the quality of obtaining node representation can be improved to a certain extent by relying on the original graph structure information, and the attention mechanism can adaptively assign higher weights to the original graph representation by learning. In the process of learning the node representation of the disassortative network, the use of feature correlation can solve the problem of inadequate representation of the learning node relying on the original graph structure, and the attention mechanism can give greater weight to the feature correlation graph, which can help feature correlation play a greater role in disassortative networks. Therefore, in order to get a better node representation in different types of network structures, we use an attention mechanism to get the attention weight K o , K f corresponding to Z o , Z f to make feature correlation graph better complement the original graph.
was used to normalize the attention weights k i o and k i f to get the final weight values. Finally, we apply feature transformation with a layer of MLP to obtain the final representation; the formula is as follows: W represents the weight matrix of the last layer of feature transformation, and B represents the bias matrix. Thirdly, we define the cross-entropy loss function which can minimize the discrepancy between the true label and the predicted label of nodes in the graph; the specific formula is as follows: where Y ic represents the true result that node i belongs to class c. Z ic represents the predicted possibility that node i belongs to class c. In this paper, we describe the architecture of GWNN-HF in pseudocode which is given in Algorithm 2.

Experiments
In this part, we need to evaluate the effectiveness of the GWNN-HF model in assortative and disassortative networks. In addition, we need to answer the question: (1) How does GWNN-HF perform in assortative networks?

Datasets
Since the performance of graph neural networks is different among disassortative and assortative networks, we select several representative datasets from the two different types of networks to conduct experiments. Specifically, for assortative networks, we select three popular citation networks in graph neural networks (including Cora, Citeseer, and Pubmed [31]).
In the citation network, the edge represents the citation relationship between two papers (undirected) and the label represents the field of the paper. For disassortative networks, we choose three Web datasets (including Cornell, Texas, and Wisconsin [19]), where edges represent hyperlinks between two pages and node features represent word packs in the pages. It is worth our attention that assortativity describes the measurement of connecting nodes with the same label in a dataset, and its specific formula [19] is listed as follows: In the assortative network, most nodes with the same label are connected, so the assortativity is relatively high. However, in the disassortative network, most of the adjacent nodes are not of the same label, so the assortativity is low. According to the value of assortativity, this paper uses Cora, Citeseer, and Pubmed as the assortative networks, and Cornell, Texas, Wisconsin as the disassortative networks [25]. We refer to the statistical data of these six datasets in Table 1.

Baselines
To assess the effectiveness of our model, we select the following baselines for node classification tasks.
GCN [31]: It updates the representation of the central node by aggregating the neighbor node feature information using a normalized Laplace matrix.
GAT [20]: It is a graph neural network model that utilizes an attention mechanism to aggregate node feature.
GWNN [23]: It uses graph wavelet as a set of basis to replace the eigenvectors of Fourier basis.
KNN-GCN [32]: It constructs a k-nearest neighbor graph based on node features and performs graph convolution on that k-nearest neighbor graph. Bold results the best result in assortative network datasets JK-NET [7]: It flexibly uses the neighborhood node information of different orders to obtain a better node representation.
GCNII [33]: Based on GCN, it uses residual connections and identity mapping to achieve better performance.
Gemo-GCN [19]: It seeks to capture long-range dependencies in disassortative networks. It uses geometric relations defined in potential space to build a structural neighborhood for aggregation. Geom-GCN-S, Geom-GCN-I, and Geom-GCN-P are three variants of it. It is mainly designed to be used in disassortative networks; thus, we only report its performance in disassortative networks.
SimP-GCN [25]: It balances structure and feature information adaptively and captures the similarity of node pairs through self-supervised learning.

Experimental settings
We use PyTorch to implement our approach and GPU to speed up the process of the experiment. We use Adam optimizer [34] to train the GWNN-HF with a maximum training number of 1000 epochs and a learning rate of 0.01. When patience is equals to 100, we stop the model training process in advance.

Performance comparison
In this section, we answer the first and second questions and compare the performance of our model and baselines in disassortative and assortative networks, respectively. In order to ensure fairness, this paper directly uses the experimental results provided by other papers [25], because GWNN did not complete the experiment in the disassortative network, we have realized the experiment of GWNN [23] in the disassortative networks according to the paper's data processing [33].

Comparison in the assortative network
For experiments in the assortative networks, we follow the widely used semi-supervised setup, using 20 marked nodes per class for training, 500 nodes for validation, and 1000 nodes for testing. The details can be found in GCN [31]. On the basis of using training/validation/test data in the same configuration, we report the mean classification accuracy on the test sets of our model after 10 runs; the result is represented in Table 2 In percentage, the highest accuracy is highlighted in each column.
It is seen from Table 2 that GWNN-HF algorithm for node classification in assortative networks has a good performance, GWNN-HF algorithm outperforms GWNN by 0.9%, 1.4%, and 0.5% in the Cora, Citeseer, and Pubmed datasets, respectively. It shows that the GWNN-HF algorithm can improve GWNN's ability to learn the node representation in assortative networks by fusing low-pass signals and high-pass signals and introducing feature correlation. The potential reasons for GWNN-HF proposed to outperform the listed comparison algorithms in assortative networks are summarized as follows: (1) GCN, GAT, and GWNN update the central node representation by designing different strategies to aggregate low-order neighbor information. However, these methods to a certain extent limit the ability to obtain global information about nodes and then reduce the quality of the node representation. Compared with these methods, GWNN-HF obtains feature information of nodes with the distant location but similar feature in the original graph by additionally constructing a feature correlation graph from the perspective of the feature. No matter how far some nodes are positioned from the central node in the original graph, as long as the feature correlation between them and the central node is high, they will become low-order neighbors; absorbing the information of these neighbor nodes has a gainful effect to improve the representation of the central node.
(2) KNN-GCN constructs the feature graph according to feature and learns the node representation of the feature graph by using GCN. Compared with GCN, we can find that in assortative networks aggregating the feature information of neighbor nodes in the original topology is helpful for learning the central node representation due to its high assortativity, so ignoring the original topology makes it inadequate for learning the central node representation in assortative networks. SimP-GCN performs better in the Pubmed dataset, indicating that its self-supervised learning approach can capture pairwise node similarity, but the GWNN-HF algorithm outperforms it in the Cora and Citeseer datasets, indicating the effectiveness of feature correlation.
(3) JK-NET selectively combines the aggregated representation of different layers in the final output representation layer; GCNII uses initial residual connectivity and identity mapping to obtain deeper feature information. Although these operations enrich the node representation from multiple perspectives, it is impossible to distinguish whether the acquired node information is beneficial or harmful, so the node representation obtained by learning is inadequate. The GWNN-HF algorithm designs specialized low-pass filters and high-pass filters, which can effectively obtain the similarity information of the same label nodes and the difference information between different label nodes. As seen in Table 1, although the assortativity of Cora, Citeseer, and Pubmed is relatively high, they are all less than 1, indicating that in these three networks, there are still neighboring nodes with inconsistent labels with the central node. On the basis of obtaining similarity information of the same labeled nodes, it is gainful to learn the node representation by additionally obtaining the difference information among different labeled neighborhood nodes.

Comparison in the disassortative network
In this section, we report the excellent performance of the GWNN-HF algorithm in three disassortative networks, such as Cornell, Texas, and Wisconsin. We follow the common design in disassortative networks [33], randomly split nodes of each class into 60%, 20%, and 20% as the training set, verification set, and test set. We report the average accuracy of all Bold results the best result in disassortative network datasets models on the test sets over 10 random splits and bold the highest accuracy for each column, the result is represented in Table 3. It is clear from Table 3 that GWNN-HF algorithm outperforms other comparison algorithms in three disassortative network datasets. Compared to the SimP-GCN algorithm, GWNN-HF exceeds 0.33%, 0.54%, and 1.22% in the Cornell, Texas, and Wisconsin datasets, respectively. The potential reasons for the GWNN-HF algorithm to outperform other comparative algorithms in disassortative networks are summarized as follows: (1) It is clearly seen from Table 3 that GCN, GAT and GWNN perform poorly in the task of node classification in disassortative networks, they can capture the similarity information of neighbor node feature. The label of many neighbor nodes in disassortative networks is inconsistent with the label of the central node. They cannot extract the similarity information of the same labeled neighbor nodes and the difference information of different labeled neighbor nodes in a flexible way; they make the central node inadequate to obtain feature information of its neighbors in disassortative networks.
(2) JK-NET and GCNII adopt different strategies to obtain global information, respectively, but JK-NET and GCNII rely on the topology of the graph when learning node representation. Due to the high disassortativity of disassortative networks, aggregating information of neighbor nodes to update node representations is not very useful in disassortative networks. The KNN-GCN algorithm does not rely on the topology of the graph, which constructs a new graph based on feature to learn node representation, which has a better performance compared with JK-NET and GCNII. It illustrates that feature information of nodes is important in the process of learning node representation in disassortative networks, while the original graph topology in disassortative networks may be harmful.
(3) Gemo-GCN and SimP-GCN both design different strategies to obtain long-range dependencies to reduce the harm of the original topology in learning the node representation in disassortative networks. GWNN-HF absorbs the advantages of two algorithms, which constructs a feature correlation graph to get the information of nodes with distant locations but similar features. On this basis, high-pass signals are introduced to obtain differences among most of the label-inconsistent nodes in disassortative networks, which can to some extent enhance the representation of nodes.

Ablation experiment
In order to better understand the effect of different modules in the algorithm, we conduct an ablation experiment to answer the third and fourth questions mentioned above. Specifically, we construct the following ablation experiments: GWNN-H: It fuses low-pass signals and high-pass signals, but it does not consider the role of feature correlation.
GWNN-F: It adaptively fuses the original graph and the feature correlation graph, but it does not consider the role of high-pass signals.
GWNN-HF: It fuses low-pass signals and high-pass signals and considers the role of feature correlation.

Comparison of GWNN-F and GWNN-HF
Since most of the currently available work based on graph wavelet transform is considered to be a low-pass filter [35], so we mainly focus on the impact of high-pass signals on the overall algorithm. Here, in order to better explore the influence of high-pass signals, this section mainly shows the comparison of the results between GWNN-F and GWNN-HF in three citation network datasets and three Web page datasets, the difference between GWNN-F and GWNN-HF is the presence of high-pass signals, the result is shown in Fig. 2.
Firstly, the role of high-pass signals is analyzed from the perspective of assortative networks, the GWNN-HF algorithm is superior to the GWNN-F variant in three citation network datasets. It can be seen that the high-pass signals play a role in assortative networks. Although the assortativity of the three citation network datasets is high, their assortativity is still less than 1. It means that there are still a small number of neighbor nodes with inconsistent labels around the central node. By fusing low-pass signals and high-pass signals, it is possible to preserve most of the similarity information of the same labeled neighbor nodes and additionally absorb the difference information between different labeled neighbor nodes, making the node representation richer.
Secondly, the role of the high-pass signal is analyzed from the perspective of the disassortative networks. The node classification effect of GWNN-HF also exceeds the GWNN-F variant in the three Web page datasets. Meanwhile, the gap of disassortative networks between the GWNN-HF algorithm and the GWNN-F variant is larger than that of the assortative networks, indicating that the high-pass signals play a greater gain in disassortative networks. Due to the low assortativity of disassortative networks, it is obvious that the learned node representation is enriched by using an adaptive approach to fuse low-pass signals and highpass signals that can absorb difference information among different labeled neighbor nodes while retaining most of the similarity information of same labeled neighbor nodes.
In summary, the high-pass signal is gainful to the GWNN-HF algorithm in both assortative and disassortative networks.

Comparison of GWNN-H and GWNN-HF
Here, in order to better explore the influence of feature correlation, this section mainly shows the comparison of the results between GWNN-H and GWNN-HF in three citation network datasets and three Web page datasets, the difference between GWNN-H and GWNN-HF is the presence of feature correlation, the result is shown in Fig. 3.
Firstly, the role of feature correlation is analyzed from the perspective of assortative networks, GWNN-HF outperforms the GWNN-H variant in three citation network datasets, indicating that introducing feature correlation is an effective method to improve node representation. Specifically, the feature correlation graph is constructed based on the feature correlation so that nodes that are distant from the central node but have similar feature in the original graph structure become low-order neighborhoods directly. Performing convolution filtering on the feature correlation graph can effectively absorb those node feature information to update the representation of the central node, which obtains long-distance dependency to effectively enhance the node representation.
Secondly, the role of feature correlation is analyzed from the perspective of disassortative networks; the effect of GWNN-H is lower than that of the GWNN-HF algorithm on three Web datasets, which strongly indicates that the feature correlation plays an important role in disassortative networks. Specifically, since the label of most neighbor nodes in the disassortative networks is not consistent with the label of the central networks, the original graph structure may have no effect on the enhancement of the central node representation. Furthermore, it can be found from Table 3 that KNN-GCN constructs graph topology by using the KNN algorithm on the basis of feature and applies the GCN layer to update the central node representation. The effect of KNN-GCN in the disassortative networks surpasses the effect of using GCN layers in the original graph, it has also been demonstrated that the original graph structure in the disassortative network harms the central node representation to some extent, and the construction of the feature correlation graph solves this problem.
In summary, feature correlation is helpful in improving the learning node representation ability of GWNN-HF in different types of network structures.

Time analysis
Here, we mainly answer the fifth question mentioned above. We show in Tables 4 and 5 the average running time per epoch in assortative networks and disassortative networks; it is noteworthy that the Pubmed dataset is larger than other datasets. To maintain fairness, we run all variants of the model on the Pubmed dataset by using GPU A6000, while the other datasets are trained by GPU 1050. We use a trainable parameter alpha instead of the attention mechanism to fuse the original graph with the feature relevance graph representation in order to better validate the training time of the attention, we name the variant GWNN-alpha. From Tables 4 and 5, we find that although GWNN-HF has better performance in assortative and disassortative networks, its ability to adapt to different types of network structures inevitably introduces certain trainable parameters, including the parameters of the low-pass and high-pass filters, which increases the number of trainable parameters to a certain extent. In addition, we introduce the feature correlation graph, and we can find that the time of GWNN-F is also larger than that of GWNN, indicating that convolutional filtering on the feature correlation graph increases the number of trainable parameters.
To better analyze the number of trainable parameterizations, we analyze specifically in terms of time complexity, GWNN-HF performs two linear transformations and two convolution layers for message passing, and an attention mechanism totally during each training iteration. H represents the initial feature of the input layer, n represents the number of nodes in the graph, p represents the number of feature, the time complexity of two linear transformations is O( pqc), where q represents the dimension of the hidden layer,c is the number of categories in the output layer. We use two convolution layers for message passing, the number of trainable parameters in the convolution kernel is (4n + 6), so the time complexity of this period is O(n). We use an attention mechanism fuse node feature of the original graph and feature correlation graph, which including two linear transforms to compute attention coefficient, so the time complexity of this period is O(qh), where h represents the dimension of the hidden layer, so the total time complexity of GWNN-HF is O( pqc + qh + n).

Conclusion
Throughout this paper, we put forward the GWNN-HF model in assortative and disassortative networks for the node classification task. Firstly, we construct an additional feature correlation graph and use the attention mechanism to fuse the representation of the original graph and the feature correlation graph. Secondly, we fuse low-pass signals and high-pass signals, not only considering the similarity of the node with the same label but also considering the difference between nodes with different labels. Several experiments show that GWNN-HF has good performance in different types of network structures. Binfeng Huang was born in 1998. Master candidate. His main research interests include machine learning, graph representation learning.
Wenjie Zheng was born in 1996. He is a Ph.D student in the School of Computer Science and Engineering, Nanjing University of Science and Technology. His current research interests include multimodal emotion analysis, multimodal mental health.
Fulan Qian was born in 1978. Ph.D. She is an associate professor and supervisor of the Master's students at the School of Computer Science and Technology, Anhui University, China. Her research interests include machine learning, recommendation algorithms, adversarial robustness, and 3D object detection.
Shu Zhao was born in 1979. Ph.D. She is a professor and supervisor of the PhD students at the School of Computer Science and Technology, Anhui University, China. Her research interests include network representation learning, knowledge graph, and social network analysis.
Jie Chen was born in 1982. Ph.D. She is an associate professor and supervisor of the Master's students at the School of Computer Science and Technology, Anhui University, China. Her research interests include machine learning, text sentiment classification and granular computing.
Yanping Zhang was born in 1962. Ph.D., professor. Her research interests include intelligent computing, quotient space theory, machine learning and intelligent information processing.