1.3 Overall statistics and distributions
A total of 1280 documents matched our search query. One of the papers was retracted, so we excluded it from the list. These documents have been cited 7121 times in overall. So, the average number of citations per document is 5.56. Table 1 shows the distribution of the document types.
The document types with the most average citations per paper are reviews, conference papers, and articles (which aggregately constitute 96.24% of data) on average received 49.44, 6.31 and 4.04 citations respectively. Approximately 78.58% of citations were to documents published within the last three years (2017 to 2019).
Table 1
Document type
|
Number
|
Average citations per paper
|
Conference Paper
|
854
|
6.31
|
Article
|
369
|
4.04
|
Conference Review
|
38
|
2
|
Review
|
8
|
49.44
|
Book Chapter
|
8
|
0.25
|
Note
|
1
|
1
|
Book
|
1
|
0
|
Figure 1 shows the publication trend of documents in this field. The first paper found in our documents set dates back to 2004 by Scarselli, Tsoi (77) published in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Scarselli, Tsoi and Gori, as the authors of the paper, are among top authors of the field. They proposed an architecture similar to recursive neural networks in which each unit stores the current node state, and, when activated, it calculates the next state using its neighbor's states.
A few numbers of documents were published before 2017 (41 documents). Thereafter, research in this field blossomed, with a significant rise in the number of documents. It is clear that this field of research is very young and is attracting more and more attention. The average annual growth of the documents from 2017 to 2019 is ~ 447%. It shows that this field is very young, attracting more attentions. Note that the citations curve in Fig. 1 shows the number of citations to the papers published in a particular year. For instance, the plot shows that the papers published in 2017 are cited around 1800 times by the papers published thereafter. We observe two local peaks in the citations curve before 2017: one in 2005 and the other in 2009. These are mainly due to Gori, Monfardini (78) and Scarselli, Gori (79) papers. There is a rise in the number of citations in 2016 due to Li, Tarlow (35). The year 2017 is an important milestone with a significant increase in the number of citations. Some of the most cited papers of the field appeared in this year including (80) and Bronstein, Bruna (81). The year 2018 is the most cited year with some notable papers including Yan, Xiong (82), Schlichtkrull, Kipf (83), Ying, He (16) and Zhang, Cui (84). More than 50% of documents were published in 2019 with some remarkable documents including AlQuraishi (85) Wang, He (86) and Zhang, Qi (45).
Figure 2 shows subject distribution of the GNN papers in the 22 different categories. The most frequent subjects are Computer Science (86%), Mathematics (22.90%), Engineering (22.59%), Decision Sciences (10.16%), Social Sciences (8.44%), Materials Science (5.16%), Physics and Astronomy (4.76%), Biochemistry, Genetics and Molecular Biology (3.98%), Business, Management and Accounting (3.67%), and Arts and Humanities (3.67%).
2.3 Top authors
Table 2 shows the top ten authors based on their h-index (87) in this field. Notice that the numbers reported in this table are limited to GNN papers which means the overall values can be larger. Franco Scarselli is the most prolific and impactful researcher in this field, with an h-index of 10. His top most frequently used keywords in descending order are graph neural networks, graphical domains, recursive neural networks, and deep neural networks. He is an associate professor at Department of Information Engineering and Mathematics at the University of Siena. His main topics of research are Artificial intelligence, Machine learning, Artificial neural networks, Graph neural networks, and Deep learning. In addition, he has published two of the most-cited GNN papers which will be introduced in the Section F “Must-read papers”.
Table 2
Most prolific and impactful researchers
H-index
|
Author
|
All citations
|
Docs
|
Affiliation
|
Most used keywords
|
10
|
Franco Scarselli
|
929
|
19
|
University of Siena
|
Graph Neural Networks, Graphical Domains, Recursive Neural Networks, Deep Neural Networks
|
8
|
Marco Gori
|
891
|
11
|
University of Siena
|
Graphical Domains, Graph Neural Networks (GNNs), Biodegradability, Graph Processing
|
8
|
Wang Xiang
|
252
|
33
|
National University of Singapore
|
Graph Neural Network, Recommendation, Collaborative Filtering, Embedding Propagation
|
8
|
Markus Hagenbuchner
|
732
|
14
|
University of Wollongong
|
Graphical Domains, Recursive Neural Networks, Vapnik–Chervonenkis Dimension
|
8
|
Ah Chung Tsoi
|
732
|
14
|
University of Wollongong
|
Graphical Domains, Recursive Neural Networks, Approximation theory
|
7
|
Jian Tang
|
165
|
21
|
Syracuse University
|
Representation Learning, Network Embedding, Graph Neural Networks, Graph Convolutional Network, Graph Attention
|
6
|
Sanja Fidler
|
204
|
7
|
University of Toronto
|
Deep Learning, Grouping And Shape, Segmentation
|
6
|
Gabriele Monfardini
|
830
|
6
|
Università degli Studi di Siena
|
Graphical Domains, Graph Neural Networks (Gnns), Graph Processing, Relational Neural Networks
|
5
|
Jure Leskovec
|
336
|
7
|
Stanford University
|
Graph Neural Networks, Knowledge-Aware Recommendation, Label Propagation
|
5
|
Raquel Urtasun
|
142
|
8
|
University of Toronto
|
Graph Neural Networks, Inference, Message-Passing, Probabilistic Graphical Models
|
5
|
Ivan Titov
|
356
|
5
|
Universiteit van Amsterdam
|
|
5
|
Michael Bronstein
|
767
|
8
|
University of Lugano
|
Graph Convolutional Neural Networks, Geometric Deep Learning, Graph Neural Networks, Recommender Systems, Geometric Deep Learning
|
3.3 Scientific collaboration
Studying the co-authorship patterns shows that each paper has 5.73 authors on average. Figure 3 shows the distribution of the number of authors and the average number of citations per paper. Sügis, Dauvillier (88) authored the most collaborative paper with 18 authors. We can see that the number of citations increases from single-author papers to double-author papers, after which we observe a decrease in the average citations per paper except for the five-author papers. Collaborative papers (written by two or more authors) have been cited more than single-author papers on average (5.75 vs 3.82).
4.3 Top countries and institutions
A total of Fifty-one countries contributed to writing GNN documents. China (with 593 documents) published the greatest number of documents in this field, followed by the United States (with 377 documents), and Canada (with 82 documents). Figure 4 shows collaboration map of the most prolific countries. The size of the nodes in the graph shows the number of documents published by the respective country. The graph edges show co-authorship and the node’s colors indicate node clusters. Clustering has been done using VOS algorithm (89) based on collaborations. There are six clusters which can be explained partly by geographical distribution. Cluster one includes European countries, such as Netherlands, Belgium, and Spain. Cluster two has more diversity with three Asian countries plus the United States and Germany. Cluster three consists of East Asian, countries such as China, Japan, and South Korea. Cluster four consists of an East Asian country (Hong Kong), Australia and Italy which transfers knowledge between them and Europe. Also, it is clear that the United States acts as a bridge between China and European countries. Geographical patterns are not too clear in the Cluster five which consists of one Southeast Asian country (Singapore) one Middle East country (Israel) and one Northwestern Europe country (Switzerland). Cluster six consists of one Northwestern Europe country (Ireland) and Canada.
Table 3 shows the top ten prolific and impactful institutions in this field. Chinese institutions have published many documents, as expected because of China’s population and its science productivity. There is just one Italian institution among the top ten most prolific institutions. On the other hand, the top-cited list is dominated by European and American institutions. University of Amsterdam is the most cited institution. There are some highly influential researchers at this university, such as Thomas Kipf, Max Welling, and Ivan Titov, who have contributed to important models and published highly cited papers. It is an interesting point that the Facebook research has appeared among the most influential institutions.
Table 3
Most prolific
|
Most impactful
|
Institution
|
Country
|
Docs
|
Institution
|
Country
|
Citations
|
Chinese Academy of Sciences
|
China
|
87
|
University of Amsterdam
|
Netherlands
|
975
|
University of Chinese Academy of Sciences
|
China
|
67
|
University of Siena
|
Italy
|
944
|
Tsinghua University
|
China
|
43
|
Canadian Institute for Advanced Research
|
Canada
|
767
|
Peking University
|
China
|
41
|
University of Wollongong
|
Australia
|
714
|
Beijing University of Posts and Telecommunications
|
China
|
31
|
Hong Kong Baptist University
|
Hong Kong
|
651
|
Institute of Automation Chinese Academy of Sciences
|
China
|
29
|
New York University
|
United states
|
492
|
Beihang University
|
China
|
29
|
Universita della Svizzera Italiana
|
Italy
|
459
|
Tencent
|
China
|
28
|
Facebook research
|
United states
|
435
|
Shanghai Jiao Tong University
|
China
|
23
|
Swiss Federal Institute of Technology in Zurich
|
Switzerland
|
435
|
Università degli Studi di Siena
|
Italy
|
23
|
Université catholique de Louvain
|
Belgium
|
435
|
5.3 Top publication sources
Table 4 shows the top ten publication sources which have published the greatest number of documents in this field. To evaluate these sources, we also included SJR and impact factor of these sources in the table. Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics have published ~ 10.94% of the papers. Proceedings Of The IEEE International Conference On Computer Vision is the most impactful source in this table. Six of these sources are conferences, three are journals, and one is a book series. These 10 sources published around 27% of the documents in this field. In order to show the focus areas of these sources, we provided the most-used keywords in the fifth column. In the following, we provide a brief review of the most impactful papers of these journals based on the most frequent keywords.
Table 4
Sources
|
Docs
|
SJR 2019
|
IF 2019
|
Most used keywords
|
Lecture Notes In Computer Science Including Subseries Lecture Notes In Artificial Intelligence And Lecture Notes In Bioinformatics
|
140
|
0.42
|
|
Graph convolutional network, Representation learning, Knowledge graph
|
IEEE Access
|
43
|
0.77
|
3.74
|
Graph neural network, Deep learning, Link prediction
|
Proceedings Of The IEEE Computer Society Conference On Computer Vision And Pattern Recognition
|
32
|
13.39
|
10.25
|
Categorization, Deep learning, Knowledge graph
|
International Conference On Information And Knowledge Management Proceedings
|
25
|
0.51
|
|
Recommender system, Heterogeneous graph, Link prediction
|
Proceedings Of The ACM SIGKDD International Conference On Knowledge Discovery And Data Mining
|
25
|
1.004
|
|
LSTM, Graph embedding, Heterogeneous graph
|
Proceedings Of The IEEE International Conference On Computer Vision
|
22
|
13.63
|
|
|
Neurocomputing
|
19
|
1.17
|
4.43
|
Relational learning, Attention mechanism, LSTM
|
ACM International Conference Proceeding Series
|
14
|
0.2
|
|
Auto-encoder, Skeleton-based, Action recognition
|
Knowledge Based Systems
|
13
|
1.75
|
5.92
|
Aspect-level, Recommender system, Graph neural network
|
Communications In Computer And Information Science
|
12
|
0.188
|
|
Attention mechanism, Network embedding, Graph convolutional network
|
Graph convolutional network (GCN)(80), as a special kind of GNNs, bridges the gap between spatial and spectral methods. The GCN propagation rule is equivalent to aggregating node representation with the direct neighbor node representations (90). Clearly, node representations can be enriched by second-order neighbors’ representations through adding another GCN layer. Yang, Lu (52) proposed Graph R-CNN for Scene Graph Generation. They used attentional GCN to integrate contextual information from neighboring objects in the scene. Also GCN has been used to model the spatial and semantic connections between objects for image captioning. Yao, Pan (48) leveraged this idea and proposed GCN-LSTM which uses LSTM with attention mechanism for sentence generation. It is interesting that GCN has opened its way into bioscience, too. Gievska and Madjarov (91) leveraged a modified version of GCN for prediction of the protein’s functions based on their structure which is essentially modeled by a graph.
Representation learning, which mainly means graph representation learning in this context, aims to learn a low-dimensional vector representation of nodes (or edges) for graph data mining (32). The learned vectors can be used in different downstream tasks such as node classification [75], (92), link prediction (92), and community discovery (93). Deepwalk (94) and node2vec (95) are examples of node representation learning. Sun, Man (96) proposed a representation learning method which incorporates the entity descriptions embeddings (built by Doc2Vec (97)), with translation-based models for Medical Knowledge Graphs Representation Learning.
Another frequent keyword is knowledge graph. Formally, a knowledge graph is a collection of triplet (h, r, t) where h and t are head and tail entities, and r represents the relation from h to t (98). The most popular knowledge graphs are WordNet, NELL, DBpedia, Freebase Google’s Knowledge Graph and YAGO, which empowered different Natural Language Processing tasks including relation extraction, named entity recognition, and question answering (99). The knowledge graphs have been used as a dataset for testing the models (100). They have also been exploited to mine the relationship between classes for zero-shot recognition (101).
Graph Neural Network has been used in the context of Internet of Things (IoT), too. Zhang, Zhang (102) represented the IoT system as a complete graph and proposed GNN-based Modeling approach for IoT (GNNM-IoT) which modeled the relationships between sensors with GNN. Yin, Li (103) modeled interaction data in recommender systems by a bipartite user–item graph, and used some message passing layers to improve the latent factors of users and items.
Deep learning as a super field of GNN, has also been used by the authors as a keyword. Shi, Zhang (51) proposed a GCN-based model for skeleton-based action recognition which consists of two types of graphs. One represents the common pattern for all the data, and the other represents the unique pattern of each type of data. The structure of these graphs are trained with convolutional parameters. Chen, Wei (53) proposed Multi-Label image recognition with Graph Convolutional Networks (ML-GCN) for multi-label image recognition. The idea was to model the label dependencies based on the objects co-occurrences in the images and to use a GCN to map the label graph into inter-dependent object classifiers.
Link prediction is the task of inferring upcoming likely interactions between nodes, given the existing graph (92). GNNs has been popular tool for link prediction. Tan, Zhao (104) proposed Combination-based knowledge Embedding model (CombinE), a knowledge graph embedding method which jointly minimized the norm of difference between entities’ plus/minus combinations and the relation. Jing, Wang (32) introduced Variable Heat Kernel Representation )VHKRep( for graph representation learning which captures implicit global features by a heat diffusion kernel. They showed the effectiveness of their method on link prediction and node classification. Since the knowledge graphs are incomplete, a line of research focused on learning the knowledge graph representation (previously defined in this section).
LSTM (Long Short-Term Memory), which is a powerful sequence modeling method has been used as a rival for GNNs in flood prediction task (105). Lu, Lv (106) combined GCN in LSTM cell and called it Graph LSTM (GLSTM) to handle graph sequences rather than sequential vectors for road speed prediction task.
Graph embedding and Network embedding, which have been used interchangeably, are among the most frequent keywords, too. Graph embedding is an effective method to represent graph data in a low dimensional space for graph analytics (107). Hou, Chen (108) proposed a model named Property Graph Embedding (PGE) which incorporates nodes and edges properties into the procedure of graph embedding. Zhang, Song (109), proposed Heterogeneous Graph Neural Network (HetGNN) based on the idea of leveraging heterogeneous structural and heterogeneous content information simultaneously.
Relational learning, refers to the learning paradigm where there may be relationships between examples or the examples may have an internal structure (110). Interestingly, Trentin and Di Iorio (111) modeled the problem of graph classification in the form of Bayesian maximum-a-posteriori. Specifically, they calculated the class probability of the graph by multiplication of the class prior probability to the conditional probability of the graph relations.
Attention mechanism enables a model to focus on the most relevant parts of the input (112). In the graph context, attention is defined as a function which assigns a relevance score [0, 1] to each of the nodes’ neighbors. This score specifies the amount of attention the model gives to a particular neighbor (113). Xie, Chen (114) proposed Attention-based Graph Convolution Networks (AGCN) for point clouds learning. They modeled the learning as a message propagation algorithm among adjacent points. Essentially, the model had three parts: local structural feature learning, point attention layer, and the global point network. They used the attention mechanism to model the relationship among k adjacent points.
Graph auto-encoders map graphs to low-dimensional vectors (115). Liu and Sabbata (12) utilized Variational graph auto-encoders (116) to predict tweet geolocations. The model predicted the link between an unknown tweet and the existing tweet. Wang, Xu (115) proposed a training strategy to improve the training performance of graph auto-encoders. They injected noise to the adjacency matrix and used the noisy input to replace the input and the output.
Skeleton-based and action recognition are two keywords which co-occurred three times. In order to capture joint dependencies for action recognition, recent methods have constructed a skeleton graph whose vertices and edges are joints and bones, respectively. They applied GCN to extract correlated features (20). More recently, Ding, Yang (19) proposed Semantics-guided Graph Convolutional Network (Sem-GCN). In order to aggregate information of the L-hop joint neighbors, the architecture utilized three semantic graph modules including structural graph extraction, actional graph inference and attention graph iteration. Yang, Ding (18) presented an end-to-end generative GCN to learn the joints graph connection from data. The model used self-attention to construct the weighted spatial graph of skeleton frames.
Categorization or classification as a fundamental task in machine learning refers to the process of predicting the class of a given sample. Node classification, as one of the basic graph analysis tasks, is usually performed to test the GNNs (117). Li, Chen (20) proposed Actional-Structural Graph Convolution Network (AS-GCN) which stacked actional-structural and temporal graph convolution for action recognition. The structural links specified by the bones physical structure and the collaborative moving joints specified actional links. Kim, Kim (118) proposed Edge-Labeling Graph Neural Network (EGNN) which utilized a deep network for edge-labeling few-shot learning. For updating the nodes, the model aggregated features from inter/intra class neighbors of each node. After L updates, the edge label can be predicted based on the final edge feature.
A recommender system aims to provide personalized product or service recommendations for users in order to manage the growing information (119). In this context, GCN has been used for Click-Through Rate (CTR) prediction (120), session-based recommendation (121), and agent-initiated recommendation (120).
Heterogeneous graph is another frequent keyword which refers to a kind of graph with more than one type of nodes or edges (122). Li, Qin (123) constitute a heterogeneous graph composed of six kinds of nodes and eight kinds of edges for cross-domain aspect detection. Li, Qin (123) proposed GCN-based Anti-Spam (GAS) model composed of a heterogeneous graph to capture both the local and global contexts of a comment. Liu, Chen (124) introduced Graph Embeddings for Malicious accounts (GEM) i.e., for detecting malicious accounts which operated on account-device heterogeneous graph.
Aspect-level sentiment analysis, which is a subtask of sentiment analysis, aims to discover sentiments about entities, such as laptop and their aspects, such as battery life (36, 125). Zhou, Huang (9) utilized Syntax- and Knowledge-based Graph Convolutional Network (SK-GCN). In order to enhance the sentence representation with respect to the given aspect, they leveraged syntactic dependency tree and commonsense knowledge graph using two GCNs. Zhao, Hou (11) utilized bidirectional attention mechanism with position encoding to model aspect-specific representations between each aspect and context words then exploited GCN over these representations to capture the sentiment dependencies between aspects in one sentence.
6.3 Must-read papers
Citation count is considered as an effective measure of the impact of a research paper (126–129). In this section, we review the most-cited papers. We also present a list of available review papers and their suggested future directions and issues.
Table 5 shows ten papers with the greatest number of citations. It should be noted that the third paper in this table is a review paper which is also included in Table 6. Since our purpose in this section is to review the main ideas of these hot papers, we ignore this review paper. Interestingly, eight out of ten papers are conference papers, which indicates the relative importance of conferences compared to journals in this field.
The most impactful paper is an article published in IEEE Transactions on Neural Networks, which changed its title in 2011. The current retitled publication is IEEE Transactions on Neural Networks and Learning Systems, with an impact factor of 2.633, making it a prestigious journal in the field of deep learning. In this paper, Scarselli, Gori (79) proposed an architecture with forward and backward components. In the forward phase, the model computes states as a function of the target node’s features, target node neighbors’ features, the previous states, and the features of the edges which are connected to the target node. It stops when the difference between two states is less than or equal to a threshold. In the backward phase, the model computes gradient of a quadratic loss with respect to the model parameters. Indeed, they extended the framework of Gori, Monfardini (78) by conditioning the message-passing updates on initial edge features.
The second paper is the famous GCN paper of Kipf and Welling (80), a revolutionary paper in this field that combines the two different approaches. Basically, the model combines each node representation with its direct neighbors in each layer.
Monti, et al [26] proposed Mixture Model Networks (MoNet), a special model to extend CNNs to graphs and manifolds. The model associates each neighbor of the point (x) with a d-dimensional pseudo-coordinates vector u (x, y). It then, applies a set of Gaussian kernels with some learnable parameters to these coordinates, instead of using fixed kernels. Yan, Xiong (82) presented Spatial-Temporal Graph Convolutional Networks (ST-GCN) for action recognition which learns both the spatial and temporal patterns using GNN. The spatio-temporal graph is constructed by both intra-body edges of joints based on the natural connections and inter-frame edges which connect the same joints in the neighboring frames. Li, Tarlow (35) introduced Gated Graph Neural Networks (GG-NNs). This model, as a modification of graph neural network (first paper in Table 5 (79)) uses gated recurrent units (GRUs) to generate sequences. Gori, Monfardini (78) proposed a neural network model which directly acted on graphs. The system computes the state of the node n (xn) as a function of its features along with the features and states of its neighbors. Schlichtkrull, Kipf (83) proposed Relational Graph Convolutional Networks (R-GCNs) for multi-graphs and evaluated the model on link prediction and entity classification tasks. The architecture of the model is quite simple in each layer, the representation of each node is obtained by a combination of that node and its neighbors in different graphs. Ying, He (16) developed PinSage to generates the node embeddings for web-scale recommendation. The architecture computes the target node’s embedding based on its pervious representation and the representation of its neighbors which are computed based on their neighbors. In contrast to the mainstream of GCNs which are based on the powers of the graph Laplacian, PinSage performs by sampling the target node's neighborhood. Finally, Marcheggiani and Titov [42] utilized GCN over syntactic dependency trees as sentence encoder for semantic role labeling. Their experiments showed that stacking GCN and LSTM layers outperformed the state-of-the-art on CoNLL-2009.
Table 5
Title
|
Citation
|
Authors
|
Year
|
Document type
|
Source
|
The graph neural network model
|
578
|
Scarselli, F.; Gori, M.; Tsoi, A.C.; Hagenbuchner, M.; Monfardini, G.
|
2009
|
Article
|
IEEE Transactions on Neural Networks
|
Semi-supervised classification with graph convolutional networks
|
577
|
Kipf, T.N.; Welling, M.
|
2017
|
Conference Paper
|
5th International Conference on Learning Representations, ICLR 2017
|
Geometric Deep Learning: Going beyond Euclidean data
|
435
|
Bronstein, M.M.; Bruna, J.; Lecun, Y.; Szlam, A.; Vandergheynst, P.
|
2017
|
Review
|
IEEE Signal Processing Magazine
|
Geometric deep learning on graphs and manifolds using mixture model CNNs
|
214
|
Monti, F.; Boscaini, D.; Masci, J.; Rodolà , E.; Svoboda, J.; Bronstein, M.M.
|
2017
|
Conference Paper
|
30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017
|
Spatial temporal graph convolutional networks for skeleton-based action recognition
|
166
|
Yan, S.; Xiong, Y.; Lin, D.
|
2018
|
Conference Paper
|
32nd AAAI Conference on Artificial Intelligence, AAAI 2018
|
Gated graph sequence neural networks
|
161
|
Li, Y.; Zemel, R.; Brockschmidt, M.; Tarlow, D.
|
2016
|
Conference Paper
|
4th International Conference on Learning Representations, ICLR 2016
|
A new model for learning in graph domains
|
159
|
Gori, M.; Monfardini, G.; Scarselli, F.
|
2005
|
Conference Paper
|
International Joint Conference on Neural Networks, IJCNN 2005
|
Modeling Relational Data with Graph Convolutional Networks
|
147
|
Schlichtkrull, M.; Kipf, T.N.; Bloem, P.; van den Berg, R.; Titov, I.; Welling, M.
|
2018
|
Conference Paper
|
15th International Conference on Extended Semantic Web Conference, ESWC 2018
|
Graph convolutional neural networks for web-scale recommender systems
|
136
|
Ying, R.; He, R.; Chen, K.; Eksombatchai, P.; Hamilton, W.L.; Leskovec, J.
|
2018
|
Conference Paper
|
24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2018
|
Encoding sentences with graph convolutional networks for semantic role labeling
|
107
|
Marcheggiani, D.; Titov, I.
|
2017
|
Conference Paper
|
2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017
|
Review papers are good starting points for those who want to work on a field and identify research gaps. Table 6 shows the list of reviews conducted in the field of GNN. Bronstein, Bruna (81) is the most cited review paper. The other seven review papers are all published in 2019 or 2020.
Table 6
Title
|
Year
|
Citations
|
Suggested future directions and open problems
|
Geometric Deep Learning Going beyond Euclidean data 77
|
2017
|
435
|
1. Generalization across different domains.
|
|
|
2. Dealing with signals over dynamic structures.
|
|
|
3. Coping with directed graphs.
|
|
|
4. Learning generative models.
|
|
|
5. Developing efficient computational paradigms.
|
An Overview of Unsupervised Deep Feature Representation for Text Categorization 126
|
2019
|
5
|
1. Exploring more efficient unsupervised deep learning models.
|
A gentle introduction to deep learning for graphs 127
|
2020
|
3
|
1. Formalizing various adaptive graph processing techniques under a unified framework.
|
|
|
|
2. Defining a set of benchmarks in order to assess proposed models.
|
|
|
|
3. Transferring research knowledge to other application fields.
|
Graph convolutional networks for computational drug development and discovery 128
|
2020
|
2
|
1. Extending gcns to 3D structures (such as Molecular compounds).
|
|
|
|
2. Exploring motif-based GCN and its application on drug discovery.
|
|
|
|
3. Defining convolution on hyper graphs (for example drugs with the same adrs, targets or indications).
|
Introduction to Graph Neural Networks 129
|
2020
|
0
|
1. Deepening GNNs regarding over smoothing problem.
|
|
|
|
2. Dealing with dynamic networks.
|
|
|
|
3. Generating optimal graphs for non-structured data.
|
|
|
|
4. Applying embedding models in the web-scale.
|
Learning Combinatorial Optimization on Graphs: A Survey with Applications to Networking 130
|
2020
|
0
|
1. Improving scalability, adaptability, generalization, and run time of gnns.
|
|
|
|
2. Automating above improvements without re-training.
|
|
|
|
3. Using distributed machine learning.
|
Application of deep learning in ecological resource research: Theories, methods, and challenges 131
|
2020
|
0
|
1. Standardizing and sharing of data for ecological resource research.
|
|
|
|
2. Increasing ability to explain hidden layers
|
|
|
|
3. Appling more advanced deep learning methods on ecological resource research.
|
7.3 Keyword analysis
7.3.1 The Most used keywords
We analyze the most frequently used keywords in two time spans, 2004–2017 and 2017–2020. The word clouds of these periods are illustrated in Fig. 5. Word clouds are used to visually summarize texts (130). The size of keywords in the word clouds indicates their frequency in the respective time period. Graph neural network, relational learning, structured pattern recognition, recurrent network, feedforward network, neuroscience, random process, recursive neural network, graph structured data, semigraph, wavelet transform, and graphical domain are the most used keywords in the first period. In the second period graph convolutional network, graph neural network, deep learning, geometric deep learning, representation learning, machine learning, graph convolution, network embedding, convolutional neural network, knowledge graph, neural network, and action recognition are used most frequently by authors. As it is obvious from this figure, graph neural network and graph convolutional network are the most frequent keywords in the first and second periods, respectively. This is because, as mentioned previously, early GNNs were more similar to RNNs with different states in different steps, while the recent models are more convolutional-based. Also, some topics, such as recursive neural network, have lost their positions over time, which demonstrates the change of approach to deep learning on graphs. Some technical topics have emerged or grown in the recent years, including representation learning, graph attention, graph autoencoder, variational autoencoder, spectral graph theory, message-passing, graph isomorphism test, label propagation, and balance theory. While early GNNs where based on message-passing too, the message-passing used in more recent methods refers to recent advancements such as (80, 131). GNNs have been applied for different applications including action recognition, semantic segmentation, anomaly detection, drug discovery, sentiment analysis, session-based recommendation, video analytics, scene graph generation, social recommendation, image captioning, human pose estimation, traffic forecasting, visual question answering, traffic speed prediction, name disambiguation, hyperspectral image classification, and knowledge graph completion.
7.3.2 Hot topics
Table 7 shows ten keywords with the highest average publication year to reveal topics that have received the most attention recently. In order to remove the possible noise, we include the keywords with more than two papers.
Table 7
Ten keywords with the highest average publication year (frequency > 2)
keyword
|
Number of papers
|
Average publication year
|
Type
|
BERT
|
3
|
2020
|
Model
|
Dynamic network
|
3
|
2020
|
Network
|
Graph attention network
|
6
|
2020
|
Model
|
Relation extraction
|
5
|
2019.8
|
Task
|
Attention mechanism
|
21
|
2019.8
|
Model
|
Human pose estimation
|
4
|
2019.8
|
Task
|
Self-supervised learning
|
4
|
2019.8
|
Learning approach
|
Semisupervised learning
|
4
|
2019.8
|
Learning approach
|
Traffic prediction
|
4
|
2019.8
|
Task
|
Adversarial learning
|
3
|
2019.6
|
Learning approach
|
The first keyword is BERT, a successful language-understanding model based on transformer (132) which has been used recently in this field for token representation. Jeong, Jang (43), used BERT as the encoder of context sentences and GCN as citation context encoder in the task of context-aware paper recommendation.
Dynamic network refers to a sequence of graph snapshots over time. Mahdavi, Khoshraftar (133) proposed Dynamic joint Variational Graph Auto-Encoders (Dyn-VGAE) which consists of auto-encoders that embed graph snapshots based on their local structures and interact with each other to learn temporal dependencies of graphs.
Graph attention network specifies different weights for aggregating different neighbors (112) to obtain a weighted average of neighbors’ features. In this way, the model can overcome the cross-class links. Zhao, Jia (98) used different aggregation functions for the task of Out Of Knowledge Graph (OOKG) entity and relation. They leveraged average pooling, max pooling, and attention as the aggregation functions.
Relation extraction is an NLP task which aims to distinguish relational facts from a piece of text. Xie, Xu (134) proposed a GNN with a propagation rule similar to GCN (80) on the heterogeneous graph composed of sentence and entity nodes for few-shot relation classification.
Attention mechanism, introduced previously, is also among new topics. You, Tian (135) proposed Sliced recurrent neural network and Attention treated GCN-based Parallel (SAGP) model for remote sensing image recognition which is composed of two sub-modules: the improved Sliced Recurrent Neural Network (SRNN) retains the semantic information of the context and the original image features and a GCN which mines high-weight features (obtained by attention mechanism) and reserves the relationship between their features.
Human pose estimation’s goal is to identify the human body parts poses in images or videos (136). Wang, Huang (137) proposed to utilize Global Relation Reasoning Graph Convolutional Networks (GRR-GCN) to model the global dependencies of body joints. The model projects the coordinate space features to a fully-connected graph, in which global relation reasoning is done by GCN.
Bin, Chen (138) proposed a model which first feeds images to CNNs in order to obtain the key points representations. The model has two parallel multi-layer Pose Graph Convolutional Network (PGCN) modules which then capture the feature correlation between key points locally and non-locally based on a directed graph over the obtained key representations.
Self-supervised learning is an emerging effective learning strategy which creates a supervised task from unlabeled data. For instance, the model can learn to predict half of an image given the other half (139, 140). Shen, Shen (141) used self-supervised learning for taxonomy expansion data generation. They suggested TaxoExpan a GCN-based neural network to learn to predict if a query concept is the hyponym of an anchor concept. Bo, Wang (142) proposed Structural Deep Clustering Network (SDCN) which uses a delivery operator in order to combine representations of auto-encoders and GCN layers. In effect, it leverages a dual self-supervised strategy to unify these deep learning models. Semisupervised learning is an approach to machine learning in which only a small subset of training samples are labeled (80). The goal is to infer the labels of unlabeled samples from the information contained in feature vectors and labeled samples (143). Qin, Shang (50) proposed Spectral–Spatial Graph Convolutional Networks (S2GCNs) for Hyperspectral Image Classification (HIC) which is a semisupervised GCN based model that utilizes spatial (pixel adjacency) and spectral information.
Traffic prediction is the task of forecasting real-time traffic based on floating car and historical data including flow, average speed, and incidents (144). Zhao, Gao (145) proposed SpatioTemporal Data Fusion (STDF) for traffic prediction which separates data into traffic directly/indirectly-related data. The model leverages GCN for processing directly-related data.
Adversarial learning, the final hot topic, is a learning technique that tries to fool algorithms by presenting deceptive inputs to them. Hong, Kim (146) proposed an architecture based on Generative Adversarial Network (GAN) to predict missing longitudinal diffusion MRI data. They leveraged graph convolution in both generator and discriminator of the network.
[4] List of reviews are provided in Table 6.
[5] Titles of the books and book chapters are provided in APPENDIX 1.
[6] Note that the data gathering was done about the half of 2020.
[7] Note that a journal can be assigned to more than one category in Scopus.
[8] Hirsch (2005) proposed the h-index. The H-index of a researcher is h if h of his/her papers have at least h citations each, and the other papers have at most h citations each.
[9] Based on his Google scholar profile.
[10] Twenty-four countries with more than five documents have been included in the map.
[11] The list of next fifty cited papers is provided in APPENDIX 4.
[12] http://rank.sid.ir/cloud
[13] List of keywords is provided in APPENDIX 2.
[14] For the continue of keywords with the highest average publication year refer to APPENDIX 3.