End-to-end Relation-Enhanced Learnable Graph Self-attention Network for Knowledge Graphs Embedding

: The existing methods ignore the adverse e ﬀ ect of knowledge graph incompleteness on knowledge graph embedding. In addition, the complexity and large-scale of knowledge information hinder knowledge graph embedding performance of the classic graph convolutional network. In this paper, we analyzed the structural characteristics of knowledge graph and the imbalance of knowledge information. Complex knowledge information requires that the model should have better learnability, rather than linearly weighted qualitative constraints, so the method of end-to-end relation-enhanced learnable graph self-attention network for knowledge graphs embedding is proposed. Firstly, we construct the relation-enhanced adjacency matrix to consider the incompleteness of the knowledge graph. Secondly, the graph self-attention network is employed to obtain the global encoding and relevance ranking of entity node information. Thirdly, we pro-pose the concept of convolutional knowledge subgraph, it is constructed according to the entity relevance ranking. Finally, we improve the training e ﬀ ect of the convKB model by changing the construction of negative samples to obtain a better reliability score in the decoder. The experimental results based on the data sets FB15k-237 and WN18RR show that the proposed method facilitates more comprehensive representation of knowledge information than the existing methods, in terms of Hits@10 and MRR.


Introduction
The knowledge graph plays an indispensable role in the semantic network, and can benefit many downstream applications, such as personalized recommendation [1] and question answering [2]. knowledge information is represented by a network structure consisting of triples, in which nodes refers to entity information and connection edges represent multi-relation information among entities [3,4], as shown in Fig.1. However, the symbolic of graph struction cannot be processed by machine learning, which leads to problem of insufficient utilization and analysis of knowledge base information. Therefore, the knowledge graph embedding plays an important role in the analysis of knowledge information, it aims to convert the knowledge information of the graph structure into a low-dimensional dense vector representation.
The knowledge graph is the semantic network graph describing various entities (or concepts) and relations in the real world. Each node in the graph represents an entity that is connected by various relation information to form different knowledge information. We analyzed knowledge information and general graph structure data, and there are obvious differences between knowledge graph and general graph structure data. The main points are as follows: • Different knowledge information assigned by triples has different effects on entities. For example, the entity "Yao Ming" has different triples:(Yao Ming, Profession, basketball player) and (Yao Ming, wife, Ye Li). The knowledge of "Yao Ming" represents the global information related to it, but each entity will have a different impact on "Yao Ming", of which "basketball players" have a greater impact.
• Knowledge graph connects diverse knowledge information through the complex relation network. Compared with the simplified connection edges of general graph structure data, the degree of correlation between entities is much related to the relation category. However, even the same relation category has different effects on different entity information. For example, the "father-son" relation exerts different influence on entity "Qianlong" and "Yao Ming".
• The way of construction of knowledge graph determines the incompleteness of knowledge information, and the incompleteness of the knowledge graph will have a very terrible influence on the knowledge graph embedding.
To sum up, the complexity of such graphs demonstrates that the classical graph convolution network cannot handle knowledge information well.
In view of the above analysis, this paper proposes the end-to-end relation-enhanced learnable graph self-attention network for knowledge graphs embedding. First, we construct the relationenhanced adjacency matrix. Secondly, the correlation between entity nodes is calculated by graph self-attention network, and the global encoding of the central entity node is obtained by weighted summation. And then, K neighboring nodes with the greatest correlation are selected as the knowledge subgraph, and the multi-layer convolution operation is embedded in the graph convolution process to obtain the entity encoding representation. Furthermore, the relational encoding representation is obtained by modeling the relational information, and the entity encoding and the relational encoding are combined to form the triple knowledge encoding. Finally, we improved the training effect of the convKB model by changing the construction of negative samples to obtain a better credibility score in the decoder.
The main contributions of the paper are presented in the following aspects: 1. We constructed a relation-enhanced learnable graph self-attention network, which is more in line with the complexity and diversity of knowledge information.
2. Based on the relevant ranking of entities, we put forward the concept of convolutional knowledge subgraphs, adding multi-layer convolutional networks into the aggregation process of graph convolutional networks, so that increase the learnability of the model. 3. We improved the training effect of the ConvKB model by changing the construction of negative samples to obtain a better credibility score in the decoder.

Related Work
The existing knowledge graphs embedding method is mainly based on the idea that the relationship is a transformation between entities. Bordes et al [5] proposed the TransE model based on the idea that the head entity could get the tail entity through the translation of the relationship information, which learned the embedding matrix of the entity and the relationship respectively. In subsequent studies, researchers proposed TransH [6], TransR [7], TransD [8], KG2E [9], TranSparse [10] by extending the embedding space of entities and relationships. Ebisu et al. changed the embedded space to sphere to obtain more comprehensive entity and relation embedding through spherical space [11]. Sun et al. defined the relationship as the rotation from head entity to tail entity, which increased the complexity of feature space in relation translation and obtained the knowledge graphs embedding in complex space [12]. However, these methods only model the characteristics of entities and relationships by constructing complex feature spaces, without considering the global structure information and incompleteness of the knowledge graph.
We focus more on recent research results on knowledge graphs embedding. The researchers used neural networks to capture more feature interactions between embeddings,respectively proposed the ConvE model [13], ConvKB model [14] and the InteractE model [15], which trained the knowledge embedding matrix through the credibility score function, and thus improves the expressiveness. Trouillon et al. deal with various binary relations through the combination of complex embedding [16]. Cai et al. used the generative adversarial network to improve the quality of negative samples and improve the learning effect of the model [17]. Recently, the graph convolutional networks are widely used in knowledge graph embedding, It has very well advantages for graph structure data [18,19,20,21]. Shang et al. proposed to extend the ConvE model by using graph convolution network, maintaining the transformation characteristics between entities and relationships [22]. Many recent methods devote to preserving the symmetry and antisymmetry properties of relations to improve the expressiveness of embeddings [23,24,25,26]. Chen et al. and Zhang et al. combined the bidirectional influence between entities and relationships by considering relationship edge information [27,28]. Hamilton et al. embedded the logical query information of knowledge information into the knowledge graphs embedding by performing the logical operation in the low-dimensional embedded space [29]. However, the above methods do not take into account that compared with the general graph structure information, the knowledge information is imbalanced, so that the classic graph convolutional network cannot process the knowledge information well. In addition, the complexity of knowledge determines that models need better learnability, rather than linearly weighted qualitative constraints.

Our Approach
In this paper, we proposed the method of end-to-end relation-enhanced learnable graph selfattention network for knowledge graphs embedding. The method consists of two modules, which are knowledge encoder of relation-enhanced learnable graph self-attention network and the knowledge decoder of the reconstructing the negative sample.

Knowledge Encoder of the Relation-Enhanced Learnable Graph Selfattention Network
The knowledge graph is the semantic network graph describing various entities (or concepts) and relations in the real world. Each node in the graph represents an entity that is connected by various relation information to form different knowledge information. We analyzed the difference between knowledge information and general graph structure data. The interconnection and interaction of knowledge information caused each triple to affect the central entity differently. The complexity of knowledge graphs indicates that classical graph convolutional networks cannot handle knowledge information well. To this end, this paper proposes a knowledge information encoder that relation-enhanced learnable graph self-attention network, as shown in Fig.2.
The relation-enhanced learnable graph self-attention network is the extension of the classical GCN model. As Fig.2 shows, this model consists of multi-head self-attention layer and knowledge convolutional sublayer. It can assigns weights to each entity node according the influence degree of knowledge information, which is line with the characteristics of unequal knowledge in-formation. The construction of convolutional knowledge subgraphs can increase the multi-level convolution operation in the process of graph aggregation. Compared with the classical graph convolution model, the feature extraction ability and learn ability of the model are improved. As well as, we construct the relation-enhanced adjacency matrix for the incompleteness of the knowledge graph. For each entity node in the graph, the entity node representation from the previous layer is used as input in the current layer network. The entity representation matrix of output is obtained through the relation-enhanced self-attention network of learnable graphs: Where A is relation-enhanced adjacency matrix of knowledge graph; H L ∈ R n×F L is the entity representation matrix of the L layer learnable graph self-attention network, where F L is the entity representation dimension; W conv is the trainable parameter matrix In order to avoid the adverse effect of incompleteness of knowledge graph on knowledge representation, we added indirect relationship attributes between entities in the adjacency matrix.
We (Kobe, teammates, Gasol), then there is an indirect relationship between "Gasol" and "Lakers", we call this it is the horizontal indirect relationship. The formula for calculating the indirect relationship index is: Where k is the relationship path length; p is the horizontal indirect number. When there are multiple paths between two entities, take the shortest path. In this paper, we stipulate that there can only be one relationship between entity pairs. We do not calculate the indirect relationship attributes between directly adjacent entities. In this way, we construct the relation-enhanced adjacency matrix A: For the relation-enhanced learnable graph self-attention network, as shown in Fig.2, the highlevel feature representation of entities in the knowledge graph is extracted through the convolutional neural network: Where conv is the convolutional operation; e i is the input characteristic of the i − th entity; n is the entity number in the knowledge graph. Through the CNN model to obtain the input matrix of the entity representation H 1 . In this paper, the feature representation of an entity consists of relationships and entities that are directly connected to the entity. It can be expressed as: Where w i j refers to the entity or relation directly adjacent to entity e i . In this paper, m=50, (EMP is used to make up the deficiency). The classical graph convolutional neural network (GCN) assigns the same weight to the adjacent nodes, and obtains the representation of the central node by aggregating the feature representations of adjacent nodes. For the knowledge graph, the same relation category exerts different influences on different entity information. Through graph self-attention mechanism to obtain the attention weights between different entities: Where h L i ∈ R ( F L ) and h L j ∈ R ( F L ) are the feature representation of entities e i and e j at the layer L of the relation-enhanced learnable graphs self-attention network; σ function is the feedforward neural network of single layers with LeakyReLU as the activation function (negative input slope α = 0.2). Attention weights between entity nodes are obtained through the normalization of scores calculated by single-layer feedforward neural network: Where W T a ∈ R 2F L is the weight matrix of feedforward neural network; || is the connection operation of vectors. At here form a relevance matrix P ∈ R n×n , P i j = a i j . Through aggregation of the neighboring entities to obtain the entity node representation of the fusion graph structure information: Where N i is the adjacency nodes set of entity e i . In this paper, we constructed the relationenhanced adjacency matrix A of knowledge graph, which reduces the negative effects of incomplete knowledge graph on knowledge representation. In the meanwhile, in view of the large amount of entity nodes in the graph, this paper makes references to Literature [30] with the multihead self-attention mechanism, in order to acquire a better and more stable representation of entity. The final encoding representation of entity nodes is obtained by connection the feature vectors of each attention: Where concate is the connection operation of vectors; S is the number of self-attention mechanism.
In the knowledge graph, the connection of each entity can reflect the semantic information of the entity. We hope to use convolutional neural networks to extract the connection information of each entity, but due to the different connection conditions of each entity, it is impossible to directly convolve the entity information in the knowledge graph. So we construct a convolutional knowledge subgraph, so that the convolution operation can be performed on the aggregated entity representation. As shown in Figure 4, In order of relevance, the convolutional knowledge subgraph of the entity "Kobe" is ["Laker", "Gasol", "NBA"]. By performing the convolution operation on the convolutional knowledge subgraph, its characteristic representation is obtained.
we first extract the T entities with the most relevance for each entity through the correlation matrix P obtained, and construct a convolutional knowledge subgraph: Where g refers to the construction function of convolutional knowledge subgraph; P refers to the correlation obtained by the multi-head self-attention mechanism. In this paper, P is the average value of each self-attention mechanism; S is the number of self-attention mechanisms; T refers to the number of neighboring nodes with the greatest correlation, and if the neighboring node is less than T , then empty entity < EMP > is adopted for supplement; A refers to the relation-enhanced adjacent matrix of the knowledge graph; Through matrix point multiplication to obtain the final relevance matrix W of adjacent nodes. And then obtain the entity representation through the convolutional neural network: Where conv is the convolutional operation; W 1 and b 1 are convolution layer parameters; H c ∈ R n×T ×m is the knowledge subgraph of entity. The convolution operation is performed on the convolutional knowledge subgraph to obtain an entity representation matrix H L+1 ∈ R n×F L+1 as the output. In this process, we construct a convolutional knowledge subgraph, realizing the convolution operation in the graph convolution network, and reduce the model parameters. At the same time, the multi-layer convolution network can be embedded in the process of entity node information aggregation according to the graph structure, which improves the representation ability of the classic graph convolution network.
The knowledge graph embedding refers to the representation of entities and relationships.
This paper holds the view that relations are manifested in the entity information. For example, with the father-son relation, people always think of Kangxi, Yongzheng and Yongzheng, Qianlong. Therefore, the relational embedding matrix is constructed with the idea that entity information can explain and reflect the relation information. First of all, the embedding matrix H of entities is employed to search the entity pairs corresponding to each relation category respectively. In this paper, 20 entities are selected as the information representation text according to each relationship category (EMP is used to make up the deficiency). The representation of each category is obtained by convolutional neural network: Where conv is the convolution operation; X i is the information feature representation of the To sum up, the paper proposes a relation-enhanced learnable graph self-attention network model. The description of algorithm is expressed as follows Table 1.

Knowledge Decoder of the Improving Construction Negative Samples
Knowledge graph is a network structure composed of triples, and the decoder aims to define a reliability score function f, so that make the score of positive triples higher than the score of negative triples.

The embedding representation
Where U i,: ∈ R 3×1 is the i − th dimensional representation of triple; convolutional kernel is W ∈ R ( 3×1); conv function refers to the convolution operation, which is adopted to obtain the i−th  Obtained entity representation matrix: For L in range (N): For k in range(S ): Calculate correlation weights between entities: Entity nodes are aggregated : In order to maintain the conversion properties of the triplet, for the knowledge representation after the convolution, it is to choose the connection operation instead of reshaping: Where v ∈ R 1×d is the feature representation of triples after convolution. Such connection operation not only extracts the global feature information of triples, but also maintains the transformation characteristics of triples. In order to acquire more abundant feature information, different convolution kernels are set to achieve the multi-channel convolution operation: Where W τ and b τ is the convolution layer parameters of the τ − th convolutional channel.
The triple feature representation of each channel is connection to obtain the triple representation where t is the number of convolution channels. The reliability score of triple is calculated: Where W z and b z is the full connection layer parameters. In this paper, we improved the training effect of negative samples on the convKB model by changing the construction of negative samples to obtain a better reliability score in the decoder. The negative samples are updated according to the reliability score after each iteration: there refers to the entity set. With construction of negative samples, the random replacement method for head entity or tail entity is adopted in order to prevent the occurrence of simultaneous replacement of head entities and tail entities that are still positive samples. For the calculation of the negative sample score, the same weight matrix W z and b z is employed with no consideration of model loss and upgradation of model parameters. The triple with the highest reliability score is selected as the negative samples: Where q is the number of constructed negative samples. The triple with the highest reliability score is chosen as the negative sample to participate in the next iteration training of the model.
After each iteration, the negative samples are updated according to reliability score. With an aim to improve the calculation efficiency, randomly select 100 negative samples for each positive sample, and then the triple with the highest reliability score is used as the final negative sample.
The model parameters are trained with the loss function: Where ξ, ξ ′ refer to the set of positive and negative samples respectively. The parametric matrix W of the model is regularized by L2. Based on the optimization of loss function, the reliability score of positive triples are higher than that of negative triples. By optimizing the embedding matrix of knowledge representation for the relation-enhanced learnable graphs selfattention network, a higher quality knowledge embedding can be obtained.

Experiment Data
This paper verification the effectiveness of the present approach based on two benchmark data sets: WN18RR [13] and FB15k-237 [15], which are the subsets of WN18 [5] and FB15k [5] respectively. The data set WN18RR consists of 40943 entities and 11 kinds of relation categories, FB15k include many reversible relationships, which enable to predict triples easily. In the data sets of WN18RR and FB15k-237, the influence of reversible relationships be removed for a more authentic knowledge representation and the knowledge base completion.

Experimental Evaluation Criteria
In this paper, the experimental results of entity linking prediction are used to verify the effectiveness of the proposed method, and the prediction results are obtain according the reliability score function f on the triples of test set. During the testing process, this paper refers to the filtering protocol in Literature [5]. For each test triple, a group of negative triples is constructed through the random replacement of head entity and tail entity.
The present paper employs three benchmark evaluation indicators MR, MRR and Hits@10 to evaluate the effectiveness of the proposed method. The MR represents the average value of the correct label ranking in the probability distribution vector, and the smaller the value, the better. The MRR represents the average value of the reciprocal of the correct label ranking in the probability distribution vector, and the larger the value, the better. The Hits@10 represents the probability that the correct label rank in the top ten, and the larger the value, the better.

Setting of Experimental Parameters
In this paper, for experimental parameter settings, the output vector of convolution layer is set to be 64 dimensions; The number of convolution kernels is set to be 500 with the dropout of 0.6; The number of negative samples constructed by each positive sample in this paper is 100 and Select the one with the maximum reliability score as the negative sample; In the training process for end-to-end model, Adam optimizer is used to set the learning rate of 0.0001 and the parametric   Table 2 shows the effect of different sizes of knowledge subgraphs on the experimental results to verify the optimal size T of knowledge subgraphs:

Experimental Results and Analysis
As the experimental results in Table 2, the size of the knowledge subgraph increases, each evaluation indicators of the experimental results has been improved accordingly. According to the results, when T=40, achieve the best experimental results. One possible reason is that the size of the knowledge subgraph increases, more adjacent entity node information can be utilized, and convolution can obtain the richer feature representation of knowledge subgraph. But since our knowledge subgraphs are structured by relevance sort, when the knowledge subgraph reaches a certain level, the less relevant entity node information is likely to generate redundant information to the central entity, which will have a bad effect.  Table 3 below presents the effect of the number of different self-attention mechanism on the experimental results to verify the optimal number of self-attention mechanisms: As the experimental results in table 3, the number (h) of attention mechanisms increases, each evaluation indicators of the experimental results has been improved accordingly. At the same time, according to the experimental results, whenh=10, achieve the best experimental results.
One possible reason is that the number (h) of attention mechanisms increases, a better and more   ison. The experimental results are presented in Table 5: All the baseline results in Table 5 are copied from the original paper, and the missing data indicate that there is no corresponding report score in the original paper. Here, R-GAT is the exper- show that drawing on the relation-enhanced learnable graphs self-attention network, takes into full consideration the more comprehensive information of knowledge graphs and improves the learn ability of the model for a better effect of the model. By comparison with the SACN [22], Obvious increases are shown in Hits@10 and MRR, which shows that the based on the graph structure and relation-enhanced adjacency matrix, considering the imbalance of knowledge information and the construction of convolutional knowledge subgraphs can increase the learn ability of model, so that obtain a better knowledge representation.
Experiment 5: In this experiment, In order to test the applicability on different data sets, this method is implemented on the data set WN18RR, and the applicability is verified by the effect therefrom. The experimental results are shown in Table 6.
The experimental results in Table 6 indicate the method proposed in this paper improves the evaluation indexes Hits@10 and MRR in the data set WN18RR compared with the latest research results of SACN [22]. Furthermore, the applicability of the proposed method can be verified

Conclusions
Aiming at the complexity and large-scale of knowledge information determine that the classic graph convolutional network cannot achieve better knowledge graph embedding. The paper puts forward the method of end-to-end relation-enhanced learnable graphs self -attention network for knowledge graphs embedding. The proposed method promotes the flexibility and learn ability of network model. The entity linking prediction experiments on public data sets have achieved good results. In the near future, great emphasis should be put on the construction of knowledgedriven neural network model, which transforms the feature learning of model into the knowledge learning with a higher level.

Declarations
Conflict of interest The authors declare that they have no conflict of interest.
Human and animal rights This article does not contain any studies with human or animal subjects performed by any of the authors. Informed consent Informed consent was not required as no human or animals were involved.
Authorship contributions Hongbin Wang puts forward the idea of the thesis and guides the students to realize the code. Shengchen Jiang participated in the programming and writing of the thesis. Xiang Hou participated in the revision and improvement of the thesis.