Relationship Classi�cation based on Dependency Parsing and Pre-training Model

As an important part of information extraction, relationship extraction aims to extract the relationships between given entities from natural language text. On the basis of the pre-training model R-BERT, this paper proposes an entity relationship extraction method that integrates entity dependency path and pre-training model, which generates a dependency parse tree by dependency parsing, obtains the dependency path of entity pair via a given entity, and uses entity dependency path to exclude such information as modi�er chunks and useless entities in sentences. This model has achieved good F1 value performance on the SemEval2010 Task 8 dataset. Experiments on dataset show that dependency parsing can provide context information for models and improve performance.


Introduction
Information extraction (IE) aims to extract structured information from large-scale semi-structured or unstructured natural language text [1] . Information extraction tasks are applied, for example, to knowledge graph construction [2] , information retrieval [3] , question-answering systems and text summarization. Entity relationship extraction is an important part of information extraction tasks, and its results will affect the performance of follow-up tasks.
Entity relationship extraction based on deep learning falls into one of the following two major categories: supervised entity relationship extraction and distantly supervised entity relationship extraction. In supervised entity relationship extraction, the entity relationship extraction can be achieved by either pipeline learning or joint learning [4] . The pipeline learning method is to extract the relationships between entities directly on the basis of entity identi cation, and the joint learning method is to identify entities while extracting the relationships between entities mainly based on an end-to-end model of neural network. Compared with supervised entity relationship extraction, the distantly supervised entity relationship extraction, due to lack of human-annotated dataset, takes one more step to distantly align knowledge base to label unlabeled data. As for the construction of relationship extraction model, there is little difference between distantly supervised entity relationship extraction and the pipeline learning method of the supervised entity relationship extraction. The main difference between the supervised entity relationship extraction and the distantly supervised entity relationship extraction is the difference in the annotation level of dataset. For the supervised method, entity and relationship type have been given in the dataset. At this time, the relationship extraction task can be done by the method of classi cation task.
Due to the rise of the pre-training models, the tasks of entity relationship extraction have gradually moved closer to the direction of pre-training models. Researchers have achieved very good results only by netuning the pre-training model BERT and then performing entity relationship extraction experiments. In 2019, the Ali team [5] took the lead in applying BERT to the relationship extraction tasks and achieved the best results at that time. The result made more researchers focus on pre-training model. After these experiments, most entity relationship extraction models are based on pre-training models, usually by training their own pre-training models after changing the initialization parameters of BERT structure, or integrating external knowledge.
The existing models pay too much attention to the impact of the whole sentence on the relationship classi cation, without considering the noise caused by such content as modi er chunks in sentences. Moreover, external knowledge is cited to assist the model in sentence classi cation, but the syntactic knowledge of the sentence itself is ignored. This paper proposes a method of using dependency parsing, which establishes a dependency tree for each data and obtains the shortest dependency path between entities via the dependency trees. This paper mainly focuses on the word information in the dependency path between entities, rather than using the type of dependency relationship between words in the past. Parsing is used to enhance the context information of model learning, so as to avoid noise caused by information such as modi er chunks and unannotated entities in sentences.

Related Work
Parsing [6] is one of the key technologies in natural language processing, and its basic task is to determine the syntactic structure of a sentence or to clarify the dependency relationships between words in a sentence. Dependency parsing analyzes a sentence into a dependency syntax tree to describe the dependency relationships between words. The dependency relationship is represented by a directed arc, which is called the dependency arc. The shortest dependency path refers to the shortest path of two words in dependency syntactic structure. Entity dependency path is the shortest path between two entity nodes in the dependency syntactic structure. The shortest dependency path can express the syntactic relationships between two nodes. According to the characteristics of the shortest dependency path, the entity dependency path can concisely express the syntactic relationships between entities, remove modi er chunks, and retain the backbone mode that can clearly express the entity relationships. Therefore, dependency parsing is widely used in relationship extraction.
Entity relationship extraction, as one of the most critical tasks in natural language processing, is widely used in such elds as information extraction, natural language understanding, and information retrieval.
Early relationship extraction methods include feature-based methods and kernel-based methods. As early as in the feature-based methods, syntactic knowledge has been used for relationship extraction. Today's relationship extraction methods can be divided into two categories: statistical relationship extraction and neural relationship extraction [7] . The statistical relationship extraction is to annotate the relationships of the target entity pair in the text based on traditional machine learning methods. Among them, classical entity relationship extraction methods can be divided into four categories: supervised, semi-supervised, weakly-supervised and unsupervised methods, which are distinguished by the degree of annotation of dataset. Neural relationship extraction applies deep learning to relationship extraction tasks, and the entity relationship extraction tasks of deep learning can be divided into supervised tasks and distantly supervised tasks.
Among the classical statistical relationship extraction methods, Zhou [8] and Guo Xiyue et al. [9] used SVM as a classi er to study the effects of lexical, syntactic and semantic features on entity relationship extraction; Craven et al. [10] rst proposed the idea of weakly-supervised machine learning in the process of extracting structured data from text to establish a biological knowledge base; Brin [11] used the Bootstrapping method to extract the relationships between named entities; and Hasegawa et al. [12] rst proposed an unsupervised method of extracting relationships between named entities at the ACL meeting.
Traditional methods have the problem of error propagation of feature extraction, so the entity relationship extraction method based on deep learning, which can effectively solve this problem, has been paid attention to and achieved good results. Zeng et al. [13] rst proposed using CNN to extract the meaning of a word and applying softmax for classi cation in 2014. Zhang et al. [14] proposed using Bi-LSTM for relationship classi cation in 2015. Xu et al. [15] reintroduced the traditional method and proposed CNN that is based on the shortest path. In addition, in recent years, many attention-based models have been applied to relationship extraction tasks. Katiyar et al. [16] rst used Attention, an attention mechanism, together with Bi-LSTM to jointly extract entity and classi cation relationships in 2017.
Scholars have also proposed a variety of improvements based on the basic methods, such as the fusion method of PCNN and multi-instance learning [17] , and the fusion method of PCNN and attention mechanism [18] . Ji et al. [19] proposed adding entity description information on the basis of PCNN and Attention to assist in learning entity representation. The COTYPE model proposed by Ren et al. [20] and the residual network proposed by Huang [21] both enhanced the effect of relationship extraction.
After the pre-training model was proposed, Wu et al. rst applied the pre-training model to the relationship extraction tasks in 2019, and explored the mode of combination of entities and entity locations in the pretraining model by adding identi ers before and after the entities to indicate the entity locations, rather than using the traditional location vector. The best results were achieved at that time, which made more researchers focus on the pre-training model. Later, Livio Baldini Soares et al. [22] from the Google team proposed a pre-training model of BERTEM+MTB in 2019. In that paper, the effects of input and output on the results of relationship classi cation under different conditions were discussed, and the Matching the blank pre-training task was proposed according to the results to eliminate the error caused by over utilization of entities. Peng et al. [23] conducted experiments on the basis of BERT and MTB in 2020, explored through experiments the information types used by the existing models in the entity relationship extraction tasks, designed experiments, and nally concluded that the existing models did not make full use of context information. In this paper, the BERT model is used for experiments. Besides the entity information, entity dependency path is used as syntactic representation, and sentence information, entity information and syntactic information are used as sentence representation for relationship classi cation.

Model Introduction
In supervised entity relationship extraction tasks, since the dataset have fully annotated entities and the corresponding relationship types are given, the existing models all use these tasks as classi cation tasks. The model outputs a vector as a sentence representation and predicts the relationship type. This paper proposes a model framework that uses context information for relationship extraction, whose architecture is shown in Figure 1.
In this paper, the pre-training model BERT is used as the basic model for relationship extraction, and its structure including three parts. Given a sentence, the shortest dependency path between entities is obtained rst after dependency parsing, which, together with the sentence, will be used as the input to the model. Token of the sentence obtained through word segmentation is input to an encoder for coding to obtain the vector representation of each Token, and the sentence vector, entity vector and dependency vector are spliced to obtain the nal representation of the sentence, which is also the nal vector for classi cation. This vector is input to the Softmax classi er for prediction.

Dependency syntactic parsing
Parsing is one of the key technologies in natural language processing, and its basic task is to determine the syntactic structure of a sentence or the dependency relationships between words in a sentence. The dependency syntax was rst proposed by French linguist L. Tesniere (1959) in his works, which analyzed a sentence into a dependency syntax tree to describe the dependency relationships between words. In the structure of dependency grammars, there is a direct dependency relationship between words to form a dependency pair, one of which is the core word, also known as the governing word, and the other is called the modi er, also known as the dependent word. The dependency relationship is represented by a directed arc, called the dependency arc. Take a sentence "The <e1>show</e1> centered around a <e2>beach theme</e2>." as an example for dependency parsing. The analyzed dependency relationship of the sentence is shown in Figure 2: In order to obtain the entity dependency path, the dependency tree should be obtained from the dependency structure of the sentence rst. According to the dependency tree and the annotated entities, the path between entities e 1 and e 2 on the dependency tree can be found, which is the entity dependency path. The entity dependency path is shown in Figure 3, where the red nodes represent entity nodes, and the dotted line is the entity dependency path.

Input
The pre-training model BERT is a multi-layer two-way transformer encoder. The input to BERT can be a sentence or a pair of sentences. A special tag [CLS] is the rst tag of each sequence.
Given a sentence S, the dependency parsing tree is obtained through dependency parsing, and the shortest dependency path between entities is found according to the target entities (e 1 , e 2 ). In order to prevent the path length from being 0 because one entity is dependent on the other entity, the entity dependency path is a path that contains the entities. Special identi ers are inserted before and after the two target entities to emphasize the entities and assist the model in capturing locations of the entities.
The processed sentences and the entity dependency path are entered into the model. The location of the node words can be obtained by the entity dependency path, and the input of the one-hot vector is entered in the path. A [CLS] tag is added to the beginning of the sentence, and the data is input into a tokenizer to obtain its Token sequence. The vector representation of each Token is generated by the encoder.

Class cation
Given a sentence x, containing the entire sentence and the analyzed shortest dependency path sequence, a vector representation r can be obtained by inputting x to a relational encoder. After the relationship representation is obtained, a fully connected softmax layer is used to predict the relationships of the sentence. Then a probability distribution P covering all prede ned relationship types is obtained.
p(y|x, θ) = softmax(W r r + b r ) 8 where y ∈ y is the target relationship type, and θ refers to all learnable parameters, including W r and b r .

Dataset
In this experiment, the SemEval-2010 Task 8 dataset was used. The dataset was collected from major data sources according to nine pre-set incompatible relationships, which contains 10,717 pieces of data, including 8,000 use cases for training and 2,717 use cases for testing. All examples in the dataset were annotated with nine relationships and an Other relationship. The distribution of quantity of the nine relationship types is shown in Table 1: In addition to the annotated relationship types, each data also contains two annotated entities e 1 and e 2 .
The relationship types other than Other type are directional. For example, Cause-Effect (e 1 , e 2 ) and Cause-Effect (e 2 , e 1 ) are different. Therefore, in the experiments, 19 relationship types are usually set to make predictions.
In this paper, the macro average F1 value in the o cial scoring script provided by the SemEval-2010 Task 8 dataset was used for scoring. According to this scheme, the macro average F1 value scores of 9 actual relationships (excluding relationship of Other type) were calculated, and the directionality of the relationships was taken into account. The calculation of F1 values requires precision and recall. The calculation formula is shown in equations (9) to (11): precision= TP TP + FP

Hyper-parameter settings
The settings of hyper-parameters are as follows:  Table 3 compares the performance of the model in this paper with various neural network models on the SemEval-2010 Task 8 dataset, which proves that the method proposed in this paper has achieved good results. The highest value in each column of indicators is shown in bold. It can be seen from the results in the table that the effect of the pre-training model is much better than those of such neural network models as CNN and LSTM. In this paper, the pre-training model was also used for experiments, and the R-BERT model was selected as the Baseline model. The R-BERT model was based on the pre-training model and highlighted the entity information with special identi ers to indicate the entity location, which achieved the best results at that time, and the o cial F1 evaluation value reached 89.25%. On this basis, the shortest dependency path was obtained through dependency parsing and integrated into the R-BERT model in this paper, so that the model could learn the context information of sentences. The results show that the F1 value performance of the model reaches 89.97% after parsing is introduced, which fully proves that the context information provided by the dependency parsing is effective.

Ablation Experiments
The method proposed in this paper has been proved by the above experimental results. We wanted to further understand what factors besides BERT contributed to the experimental results in the method based on the pre-training model, and therefore, three ablation experiments were designed. Since the entity tags "<e1>" and "<e2>" were added to emphasize the entity and add boundary information to the entity, which signi cantly improved the classi cation prediction, these entity tags were reserved and used in each ablation experiment.
In the rst experiment, a [CLS] token was added before the sentence input, the hidden layer vector of this token was used as a vector representation of sentence classi cation, and only this vector was used for classi cation. In the second experiment, [CLS] and the hidden vector of entity dependency path were spliced to obtain a vector as the sentence representation, in which the entity dependency path did not contain entity information. In the third experiment, [CLS] and the hidden vector of the entity were spliced as sentence representation, and in this case, the entity information contained the tags of the entity and integrated the boundary information of the entity.The SDP represents the shortest dependency path. It can be seen from the results in Table 4 that the experimental results are improved after the addition of entity identi ers, which provide the model with the boundary information of the entity and emphasize the entity. There is little difference between the result of using the hidden vector of entity dependency path information as sentence representation and that of using the hidden vector of entity as sentence representation, but the result of using entity information is better. Experimental results show that the model can make use of context information, but the model still needs entity information for supplementation. After combining the entity information with the context information provided by the dependency parsing, the model can predict the classi cation better.

Case study
This section analyzes the results of the R-BERT model and the model proposed in this paper in detail, and compares the results of various relationship types, as shown in Table 5. The results in the table show that the classi cation effect for most relationship types is improved compared with Baseline after the introduction of the entity dependency path, and the effect is more obvious for such relationship types as Content-Container, Product-Producer and Instrument-Agency, indicating that this experiment has successfully integrated the entity dependency path into the pretraining model, and is bene cial to improving the effect of relationship classi cation.
However, the classi cation effect for Cause-Effect and Entity-Destination has not improved, but reduced signi cantly. Therefore, we reviewed the classi cation results obtained by the two models in detail, and extracted the examples of wrong classi cation results of the two models respectively. Table 6 provides detailed examples of classi cations errors in these two types. From the results of data classi cation in the table, we can see that in the prediction results of these two types, the model proposed in this paper correctly predicted the relationship types, but mispredicted the relationship directions, and the relationship types predicted by the baseline model were different from the standard results. Therefore, taking Cause-Effect as an example, when the accuracy of this type is calculated on the premise that the recall rates of the two models are not much different, due to the wrong relationship directions in the prediction results on some data, the model in this paper predicted more data to be of Cause-Effect type than that of the baseline model, so the accuracy rate obtained is lower. As a result, the F1 value evaluation of the Cause-Effect classi cation results is lower than that of the baseline model.
It can be seen from the above results that the method proposed by the model in this paper not only allows the model to learn the context information provided by the dependency syntax, but also improves the prediction of the model. However, the model underutilizes the context information of the data in some relationship types, resulting in correct classi cation of relationship types and wrong classi cation of relationship directions. In this case, it shows that there is still room for improvement in the use of context information, which is also the focus of our following work.

Experiment
Based on the pre-training model, this paper proposes a pre-training model integrating dependency parsing for supervised entity relationship extraction. The shortest dependency path between entities is obtained by dependency parsing, which concisely expresses the syntactic relationships between entities, retains the main part of the expression relationship type, and removes useless modi er chunks and redundant entity information. The context information between entities can be obtained through dependency parsing, and a syntactic representation can be obtained by adopting the same processing method as the entity representation in the R-BERT model, which is spliced to the sentence vector and entity vector to obtain the vector representation for classi cation. The F1 value is increased to 89.97% on the SemEval-2010 Task 8 dataset, which is an increase of 0.72% compared with R-BERT. Through the analysis of the results and the comparison of the results of the models, the model in this paper has achieved good results, successfully learned the context information of sentences, basically solved the problems raised and achieved the expected results.
In the detailed analysis of the results, it is found that the length of the dependency path between entities in some sentences is 0. In order to avoid this situation, this paper also counts entities as nodes on the path during data processing. However, the model cannot obtain enough context information from these data, and the entity information is only reused, which lead to the situation that in some relationship types, entity relationship is accurately predicted, while the direction of the relationship is incorrectly predicted, affecting the nal relationship prediction results to some extent. Therefore, in the next work, we will try to design strategies to extract context information for these sentences to improve the overall effect of relationship extraction.

Declarations
Authors' contributions Authors have equal contributions.
Funding This work was supported by the National Defense Science and Technology Industrial Technology Research Project (JSQB2017206C002).
Availability of data and material Data for this work were obtained from the web (Access from www.semeval2.fbk.eu/ semeval2.php) Con ict of interest The authors declare no con ict of interest.
Ethical approval The authors declare that they do not have any con ict of interest. This research does not involve any human or animal participation. All authors have checked and agreed the submission.
Consent for publication Author have taken all the consent for publication.

Figure 2
Dependency Relationship Graph.

Figure 3
Entity Dependency Path.