Causality extraction model based on two-stage GCN

As one of the indirect causality, cascaded causality can be used to construct the event knowledge graph, causal inference, scenario analysis, etc. The existing GCN methods lack the mining of context information and relevant entity information, resulting in the poor ability of causality inference, which inevitably affects the extraction accuracy of cascade causality. To solve this problem, this paper proposes a causality extraction model based on a two-stage GCN to improve the extraction accuracy. To obtain rich features of entities, this work combines sentiment polarity and knowledge base to get the causality candidate entity library. Firstly, the BERT model is pre-trained using context information and relevant entity information extracted from the entity library to obtain the final entity nodes. Secondly, using the semantic dependency graph, each possible edge between any two entity nodes can be obtained, which are input into the first stage GCN to get a preliminary directed graph of causality. Finally, the directed graph of causality is input into the second stage GCN to achieve deep causality multi-hop inference. Thus, the cascade causality is inferred and extracted by the two-stage GCN model. Experiments show that the extraction accuracy of cascade causality has been further improved.


Introduction
Relation extraction gives a given text, extracts entities/ objects, and infers their relations in the text, forming a triple (S, R, O) (Bosselut et al. 2019). S represents the subject entity, R represents the existing relationship, and O represents the object entity. At present, entity relation extraction technology impacts all aspects of natural language processing tasks and has been widely used. Causality is one of the most famous relationships. Existing methods can quickly identify explicit causality entities, but the extraction accuracy is not high because of the need for inference for implicit causality entities. At the same time, because of the uniqueness and diversity of text, the complexity of semantic structure , the variety of expression and other factors, it is inevitable to increase the difficulty of inference, which leads to many potential causality entities not extracted.
Given the above problems in the research on causality extraction, the following points are mainly considered. (1) Identification of Entity nodes. Previous work has proposed a key sentences extraction algorithm for Chinese microblog comments (Zhang et al. 2021), which considers multiple factors and attributes to identify the key sentences of comments. So this algorithm can provide technical assistance for the identification of entity nodes. (2) Feature extraction and causality inference. In the microblog hot topic word extraction model (Liu et al. 2021a), a feature cooccurrence method is proposed, providing a reference for causality feature extraction and causality inference. (3) Long-distance entity causality inference. Especially, it can effectively identify the cascade causality. Previous work has used ALN (Association link Network) (Xu et al. 2017) to achieve the hierarchical division of associated semantics. Learning the semantic information of the text provides technical support for long-distance entity causality inference.
Based on the above considerations, we combine BERT technology and GCN (Graph Convolutional Network) to propose a new model called the causality extraction model based on a two-stage GCN. Different from the traditional method, the proposed method extracts the potential causality among cascade entities by two-stage inference of GCN. The motivation is that the network can learn more complex entity structures, capture more abundant local and non-local entity features and achieve multi-hop inference of entities for extracting cascade causality entities. The model framework is shown in Fig. 1, and the main contents of this paper can be summarized in the following three points.
• Construct causality candidate entity library We use prior causality knowledge and semantic data to extract causality entities from the dataset. Firstly, the review text is preprocessed. Then, the ''NLTK'' word segmentation is used to segment the words in the text. Secondly, using emotional intensity to establish causality seed lexicon. Finally, combined with the knowledge base, the K-means clustering algorithm is used to extend the causality candidate entity library.
• Build the directed graph of causality by first stage GCN To obtain rich target entity features, the relevant entity information from the causality candidate entity library is input into BERT by entity linking. Firstly, The BERT pre-training model is used to obtain the final entity node by the relevant entity and context semantic information. Secondly, the potential edges between any entity nodes are obtained by the semantic dependency graph. Then, the final entity nodes and edges are input into the first stage GCN to get the preliminary directed graph of causality. • Cascaded causality extraction by second stage GCN Firstly, the directed graph of causality by the first stage GCN is input into the second stage GCN. Secondly, find the path for all reachable entities of an entity by an entity causality directed graph. Finally, the multi-hop inference of entity nodes is carried out through the second stage GCN model to extract all potential entity relations.
The advantage is that the two-stage GCN method can achieve better multi-hop relational inference and identify more cascade causality. The main contributions of this paper can be summarized as the following two points: • Effective construction of causality candidate entity library We consider that the causality candidate entity library is constructed by emotional intensity and part of speech, which is convenient for obtaining relevant entity information to learn entity features better. • Strong inference of two-stage GCN The GCN of the first stage is conducted entity causality inference to get the preliminary directed graph of causality, and the second stage GCN realizes multi-hop inference between entities. So the two-stage model proposed in this paper further strengthens the inference ability.
This paper is organized as follows. Section 2 introduces the related work of causality extraction. Section 3 gives the method of building the causality candidate entity library, and Sect. 4 shows the construction method of the causality extraction model. Section 5 provides the experimental analysis of the model. Section 6 summarizes this article and future work.

Research on causality
More and more scholars have paid attention to causality extraction in recent years. Dasgupta et al. (2018) uses complex formulas to represent causality and annotates the cause, effect, and causal connectives in sentences through bidirectional long-term and short-term memory networks. It converts the whole sentence into a word vector sequence and adds a language layer on Bi-LSTM to achieve good results. Zhang et al. (2015) consider the sequence information of long sentences because of the location uncertainty of causality information. Therefore, word embedding (Dunietz et al. 2017) is used as the input feature to extract causal events (Wu et al. 2019) based on the Bi-LSTM method . Silva Tharini et al. (2017) compared the two methods. Based on the characteristics of knowledge and deep learning, the first group of experiments trained the SVM-based causal relationship classification model on the features of semantic knowledge and selected several different CNNs for experiments. Li and Mao (2019) introduced other vital features of causality to improve the performance of CNN by reducing the dimension, considering that the construction of features requires a lot of engineering. An et al. (2019) extracted causality from the literature from the perspective of rules, improved the syntactic pattern matching method to simplify sentences, established a verb seed set to learn the characteristics of verbs, and finally achieved good results. Pechsiri and Piriyakul (2021) extracted causality from web documents, mainly the causal path. The causal path can be used to explain or express some concepts, and the supervised learning method is used to improve the accuracy. Abbas et al. (2021) aimed to extract causality from biomedical literature and implemented and evaluated several commonly used models to reduce class imbalances and improve the performance of models by random oversampling techniques. Vo et al. (2020) focused on the causality extraction of document-level texts and constructed a network. Akkasi and Moens (2021) extract causality from the biomedical field. Shao et al. (2021) proposed a BEL method to simplify sentences and improve the accuracy of causality extraction by BERT. Vargas-Hakim et al. (2022) introduced the convolution neural networks (CCNs) and introduced the state-of-the-art of encodings for CNNs. GCN model is closely related to the CNN model, both of which are the operations of aggregating neighborhood information. In recent years, GCN can capture more features because of its characteristics. More and more scholars turn their attention to GCN and use effective training methods to train GCN to improve efficiency (Liu et al. 2021b). Bosselut et al. (2019) transformed implicit knowledge into explicit knowledge through the COMET model, established a common-sense knowledge base, and produced more high-quality new knowledge. Experiments show that new knowledge can be inferred through knowledge graphs. Zhu et al. (2019) proposed a GP-GCNs to generate parameters for relation extraction and described the embedding module, propagation module, and classification module in detail. GCN adjusted the hyper-parameters in the process of propagation. Finally, qualitative analysis and quantitative analysis were carried out. The model can be used for inferring relations through the multi-hop mechanism. Fu et al. (2019) and others use relational weighted GCN to extract the relationship. This paper mainly considers the text's sequential features and local features and uses a dependency structure for GCN to learn the implicit features between all words in the text. The GCN model improves the prediction results with high accuracy on public datasets. Balalia et al. (2020) extracted event information by dependency graph and proposed an attention-based GCN to capture potential relationships between events. MHGCN model presented by Gao et al. (2022) embeds entities in each view of GCN to realize cross-language entity pairs and supplement the knowledge map. In addition, to pay attention to the characteristics of entities themselves, Gao et al. also considered the characteristics of relational semantics and entity attributes to learn the structural features of entities better. Zhou et al. (2020) used grammar dependence and GCN to establish a common-sense knowledge map, and GCN could flexibly combine syntactic information into emotion analysis tasks (Wei et al. 2021). Zhang et al. (2020a) used GCN learning to extract more accurate features. Zhao et al. (2021), Zhang et al. (2020b), Lei et al. (2021) and Chawla (2021) have proved the effectiveness of convolutional neural networks.

Research on graph neural network
Causality extraction  can quickly extract causality entities for explicit causality, but it often requires inferring to extract causality for an implicit causality. The traditional GCN can learn the features of adjacent nodes through the hidden layer to infer, and further inference is needed for the cascade causality. Therefore, this paper proposes a two-stage GCN to implement the cascade causality inference.

Data pre-processing
The main tasks of data pre-processing include two aspects.
One is to preliminarily screen the content of the text, delete the valueless information or unify the sentence format. The second is to mark the selected sentences. For example, causality pairs use cause phrases to represent a cause, and effect phrases mean an effect. Since this article involves sequence annotations, punctuation is also annotated as words (labeled 'O'). The causality trigger words are not labeled. The causality extraction in this paper is not limited to the explicit causality with markers. The labeling example 1 ) is shown in Fig. 2 and Fig. 3. C represents the cause, O represents the other, and E represents the effect. This paper aims to extract all the causality entity pairs, as shown in Fig. 3.
For example, in {[Bacterial], Cause-Effect, [ance pimples]}, the first entity represents the cause entity, and the second entity represents the effect entity. The relationship is Cause-Effect, which makes up a triple.

Extraction of causality candidate entities
Because causality is usually a phrase with more significant emotional tendencies, this paper will prioritize the emotional intensity of the words. The existing sentiment lexicon, such as the sentiment lexicon of the Taiwan University of china and the HowNet sentiment lexicon of CNKI (i.e., abbreviation of China National Knowledge Internet, a paper retrieval database in China), are all familiar. In corpus sentiment analysis, it is impossible to judge the emotional intensity of some specific words accurately, so we need to build a corpus-oriented sentiment lexicon.
(1) Data preprocessing, cleaning, and removing incomplete and repetitive data in the corpus The motivation is to ensure the corpus belongs to the same field. Stop words and special symbols are processed by word segmentation.
(2) The construction of the word vector model We use the Word2Vec model in deep learning to transform words into word vectors, which build the foundation for constructing the subsequent neural network.
(3) The construction of a neural network Through the dataset, we construct the corpus needed for training. At the same time, the word vector is used to convert words into vectors for neural network training, and the emotional classifier is obtained finally. (4) Construction of domain sentiment lexicon The sentiment lexicon is mainly composed of the obtained sentiment lexicon and the emotional words in the corpus field. The neural network classifier is used to judge its sentiment polarity, and the required domain sentiment lexicon is obtained.
This paper uses the neural network to construct a binary classifier of word emotion. It is known that the training corpus is emotional words, and the corresponding label is the polarity of emotional words. Each emotional word is converted into a 100-dimensional word vector through Word2Vec, and the judgment of the sentiment polarity of words belongs to the classification problem. Therefore, we use the fully connected neural network to construct the classifier.
Step 1: Determine the seed word set According to the characteristics of the fields, the corresponding selection criteria are formulated, and the words in the corpus are extracted as seed words which are added to the seed word set.
Step 2: Determine the set of emotional words The seed words are converted into the word vector and calculated as the vector of similarity formula (cosine similarity). N-words that are most similar to each seed word are obtained as the emotion words set.
Step 3: Use the trained classifier to judge the sentiment polarity of each word The words with sentiment polarity are integrated and added to the sentiment lexicon for specific fields.
As the causality in the dataset field may be positive or negative emotions, so we consider more emotional intensity. Firstly, we calculate the emotional intensity of the causality entity, combined with the emotional intensity of manual annotation in the comment text. Whether a word frequently appears in the dataset and has strong sentiment polarity, when the conditions are met, it can be considered a seed word. The emotional intensity is divided into four levels: level-0, level-1, level-2, and level-3. The basis for the division mainly considers the following aspects: the completeness of components, the weighted average of the emotional intensity of each word, and the frequency of words that appears in the whole annotation dataset. Therefore, the following definition is given.
Definition 1 (Emotional Intensity of Causality Entity (EICE)) Emotional Intensity of Causality Entity is used to measure the emotional intensity of the entity in the annotated dataset. The sentiment polarity intensity of the causality entity is calculated from the causal word. With the help of the emotional intensity characteristics, the causality entity can be extracted more accurately, as shown in formula (1).
where F i represents the frequency of causality word i in the whole annotated dataset, I i represents the emotional intensity of causality word in the sentiment lexicon and finally takes the absolute value. E i represents the composition of the causality entity in the dataset (e.g., subject, predicate, object). And W i represents the initial weight of the causality entity.
The emotional intensity values calculated by the formula (1) can be used to establish a causality seed entity library, and appropriate weights can be given to causality entities with different emotional intensity levels. The emotional intensity level is mapped to the range of 1-4, and the integer is selected simultaneously. Therefore, the weights are 0.5, 1, 1.5, and 2, increasing by 1 for each ascending emotional intensity level. The K-means clustering algorithm is used to cluster the candidate seed entity library, so the causality candidate entity seed set is constructed. The specific algorithm process is shown as follows.  Algorithms 1, Steps 1-3 calculate the EIC i of each word. Then, Step 4 splits the sentence. Next, calculate Avg (EIC i ) for the segmented clauses, and select the most emotional intensive clause in Step 6. Steps 7-8 determine the part of speech of each clause. If it is a noun, add a clause phrase to L in Step 9. Steps 11-12 expand the candidate seed entity library.
Obtain seed entity library on the dataset and expand related entities with the domain knowledge base. The specific examples of the causality candidate entity library are shown in Table 1.

Screening of nodes and edges
The representation of entity labels is constructed, and Sect. 3.1 of this paper introduces the unified label framework of entities. At the same time, the dependency edges constructed by the semantic dependency graph are regarded as an adjacency matrix input into GCN. The semantic dependency graph created is centered on the core entity in the graph. Therefore, in pre-training input, each sequence component is a node of the graph, and the dependency between words is the edge of the graph. In this paper, the HanLP tool is used to realize the semantic analysis and it is obtained as shown in Fig. 4.

Pre-training
In the causality entity extraction task, BERT mainly learns syntactic structure in the low-level network structure and learns rich semantic information features in the high-level network. However, BERT needs to be improved for the ambiguity of the entity itself. So this paper needs to learn more complex entity information by entity linking. Context word is usually trained on unstructured and unmarked texts and does not contain precise semantics for real-world entities. So it usually cannot remember entities other than these entities. For each sentence, use the integrated linker to retrieve the relevant entity embedding and then update the context word representation in the form of entity attention. The key idea is to model entities explicitly and use a linker to retrieve relevant entity information embeddings from the constructed causality candidate entity library. The motivation is to form knowledge-enhanced entity representation. The model is shown in Fig. 5.
During BERT pre-training, each entity is represented by an exceptional identifier #. BERT pre-training sequence generation uses Masked LM. The task of Masked LM is described as giving a sentence, randomly erasing one or several words, and predicting what the words erased according to the remaining words are. Specifically, 15% of the words in a sentence are randomly selected for prediction. 80% of the erased words are replaced by a special symbol [MASK], an arbitrary word replaces 10% of the words, and 10% of the words remain unchanged to predict a word. The model does not know whether the words that embed the corresponding position are correct. So it can force the model to rely more on context information to predict words and give the model a specific ability to correct errors. Multiple self-attention consists of three parts: query, key, and value, allowing each vector to focus on other vectors. We train BERT to minimize the objective function that combines the next sentence prediction (NSP) with masking LM logarithmic likelihood (MLM).
where h is the encoder parameter in BERT. h 2 is the parameter in the output layer connected to the encoder in the Mask-LM task and h 1 is the classifier parameter associated with the encoder in the prediction task. Therefore, in the first part of the loss function, if the masked word set is M and its size is |V|, it is also a multi-classification problem, then precisely the loss function is formula (3).
It is also a loss function of a classification problem in the sentence prediction task.
Therefore, the joint learning loss function of the two tasks is In the pre-training process, the related entity information from the causality candidate entity library by entity linking was introduced in the previous section, which is mainly divided into the following three sub-modules.
(1) The related entity generation module It is responsible for detecting the target entity set M (i.e., including all the entities mentioned in the input text) and finding the related entity set E m corresponding to  each target entity m [ M from the given causality candidate entity library. (2) Related entity ranking module It is responsible for scoring and ranking multiple related entities in the related entity set E m (i.e., each entity mentions m) and outputs the related entity with the four highest scores as the entity link result of m. (3) Unlinkable entity prediction It is responsible for predicting which target entities in the input text cannot be linked to the causality candidate entity library.

Two-stage GCN
In this section, we use the two-stage GCN model to infer. GCN mainly learns entity node features and corresponding relationship features in the first stage to get a directed graph of causality. And the second stage GCN realizes multi-hop inference between entities. GCN mainly considers implicit features between all entities and further infers causality in the second stage. So the two-stage GCN model proposed in this paper further strengthens the inference ability. The original input is a sentence sequence. After BERT pre-training, the entity nodes and edges are obtained. GCN can be used to extract entity regional dependency features. In this paper, the GCN model is used to realize causality inference, and the first stage GCN model and the second stage GCN model adopt the same structure. The principle of GCN is shown in Fig. 6.
In each layer, ReLU represents the feature activation function. In this paper, three hidden layers are selected. In general, GCN accepts all the vertex feature messages transmitted from the former layer, makes corresponding transformations, and adds them together. Finally, an activation function is used as the output of this layer. For each hidden layer, there is formula (6).
Here, h l u shows the features of the word u in a hidden layer, including all the words transmitted from the word u and all the introduced words, including the word u itself. W represents the weight. We connect the output and input word features as the last word features. Firstly, each node sends its feature information to neighbor nodes after transformation and extracts the feature information of the node. This step is to integrate the local structural information of nodes (i.e., for all neighbor nodes), namely the sum operation in the above formula (6). And then do nonlinear transformation after gathering the previous information to increase the expression ability of the model.
The entity is predicted using the word features extracted from GCN, and the causality between entities is extracted. The dependency edges are removed and all entities are predicted. Based on the results of two hidden layers, entity causality is obtained.
(w 1 ,r,w 2 ) denotes the scores of entity pairs (w 1 , w 2 ) obtained under causality. When extracting causality triples, the relationship between each word pair is judged and identified as causality as possible.
In the second stage of GCN, the entities and relationships extracted in the first stage are not very good for long distance. Therefore, the second stage GCN is proposed to extract the cascade causalities for long-distance multi-hop inference. The second stage considers the implicit features between all causal entities in the text so that the accuracy of the extraction is higher. In the first stage, a complete correlation weighted graph is established for each pair of causality. Here, (w 1 , w 2 )is the weight of the edge to represent the probability that w 1 and w 2 entities are causalities. To extract the causality between each entity pair more  Fig. 6 The principle of GCN accurately, the GCN in the second stage should carry out weighted propagation to achieve a more robust relationship prediction. The formula is shown as (8) to propagate between hidden layers.
where P r (u,v) denotes the weight with edge weight as edge, which indicates the probability that the two entities w 1 and w 2 are causal, and W r and b r are the weight text of layer l of GCN hidden layer, which excludes all words and all relationships. Finally, a threshold is set to extract causality entity pairs. If P r (u,v) [ 0.5, the entity pair is considered to have a causality, and vice versa. The classification module takes the target entity pair (w 1 , w 2 ) as input, and we stack the embedding of (w 1 , w 2 ) together to infer the underlying relationship between each pair of entities. So we can obtain the causality of each pair of entities, as shown in formula (9). P r w 1 ; ce; w 2 ð Þ¼softmax S w 1 ;ce;w 2 Here, we use cross-entropy as the final classification loss function, as shown in formula (10).
S is a set representing a collection of all entities, P r (w i , w j ) denotes the probability that entities w i and w j are causality. The entire two-stage GCN algorithm process is as follows.
Algorithms 2, Step 2 calculates the similarity between v i and v j , learns features, and propagates through Step 3 hidden layer. If the P r (v i , v j ) calculated by Steps 5-8 exceeds the set threshold, there is a causal relationship between v i and v j . Steps 1-8 describe the algorithm process of GCN in the first stage, mainly learning the local characteristics of nodes. Then, construct a new graph. Steps 9-12 illustrate the process of the GCN algorithm in the second stage.
Step 10 calculates the similarity between w i and w j , learns features, and propagates through Step 11 hidden layer. If the P r (w i , w j ) calculated by Step 14 exceeds the threshold set by Step 15, there is a causal relationship between w i and w j . Steps 13-17 completed the extraction of causality triples and improved extraction accuracy.

Experimental setup
The datasets used are from open source databases SCIFI 1   . SCIFI contains a total of 1270 valid data, 1803 sentences in the ECauSE Corpus2.0 corpus contain causality, and a third of them involve overlapping relationships. CaTeRS annotates a total of 1600 sentences in 320 five-sentence short stories extracted from the ROCStories corpus, which all contain causality. NYT and WebNLG are datasets for relation extraction, which contain overlapping entity relations. NYT contains 1230 sentences with causality, and WebNLG contains 1420 sentences. Mining information in large data sets requires rational allocation and composition of data (Masood et al. 2021). The dataset is mainly divided into a training set, test set, and validation set, which are divided according to 8: 1: 1. The construction of the dataset is shown in Table 2. We use SCIFI and NYT datasets to verify our method.
The specific operation for perfect experiments is as follows.
Step 1: The causality dataset is obtained The opensource databases SCIFI, ECauSE Corpus2.0, CaTeRS, NYT, and WebNLG are used for causality extraction. After denoising, 7323 sentences that are conducive to causality analysis are selected.
Step 2: Causality entity pre-processing For the 7323 text sentences obtained, the causality entity labeling is unified. For the entities appearing in the sentences, the special symbol # is used to mark them, such as #entity#, and 7323 sentences with causal entity labeling are obtained.
Step 3: Construct the causality candidate entity library Extract the potential causality entity from the dataset and put it into a candidate entity library, on this basis, using a domain knowledge base to expand the candidate entity library.
Step 4: Construction of causality extraction model The BERT pre-training is used to convert the text with the semantics of each word into a word vector. At the same time, to learn the entity features better, BERT pre-training is used to obtain context information and related entity information from the causality candidate entity library. Then, the two-stage GCN is used for causality inference. The first stage is to learn the local feature of the entity node, and the second stage is to learn the global feature.
Step 5: Extract the causality entity triples Each pair of entities meeting threshold conditions is identified as causality entities. All entity pairs satisfying causality are extracted to form a triple.
To make the method more persuasive, we compare the models in many aspects. Firstly, the method is tested on multiple datasets to illustrate the effectiveness of the causality extraction model. At the same time, this paper also conducts comparative experiments on other baseline models, including Bi-LSTM ? GCN, Bi-LSTM ? CRF, 3-layers CNN, GP-GCNs, CNN ? RNN, CNN ? BiGUR ? CRF, Bi-LSTM ? Attention.

Experimental analysis
According to the above analysis, this paper did the following experiments.
This paper selects different models for comparison. The Bi-LSTM context (Peters et al. 2019) coding is mainly to improve the pre-training process. The CNN model is often used to learn local features and three layers are selected here. GCN can learn long-distance relationship inference. The causality extraction model based on two-stage GCN, Bi-LSTM ? GCN, Bi-LSTM ? CRF, 3-layer GNN, GP-GCNs, CNN ? RNN, CNN ? BiGUR ? CRF are used.   Table 3. Two-stage GCN model is more comprehensive. The results of Table 3 show that based on the two-stage GCN causality extraction method for SCIFI and CaTeRS datasets, it has a higher recall score. Contrast Bi-LSTM ? GCN, Bi-LSTM ? CRF, 3-layer GNN, GP-GCNs, CNN ? RNN, CN ? BiGUR ? CRF, Bi-LSTM ? Attention, can be found with GP-GCNs in SCIFI dataset recall is higher than other methods.
We consider more overlapping entity relationships. For ECauSE Corpus2.0, NYT and WebNLG, due to many overlapping relationships, the overlapping relationships are mainly divided into the following types, as shown in Table 4.
The inference ability of the two-stage GCN is stronger. It can be seen from Table 4 that each overlapping entity needs to be inferred, and GCN can achieve this inference. But the effect of using GCN only once is less than that of two-stage GCN, as shown in Fig. 7.
The number of layers of GCN is also an essential factor. To prove the influence of the number of layers, we also compare the models of different layers. It can be seen in Fig. 8. In the two datasets, the second layer has the best effect. Although the impact of three layers is also good, the time of three layers is significantly more than two layers. It shows that more layers considered in the inference process will lead to better performance, especially when there are more entities. The apparent two-stage training is better than the onestage training in the model training. We took two datasets on SCIFI as examples, two examples in Table 5. As can be seen from Table 5, the GCN effect of two-stage is better than that of only one-stage GCN. Although the GCN of one-stage also implements inference, the inference ability of the two-stage GCN is stronger for long distances.
The above experiments show that the cascade causality entity extraction relies not just on local features but also needs of global features. It is necessary not only to judge the relationship between adjacent entities but also to infer cascade entities. So this work uses a two-stage GCN for efficient inference to extract all potential causality. Because of the need to learn each pair of causality entities more comprehensively, the candidate entity library introduced in Sect. 3 is constructed to learn entity features. And two-stage GCN is used to infer the relationship between entities. The various aspects of the experiment are discussed, and the above experiments prove the effectiveness of this method.

Conclusions
We propose a causality extraction model based on a twostage GCN to extract causality effectively. This model mainly uses two-stage GCN to extract causality and analyzes all causality entity triples in the text-especially the cascade causality. Finally, causality extraction in deep semantics is realized. The contributions of this paper mainly include the following aspects.
(1) The causality candidate entity library has been constructed. In the pre-training, the four closest related entities from the candidate entity library by entity linking is selected, so the pre-training entity vector could learn more features. At the same time, it can better learn the features of the entity.
(2) The two-stage GCN has been proposed for entity relationship inferring. In the first stage, the local features of adjacent entity nodes are learned, and in the second stage, the features of all nodes are learned. The model realizes long-distance relation inference. Each pair of entities made a causality judgment to identify more cascade causality.
In the future, the method in this paper can be considered to be applied to all relational extraction texts. The method proposed in this paper can well apply cascade causality texts. But in the face of more complex multifactorial, text effect remains to be improved. In the future, the The damages caused by mudslides, tremors, subsidence, superficial or underground water were verified, as well as swelling clay soils (mudslides,C-E, the damages) (mudslides,C-E, the damages) (tremors,C-E, the damages) (tremors,C-E, the damages) (subsidence,C-E,acnepimples) (subsidence,C-E,acnepimples) Val-acc(%) One layer Two layer Three layer Fig. 8 Comparison of Layers of GCN comprehensive analysis combined with emotional analysis can be considered to broaden the scope of application, such as dialogue system and situational analysis combined with emotional analysis. The causality extraction model based on the two-stage GCN model can help the software platform or relevant departments to extract the causality effectively. It can carry out management measures or coping strategies, help make the best decisions, and build the foundation for subsequent emotional analysis. Data availability Data cannot be available for privacy reasons.

Declarations
Conflict of interest The authors declares that they have no conflict of interest. Also, this manuscript is approved by all authors for publication. I (Guangli Zhu) would like to declare on behalf of all co-authors that the work described was original research that has not been published previously. All the authors listed have approved the manuscript that is enclosed.
Ethical approval This article does not contain any studies with human participants or animals performed by any of the authors.
Informed consent. No humans or any individual participants are involved in this study.