SCL-SKG: software knowledge extraction with span-level contrastive learning

The text of software knowledge community contains abundant knowledge of software engineering field. The software knowledge entity and relation can be extracted automatically and efficiently to form the software knowledge graph, which is helpful for software knowledge-centric intelligent applications, such as intelligent question answering, automatic document generation and software expert recommendation. Most existing methods are confronted with problems of task dependence and entity overlap. In this paper, we propose a software knowledge extraction method based on span-level contrastive learning. From the level of sentence sequence modelling, we model the sentence sequence with span as a unit, and generate abundant positive and negative samples of entity span through the span representation layer to avoid the problem that the token-level method cannot select overlapping entities. From the level of feature learning, we propose supervised entity contrastive learning and relation contrastive learning, which obtain enhanced feature representation of entity span and entity pair through positive and negative sample enhancement and contrastive loss function construction. Experiments are conducted on the dataset which is constructed based on texts of the StackOverflow, and show that our approach achieves a better performance than baseline models.


Introduction
In the context of big data software engineering, software knowledge communities (e.g., StackOverflow) generate massive community texts, which contain rich software domain knowledge such as software development activities, software development technology, software development tools and software project management (Yin et al. 2018).Therefore, automatic and efficient extraction of software knowledge entities and their semantic relations from software knowledge community texts, and construction of software knowledge graph, are helpful for software knowledge-centric intelligent applications, such as intelligent question answering, automatic document generation and software expert recommendation, and play an important role in improving software development efficiency and software production quality.
Existing researches (Tabassum et al. 2020;Ye et al. 2016;Reddy et al. 2019;Lv et al. 2021;Zhu et al. 2015;Zhao et al. 2017;Tang et al. 2022) are mostly based on pipeline method, and entity extraction and relation extraction are modeled as two independent tasks.The model implementation of these methods is relatively simple and flexible, but the interaction and association of entity extraction and relation extraction are ignored, causing many problems (Geng et al. 2021).On the one hand, due to the error propagation of serial tasks, the quality of tasks in the previous stage directly affects the performance of tasks in the next stage.On the other hand, due to the independent modeling of entity extraction and relation extraction, the parameters and the information of the two tasks do not share or interact, semantic information and dependency information are lost, resulting in redundant entities and high error rate.
Different from the pipeline method, the joint learning method models entity extraction and relation extraction as a task, and uses the joint loss function to enhance the association and information sharing between tasks.The overall performance of the model is improved and developed into the mainstream method of information extraction task (Han et al. 2020;Ye et al. 2021).
According to the different modeling objects, the extraction of triples based on joint learning method can be divided into the method based on parameter sharing and the method based on sequence tagging.In order to solve the problem of manual feature extraction, the method based on parameter sharing (Miwa and Bansal 2016;Zheng et al. 2017a;Li et al. 2017) uses neural network to construct feature encoder and feature sharing layer to realize automatic feature extraction and model parameter sharing, which can alleviate the problems of error propagation and inter-task relation dependence.However, this kind of methods inherits the sequence relation of subtasks in essence, which will produce redundant entities with no matching relation and affect the quality of joint extraction.In order to solve the problem of redundant entities mentioned above, the method based on sequence tagging (Zheng et al. 2017b;Bekoulis et al. 2018a, b;Zeng et al. 2018) uses joint tagging to label the entity location, entity relation type, entity role and so on, and transforms the joint learning model into a sequence tagging model to realize simultaneous extraction of entity and relation.However, such methods are usually token-level tasks, and the inherent sequential characteristic of sentence sequences will result in the inability to select overlapping entities, which will lead to the entity overlap problem in relation extraction tasks.
The text of software knowledge community is unstructured user-generated content, and the semantic relation of domain entities is complex.The traditional sequence modeling method is used to extract software knowledge, which will cause problems of entity overlap and error propagation, and will affect the quality of software knowledge graph construction.The entity overlapping problem occurs when a sentence contains multiple entities overlapping with each other.For example, in the sentence "GetHashCode is Method of Base Object Class of.NET Framework", ".NET" and ".NET Framework" are overlapping entities.
Motivated by above problems, this paper proposes a method based on span-level contrastive learning for software knowledge extraction.From the perspective of modeling object, this method models the sentence sequence with span as a unit, which can avoid the disadvantages of token-level sentence sequence modeling and effectively alleviate the entity overlap problem.From the perspective of feature learning, this method introduces contrastive learning to obtain more distinct feature representation of entity span and entity pair, so as to improve the accuracy of classification prediction.The main contributions of this paper are as follows: (1) We propose a span-level contrastive learning model named SCL-SKG for software knowledge extraction from software knowledge community texts.The proposed SCL-SKG takes the sentence sequence with span as a unit, and generates abundant positive and negative samples of entity span as data augmentation strategies for contrastive learning.
(2) We introduce supervised entity contrastive learning algorithm and supervised relation contrastive learning algorithm to learn effective feature representation of entities and entity pairs that are more suitable for downstream classification tasks.
(3) The experimental results showed that the proposed model achieves better performance than the benchmark models, which demonstrates its effectiveness.
The remainder of this paper is organized as follows.Related works are reviewed in Sect.2, and Sect. 3 presents each module of the proposed approach in detail.Section 4 presents the details of benchmark dataset, performance evaluation metric, and experimental results analysis.Finally, the main conclusions and future works are given in Sect. 5.

Related work
In this section, we provide a comprehensive introduction for previous works in the fields of information extraction based on span and contrastive learning in natural language processing.

Information extraction based on span
Span-based approaches take one or more words of a sentence sequence as span units, generate all possible spans, and model the sentence sequence at the span level.Getting rid of inherent sequential characteristic of sentence sequences, the span-based method can select overlapping entities and represent features, which can alleviate the error propagation and entity overlap problems caused by token-level sentence sequence modeling.Dixit and Al-Onaizan (2019) proposed a span-based model for joint extraction of entities and relations using Bidirectional Long Short-term Memory (Bi-LSTM) network and achieved a better performance.Luan et al. (2018) proposed a multi-task classification framework based on shared span representation, and constructed a scientific literature knowledge graph on the scientific literature abstract dataset SciERC through tasks such as entity recognition, relationship classification and reference resolution.In order to enhance the interaction between different tasks, Luan et al. (2019) proposed DyGIE, an information extraction framework based on dynamic span graph, which captured the interaction information between spans through graph propagation and improved the performance of the model without requiring additional syntactic analysis and processing.Based on the Bidirectional Encoder Representations from Transformers (BERT) model, Eberts and Ulges (2019) realized the joint extraction of entities and relations through strong negative sample sampling, span filtering and local context representation.Ding et al. (2021) applied the joint extraction method to a specific military field, proposed a hybrid model integrating span method and graph structure, and improved the extraction performance of the model by combining the vocabulary and syntactic knowledge of the specific field.

Contrastive learning in natural language processing
As a discriminative self-supervised learning, the contrastive learning is to leverage the similarity or dissimilarity of data samples to learn an encoder with supervised information.The encoder adopts the similar feature representation for the same type of data and different feature representations as far as possible for different types of data to obtain more effective feature representations for downstream tasks (Liu et al. 2021).
In terms of text presentation tasks, Giorgi et al. (2020) applied contrastive learning to feature representation of sentence with unsupervised learning, and proved its feasibility in sentence representation task through experiments.In order to obtain a better sentence representation, Gao et al. (2021) proposed a training method based on contrastive learning.This approach enhances sentence representation through specific data augmentation and contrastive loss function construction, and achieves a better performance in the downstream 7 Semantic Textual Similarity tasks (STS).To alleviate the collapse issue of BERT, Yan et al. (2021) proposed a sentence representation transfer framework based on contrastive learning.By constructing contrastive learning tasks, the framework fine-tunes BERT models on unlabeled datasets of the target domain, and generates sentence representations that are more suitable for downstream tasks.
In terms of entity and relation extraction tasks, Peng et al. (2020) proposed an entitymasked contrastive pre-training framework for relation extraction to capture sentence context information and entity type information.Firstly, the framework uses distant supervision method to generate positive and negative samples through external knowledge graphs.Then, sentences with the same relation are regarded as positive samples, and sentences with different relation are regarded as negative samples.Finally, a randomly entity-masked method is used for pre-training to obtain a better relation representation.As the pre-trained model cannot capture factual knowledge in the text, Qin et al. (2021) proposed a contrastive learning framework in the pre-training phase to enhance understanding of entities and their semantic relations through Entity Discrimination and Relation Discrimination.Su et al. (2021) improved the text representation of BERT model by means of contrastive learning through data augmentation methods such as synonym replacement, random exchange and random deletion to enhance the effect of biomedical relation extraction.

Task modeling and model overview
The task of the span-level software knowledge extraction proposed in this paper is to automatically identify all possible software knowledge entity spans from software knowledge community texts, predict their corresponding entity types, and classify the semantic relations of entity span pairs according to the predefined relation types, so as to obtain software knowledge relation triplets.In this way, the task can be formally defined as a 7-tuple SKG = X, S, Y e , Y r , , , NA , where: (1) X = x 1 , x 2 , … , x n is an input sentence of the software knowledge community text; (2) S = s 1 , s 2 , … , s n is a candidate span set which generated by enumerating X; (3) Y e (s i ) ∈ ∪ {NA} is a function to predict the entity type of candidate span instance s i and generate a set E = (e 1 , e 2 , … , e |E| ) , s i = x i , x i+1 , … , x i+k ; (4) Y r (e i , e j ) ∈ ∪ {NA} is a function to predict the semantic relation type of entity pairs (e i , e j ) and generate a set R = (r 1 , r 2 , … , r |R| ),e i , e j ∈ E; (5) is a set of predefined entity types; ( 6) is a set of predefined relation types; (7) NA is a set of non-entity or no-semantic relation.
For example, given the sentence of the software knowledge community text "GetHash-Code is Method of Base Object Class of.NET Framework", the goal of software knowledge extraction is to accurately identify entity pairs (e i , e j ) : "GetHashCode" and ".net Frame- work", and predict the relation r ij of entity pairs as "inclusion", thereby obtaining software knowledge relation triplets ⟨e i , r ij , e j ⟩.
According to the task definition mentioned above, we propose a novel hybrid model for software knowledge extraction, named SCL-SKG, which based on span-level contrastive learning.The architecture of the SCL-SKG is shown in Fig. 1.

BERT contextualized word embedding
Software knowledge community text is the user-generated content, which not only has social features such as repetitive content, loose structure and irregular spelling, but also has software domain features such as non-uniform naming, complicated terminology and weak semantic features.We use SWBERT (Tang et al. 2022), a pre-trained model in the field of software engineering, to encode the input sentence and capture the dynamic word embedding.The detailed description is as follows: (1) For sentence sequence X = x 1 , x 2 , … , x n , the corresponding Token sequence is obtained by adding the identifiers [CLS] and [SEP] at the beginning and ending of the sentence sequence.(2) For each token in the Token sequence, token embedding, segment embedding and position embedding are generated.After summing up these three embeddings, the input embedding of BERT is obtained: (3) After feature encoding, the dynamic word embedding of sentence sequence X is obtained: (1) After the pre-trained model SWBERT encoding, a word embedding sequence with length n + 1 is obtained: W = (w cls , w 1 , w 2 ,…,w n ), where w cls represents the classification information of the sentence sequence.

Token and span representation
In view of the entity overlap problem in the software knowledge community text, inspired by relevant work (Dixit and Al-Onaizan 2019; Eberts and Ulges 2019), we constructed a span-level sentence sequence representation layer in the SCL-SKG and modeled sentence sequences as span units.Generally, span-based methods generate span representations by iterating all the words in the sentence sequence, which incurs model computational overhead.Therefore, we first filter sentence sequences, removing words with less meaning and reserving words such as verbs and nouns.Then, all words of sentence sequences are iterated to produce span representations with different lengths, resulting in a set of entity span: S = s 1 , s 2 , … , s n .The entity span instance is represented as where k is the length of the span, indicating the number of words contained in the entity span.
For example, for the sentence sequence of software knowledge community text: "GetHash-Code is Method of Base Object Class of.NET Framework", the sentence sequence: "GetH-ashCode Method Base Object Class.NET Framework" is obtained after filtering, and the

Relation Contrastive
Learning Fig. 1 Overview of the proposed model generated entity span sets are: "GetHashCode", "Method", "GetHashCode Method", "Method Base", "Object", "Base Object", "Class", ".NET", ".NET Framework", etc.Therefore, the SCL-SKG model generates the entity span of sentence sequences through the entity span representation layer, and generates abundant positive and negative samples of entity span, which provides a data augmentation method for the next step of entity contrastive learning.

Entity classification based on contrastive learning
In order to obtain distinctive feature representation of entity span and improve the accuracy of entity span classification prediction, we propose an entity contrastive learning at entity classification layer.Therefore, the entity classification layer of the SCL-SKG model includes two steps: entity contrastive learning and entity classification.

Supervised span-level entity contrastive learning
In order to make use of the label information of software knowledge entities, we extend the contrastive self-supervised learning method and propose a supervised span-level entity contrastive learning to obtain the entity span feature representation that is more suitable for downstream tasks.Different from the self-supervised contrastive learning, the supervised span-level entity contrastive learning combines entity label information and data augmentation methods to generate multiple positive and negative views of the original data samples, and uses the contrastive loss function constraint model to learn the feature representation of entity span that is more suitable for classification tasks.
The following is a detailed description of the components involved in the supervised spanlevel entity contrastive learning.
(1) Data augmentation compared with the image processing field, the data augmentation in the natural language processing field is more difficult, as it includes random deletion of words, random insertion of words, random exchange of words and synonym-antonym replacement, etc. (Wei and Zou 2019).However, these methods are likely to interfere with the structure and semantic information of sentences, and if they are directly applied to the downstream tasks such as entity extraction and relation extraction, it will affect the performance of the model.
Based on the entity span set in the span representation layer, we use the labels of software knowledge entities for data augmentation to generate multiple positive and negative samples of entity span instances.Specifically, supervised span-level entity contrastive learning regards entity spans with the same type as positive samples and constructs positive sample set P(i), and regards other types of entity span or non-entity span in the same batch as negative samples and constructs negative sample set N(i).
(2) Encoder in the encoder component, the SCL-SKG model uses the pre-trained model SWBERT to transform the input sentence sequence X = x 1 , x 2 , … , x n into dynamic word embedding, and extract text features, which can be expressed as: (3) Projection network referring to Chen's work in the image field (Chen et al. 2020), the SCL-SKG model utilizes Multi-Layer Perceptron (MLP) to project the embedding to another representation space in the projection network component, so as to obtain better feature representation in the training phase, which can be expressed as: (2) where W 1 and W 2 represent the weights of hidden layers, is ReLU activation function.
(4) Contrastive loss function since self-supervised contrastive learning cannot deal with type information of entities and will lead to multiple positive samples problem, we refer to the work of Khosla et al. (2020) and extend the self-supervised contrastive loss function to obtain the loss function of supervised span-level entity contrastive learning, which is expressed as: where z i is the span instance of the current entity, z p is the positive sample instance of z i , z n is the negative sample instance of z i , B(i) is the sample set in batch, |P(i)| is cardinality of positive sample set, N(i) is the set of negative samples, temperature parameter, the symbol • denotes the inner product operation of similarity calculation.
Thus, the span-level entity contrastive learning can be described as Algorithm 1.

Entity classification
The goal of entity classification is to predict the type of candidate entity span and filter non-entity span at the same time.Therefore, the entity classification consists of the following two steps: (1) Embedding concatenation.After span-level entity contrastive learning, the final embedding of candidate entity span s i is obtained, which can be expressed as: where h i is the embedding of entity span instance, s width is the embedding of length of entity span instance, and w cls is the special classification information.
(2) Entity type prediction.After embedding concatenation, the candidate entity span s i is fed into a Softmax layer for entity type prediction: (3) where W i represents the weight matrix, and b i represents the bias units.

Relation classification based on contrastive learning
In order to obtain a distinctive feature representation of candidate entity pairs that is more suitable for relation classification, we propose a supervised span-level relation contrastive learning at relation classification layer.Similar to the above entity classification, the relation classification layer of the SCL-SKG model includes two steps: relation contrastive learning and relation classification.

Supervised span-level relation contrastive learning
Compared with the supervised entity contrastive learning, the Encoder and Projection network of the supervised relation contrastive learning remain unchanged, and the data augmentation and contrastive loss function are different.
In the data enhancement component, supervised relation contrastive learning regards entity pairs with the same relation type as positive samples and constructs positive sample set P(i), and regards other relation types of entity pairs or non-relation in the same batch as negative samples and constructs negative sample set N(i).
Therefore, the contrastive loss function of supervised relation contrastive learning is defined as follows: where z i is the instance of the current entity pairs, z p is the positive sample instance of z i , z n is the negative sample instance of z i , B(i) is the set of candidate entity pair in batch, |P(i)| is cardinality of positive sample set, N(i) is the set of negative samples, temperature parameter, and the symbol • denotes the inner product operation of similarity calculation.
Similarly, span-level relation contrastive learning can be described as Algorithm 2. (7)

Relation classification
The goal of relation classification is to predict the type of candidate entity pair and filter non-relation at the same time.Therefore, the relation classification consists of the following two steps: (1) Embedding concatenation.After span-level relation contrastive learning, the final embedding of entity pair s ij is obtained, which can be expressed as: where h i is the embedding of entity pair, and c ij is the context of span pair.
(2) Relation type prediction.After embedding concatenation, the candidate entity pair s ij is fed into a fully connected layer for relation classification: where is activation function, W ij represents the weight matrix, and b ij represents the bias units.
In summary, the software knowledge extraction with span-level contrastive learning is described as Algorithm 3.
As shown in Algorithm 3, firstly, a pre-trained language model SWBERT in software engineering was used as the input feature encoder to obtain the dynamic word embedding of the sentence sequence.Then, the sentence sequence is modeled with spans as the unit to generate rich entity span representation to avoid the problem that overlapping entities cannot be selected.Finally, the supervised contrastive learning is introduced into the entity classification and relation classification tasks, and the entity span and entity pair feature representation are obtained by using data augmentation and contrastive loss function, so as to improve the performance of entity classification and relation classification.

Experiments
In order to evaluate the performance of SCL-SKG model proposed in this paper, the ablation experiments and comparative experiments with the benchmark models in the field of joint entity and relation extraction were carried out.SCL-SKG model was implemented in Python using the deep learning framework PyTorch.It was specifically configured as Intel Xeon Gold 5117 processor, 2.0 GHz clock speed, NVIDIA Tesla T4 GPU, 16GiB display memory, and all the experiments in this paper were conducted in this experimental environment.

Dataset
Due to the lack of available annotated dataset for the software knowledge extraction task, we build an annotated dataset based on the text of StackOverflow with reference to related research works (Ye et al. 2016;Tang et al. 2022).For the types of software knowledge entity and relation, we constructed eight predefined entity types and five predefined relation types.The detailed information is shown in Table 1.
In terms of data set annotation, we use the JavaScript Object Notation (JSON) file format to annotate information such as sentence sequence, entity type, start and end positions of entity span, relation type, header entity and tail entity to form software knowledge annotated data sets.In order to ensure the scientific and reasonable results of model experiment, the dataset is divided into training set, verification set and test set according to the ratio of 7:1:2 for the experiment of software knowledge extraction.The dataset consists of 19,013 sentences, 43,769 entities and 25,183 relations, which contain 452 instances of relations with overlapping entities.The detailed information of the dataset is shown in Table 2.

Parameter settings
For the span-based module in SCL-SKG model, the word embedding dimension of pretrained language model SWBERT is set as 768 dimensions, Batch size is set to 3, the maximum of span is set to 10, and the maximum value of entity span negative sample and relation negative sample is set to 100, Adam is used as the optimizer, the initial learning rate is set at 5e−5.For the contrastive learning module in SCL-SKG model, contrastive loss is used as the loss function of the model, and the temperature parameter is set to 0.1.The setting of relevant hyper-parameters is shown in Table 3.

Evaluation metrics
The General evaluation metrics in information extraction task are selected to evaluate the performance of the model, including precision rate (P), recall rate (R) and F1 score (F1).Precision rate (P) represents the percentage of correctly recognized samples in all recognized samples in the model extraction results; the recall rate (R) represents the percentage of correctly recognized samples in the number of all correct samples; F1 score is the weighted harmonic average of precision rate (P) and recall rate (R), which is used as the comprehensive performance evaluation index of the model.The formal expression of each evaluation index is as Eqs.( 10)-( 12):

Results and discussion
To evaluate the performance of SCL-SKG model proposed in this paper, three state-of-theart joint extraction models were selected for comparative experiments from two aspects: parameter sharing-based approach and joint decoding-based approach.The experimental results are shown in Table 4.
From the experiment results, it can be seen that the F1 score of the SCL-SKG model is higher than the other three baseline models, and has achieved a better performance.
(10) P = T P T P + F P , Compared with the token-level approach, the span-based approach takes spans as the unit to model the sentence sequence, which can alleviate the entity overlap problem and improve the performance of the model.Therefore, SPERT model and SCL-SKG model which are span-based approach obtained the highest precision rate of entity extraction and relation extraction, respectively.Compared with the SPERT model, the SCL-SKG model introduced the contrastive learning based on the span approach, and F1 scores of entity extraction and relation extraction were increased by 5.99% and 7.21%, respectively.Meanwhile, we evaluate the performance of SCL-SKG model on the dataset excluding the overlapping entities.When excluding the overlapping entities, F1 scores of entity extraction and relation extraction were only increased by 0.23% and 0.29% respectively.This indicates that the performance of SCL-SKG model remains relatively stable on these two different datasets.
In addition, the results of the SCL-SKG model for each predefined software knowledge entity type and relation type are shown in Figs. 2 and 3.
It can be seen from the results that the entity extraction task of SCL-SKG model has higher performance than the relation extraction task.In the task of entity extraction, the SCL-SKG model has better performance on entity types such as Software Tool, Software

Ablation experiment and analysis
The main goal of ablation experiments on SCL-SKG model is to verify the contribution of the proposed entity contrastive learning, relation contrastive learning, and pre-trained model SWBERT to software knowledge extraction.

Contribution of contrastive learning to performance
We selected the SCL-SKG model as the benchmark model to evaluate the contribution of entity contrastive learning and relation contrastive learning to the software knowledge extraction.The experimental results are shown in Table 5.
In Table 5, the SCL-SKG-NN model indicates that entity contrastive learning and relation contrastive learning are not introduced.The SCL-SKG-EC model indicates that only entity contrastive learning is introduced, the SCL-SKG-RC model indicates that only relation contrastive learning is introduced, the SCL-SKG-ALL model indicates that both entity contrastive learning and relation contrastive learning are introduced.
According to the experiment results, after only introducing entity contrastive learning, F1 scores of entity extraction and relation extraction is increased by 15.52% and 4.33%, respectively.After only introducing relation contrastive learning, F1 scores of entity extraction and relation extraction are increased by 0.04% and 6.86%, respectively.Experimental results show that entity contrastive learning and relation contrastive learning can obtain feature representations of entity span and entity pair that are more suitable for downstream classification tasks, which is helpful to improve the performance of software knowledge extraction.

Contribution of pre-trained model to performance
In order to evaluate the contribution of pre-trained language model SWBERT to software knowledge extraction, SCL-SKG model is selected as the benchmark model, and the experimental results are shown in Table 6.
In Table 6, the model is labelled by symbol "✓", if the corresponding features representation is used; Otherwise, it is labelled "".Among them, the model SCL-SKG-NN indicates that no pre-trained model is introduced, the model SCL-SKG-BERT indicates that the general domain BERT model is introduced, and the model SCL-SKG-SWBERT indicates that the pre-trained model for software domain SWBERT is introduced.
According to the experimental results, F1 scores of entity extraction and relation extraction are increased by 3.58% and 3.12% respectively after the general domain BERT is introduced.F1 scores of entity extraction and relation extraction are increased by 7.23% and 8.13%, after the pre-trained model SWBERT is introduced.In addition, compared with the introduction of BERT, F1 scores of entity extraction and relation extraction increased by 3.65% and 5.01% respectively after the introduction of SWBERT.

Comparison with state-of-the-art models on public dataset
To further evaluate the performance of SCL-SKG model proposed in this paper, we compare SCL-SKG with state-of-the-art joint extraction models on three public datasets.The CoNLL04 dataset (Roth and Yih 2004) is an annotated data set for news articles, including four entity types and five relation types.The SciERC dataset (Luan et al. 2018) is derived from 500 abstracts of AI conference/workshop proceedings in four AI communities.The ADE dataset (Gurulingappa et al. 2012) is derived from medical reports from drug use, including two entity types and one relation type.
Following the evaluation method of previous work, we measure the macro-averaged values for the CoNLL04 dataset and the ADE dataset; and measure the micro-averaged values for SciERC dataset.The experimental results are shown in Table 7. According

Analysis of joint training methods
The loss function of the proposed SCL-SKG is composed of four parts: entity contrastive learning loss L ec , entity classification loss L e , relation contrastive learning loss L rc and relation classification loss L r .Among them, entity classification loss L e adopts Categorical Cross Entropy as loss function, and the relation classification loss L r adopts the Binary Cross Entropy as the loss function.
In order to obtain the best joint training result of the proposed SCL-SKG, we tried three different joint training methods, namely adding loss function, multiplying loss function and linear combination of loss function.The specific formula is as follows: where the adding loss function represents the sum of four losses, such as entity contrastive loss L ec , entity classification loss L e , relation contrastive loss rc and relation classification loss L r , as shown in Formula (13).The multiplying loss function means entity contrastive loss multiplied by entity classification loss, and relation contrastive loss multiplied by relation classification loss, as shown in Formula ( 14).Linear combination of loss function means that a linear function is added to the entity classification loss and relation classification loss, as shown in Formula (15).
As can be seen from Figs. 4 and 5, the method of multiplication of loss functions will make the model fail to converge and achieve poor results.The method of adding loss

Case analysis
The above experimental results show that the software knowledge extraction model SCL-SKG based on span-level contrastive learning achieve better performance, and construct a software knowledge graph which contain 43,769 entity instances and 25,183 relation instances.The overview diagrams of the software knowledge graph with 50 nodes and 1000 nodes are shown in Figs. 6 and 7.
As can be seen from Figs. 6 and 7, nodes of the software knowledge graph represent instances of software knowledge entities, different types of software knowledge entities are distinguished by different colors.For example, the entity type of nodes "Java", "PHP" and "C#" is "Programming Language", the entity type of nodes "JUnit", "TestNG" and "React" is "Software Framework".The software knowledge instance <Pydantic, Inclusion, Python> describes a software knowledge fact that the header entity "Pydantic" linked to the tail entity "Python" via the "inclusion" edge.
Although the proposed SCL-SKG has achieved good results, there are still some specific problems.The specific case of the SCL-SKG is analyzed below, and the analysis results are shown in Table 8, where the symbol "[]" represents the extracted software knowledge entity.
In the Case 1, SCL-SKG model can not only extract software knowledge entities "Apples" and "IOS", but also accurately extract software knowledge entities "Apples Framework" and "IOS 7".According to statistics, 267 out of the 453 instances of relations with overlapping entities were accurately identified, which indicates that the SCL-SKG model can effectively alleviate the problem of entity overlap.In the Case 3, SCL-SKG model extracted software knowledge entities "BlackBerry", "BlackBerry Dynamics SDK" and "Cylance REST APIs", and accurately extracted the relation of "BlackBerry" and "BlackBerry Dynamics SDK" as "Inclusion".Meanwhile, although the relation of "BlackBerry Dynamics SDK" and "Cylance REST APIs" is not labeled in the training dataset, the model predicts that the relation of the two entities is "Brother".
In the Case 4, SCL-SKG model identifies the Software knowledge entities "CUDA" and "Compute Unified Device Architecture" as the types of "Software Platform" and "Software Tool" respectively, resulting in entity extraction errors and the "Consensus" relation is not correctly identified.

Conclusion and future work
In view of the problems of task dependence in traditional Pipeline method and entity overlap in software knowledge community text, we proposes a software knowledge extraction method based on span-level contrastive learning, and takes software knowledge community  In the future, we will integrate source code, issue reports, mailing lists and other different types of software resources, extract different granularity of software knowledge, and improve the completeness and application of software knowledge graph.At the same time, we plan to use the software knowledge graph as an auxiliary resource to study software expert recommendation, intelligent Q&A, automatic document generation and other software engineering issues, expand the application scenarios of the software knowledge graph, and promote the development of intelligent software development.
software knowledge extraction model SCL-SKG involves two subtasks: entity extraction and relation extraction.The evaluation metrics of the extraction results are as follows: if both the boundary and the type of software knowledge entity span are predicted correctly, the result of entity extraction is correct; if the boundary, type and semantic relation of software knowledge entities are predicted correctly, the result of relation extraction is correct.
Multi-head model(Bekoulis et al. 2018a, b)  is a joint entity and relation extraction model based on shared parameter approach.This model uses BILOU annotation method and CRF decoding to realize entity extraction, use multi-head selection algorithm and sigmoid layer to realize relation extraction.SPERT model (Eberts and Ulges 2019) is a joint entity and relation extraction model based on shared parameter approach.This model abandons the traditional method based on BIO/BILOU annotation, and uses the pre-trained language model BERT to obtain the word embedding of sentence sequence, and implements the joint entity and relation extraction by enumerating all possible entity spans in sentence sequence.NovelTagging model (Zheng et al. 2017b) is a joint entity and relation extraction model based on joint decoding approach.This model implements an end-to-end joint entity and relation extraction based on a new sequence annotation framework and LSTM network.

Fig. 2
Fig.2Extraction results for each entity type

Fig. 3
Fig. 3 Extraction results for each relation type to the experimental results, SCL-SKG model achieve performance improvement of entity extraction and relation extraction on the CoNLL04 dataset, F1 scores of entity extraction and relation extraction are increased by 0.7% and 0.3%, respectively.For SciERC dataset, the relation extraction performance also achieves improvement, F1 scores of relation extraction is increased by 1.1%.For ADE dataset, which contains 120 instances of relations with overlapping entities, SCL-SKG model does not achieve performance improvement of entity extraction and relation extraction compared with SPERT model.When overlapping entities are included, F1 scores for entity extraction and relationship extraction of the SCL-SKG model decrease by only 0.14% and 0.25%, respectively.

Fig. 4
Fig. 4 Joint training methods for entity extraction

Fig. 5
Fig. 5 Joint training methods for relation extraction

Table 1
The types of software knowledge entity and relation

Table 2
The detail of the dataset

Table 3
(True Positive) represents the number of correct relation types that are recognized as positive examples by model, F P (False Positive) represents the number of the wrong relation types that are recognized as positive examples by model and F N (False Negative) represents the correct number of relation types that are recognized as the negative examples by model.

Table 4
Experimental results on software knowledge dataset

Table 5
Contribution of contrastive learning to model performance

Table 7
Experimental results on public dataset