Background: Building a large-scale medical knowledge graphs needs to automatically extract the relations between entities from electronic medical records(EMRs) . The main challenges are the scarcity of available labeled corpus and the identification of complexity semantic relations in text of Chinese EMRs. A hybrid method based on semi-supervised learning is proposed to extract the medical entity relations from small-scale complex Chinese EMRs.
Methods: The semantic features of sentences are extracted by a residual network(ResNet) and the long dependent information is captured by bidirectional GRU(Gated Recurrent Unit). Then the attention mechanism is used to assign weights for the extracted features respectively, and the output of two attention mechanisms is integrated for relation prediction. We adjusted the training process with manually annotated small-scale relational corpus and bootstrapping semi-supervised learning algorithm, and continuously expanded the datasets during the training process.
Results: We constructed a small corpus of Chinese EMRs relation extraction based on the EMR datasets released at the CCKS(China Conference on Knowledge Graph and Semantic Computing). The experimental results show that the best F1-score of the proposed method on the overall relation categories reaches 89.78%, which is 13.07% higher than the baseline CNN.