Improving Biomedical Named Entity Recognition with Syntactic Information

DOI: https://doi.org/10.21203/rs.3.rs-21994/v1

Abstract

Background Biomedical named entity recognition (BioNER) is an important task for understanding biomedical texts. The task can be challenging due to the lack of large-scale labeled training data and domain knowledge. Previous studies have shown that syntactic information can be useful for named entity recognition; however, most of them fail to weigh that information with respect to its contribution as they treat the syntactic information as gold reference. 

Results In this paper, we propose BioKMNER, a BioNER model for biomedical texts with key-value memory networks to incorporate syntactic information, which is extracted from syntactic structures automatically generated by existing toolkits. Our approach outperforms baselines without memories and achieves new state-of-the-art results on on four biomedical datasets compared with previous studies, i.e., 85.67% on BC2GM, 94.22% on BC5CDR-chemical, 90.11% on NCBI-diease, and 76.33% on Species-800.  

Conclusion Experimental results on four benchmark datasets demonstrate the effectiveness of our method, where the state-of-the-art performance is achieved on all of them.

Full Text

This preprint is available for download as a PDF.