Using Semi-Automatically Annotation System on Medical Entity Recognition

doi:10.21203/rs.3.rs-2222605/v1

Download PDF

Research Article

Using Semi-Automatically Annotation System on Medical Entity Recognition

https://doi.org/10.21203/rs.3.rs-2222605/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this older preprint version

Read the latest preprint version →

It is more and more common that people ask questions on the web and seek suggestion before visiting medical institutions. These corpus resources may be valuable for further research on natural languages processing for Medicine. Amazon provided a service called “Amazon Comprehend Medical” that could help medical experts to extract six kinds of the important terms from the articles. In this research, we proposed a medical entity recognition model to identify ten medical entity terms. A semi-auto annotation system was also developed to extract medical entity terms from the questions. The expected result shows that the annotation system could reduce 40% labeling time and provides a tagging interface to add medical entity terms manually.

Medical Informatics

Software Engineering

Biomedical Engineering

medical entity recognition

medical entity

annotation

In recent years, with the rising of health awareness and Internet use, more and more netizens search for information from online medical consultation platforms (OMC), health discussion forums, and social networking sites. OMC services have grown at an average rate of 150% every five years since the year 2000 [1]. These websites not only aggregate health news, but also serve online medical consultation, such as the 5151 Online Health Care Network, Quickly ask a doctor and king-net [2] [3]. It is not unusual that netizens submit questions and seek for suggestion before visiting the medical institutions [4]. It has been stated that 80% of adults in the US and 66% of adults in Europe seek online health advice [5]. These corpus resources may be valuable for further research on natural languages processing for Medicine. However, they are often recorded in unstructured free text. Medical informatics analysis is difficult to be conducted on these unstructured corpora, and it will benefit from translating them to structured form. The benefit includes the reduction of time required for manual expert review or the secondary use of these data for large scale automated processing [6][7]. To structure them for future usage, identify the name entity from these medical questions first may contribute.

Table 1 shows some examples that users might ask from these websites.

Table 1

Examples for health-related questions
Chinese Sentence	English translation
我最近白天一直上廁所而且晚上睡不著。不知道怎麼了?	I struggle to fall asleep, or I suffer from insomnia, I can’t get good sleep quality, or I can’t stat asleep. I don’t know what happened?
請問晚上睡覺時, 多夢不好睡, 除了吃中藥外, 還能哪些處理能得到降低多夢的情形呢? 一直夢到工作的事, 是否壓力太大造成的呢? 如何才不會一直夢到工作上的事情呢?	I am dreamful and it caused me hard to sleep at night. Besides taking Chinese medicine, what other treatments can be used to reduce the situation of dreaminess? Is it the stress caused by the work? Will you always dream about work?

Previous study showed the similar idea. Amazon Comprehend Medical online service offers functionality to extract medical name entities from unstructured medical sentences [8]. Medical condition, medication, test/treatment, procedures, anatomy and protected health information will be labeled on the Amazon service interface from the submitted data. In this study, we made an attempt to identify the medical name entities from medical questions corpus collected.

Table 2

Medical Entity Example
Order	Medical Entity Type	Example Terms
Order	Medical Entity Type	Chinese	English
1	Health Information	20歲、女性	20 years old, woman,
2	Symptom	牙結石	Dental Calculus
3	Disease	沙門氏菌感染	Salmonella Infections
4	Organ	頭、臉頰	Head, Cheek
5	Examination	尿液分析, 紅血球	Urinalysis, Red blood cell
6	Treatment	關節穿刺術	Arthrocentesis
7	Medication	幸福他命糖衣錠	HEPTAMIN TABLETS
8	Time	起床、躺下	Get up, lie down
9	Department	牙科	Dentistry
10	Abbreviation	心電圖、腦波圖	ECG, EEG

We collected questions corpus from medical consulting service sites. There were 82508 web posts gathered. Ten medical entities were determined after team group negotiation, including Symptom, Disease, Health Information, Department, Treatment, Examination, Medication, Organ, Time and Abbreviation. Abbreviation is the use of shortened word forms that are frequently ambiguous and present a problem for subsequent information retrieval. Clinical abbreviations persistently impede NLP performance and practical application in medicine [9]. Hence it is isolated in this study. 52 POS patterns was adopted to develop the method for medical entities recognition. Table 2 shows the example terms for each entity. To correct errors in generated result from the method, we expanded the whole application to semi-automatic medical entity recognition system (SAAS). This system integrates auto-labeling process and offers users interface to revise the extracted medical name entities. It also helps to reduce the labeling time cost compared with pure manually operation.

Medical entity recognition

Medical entity recognition is a type of information retrieval which focuses on identifying instances of various types of entities. For example, cancer would be an instance of disease; swelling would be an instance of symptoms and so on. [10]

Karetna, et al. built a MER system to extract drug name entities from unstructured and informal medical text using a hybrid model of lexicon-based and rule-based techniques. They used a lexicon as their initial step to detect drug name entities; then they applied some inference rules to further extract undetected drug names [11].

Kundeti, et al. [12] mentioned that medical texts contain both structured and unstructured data because doctors describe the patient’s condition in free-form English. It is not efficient to perform any analysis or data mining with this form of data. Hence, conversion of these unstructured documents to structured information is needed.

In Chinese language medical entity recognition, Liu, et al. [13] investigated the effects of different types of features in Chinese clinical NER tasks using the Condition Random Fields (CRF) algorithm. There are other studies that perform Named Entity Recognition using the CRF method in English [14][15]. Xu, et al. [16] described an unsupervised framework for recognizing and linking medical entities from Chinese online medical texts.

Amazon Comprehend Medical

Amazon Comprehend Medical is a natural language processing service that extract relevant medical information from unstructured text. It uses advanced machine learning models to accurately and quickly identify medical information, such as medical conditions and medications, and determines their relationship to each other, for instance, medicine dosage and strength [8].

Semi-automatic Approach

Semi-auto interface design could improve system performance and reduce the user's burden [17]. Lu, et al. [18] proposed a semi-automatic approach to construct Chinese-English Medical Subject Headings (MeSH) based on Web-based term translation. Medical Subject Headings is a comprehensive controlled vocabulary created and updated by the United States National Library of Medicine (NLM). The system provides knowledge engineers with candidate terms mined from anchor texts and search-result pages. They constructed a traditional Chinese-English MeSH, by translating English medical terms in MeSH into Chinese using an integrated Web-based term translation method. In previous years, we first proposed an integrated Web-based method that explores two kinds of Web resources, i.e., Web anchor text and search-result pages to effectively deal with the problems of multilingual translation for diverse unknown (new) Web query terms.

Annotation Data and Procedure

The 5151 Online Health Care Network is an online service for users to consult with doctors from 41 medical specialties. We collected 82,508 records of user consultation texts from the 5151 online health care web application. The content of consultation is the main focus, hence only health related advice is selected.

We randomly selected 100 health descriptions for an interdisciplinary team of physicians and two information scientists to develop an annotation guide. At the end of reviewing the 100 narratives, we obtained guiding principles agreed to by all members of the team.

Based on the annotation guideline, 5 annotators, each of whom are students with medical backgrounds, each marked 100 online health consultation articles. We selected online health consultation posts with word counts of no more than 150 words as the label text. The collection of 100 online health consultation posts consists of approximately 4,706 words, and the average number of words per post is 47 (SD 21.1).

Labeling is done using our SAAS. One hundred labeled online health consultations were used as training and test data for medical entity identification, as described below. The label data is marked by annotators respectively, and the differences are resolved by the SB (doctor). We also report Cohen’s kappa, a well-known statistic used to assess the reliability of a fixed number of evaluators when categorizing or classifying multiple projects [19]. We use this set of data and the previously constructed medical entity recognition model to capture all named entities.

Preparation

In this section, we give a brief introduction of Medical Subject Heading Terms which is used as our symptom and disease dictionary resource and the tool to segment the sentences.

Medical Subject Headings

We collect Mesh terms from disease categories and split tree number that starts with C23 to symptom. The reason we chose tree number starts with C23 as a symptom is because the doctor determined the term which tree number starts with C23 was a symptom. We perform data cleanup in the symptom dictionary by removing extra punctuation and duplicate symptoms. The above processing steps results in 375 symptoms in the symptom dictionary. In the dictionary, some examples are “頭昏眼花” (Dizziness), “頭痛” (Headache), “流膿” (Suppuration) and other health related symptoms.

We collect mesh terms from disease categories and split tree number before C23 to disease. The reason we chose tree number before C23 as disease was because the doctor determined the terms before tree number C23 was a disease. We performed data cleanup in the disease dictionary by removing extra punctuation and duplicate diseases. After the above processing steps, there were 3723 diseases in the dictionary. In the disease dictionary, some examples are “手足口病” (Foot and Mouth Disease Hand), “細菌感染” (Bacterial Infections) and “十二指腸潰瘍” (Duodenal Ulcer).

Tools: Ckip Chinese Word Segmentation

We employed a Chinese word segmentation system with selective function of new word recognition ability and additional part of speech(POS) tag developed by CKIP(Chinese Knowledge and Information Processing Group) [20]. For example in Fig. 1, “醫生您好, 我頭痛跟肚子痛, 還有最近晚上都一直上廁所, 不知道是怎麼了? ” It’s translated to “Hello doctor, I’m not sure what’s going on. I have a headache and stomach and go to the bathroom a lot at night. “ After segmentation, there will be a sequence of words and POS tags like 醫生(Na) 您好(VH) , (COMMACATEGORY) 我(Nh) 頭痛(VH) 跟(P) 肚子痛(VH) , (COMMACATEGORY) 還(D) 有(V_2) 最近(Nd) 晚上(Nd) 都(D) 一直(D) 上廁所(VA) , (COMMACATEGORY) 不(D) 知道(VK) 是(SHI) 怎麼(VH) 了(T) ?(QUESTIONCATEGORY) (Fig. 1). We will refer to the POS tag as the basis for the extraction. If the POS tag is “VH,” it may be a symptom entity such as headache or a stomachache.

Medical Entity Recognition

We introduce how to identify ten entities in the medical texts. The ten medical entity types are Symptom, Disease, Health Information, Department, Treatment, Examination, Medication, Organ, Time and Abbreviation. Medical entity will be divided into three processes: candidate entity generation, medical entity decision and semi-automatic medical entity extraction. Table 3 shows the definition for each medical entity.

Table 3

Medical Entity Definition
Type	Definition	Total count
Health Information	User personal information	823
Symptom	A physical or mental feature which is regarded as indicating a condition of disease	854
Disease	A disorder of structure or function in a human	3771
Organ	Brain, Teeth, Wrist	439
Examination	Inspection or investigation, especially as a means of diagnosing disease	1222
Treatment	The combating of a disease or disorder	2124
Medication	Medicine	30474
Time	Time information	103
Department	Digestive system	59
Abbreviation	A shortened form of a word or phrase	306

According to corpus analysis, we found:

(1) Long-word medical entities are usually segmented into several fragments by the natural language processing tool.

(2) There are some entity patterns on the medical texts.

Medical Entity Generation Model

In this section, we introduce the POS patterns of the candidates based on dictionary analysis. Table 4 shows the patient chief complaint text is segmented by the CKIP POS tagger. The output is set of word and part of speech pairs. Medical entity candidates will be generated according to the POS patterns of different entities. After that we filter the word set which length must large than 1 from the candidates. The dictionary mentioned above is used to determine whether entities belong to the medical entity.

Semi-automatic Medical Entity Extraction System

In this section, we will introduce a Semi-Automatic Annotating System (SAAS) that identifies medical entities based on text entered by the user. For entities that are not successfully identified, Correction is done by semi-automatic annotation immediately. Therefore, the success rate of medical entity recognition could gain more improvement. For medical researchers, SAAS can reduced labeling time and assisting entity labeling by marking results through visualized interface.

Our SAAS is composed of web interface, CKIP-based medical entity recognition unit, and the database. First, we let CKIP-based medical entity recognition unit load the raw data and generate automatic annotated data, then it will output the result to the web interface. Five operators with medical knowledge background then reviewed the generated result and corrected it on web interface. The automatic annotated data and corrected data from five operators will be stored into database for further analysis. Also, we collected the time consumed during semi-automatic annotation process, and compared it with the one during pure manual annotation.

Figure 2 shows our semi-automatic annotation system web interface, which is divide into two parts. Upper section part is an input box in which the medical text is submitted to be marked. Lower section part is the medical entity recognition result table generated by the system. Ten different entities will be labeled by different colors. For example, the entity health information will be marked in red, the symptom will be marked in sky blue. The generated result table is editable on web interface. Users could manually correct and add the unrecognized entity.

Table 5 shows the Chinese and English translations from part A of Fig. 3. Table 6shows the medical entity extracted results. We could see 9 medical entities were extracted. Only Abbreviation didn’t extract.

Table 5

Chinese-English comparison/translation of medical texts
	Chinese	English
Medical Texts	我是一位20歲的女性, 在當工程師, 您好, 大約距今大概一個月前, 我發現自己開始頭暈, 大多發生在起床、躺下、抬頭、低頭的時候, 刷牙時腦部晃動也會暈, 極少數發生在轉頭或走路時。平常坐著時, 頭暈的情況雖然不明顯, 但就是覺得頭不舒服, 胃管插入, 影響到工作, 用了喜達諾注射液。我一開始先到小診所看診, 由於我去年曾因缺鐵性貧血而頭暈, 就先做了抽血檢查, 但檢查結果顯示血紅素沒問題。診所的醫師便建議我到大醫院檢查, 暫且幫我轉診到血液腫瘤科。血液腫瘤科也幫我做了抽血檢查, 但這次主要是看血液中鐵的含量, 結果檢查結果出來也是正常, 醫師認為這不太像是血液的問題, 建議我改看耳鼻喉科。	I am a 20-year-old woman. As an engineer, hello. About a month ago, I found myself dizzy. Most of them happened when I got up, lay down, looked up, and lowered my head. When I brush my teeth, my brain shakes Will dizzy, very few occur when turning head or walking. Although the dizziness was not obvious when I was sitting normally, I felt that my head was uncomfortable. The stomach tube was inserted, which affected my work. I used Starno injection. I first went to a small clinic to see a doctor. Since I was dizzy due to iron deficiency anemia last year, I had a blood test first, but the test results showed that heme was fine. The doctor at the clinic suggested that I go to a big hospital for examination and temporarily refer me to the hematology and oncology department. The Department of Hematology and Oncology also helped me with a blood test, but this time I mainly looked at the amount of iron in the blood. As a result, the test results were also normal. The doctor thought that this was not like a blood problem and suggest me to switch to otolaryngology.

Corpus Characteristics and Annotation Agreement

Five persons semi-automatically annotate 500 online health consultation texts (posts). As shown in Fig. 3, symptom was the most frequently annotated entity then medication and then Time entity. Examination and abbreviation had the least number of annotated instances.

In Fig. 4, five persons labeled results were grouping with medical entity type. The annotator Osborn labeled the top count in total, 510 terms were added. Second, 399 terms were labeled by Freedom. Third, 213 terms were added by Jim.

Figure 7 shows that it takes average 36.21 minutes to label 100 online health consultation texts (posts) using a semi-automatic annotation system.

It takes 91.96 minutes to manually label 100 online health consultation texts (posts). The semi-automatic annotation system uses 60% less time for medical text labeling.

Error Analyses

For error analyses, we focused on medical entity non-recognition errors. 100 medical texts were randomly selected for medical entities recognition. After review, a total of 163 medical entities were not identified successfully. We group the unrecognized named entities into a total of two types of errors and give an illustrative example for both types. In all examples, annotated named entities are shown in bold. Errors can be divided into two types: 1. The medical entity is not in the dictionary. 2. The POS patterns are insufficient. These 163 unidentified entities are not in the dictionary we created. 77 can be solved after addition to the dictionary. 86 of the entities are not recognized because the POS pattern is insufficient. Further analysis of these 86 POS pattern errors can be subdivided into two errors: 1. Custom POS tag Unadded error. 2.POS pattern rules insufficient error. Table 8 shows four examples, including Chinese health consultation, English translation, entity identification results, and medical entities that were not successfully identified.

Table 8

Example of Medical Entity Recognition errors
Chinese sentence	English Translation	Not in the dictionary error	POS pattern insufficient
Chinese sentence	English Translation	Not in the dictionary error	Unadded Error	Insufficient Error
我要看腎科在桃園有家中醫把脈準的...。	I want to see that the nephrology has a Chinese medicine practitioner in Taoyuan.		把脈 (pulse taking)	腎科(Nephrology)
本人去手術是關於乳房腫瘤檢驗報告是說不典行乳管增生不曉得這種是啥? 要如何保養手術後ㄉ身體及為何會發生? 手術後是否會復發 ….。	I am going to have surgery for breast tumors. The test report shows that is a typical ductal hyperplasia. what is it? How can I take care for my body and what does it happen after surgery? Will it recur after surgery?	手術 (Surgery)、乳房腫瘤(Breast tumor)		非典行乳管增生 (Atypical ductal hyperplasia)
初期手臂痠痛, 現在已經延伸到肩部, 試過推拿, 也是過針灸, 但成果都短暫是否該持續的做某種復健才好?	Initially my arm was sore, now it has been extended to the shoulders. I have tried massage, and also acupuncture, but the results are brief. Is it necessary to continue to do some kind of rehabilitation?		手臂痠痛(Sore arm)	推拿(Massage), 復健(Rehabilitation)

Principal Results

Figure 4 shows that the number of labels for symptoms is the highest of the ten entities which five operators need to manually correct or add. This may because the description of the symptoms is more variable and divergent than other entities. Each people may adopt disparate way to describe the same symptom. If the symptoms are “頭暈”(dizziness), you can use the descriptions of “頭昏昏的”(light-headed) and “天旋地轉”(spinning) to indicate dizziness. The following entity with second highest error is organs. It is probably due to that in Chinese, there are variable size version of phrase for the same specific organ. An Example is: “頭”(head) and “頭部”(head). The third highest error entity is drug, mostly because of the drugs in submitted medicine texts are mainly Chinese herbal medicine. The pre-built drug dictionary database mainly contains Western medicines.

In Chinese languages, there is a more complex issue presented during medicine entity recognition. Unlike Latin based or Germanic based languages, the ones without blank space will lead to problems on word delimitation [21]. Major problems faced during tokenization and segmentation include tokenizing direction, efficiency, and ambiguity [22] [23]. CKIP word segmentation system was developed to resolve these problems. We adopted it to fit POS tagging process in our SAAS system.

Our medical entity recognition is trained with ten entity dictionaries and part-of-speech pairs. Therefore, the recognition result is limited by the scope of the training corpus. At present, part-of-speech pattern based medical entity recognition method is used. For different entity, we define different POS patterns, so for the symptom entity and disease entity which appear most frequently in medical texts will have many patterns rules to be added. In the future, we will increase the size of the training materials and reference our semi-automatically labeled concepts to semi-supervised machine learning methods to improve performance.

In this study, we developed an annotation guideline for medical entity information extraction from online health consultation posts. We then developed medical entity recognition models for automatically extracting medical entities from the medical text. In addition, we use semi-automatic medical entity recognition models to identify ten entities: Symptom, Disease, Health Information, Department, Treatment, Examination, Medication, Organ, Time and Abbreviation. For entities that are not extracted, we provide the ability to add them manually. Experiments have shown that the system we provide can reduce the time for medical text annotation. As the number of tags increases, the required tag time decreases.

ME: Medical Entity

MERM: Medical Entity Recognition Model

NLP: Natural Language Processing

SAAS: Semi-automatic annotating system

CKIP: Chinese Knowledge and Information Processing Group

POS: Part of speech

MeSH: Medical Subject Headings

NLM: National Library of Medicine

Acknowledgements

We would like to acknowledge the Chinese Knowledge Information Processing Group, Academia Sinica Institute of Information Science for their support of offering complimentary access to the Chinese segmentation system.

Conflicts of Interest

None declared.

I. Al-Mahdi, K. Gray, and R. Lederman, “Online Medical Consultation: A review of literature and practice,” vol. 164, p. 4, 2015.
“5151線上健康照護網 -.” http://www.5151.tw/dm.php (accessed Oct. 03, 2022).
KingNet國家網路醫藥, “免費醫藥諮詢服務-KingNet國家網路醫藥,” KingNet 國家網路醫藥. https://www.kingnet.com.tw/inquiry/ (accessed Oct. 03, 2022).
E. Sillence, P. Briggs, P. R. Harris, and L. Fishwick, “How do patients evaluate and make use of online health information?,” Soc. Sci. Med., vol. 64, no. 9, pp. 1853–1862, May 2007, doi: 10.1016/j.socscimed.2007.01.012.
L. G. Kean, “Half of American adults have searched online for health information.,” p. 42.
K. Kreimeyer et al., “Natural language processing systems for capturing and standardizing unstructured clinical information: A systematic review,” J. Biomed. Inform., vol. 73, pp. 14–29, Sep. 2017, doi: 10.1016/j.jbi.2017.07.012.
D. Demner-Fushman, W. W. Chapman, and C. J. McDonald, “What can natural language processing do for clinical decision support?,” J. Biomed. Inform., vol. 42, no. 5, pp. 760–772, 2009, doi: https://doi.org/10.1016/j.jbi.2009.08.007.
“Extract Health Data - Amazon Comprehend Medical – Amazon Web Services,” Amazon Web Services, Inc. https://aws.amazon.com/comprehend/medical/ (accessed Oct. 03, 2022).
S. Moon, B. McInnes, and G. B. Melton, “Challenges and Practical Approaches with Word Sense Disambiguation of Acronyms and Abbreviations in the Clinical Domain,” hir, vol. 21, no. 1, pp. 35–42, Jan. 2015, doi: 10.4258/hir.2015.21.1.35.
P. Hucklenbroich, “‘Disease Entity’ as the Key Theoretical Concept of Medicine,” J. Med. Philos. Forum Bioeth. Philos. Med., vol. 39, no. 6, pp. 609–633, Dec. 2014, doi: 10.1093/jmp/jhu040.
S. Keretna, C. P. Lim, and D. C. Creighton, “A hybrid model for named entity recognition using unstructured medical text,” 2014 9th Int. Conf. Syst. Syst. Eng. SOSE, pp. 85–90, 2014.
S. R. Kundeti, J. Vijayananda, S. Mujjiga, and M. Kalyan, “Clinical named entity recognition: Challenges and opportunities,” 2016 IEEE Int. Conf. Big Data Big Data, pp. 1937–1945, 2016.
K. Liu, Q. Hu, J. Liu, and C. Xing, “Named Entity Recognition in Chinese Electronic Medical Records Based on CRF,” in 2017 14th Web Information Systems and Applications Conference (WISA), Liuzhou, Guangxi Province, China, Nov. 2017, pp. 105–110. doi: 10.1109/WISA.2017.8.
P. Pathak, R. Goswami, G. Joshi, P. Patel, and A. Patel, “Crf-based clinical named entity recognition using clinical nlp,” presented at the Proceedings of International Conference on Natural Language Processing, 2013.
C. Jochim and L. Deleris, “Named entity recognition in the medical domain with constrained CRF models,” 2017, pp. 839–849.
J. Xu, L. Gan, M. Cheng, and Q. Wu, “Unsupervised Medical Entity Recognition and Linking in Chinese Online Medical Text,” J. Healthc. Eng., vol. 2018, pp. 1–13, 2018, doi: 10.1155/2018/2548537.
F. Ciravegna and Y. Wilks, “Designing Adaptive Information Extraction,” Annot. Semantic Web, vol. 96, p. 112, 2003.
W.-H. Lu, S.-J. Lin, Y.-C. Chan, and K.-H. Chen, “Semi-automatic construction of the Chinese-English MeSH using web-based term translation method,” 2005, vol. 2005, p. 475.
M. L. McHugh, “Interrater reliability: the kappa statistic,” Biochem. Medica, vol. 22, no. 3, pp. 276–282, 2012.
“CKIP CoreNLP.” https://ckip.iis.sinica.edu.tw/service/corenlp/ (accessed Oct. 12, 2022).
M. A. Inuzuka, A. S. Rocha, and H. A. Nascimento, “Segmentation of words written in the Latin alphabet: a systematic review,” 2020, pp. 291–302.
Y. Yao and K. Ten Lua, “Splitting-merging model of chinese word tokenization and segmentation,” Nat. Lang. Eng., vol. 4, no. 4, pp. 309–324, 1998.
C.-R. Huang, K.-J. Chen, L.-L. Chang, and F.-Y. Chen, “Segmentation standard for Chinese natural language processing,” 1997, pp. 47–62.

Download PDF

Version 1

posted

You are reading this older preprint version

Read the latest preprint version →

Using Semi-Automatically Annotation System on Medical Entity Recognition

Status:

Version 1

Abstract

Figures

Introduction

Related Work

Medical entity recognition

Amazon Comprehend Medical

Semi-automatic Approach

Methods

Annotation Data and Procedure

Preparation

Medical Subject Headings

Tools: Ckip Chinese Word Segmentation

Medical Entity Recognition

Medical Entity Generation Model

Semi-automatic Medical Entity Extraction System

Results

Error Analyses

Discussion

Limitations

Conclusions

Abbreviations

Declarations

Acknowledgements

Conflicts of Interest

References

Status:

Version 1