Considering the terminology data could be reused for ontology and knowledge graph construction, we referred to several ontology classification or workflow strategy [19, 20], and authoritative terminology system UMLS, SNOMED CT for terminology criteria establishment [21, 22]. The designed workflow included five steps, respectively (1) Classification schema design (2) Concepts and the sub-concepts assignment (3) Terminology editing strategy (4) Terminology property development (5) Online deployment
Classification schema design
According to the research of mass COVID-19 data, together with suggestions from two experts in medical informatics, we developed 10 classification schema for the first level top nodes involving disease, anatomic site, clinical manifestation, demographic and socioeconomic characteristics, living organism, qualifiers, psychological assistance, medical equipment, instruments and materials, epidemic prevention and control, diagnosis and treatment technique.
Concepts and the sub-concepts assignment
COVID Term was designed as a 6-level structure, few terms were included in the sixth level, which was easier for users to search and browse. Too many layers would mislead and confuse users. Figure 1 demonstrated the hierarchical structure of COVID Term. The second layer contained body tissue structure, symptoms and signs, pathogen, route of transmission, clinical typing, extreme behaviors, medical protective items, specimen types, monitoring index, clinical trial, laboratory diagnostic method, imaging diagnostic method, etc. The third layer included protective bodywear, face shield, protective glasses, gown, coverall, mask, gloves, respiratory tract specimens, other specimens, etc. The fourth layer involved serum specimens, surgical mask, etc. The fifth layer covered disposable surgical mask, convalescent serum sample, and acute serum sample, etc. The sixth layer contained few virus types. Agile model was adopted during the data processing, i.e. adjust the structure by adding, altering or deleting specific substructures when necessary.
Terminology editing strategy
A bilingual terminology system towards a worldwide emergency disease was supposed to be correct, authoritative and highly correlated, where exact bilingual concept, semantic types, etc. should be demonstrated. Therefore, the resources we have taken use of were limited to authority publishment (website, report, document, etc.) e.g. the situation report or document from World Health Organization, journal articles (preprint, open access, etc.), nationwide regulation, policy document, professional books, etc. Bilingual terms were mostly extracted from bilingual WHO documents, textbooks, and related papers. Definitions were located from textbooks and related papers under most conditions. The data processing was performed on TBench, a work platform for cross-lingual terminology operation[23]. After data processing in each round, two examiners with professional background and related practice experience were invited to validate the accuracy of the terminology. A third party with clinical experts would be involved when disagreement was reached.
Terminology property development
Each term was assigned several properties including concept ID, semantic type, Chinese preferred term, English preferred term as the obligatory items, and Chinese synonym, English synonym, bilingual definition, definition source as alternative items. The processing dates and time would be automatically generated through the system. Among these properties, concept ID could be directly linked to other systems through automatic mapping, each definition should come with a source for users to look up to. Synonyms were not a prerequisite element but more synonyms would help with the search scope and term location.
Online deployment
Currently, the terminology has been updated twice, each with more term branches of abundant information. The up to date COVID resources e.g. the lancet coronavirus theme, NIH 2019 novel coronavirus theme, WHO COVID-19 theme[24-26], etc. were constantly followed by COVID team to provide most recent terminology. The terminology towards COVID-19 was named as COVID Term. We built a website for COVID Term, making it available for users to access. Earlier versions were also released on the PHDA(Population Health Data Archive)[27].