In category 1 “Drugs for diagnosis and treatment” for the general course of the disease and the early stage, the following matches were found among the top 10 terms. In the first place in both cases is “Trastuzumab”, “Anthracycline” is in 9th and 2nd places, respectively, “Lapatinib” is in 10th place in both cases. That is, only 3 terms out of 10 coincided.
In the “Bioorganic chemistry” category, among the top 10 terms, the terms “Methylation” coincided (1st and 9th place for the general course of the disease and early stage), “Lipid” (2nd and 5th places, respectively), “Peptide” (4th and 8th places, respectively), “Aromatase” (5th and 1st places respectively), “Fatty” (7th and 6th places), “Ligand” (9th and 2nd places). That is, 6 terms out of 10 coincided.
In the “Methods and technologies of diagnosis and treatment” category, among the top 10 terms, the terms “Chemotherapy” coincided (on the 1st place both for the general course of the disease and for the early stage), “Adjuvant” (on the 2nd place in both cases), “Radiotherapy” (on the 3rd and 4th places), “Radiation” (at 6th and 10th places), “Surgery” (at 9th and 3rd places), “Mastectomy” (at 10th and 6th places). That is, 6 terms out of 10 coincided.
In the “Genetics” category for the general course of the disease and the early stage, the following matches were found among the top 10 terms. In the first place in both cases is “Expression”, the term “Pathway” (in 2nd and 8th places), “Gene” (in 3rd place in both cases), “MiR- (microRNA)” (in 4th and 2nd places), “RNA” (in 5th place in both cases), “Mutation” (6th and 4th), “DNA” (7th and 8th). That is, 7 terms out of 10 matched.
In the category “Molecular biology” for the general course of the disease and the early stage, the following matches were found among the top 10 terms. In the first place in both cases is the “Receptor”, the term “Estrogen receptor” (in 3rd and 7th places), “Estrogen” (in 7th and 5th places). That is, 3 terms out of 10 matched.
In the category “Breast cancer” for general and early stage, the terms “Breast Cancer Patient” (3rd and 2nd for general and early stage), “BRCA” (4th for both), “Breast Cancer Survivor” (6th for both), “Luminal” (8th place in both cases). That is, 4 terms out of 10 matched.
In the “Cancer” category, among the top 10 terms, the terms “Tumo(u)r” coincided (in 1st place for the general course of the disease, in 2nd place for the early stage), “Metastasis” (in 2nd and 4th places, respectively), “Recurrence” (in 4th and 3rd places). That is, 3 terms out of 10 matched.
According to the Kdif values (the relative ratio of the number of mismatched terms, Table 2), as well as from Fig. 3 there are “original terms” related to the early stage of breast cancer. In the 7 categories described above percentage of “original terms” is different. In two categories (“Breast cancer” and “Methods of diagnosis and treatment”) the percentage is the maximum.
According to the Кcoin values (Table 2) in two categories (“Bioorganic chemistry” and “Drugs for diagnosis and treatment”) when describing the early stage of the disease, top-terms are actively used.
The top-10 terms are the closest in the categories of “Bioorganic chemistry” and “Cancer”. The “Breast cancer” category is characterized by a wide variety of terms. This category is the most representative and specific.
The main goal of compiling a terminological portrait is to create a basis for automatic extraction and coding of scientific and clinical information to improve the search precision in PubMed.
For feasibility check of search strategy in PubMed database, we used the Methods of diagnosis and treatment category of the terminological portrait. The search precision based on the terminological portrait calculated as mentioned above increases (Table 5). Search relevance was assessed by the number of documents contained the terms “breast” and “early” in their titles. Usually, non-relevant publications were related to other types of cancer, and breast cancer was only mentioned in the texts.
To calculate the search precision, 10 queries were performed (1 query for each of the top 10 terms) for terms that relate to the general course of the disease and are also found in the description of the early stage (Table 3, A), as well as 10 queries for the original search terms queries that were encountered only when describing the early stage of the disease (Table 3, C). After the calculations carried out for each request (K1, K2, K3), the arithmetic mean values of the accuracy were calculated (Fig. 3). K1 is the search precision for the term “breast”. K2 is the search precision for the terms “breast AND early”. K3 is the search precision, assessed as the content of relevant articles containing the term “early” in the title among the articles containing the term “breast”.