Diagnosis Prediction of Tumours of Unknown Origin Using Immunogenius, A Machine Learning-based Expert Support System for Immunohistochemistry Profile Interpretation

DOI: https://doi.org/10.21203/rs.3.rs-58003/v1

Abstract

Immunohistochemistry (IHC) remains the gold standard for the diagnosis of pathological diseases. This technique has been supporting pathologists in making precise decisions regarding differential diagnosis and subtyping, and in creating personalized treatment plans. However, the interpretation of IHC results presents challenges in complicated cases. Furthermore, rapidly increasing amounts of IHC data are making it even harder for pathologists to reach to definitive conclusions.

Methods

We developed ImmunoGenius, a machine-learning-based expert support system for the pathologist, to support the diagnosis of tumors of unknown origin. Based on the Bayesian theorem, we developed the reactive native mobile application for iOS and Android platform. ImmunoGenius include the IHC profile of 584 antibodies in 2009 neoplasms.

Results

We trained the software using 634 real case data, validated it with 382 case data, and compared the precision hit rate. Precision hit rate of the training dataset was 78.5 % and the hit rate of the validation dataset was 78.0%, which showed no significant difference. The main reason for discordant precision was lack of disease-specific IHC markers and overlapping IHC profiles observed in similar diseases.

Conclusion

The results of this study showed a potential that the machine-learning algorithm based expert system can support the pathologic diagnosis by providing second opinion on IHC interpretation based on IHC database. Incorporation with contextual data including the clinical and histological findings might be required to elaborate the system in the future.

Background

Immunohistochemical staining (IHC) is an essential staining method for differentiating tumor origin in pathologic diagnosis. It enables to infer the origin of cells by investigating the expression of specific antigens in the tissue [1–6]. In 1941, Dr Albert Coons developed an indirect form of immunofluorescence staining technique[1, 7]. Initially, it was designed for staining fresh tissue samples and samples were visualized by fluorescence microscopy. However, with the introduction of enzyme-conjugated antibodies and paraffin-embedding, IHC became a regularly used assay in the diagnosis of pathological conditions[2–6]. Simultaneously, the role of IHC has been extended from classifying the cellular origin of tumours to the subtyping tumours, determining treatment efficacy, predicting patient prognosis (prognostic marker), and finally differentiating precancerous lesions by evaluating the molecular changes[1–3, 8].

However, the rapidly expanding knowledge about IHC positivity in each neoplasm often leads to conflicting interpretations in routine practices, especially in some complicated cases [9]. For example, a combination of TTF-1 (lung and thyroid), galectin-3 (100% in papillary thyroid cancers), and napsin A (lung adenocarcinomas) is used to determine the tumour origin of a lung mass in patients with thyroid nodules [10, 11]. However, in different lung cancer subtypes, TTF-1 positivity changes from 21% to 91%, and galectin-3 shows 49% positivity in the subset of lung adenocarcinomas, and napsin A shows a positivity of less than 5% in thyroid cancers, which means that the IHC results by themselves cannot exclude the rare exceptions[11–13]. The interpretation of IHC results can be biased depending on the experience and knowledge of the individual pathologists[2, 4, 6]. Presently, thousands of new antibodies and IHC staining data from various tumours are available to researchers. Over amillion studies using IHC-based assays have been published since 2010. Therefore, it is not feasible for the pathologists to memorize the expression of all the molecular markers recognized by the constantly evolving repertoire of antibodies in tumours from different tissues of origin[14].

Algorithmic approaches and standardized IHC panels for certain diagnoses have been used to solve this problem[9, 14, 15]. However, in clinical practice, each case is unique and sensitive, and generalized application of particular IHC panels in some cases can be time-consuming and labour-intensive.

Thus, we developed an expert support system using computer software, in the form of an iOS and Android mobile application-based on a machine-learning algorithm and IHC database IHC that assists pathologists in making a precise diagnosis.

Methods

This study was approved by the Institutional Review Board of the Catholic University of Korea, College of Medicine (SC17RCDI0074).

Development Of Machine-learning Algorithm Using Probabilistic Decision Tree

According to Bayes’ theorem, the post-event probability can be calculated when the pre-event probability is given. Bayes’ theorem is stated mathematically as P(B) ≠ 0, where A and B are events[16]. P(A|B) and P(B|A) are the conditional probabilities, such that the likelihood of event A occurring, given that B has occurred and vice versa, respectively. P(A) and P(B) are the probabilities of observing A and B independently of each other[16]

The probabilistic decision tree is one of the predictive modelling approaches used in statistics and data mining. It is often used for machine-learning algorithms, especially when the test node results are binary (Fig. 1). We adopt this approach for our machine-learning algorithm, because the IHC results are binary, and the calculated probabilities can be put together into a database.

To apply the probabilistic decision tree algorithm, we need a database of a 2 × 2 table with tests, diseases, and the probability of positivity of each test for each disease (Fig. 2). Test results obtained are binary. The probability of positivity signifies the number of positive cases among all the cases of the disease. Once the test results are obtained, the probability for each disease can be calculated by multiplying the prior probability and the probabilities of each test being positive or negative, to indicate the illness with the highest probability, by comparing post probability.

Construction of IHC Database

As shown in Supplementary Table 1, important textbooks on IHC such as Classification of Tumours Series (IARC, Lyon, France) and literature from World Health Organization (WHO), were used to build an IHC database based on the IHC expression profile of all tumours[4, 5, 17–28]. Over 5000 different neoplasms were recorded based on the WHO classification. Neoplasms without IHC expression profile were excluded. Differences in the IHC profile of tumour subtypes, were recorded separately from the primary type.

Each tumours IHC positivity was recorded as showed in the textbook. If there was no exact numerical value attributed to the positivity, arbitrary expressions such as “always positive”, “often positive”, or “rarely/ occasionally positive” were assigned. The positivity of each tumour was described as: “always”: 95%; “often”: 75%; “in about a half of cases”: 50%; “seldom”: 30%; “rarely/ occasionally”: 10%; and “never”: 0%. If the positivity differs between textbooks, the average value was used in the database. IHC database showed in Supplementary Fig. 1.

Around 600 antibody names and their synonyms used in IHC were recorded using the textbooks and reviewed with the online references Supplementary Table 2.

Development of ImmunoGenius, A Mobile Application for iOS and Android

The “ImmunoGenius” mobile application for iOS and Android was developed using NoSQL (Fig. 3) and can be accessed on iPhones, Android phones, and iPads. It is designed to search for diseases and upon selection of the illness it generates a table with the IHC antibody names in the first row and disease name in the left column. The IHC profiles are showed in the corresponding cells designated as “++” for 75–100% positivity, “+” for 50–74%, “+/-” for 30–49%, “-/+” for 10–29%, and “–” for 0–9% shown with graded shades (Fig. 4). Individuals can compare the different IHC profiles and add or remove the diseases and IHC antibodies to customize the table. Importantly, individuals can add their IHC results through a button on the right-hand side. Once the IHC results are inserted, the diagnosis presumption algorithm calculates the top 10 most probable diagnoses, which are shown along with the estimated probability (red numbers).

Validation Of Diagnosis Presumption Algorithm Using Patient Data

To prove the precision of the diagnosis presumption algorithm, IHC profile data was generated for specific cases and diagnosed by pathologists using conventional methods. These were then compared with the top 10 results from the presumptive diagnoses algorithm. The IHC profile data of 1000 tumours of unknown origin (TUOs) collected between 2010 to 2017 from the Yeouido and Seoul St. Mary’s Hospital, College of Medicine, The Catholic University of Korea were used in this study. Any data related to patient identification, except the original diagnosis and the IHC results, were blinded before data processing. TUOs were defined as the cases in clinical or pathological situation, where the immunohistochemical differential diagnosis is needed to differentiate between primary or metastatic lesions, or between variable subtypes of cancers, for confirmative diagnosis. In such cases, the histological findings alone cannot exclude the possibility of misdiagnosis or misclassification (e.g. determination of tumour origin in ascites, pleural fluid, or lymph nodes; determination of primary or metastatic lesions and pathologic subtyping in the needle biopsy samples of lung, liver, or kidney, where metastasis is common and clinicoradiologic findings are not confirmative). For training and validation, the retrieved database was divided into 6:4. The cases with inadequate IHC profiles such as the absence of markers for tumour origins, IHC less than three antibodies, inconclusive results were excluded. However, only prognostic markers such as EGFR or p53 were eliminated. Supplementary Fig. 3 showed an example of retrieved IHC profile dataset from patients. The precision of diagnosis presumption algorithm was confirmed by the inclusion of the diagnosis obtained by conventional methods in the top 10 presumptive diagnoses generated by the algorithm. It is considered to be inclusive, without significant difference in the IHC profile, between the initial and presumptive diagnosis, but the only difference in location (e.g., gastrointestinal stromal tumour of the stomach vs. small intestine). The hit rate of training and validation data was compared to prove the functionality of the algorithm. The algorithm is considered validated, if there is no statistically significant difference between the training and validation dataset.

Statistical Analysis

Time and computer complexity were accessed by testing the mobile application. Chi-square test was used to compare the hit rate between original and presumptive diagnoses. A web-based statistical analysis (“http://web-r.org”) was used for statistical analysis.

Results

Construction of IHC Database. Recruitment of Training and Validation Dataset

The detailed information related to 2009 different types of cancer, 584 IHC antibodies, and their IHC profiles were recorded in the IHC database. Six-hundred and thirty-four cases were used for the training dataset and 382 cases were used for the validation dataset.

Training Data

The recruited training and validation data of the tumours were from 634 and 382 cases, respectively. On an average, 6.8 IHC antibodies (ranged 1–13) were used for diagnosis. A wide variety of tumours from 32 organs were included. Nineteen cases were excluded, because of inconclusive diagnosis, and 53 cases with less than three IHC tested antibodies were excluded. As a result, 562 cases were used for training. The organ and the original diagnoses of the training data cases are shown in Tables 1 and 2. The common organs were lung (20.6%), liver (9.8%), kidney (6.6%), stomach (6.6%), and large intestine/rectum (5.3%) (Table 1). Ascites and peritoneum consist of 5.7%, while pleural fluid and pleura comprised of 5.2% (Table 1) of the cases. Primary carcinoma consists of 41.3% of the cases, followed by metastatic carcinoma (26.9%), benign mesenchymal tumour (21.4%), mild (normal) lesion (5.9%), and malignant mesenchymal tumour (4.6%) (Table 2). The hit rate of the presumptive diagnosis of the training data (top 10) was 78.5% (Table 2). The error rates being the highest at 30.8% in malignant mesenchymal tumours, followed by metastatic carcinoma (25.8%), benign mesenchymal tumours (23.3%), primary carcinoma (18.1%), and benign (normal) lesion (12.1%).

Table 1

The organs of the training and validation dataset of TUO
Organ	Training data		Validation data
Organ	No.	%	No.	%
Ascites (cell block)	19	3.4%	11	2.9%
Pleural fluid (cell block)	15	2.7%	12	3.1%
Lymph node	12	2.1%	9	2.4%
Peritoneum	13	2.3%	8	2.1%
Pleura	14	2.5%	7	1.8%
Lung	116	20.6%	75	19.6%
Liver	55	9.8%	43	11.3%
Kidney	37	6.6%	31	8.1%
Breast	17	3.0%	11	2.9%
Soft tissue	18	3.2%	14	3.7%
Female genital tract including uterus and vulva, vagina	17	3.0%	12	3.1%
Adnexa	8	1.4%	6	1.6%
Bladder and urinary tract	7	1.2%	5	1.3%
Adrenal gland	10	1.8%	5	1.3%
Prostate	15	2.7%	11	2.9%
Testis	8	1.4%	5	1.3%
Pancreas	7	1.2%	4	1.0%
Stomach	37	6.6%	20	5.2%
Small intestine	4	0.7%	1	0.3%
Large intestine and rectum	30	5.3%	23	6.0%
Gallbladder	3	0.5%	1	0.3%
2	0.4%	1	0.3%
Brain (CNS)	11	2.0%	6	1.6%
Meninges	22	3.9%	16	4.2%
Naso- and oropharynx	13	2.3%	10	2.6%
Skin	14	2.5%	11	2.9%
Soft tissue	20	3.6%	13	3.4%
Bone	6	1.1%	5	1.3%
Thyroid gland	2	0.4%	1	0.3%
Thymus	3	0.5%	2	0.5%
Salivary gland	5	0.9%	3	0.8%
Eye	2	0.4%	0	0.0%
Total	562	100.0%	382	100.0%

Table 2

The original diagnoses of the training and validation dataset of TUO.
Organ	Training data				Validation data
Organ	No.	%	Errors	%	No.	%	Errors	%
Primary carcinoma	232	41.3%	42	18.1%	163	42.7%	25	15.3%
Metastatic carcinoma	151	26.9%	39	25.8%	98	25.7%	26	26.5%
Benign (normal) lesion	33	5.9%	4	12.1%	22	5.8%	3	13.6%
Benign mesenchymal tumor	120	21.4%	28	23.3%	80	20.9%	24	30.0%
Malignant mesenchymal tumor	26	4.6%	8	30.8%	19	5.0%	6	31.6%
	562	100.0%	121	21.5%	382	100.0%	84	22.0%

Validation Data

The organs and the original diagnoses of the cases from the validation data are shown in Tables 1 and 2. The common organs in the validation dataset were similar to the training dataset, which are lung (19.6%), liver (11.3%), kidney (8.1%), stomach (5.2%), and large intestine/rectum (6.0%) (Table 1). Ascites and peritoneum consist of 5.0%, while pleural fluid and pleura comprised of 4.9% of the cases (Table 1). Primary carcinoma consists of 42.7% of the cases, followed by metastatic carcinoma (25.7%), benign mesenchymal tumour (20.9%), benign (normal) lesion (5.8%), and malignant mesenchymal tumour (5.0%). The hit rate of the presumptive diagnosis of the validation data (top 10) was 78.0% (Table 2), with the highest error rates at 31.6% in malignant mesenchymal tumours, followed by benign mesenchymal tumours (30.0%), metastatic carcinoma (26.5%), primary carcinoma (15.3%) and benign (normal) lesion (13.6%).

The Precision Error Rates Between Training And Validation Dataset

The error rates of the precision diagnosis were 21.5% and 22.0% for training and validation datasets, respectively; which was not significantly different (p-value = 0.866) (Table 3). The overall hit rate was 78.3% (Table 3).

Table 3

The comparison of Precision error rates between the training and validation dataset of TUO
Precision diagnosis	Training data		Validation data		Total		P-value
Precision diagnosis	No.	%	No.	%	No.	%	P-value
Accurate results	441	78.5%	298	78.0%	739	78.3%	0.866
Error results	121	21.5%	84	22.0%	205	21.7%
Total	562	100.0%	382	100.0%	944	100.0%

Discussion

In the present study, we verified the estimated the diagnostic probability of certain TUOs, using IHC results, by probabilistic decision tree and corresponding mobile application. The precision diagnosis drawn by the probabilistic decision tree algorithm, at the hit rate of 78.3%, can be a convincing assistant in decision making for pathologists. The hit rate rates between training and validation dataset were not statistically significant (78.5% vs. 78.0%, p-value = 0.866).

The hit rate of the presumptive diagnosis was generally poor. It is mainly due to the magnitude of the disease entities (2009 vs. 104). The common organs in the data used were lung, liver, kidney, ascites and peritoneum, and pleural fluid/pleura where metastatic lesions are often found in clinical practice. In case of the lungs, IHC was commonly used for subtyping between small cell, adeno, and squamous cell carcinoma, as well as determining the origin of the tumour, and whether it is primary or metastatic. In case of the kidneys, IHC was also used for subtyping between clear cell, chromophobe, papillary, etc., as well as determining whether it is primary or metastatic. For ascites and peritoneum, IHC was used for determining whether it is a metastatic carcinoma, or reactive mesothelial cells/macrophages. Moreover, in case of pleural fluid and pleura, IHC was used for determining whether it is metastatic adenocarcinoma (from the lung), mesothelioma, or reactive mesothelial cells/macrophages. Furthermore, in case of stomach, the primary differential diagnosis was between spindle cell neoplasms including gastrointestinal stromal tumours (GIST), schwannoma, and leiomyoma. Finally, in case of colon/rectum, benign spindle cell neoplasms and neuroendocrine cell tumours (carcinoid) were the most common disease.

The primary cause of inaccurate presumptive diagnosis was atypical IHC profiles (compared to that described in the textbook; about two thirds). The major causes of inaccurate presumptive diagnosis included overlapping IHC profiles between adenocarcinomas of the gastrointestinal tract, the origin of squamous cell carcinoma (no site-specific marker for squamous cell carcinoma), mesenchymal neoplasia that express both epithelial and mesenchymal markers, tumours with mixed or combined entities (e.g. squamous transformation of adenocarcinoma of the lung after chemotherapy, combined germ cell tumour, etc.), and tumours with no disease-specific markers. The cases with typical IHC markers tended to show accurate presumptive diagnosis. In other words, the precise differential diagnosis cannot be made only using the IHC profile in about 22% of the cases, and clinicopathologic findings along with the patient history should be considered. Thus, this algorithm should be used and interpreted with contextual information in a comprehensive and integrated manner. This study clearly showed the feasibility and clinical utility of making a diagnosis using the probabilistic decision tree algorithm and iOS and Android mobile application in the differential diagnosis of the tumours using IHC profiles.

Conclusions

The overall hit rate of this machine-learning algorithm was 78.3%, and the hit rate rates were not significantly different between training and validation data, thus showing a relatively robust generalization. Disease-specific markers, overlapping IHC profiles between diseases, a lack of site-specific markers, mixed/combined tumours, and atypical IHC profile are the leading causes of error in this system. However, this system will be useful to assist the pathologists in making precise decisions during the disease diagnosis Integrated interpretation with contextual information such as clinical and pathological findings should be considered, along with the use of this application, before making a final decision. Further studies for recommending IHC panels for particularly complex problems regarding differential diagnosis and application of artificial neural network algorithms to optimize the disease diagnosis, organ incidence, and antibody weight are needed in the future.

Patents

Copyright of ImmunoGenius, the mobile application developed during this project, is owned by The Catholic University of Korea, Indurstry-Academic Cooperation Foundation. The content of the software including the idea, database, user interface, and the source code is protected.

Declarations

Conflicts of Interest

The authors declare no conflict of interest.

Funding

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2016R1D1A1A02937427), partly funded by a research grant from the institute of Clinical Medicine Research, Catholic University of Korea, Yeouido St. Mary’s Hospital, supported by the Po-Ca Networking Groups funded by the Postech-Catholic Biomedical Engineering Institute (PCBMI) (No 5-2016-B0001-00149).

Author Contributions:

For research articles with several authors, a short paragraph specifying their individual contributions must be provided. The following statements should be used “Conceptualization, YC, HY; methodology, YC, MYC, and HY; software, YC.; validation, YC., and Z.Z.; formal analysis, YC.; investigation, YC, JC, YK; resources, YC, data curation, YC, writing—original draft preparation, YC.; writing—review and editing, YC, MYC and NT.; visualization, YC, NT.; supervision, MYC, HY.;. funding acquisition, YC, JC, YK.

Acknowledgments: In this section you can acknowledge any support given which is not covered by the author contribution or funding sections. This may include administrative and technical support, or donations in kind (e.g., materials used for experiments).

References

Elias JM. Immunohistochemistry: a brief historical perspective : commentary. Natick: Eaton Pub; 2000. pp. 7–13.
Buchwalow IB, Böcker W. Immunohistochemistry: basics and methods. Heidelberg: Springer; 2010.
Matos LL, Trufelli DC, de Matos MG, da Silva Pinhal MA. Immunohistochemistry as an important tool in biomarkers detection and clinical practice. Biomark Insights. 2010;5:9–20.
Chu PG, Weiss LM. Modern immunohistochemistry. 2nd ed. Cambridge; New York: Cambridge University Press; 2014.
Dabbs DJ. Diagnostic immunohistochemistry: theranostic and genomic applications. 4th ed. Philadelphia: Elsevier/Saunders; 2014.
Kalyuzhny AE. Immunohistochemistry: essential elements and beyond. Cham: Springer; 2016.
Coons AH. Labeling Techniques in the Diagnosis of Viral Diseases. Bacteriol Rev. 1964;28:397–9.
Werner B, Campos AC, Nadji M, Torres LFB. Practical use of immunohistochemistry in surgical pathology. J Bras Patol Med Lab. 2005;41(5):353–64.
DeYoung BR, Wick MR. Immunohistologic evaluation of metastatic carcinomas of unknown origin: an algorithmic approach. Semin Diagn Pathol. 2000;17(3):184–93.
Bishop JA, Sharma R, Illei PB. Napsin A and thyroid transcription factor-1 expression in carcinomas of the lung, breast, pancreas, colon, kidney, thyroid, and malignant mesothelioma. Hum Pathol. 2010;41(1):20–5.
Yu H, Li L, Liu D, Li WM. [Expression of TTF-1, NapsinA, P63, CK5/6 in Lung Cancer and Its Diagnostic Values for Histological Classification]. Sichuan Da Xue Xue Bao Yi Xue Ban. 2017;48(3):336–41.
El-Maqsoud NM, Tawfiek ER, Abdelmeged A, Rahman MF, Moustafa AA. The diagnostic utility of the triple markers Napsin A, TTF-1, and PAX8 in differentiating between primary and metastatic lung carcinomas. Tumour Biol. 2016;37(3):3123–34.
Gweon HM, Kim JA, Youk JH, Hong SW, Lim BJ, Yoon SO, et al. Can galectin-3 be a useful marker for conventional papillary thyroid microcarcinoma? Diagn Cytopathol. 2016;44(2):103–7.
Kandalaft PL, Gown AM. Practical Applications in Immunohistochemistry: Carcinomas of Unknown Primary Site. Arch Pathol Lab Med. 2016;140(6):508–23.
Lin F, Prichard J. Handbook of practical immunohistochemistry: frequently asked questions. 2nd ed. New York ; Heidelberg: Springer; 2015.
Lesaffre E, Lawson A. Bayesian biostatistics. Chichester: John Wiley & Sons; 2012.
Aaltonen LA, Hamilton SR, World Health Organization. International Agency for Research on Cancer. Pathology and genetics of tumours of the digestive system. Lyon: IARC Press; 2000.
DeLellis RA, International Agency for Research on Cancer., World Health Organization. International Academy of Pathology., International Association for the Study of Lung Cancer. Pathology and genetics of tumours of endocrine organs. Lyon: IARC Press; 2004.
Travis WD, World Health Organization., International Agency for Research on Cancer., International Association for the Study of Lung Cancer. International Academy of Pathology. Pathology and genetics of tumours of the lung, pleura, thymus, and heart. Lyon: IARC Press; 2004.
LeBoit PE, International Agency for Research on Cancer., World Health Organization. International Academy of Pathology., European Organization for Research on Treatment of Cancer., UniversitätsSpital Zürich. Departement Pathologie. Pathology and genetics of skin tumours. Lyon: IARC Press; 2006.
Louis DN, International Agency for Research on Cancer. World Health Organization. WHO classification of tumours of the central nervous system. Lyon: IARC Press; 2007.
Swerdlow SH, International Agency for Research on Cancer. World Health Organization. WHO classification of tumours of haematopoietic and lymphoid tissues. 4th ed. Lyon: IARC Press; 2008.
Rekhtman N, Bishop JA. Quick reference handbook for surgical pathologists. Heidelberg: Springer; 2011.
Lakhani SR, International Agency for Research on Cancer. World Health Organization. WHO classification of tumours of the breast. Lyon: IARC Press; 2012.
Fletcher CDM, World Health Organization. International Agency for Research on Cancer. WHO classification of tumours of soft tissue and bone. 4th ed. Lyon: IARC Press; 2013.
International Agency for Research on Cancer (IARC). Moch H. WHO classification of tumours of the urinary system and male genital organs. 4th ed. Lyon: IARC Press; 2016.
International Agency for Research on Cancer
Louis DN. International Agency for Research on Cancer. WHO classification of tumours of the central nervous system. Revised 4th ed. Lyon: IARC Press; 2016.
ImmunoQuery. http://www.immunoquery.com Accessed 14 Feb 2019.