A machine learning model to predict the histology of retroperitoneal lymph node dissection specimens

doi:10.21203/rs.3.rs-2673928/v1

Download PDF

Research Article

A machine learning model to predict the histology of retroperitoneal lymph node dissection specimens

https://doi.org/10.21203/rs.3.rs-2673928/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Background

While post-chemotherapy retroperitoneal lymph node dissection (PC-RPLND) benefits patients with teratoma or viable germ cell tumors (GCT), it becomes overtreatment if necrosis is detected in PC-RPLND specimens. Serum microRNA-371a-3p correctly predicts residual viable GCT with 100% sensitivity but residual teratoma in PC-RPLND specimens using current modalities remains difficult. Therefore, we developed a machine learning model using CT imaging and clinical variables to predict the presence of residual teratoma in PC-RPLND specimens.

Methods

We included 58 patients who underwent PC-RPLND between 2005 and 2019 at the University of Tsukuba Hospital. On CT imaging, 155 lymph nodes were identified as regions of interest (ROIs). The ResNet50 algorithm and/or Support Vector Machine (SVM) classification were applied and a nested, 3-fold cross-validation protocol was used to determine classifier accuracy.

Results

PC-RPLND specimen analysis revealed 35 patients with necrosis and 23 patients with residual teratoma while histology of 155 total ROIs showed necrosis in 84 ROIs and teratoma in 71 ROIs. The ResNet50 algorithm, using CT imaging, achieved a diagnostic accuracy of 80.0%, corresponding to a sensitivity of 67.3%, a specificity of 90.5%, and an AUC of 0.84 while SVM classification using clinical variables achieved a diagnostic accuracy of 74.8%, corresponding to a sensitivity of 59.0%, a specificity of 88.1%, and an AUC of 0.84.

Conclusions

Our machine learning models reliably distinguish between necrosis and residual teratoma in clinical PC-RPLND specimens.

germ cell tumor

machine learning model

post-chemotherapy retroperitoneal lymph node dissection

Resnet50 algorithm

support vector machine

Advanced germ cell tumor (GCT) patients receiving cisplatin-based chemotherapy and surgery have a favorable disease course. For this purpose, post-chemotherapy retroperitoneal lymph node dissection (PC-RPLND) is often conducted; however, this complex surgical procedure carries risks of simultaneous bone resection, vascular graft implantation, or adjunct organ resection [1, 2].

PC-RPLND detects viable GCT or complete resection of teratoma, with several studies showing decreases in remaining viable GCT incidence from 15–20% to 6.0-9.2% while the reported incidence of residual teratoma remains stable at roughly 40% [3–7]. Of concern, approximately 50% of studied patients who underwent PC-RPLND had necrosis, indicating overtreatment, since distinguishing between necrosis and residual teratoma or viable GCT remains difficult with computed tomography (CT) or magnetic resonance imaging (MRI) prior to PC-RPLND.

Predictive histological models, featuring logistic regression analysis for PC-RPLND specimens using clinical variables, have been proposed; among these, the best performing are by Vergouwe and Leao [8, 9]. However, these models may be insufficient to reliably omit patients from PC-RPLND consideration [10]. While another promising method relies on serum levels of microRNA-371a-3p (miR-371a-3p), correctly predicting remaining viable GCT with 100% sensitivity, a lack of this specific miRNA in teratomas reduces discriminatory power [11, 12].

To overcome these obstacles, a predictive histological model for PC-RPLND specimens is an unmet need. Machine learning methods/models (MLM) that exploit clinical variables for predictive modeling are already clinically established and we have previously reported such a model for prostate cancer [13, 14]. Furthermore, radiomics, defined as integrated extraction and analysis of radiomic features from medical images, is a rising field in quantitative image analysis [15]. A trained MLM, based on an approach similar to radiomics, may effectively provide predictive, diagnostic, or prognostic information to assist in pre-PC-RPLND evaluation.

Here, we aimed to develop a predictive machine learning model for histological analysis of PC-RPLND specimens using CT imaging and clinical variables.

Patient selection

Between 2005 and 2019, 67 patients underwent PC-RPLND at the University of Tsukuba Hospital (TUH). We excluded 2 patients due to absence of contrast-enhanced CT imaging data after chemotherapy (and prior to PC-RPLND), 1 patient due to missing clinical data, and 1 patient with viable GCT in PC-RPLND specimens. Among 10 patients with seminoma histology, 3 patients had elevated serum alpha-fetoprotein (AFP) levels (AFP-positive seminoma) while 2 patients demonstrated extremely high human chorionic gonadotropin (HCG) levels of over 10,000 mIU/ml. We clinically managed these 5 patients as non-seminoma cases. After further excluding 5 seminoma patients with normal AFP and HCG levels, we retrospectively analyzed 58 patients (Fig. 1). These patients underwent chemotherapy and PC-RPLND according to current guidelines [16, 17]. After chemotherapy, we performed PC-RPLND in non-seminoma patients with residual tumors over 1 cm in diameter. In addition, we ruled out pseudo-positive marker elevation using testosterone administration (for HCG) and lectin-reactive (for AFP) tests in suspicious patients before subsequent chemotherapy. If markers remained abnormal after subsequent chemotherapy, selected patients underwent the marker-positive surgeries. Histological analyses of these PC-RPLND specimens were classified into either necrosis or teratoma groups.

Image acquisition

Contrast-enhanced CT was performed at TUH in 58 patients using the portal venous phase. Images were displayed in the axial plane with 2.0 mm section thickness and an in-plane resolution between 0.62 × 0.62 mm and 0.86 × 0.86 mm. Manual segmentation was performed using 3D Slicer-4.11 software. Regions-of-interest (ROIs) were drawn on portal venous phase images by bounding each post-chemotherapy residual lymph node in each patient after it was identified on the template RPLND pathology report correlating with CT imaging. Collectively, there were 155 ROIs in 58 patients. The retroperitoneal lymph nodal size (LNS) was measured as the largest axial diameter and changes in size after chemotherapy were calculated by dividing pre-chemotherapy LNS − post-chemotherapy LNS by pre-chemotherapy LNS.

Machine learning algorithm

Two types of machine learning methods (ResNet50 and Support Vector Machine [SVM] classification) were applied in this study. To develop our radiomics-based predictive model, the ResNet50 algorithm (100 × 100 pixels image input) was used according to parameters shown in Table S1. A nested, 3-fold cross-validation protocol was used to determine classifier accuracy. ROI slice images (100 × 100 pixels, 150 × 150 pixels, 152 × 152 pixels, and 160 × 160 pixels) were divided into 1×1, 3×3 (stride 25), 5×5 (stride 13), and 7×7 (stride 10) images respectively before each separated image was fed to the ResNet50 algorithm, where the results for each separated image from different slices were integrated to give the final classification result. Clinical variables used to develop the prediction model using SVM classification were as follows: the presence of primary site post-pubertal teratoma (PPT) components, pre-chemotherapy serum tumor marker levels (AFP, HCG, and lactate dehydrogenase [LDH]), the largest axial diameter of the retroperitoneal lymph node after chemotherapy, and the shrinkage percentage of the retroperitoneal lymph node after chemotherapy. Serum tumor markers were fed into SVM as continuous variables.

Statistical analyses

Categorical variables were compared between cohorts using Pearson's Chi-squared test for 2 or more variables while continuous variables were compared using the Mann-Whitney U test. Model performance was evaluated as follows: sensitivity (the proportion of true positives that were classified as such), specificity (the proportion of correctly identified true negatives), accuracy (the proportion of correct predictions), and area under the receiver operating characteristic (ROC) curve (AUC; provides a measure of the discriminatory performance of the model) [13]. P-values < 0.05 were considered statistically significant. All statistical analyses were performed using SPSS® 25.0 for Windows® (SPSS Inc., Chicago, IL, USA).

Ethical considerations

The study protocol and data processing were approved by the Tsukuba University Hospital Ethical Board (R02-165). All patients gave written, informed consent. All methods were performed in accordance with the Declaration of Helsinki and the Guideline of the University of Tsukuba.

Patient characteristics are shown in Table S2. In 58 total patients, histological analysis of PC-RPLND specimens revealed necrosis in 35 patients (60.3%) and teratoma in 23 patients (39.7%). The presence of primary site PPT components was significantly higher in patients with teratoma in PC-RPLND specimens. ROI characteristics are shown in Table S3. The percentages of lymph nodal shrinkage after chemotherapy in necrosis and teratoma ROIs were 50.6% and 23.5%, respectively (p < 0.001).

We first evaluated the ResNet50 algorithm predictive performance using CT imaging. Representative CT images of teratoma and necrosis are shown in Fig. 2. The ResNet50 algorithm correctly classified ROIs from both categories and Table 1 shows diagnostic accuracy, sensitivity, specificity, and AUC according to image segmentation number. The ResNet50 algorithm plus 3×3 images achieved the highest observed diagnostic accuracy of 80.0%, corresponding to a sensitivity of 67.3%, a specificity of 90.5%, and an AUC of 0.84.

As image data augmentation increases training dataset sizes, it was performed by flipping ROI images upside down, plus left and right, and rotating ± 5 degrees. However, no improvements in the ResNet 50 algorithm were observed (data not shown).

We next evaluated the predictive performance of SVM classification utilizing clinical variables. According to previous reports, the following clinical variables were used: presence of PPT components in the primary site, pre-chemotherapy serum tumor marker levels (AFP, HCG, and LDH), post-chemotherapy LNS, and percentage of lymph nodal shrinkage after chemotherapy [8, 9, 18]. These clinical variables were fed into SVM using several combinations, achieving a diagnostic accuracy of 74.8% and corresponding to a sensitivity of 59.0%, a specificity of 88.1%, and an AUC of 0.84. Furthermore, as shown in Table 2, no significant differences in performance existed between these combinations. Using two clinical variables (presence of PPT components in primary site and percentage of lymph nodal shrinkage by chemotherapy), we also achieved an AUC of 0.84.

To evaluate whether predictive performance is improved by using both CT imaging and clinical variables, we incorporated clinical variables into ResNet50 and also used ensemble learning in which decisions from multiple methods are integrated before the final decision [19]. Ensemble learning was performed by integrating the ResNet50 algorithm with SVM classification; however, predictive performance was not improved with this combination (data not shown).

Among MLMs, convolutional neural networks with deep learning approaches have been previously reported for medical images [20]. One approach, ResNet50, emerged as superior in image classification tasks [21, 22] while SVM classification was judged useful for discriminating between two classes by making a decision boundary with one or more feature vectors [23]. We thus applied both schema in this study.

Here, we evaluated MLM efficacy at distinguishing between residual teratoma and necrosis in clinical PC-RPLND specimens. The radiomics-based ResNet50 algorithm resulted in a diagnostic accuracy of 80.0%, corresponding to a sensitivity of 67.3%, a specificity of 90.5%, and an AUC of 0.84, while SVM classification (using six clinical variables) resulted in a diagnostic accuracy of 74.8%, corresponding to a sensitivity of 59.0%, a specificity of 88.1%, and an AUC of 0.84.

There are two previous studies of differing radiomics-based MLMs that also discriminate between necrosis (benign) and teratoma/viable GCT (malignant) in PC-RPLND specimens [24, 25]. A study from Lewin et al. used SVM classification and reported a diagnostic accuracy of 71.7%, a sensitivity of 56.2%, a specificity of 81.9%, and an AUC of 0.74 [24]. Another study from Baessler et al. used the random forest model and obtained a diagnostic accuracy of 81%, a sensitivity of 88%, and a specificity of 72% [25]. The performance of our model is superior to the Lewin model; however, compared to Baessler’s report, our model is superior in specificity (90.5% versus 78%) but slightly inferior in accuracy (80.0% versus 81%). This might be due to a larger study population in the Baessler report and heterogeneity from multiple CT scanners/vendors. Furthermore, our model was developed to distinguish between residual teratoma and necrosis in clinical PC-RPLND specimens, not benign versus malignant histology.

Several nomograms by logistic regression analysis have been reported to distinguish benign from malignant histology in PC-RPLND specimens with favorable AUC ranges of 0.77–0.84 [8, 9, 18]. Since, among these, the most promising nomogram includes the following clinical variables: presence of PPT components in primary site, post-chemotherapy LNS, percentage of lymph nodal shrinkage by chemotherapy, and pre-chemotherapy serum tumor marker levels (AFP, HCG, and LDH) [8], we also applied these clinical variables to our study. To best of our knowledge, the only study featuring a machine learning model and these six clinical variables for predicting the histology of PC-RPLND specimens reported an AUC of 0.76 [24]. We selected two clinical variables from the initial six and achieved an AUC of 0.84, similar to previously reported results. Although external validation is needed, our model did achieve a similarly favorable AUC using fewer clinical variables compared with previously reported nomograms and MLMs. Superfluous data can be safely excluded as Lewin et al. incorporated extensive clinical variables into their model with no improvements in AUC [24], a result similar to ours.

Since the ensemble learning paradigm widely used in medical science strengthens models by integrating multiple MLMs into a single-output modality, we next attempted this tactic [26–28]. However, we found predictive performance unimproved by ensemble learning, probably due to unquantifiable performance issues within the individual models comprising the MLM aggregation. Reports on the strengthening of ensemble learning have yet to define if the number of integrated learning models adversely or beneficially affects performance [27]. The need for such considerations in the initial design of any attempted MLM integration for ensemble learning will be a requirement for future studies.

There are several limitations to this study. First, due to the retrospective design and single-institution analysis, no external cohort validation was done. Although we used cross-validation to overcome this limitation, overfitting of the trained model could have occurred. Second, the study population was relatively small. Third, it might be impossible to detect microscopic residual teratoma in necrotic large lymph nodes by a radiomics approach, negating this tactic.

Despite these limitations, we developed predictive models for PC-RPLND histological analysis using MLMs equal to or better than previous reports. Additional validation is required to evaluate whether our models can reliably discriminate between residual teratoma and necrosis enough to exclude patients from unnecessary PC-RPLND. Furthermore, it would be interesting to compare the utility of our machine learning method in conjunction with assessments by experienced urologists and radiologists in predicting the histology of PC-RPLND specimens.

We developed a predictive machine learning model for histological analysis of PC-RPLND specimens. The radiomics-based ResNet50 algorithm showed a diagnostic accuracy of 80.0%, corresponding to an AUC of 0.84, while SVM classification (using six clinical variables) showed a diagnostic accuracy of 74.8%, corresponding to an AUC of 0.84. Additional validation is required to verify the precision of these results.

AFP, alpha-fetoprotein; AUC, area under the receiver operating characteristic curve; CT, computed tomography; GCT, germ cell tumor; HCG, human chorionic gonadotropin; LDH, lactate dehydrogenase; LNS, lymph nodal size; MLM, machine learning methods/models; MRI, magnetic resonance imaging; ROC, receiver operating characteristic; ROI, Regions-of-interest; PC-RPLND, post-chemotherapy retroperitoneal lymph node dissection; PPT, post-pubertal teratoma; RPLN, retroperitoneal lymph node; SVM, Support Vector Machine; UNL, upper normal limit;

Ethics approval and consent to participate

Consent for publication

Not applicable

Availability of data and materials

Data used and/or analyzed in the current study are available upon reasonable request to the corresponding author.

Competing interests

The authors have no conflict of interest to disclose.

Funding

Not applicable

Author Contributions

SNi, TKo and HK built the concept and design of the study. SNi acquired the data. SNi, MG, SNa, TKo and HK analyzed and interpreted the data. SNi, TKo and HK drafted the manuscript. SK, TKa, BJM, KK, HNe and HNi revised the manuscript. All authors read and approved the final manuscript.

Acknowledgements:

We would like to thank our departmental colleagues for their support.

Albers P, Albrecht W, Algaba F et al. Guidelines on Testicular Cancer: 2015 Update. Eur. Urol. 2015; 68: 1054–1068. https://doi.org/10.1016/j.eururo.2015.07.044
Gilligan T, Lin DW, Aggarwal R et al. Testicular Cancer, Version 2.2020, NCCN Clinical Practice Guidelines in Oncology. J. Natl. Compr. Canc. Netw. 2019; 17: 1529–1554. https://doi.org/10.6004/jnccn.2019.0058
Sheinfeld J, Bartsch G, Bosl G. Surgery of testicular tumors. In: Kavoussi L, Novick AC, Partin AW, Peters CA, Wein (eds). Campbell-Walsh Urology Ninth edition. Saunders Elsevier, Philadelphia, 2007; 936-958.
Carver BS, Serio AM, Bajorin D et al. Improved clinical outcome in recent years for men with metastatic nonseminomatous germ cell tumors. J. Clin. Oncol. 2007; 25: 5603–5608. https://doi.org/10.1200/JCO.2007.13.6283.
Cary KC, Pedrosa JA, Kaimakliotis HZ, Masterson TA, Einhorn LH, Foster RS. The impact of bleomycin on retroperitoneal histology at post-chemotherapy retroperitoneal lymph node dissection of good risk germ cell tumors. J. Urol. 2015; 193: 507–512. https://doi.org/10.1016/j.juro.2014.09.090
Kundu SD, Feldman DR, Carver BS et al. Rates of teratoma and viable cancer at post-chemotherapy retroperitoneal lymph node dissection after induction chemotherapy for good risk nonseminomatous germ cell tumors. J. Urol. 2015; 193: 513–518. https://doi.org/10.1016/j.juro.2014.08.081
Nakamura T, Oishi M, Ueda T et al. Clinical outcomes and histological findings of patients with advanced metastatic germ cell tumors undergoing post-chemotherapy resection of retroperitoneal lymph nodes and residual extraretroperitoneal masses. Int. J. Urol. 2015; 22: 663–668. https://doi.org/10.1111/iju.12760
Vergouwe Y, Steyerberg EW, Foster RS et al. Predicting retroperitoneal histology in postchemotherapy testicular germ cell cancer: a model update and multicentre validation with more than 1000 patients. Eur. Urol. 2007; 51: 424–432. https://doi.org/10.1016/j.eururo.2006.06.047
Leão R, Nayan M, Punjani N et al. A new model to predict benign histology in residual retroperitoneal masses after chemotherapy in nonseminoma. Eur. Urol. Focus. 2018; 4: 995–1001. https://doi.org/10.1016/j.euf.2018.01.015
Paffenholz P, Nestler T, Hoier S, Pfister D, Hellmich M, Heidenreich A. External validation of 2 models to predict necrosis/fibrosis in postchemotherapy residual retroperitoneal masses of patients with advanced testicular cancer. Urol. Oncol. 2019; 37: 809.e9-809.e18. https://doi.org/10.1016/j.urolonc.2019.07.021
Dieckmann KP, Radtke A, Geczi L et al. Serum levels of microRNA-371a-3p (M371 Test) as a new biomarker of testicular germ cell tumors: Results of a prospective multicentric study. J. Clin. Oncol. 2019; 37: 1412–1423. https://doi.org/10.1200/JCO.18.01480
Leão R, van Agthoven T, Figueiredo A et al. Serum miRNA predicts viable disease after chemotherapy in patients with testicular nonseminoma germ cell tumor. J. Urol. 2018; 200: 126–135. https://doi.org/10.1016/j.juro.2018.02.068
Nitta S, Tsutsumi M, Sakka S, et al. Machine learning methods can more efficiently predict prostate cancer compared with prostate-specific antigen density and prostate-specific antigen velocity. Prostate Int. 2019; 7: 114-118. https://doi.org/10.1016/j.prnil.2019.01.001
Suarez-Ibarrola R, Hein S, Reis G, Gratzke C, Miemik A. Current and future applications of machine and deep learning in urology: a review of the literature on urolithiasis, renal cell carcinoma, and bladder and prostate cancer. World J Urol. 2020; 38: 2329-2347. https://doi.org/10.1007/s00345-019-03000-5
Avanzo M, Wei L, Stancanello J, et al. Machine and deep learning methods for radiomics. Med Phys. 2020; 47: e185-e202. https://doi.org/10.1002/mp.13678
Albers P, Albrecht W, Algaba F et al. Guidelines on Testicular Cancer: 2015 Update. Eur. Urol. 2015; 68: 1054–1068. https://doi.org/10.1016/j.eururo.2015.07.044
Gilligan T, Lin DW, Aggarwal R et al. Testicular Cancer, Version 2.2020, NCCN Clinical Practice Guidelines in Oncology. J. Natl. Compr. Canc. Netw. 2019; 17: 1529–1554. https://doi.org/10.6004/jnccn.2019.0058
Calaway AC, Kern SQ, Crook D, et al. Percentage of Teratoma in Orchiectomy and Risk of Retroperitoneal Teratoma at the Time of Postchemotherapy Retroperitoneal Lymph Node Dissection in Germ Cell Tumors. J Urol. 2021; 206: 1430-1437. https://doi.org/10.1097/JU.0000000000001960
An N, Ding H, Yang J, Au R, Ang TFA. Deep ensemble learning for Alzheimer's disease classification. J Biomed Inform. 2020; 105: 103411. https://doi.org/10.1016/j.jbi.2020.103411
Sharma A,K, Nandal A, Koundal D, et al. Enhanced Watershed Segmentation Algorithm-Based Modified ResNet50 Model for Brain Tumor Detection. Biomed Res Int. 2022; 2022: 7348344. https://doi.org/10.1155/2022/7348344
Li Q, Yang MQ. Comparison of machine learning approaches for enhancing Alzheimer's disease classification. PeerJ. 2021; 9: e10549. https://doi.org/10.7717/peerj.10549
Basaia S, Agosta F, Wagner L, et al. Automated classification of Alzheimer's disease and mild cognitive impairment using a single MRI and deep neural networks. Neuroimage Clin. 2019; 21: 101645. https://doi.org/10.1016/j.nicl.2018.101645
Noble WS. What is a support vector machine? Nat Biotechnol. 2006; 24: 1565-1567. https://doi.org/10.1038/nbt1206-1565.
Lewin J, Dufort P, Halankar J, et al. Applying Radiomics to Predict Pathology of Postchemotherapy Retroperitoneal Nodal Masses in Germ Cell Tumors. JCO Clin Cancer Inform. 2018; 2: 1-12. https://doi.org/10.1200/CCI.18.00004.
Baessler B, Nestler T, Dos Santos DP, et al. Radiomics allows for detection of benign and malignant histopathology in patients with metastatic testicular germ cell tumors prior to post-chemotherapy retroperitoneal lymph node dissection. Eur Radiol. 2020; 30: 2334-2345. https://doi.org/ 10.1007/s00330-019-06495-z.
Luca B, Francesco M, Alfonso R, Antonella S. An ensemble learning approach for brain cancer detection exploiting radiomic features. Comput Methods Programs Biomed. 2020; 185:105134. https://doi.org/10.1016/j.cmpb.2019.105134.
EI Asnaoui K. Design ensemble deep learning model for pneumonia disease classification. Int J Multimed Inf Retr. 2021; 10: 55–68. https://doi.org/ 10.1007/s13735-021-00204-7.
Zilly J, Buhmann J M, Mahapatra D. Glaucoma detection using entropy sampling and ensemble learning for automatic optic cup and disc segmentation. Comput Med Imaging Graph. 2017; 55:28-41. https://doi.org/10.1016/j.compmedimag.2016.07.012

Table 1 The Predictive Performance of the ResNet50 Algorithm

Image Segmentation	Accuracy (%)	Sensitivity (%)	Specificity (%)	AUC
1 (1×1)	68.4	43.5	89.3	0.78
9 (3×3)	80.0	67.3	90.5	0.84
25 (5×5)	76.1	60.1	89.3	0.79
49 (7×7)	79.3	70.2	86.9	0.82

Table 2 The Predictive Performance of SVM Classification

Clinical Variables	Accuracy (%)	Sensitivity (%)	Specificity (%)	AUC
a,b,c, and d	74.8	59.0	88.1	0.84
a,c, and d	78.0	63.2	90.5	0.82
a and d	75.4	63.2	85.7	0.84
a; presence of post-pubertal teratoma components in the primary site
b; pre-chemotherapy serum tumor marker levels (AFP, HCG, and LDH)
c; post-chemotherapy mass size
d; percentage of mass shrinkage by chemotherapy

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

A machine learning model to predict the histology of retroperitoneal lymph node dissection specimens

Status:

Version 1

Abstract

Figures

Background

Patients And Methods

Patient selection

Image acquisition

Machine learning algorithm

Statistical analyses

Ethical considerations

Results

Discussion

Conclusion

Abbreviations

Declarations

References

Tables

Additional Declarations

Supplementary Files

Status:

Version 1