Is it feasible to predict lymph node metastasis intraoperatively or postoperatively in early-stage lung adenocarcinoma: the application of machine learning algorithms?

Lymph node metastasis (LNM) status can be a critical decisive factor for clinical management of lung cancer. Accurately evaluating the risk of LNM during or after the surgery can be helpful for making clinical decisions. This study aims to incorporate clinicopathological characteristics to develop reliable machine learning (ML)-based models for predicting LNM in patients with early-stage lung adenocarcinoma. A total of 709 lung patients with tumor size ≤ 2 cm were enrolled for analysis and modeling by multiple ML algorithms. The receiver operating characteristic (ROC) curve and decision curve were used for evaluating model’s predictive performance and clinical usefulness. Feature selection based on potential models was performed to identify most-contributed predictive factors.


Abstract Background
Lymph node metastasis (LNM) status can be a critical decisive factor for clinical management of lung cancer. Accurately evaluating the risk of LNM during or after the surgery can be helpful for making clinical decisions. This study aims to incorporate clinicopathological characteristics to develop reliable machine learning (ML)-based models for predicting LNM in patients with early-stage lung adenocarcinoma.

Methods
A total of 709 lung adenocarcinoma patients with tumor size ≤ 2 cm were enrolled for analysis and modeling by multiple ML algorithms. The receiver operating characteristic (ROC) curve and decision curve were used for evaluating model's predictive performance and clinical usefulness. Feature selection based on potential models was performed to identify most-contributed predictive factors.

Results
LNM occurred in 11.3% (80/709) of patients with lung adenocarcinoma. Most models reached high areas under the ROC curve (AUCs) > 0.9. In the decision curve, all models performed better than the treat-all and treat-none lines. The random forest classi er (RFC) model, with a minimal number of 5 variables introduced (including carcinoembryonic antigen, solid component, micropapillary component, lymphovascular invasion and pleural invasion), was identi ed as the optimal model for predicting LNM, because of its excellent performance in both ROC and decision curves. The cost-e cient application of RFC model could precisely predict LNM during or after the operation of early-stage adenocarcinomas (sensitivity: 87.5%; speci city: 82.2%).

Conclusions
Incorporating clinicopathological characteristics, it is feasible to predict LNM intraoperatively or postoperatively by ML algorithms.
Trial registration: NA Background Lung cancer has been reported to be the most common cancer type worldwide and the leading cause of cancer death [1]. Among lung cancer cases that have various pathological characteristics, 80-85% of them can be categorized as non-small cell lung cancer (NSCLC) [2]. In the treatment of NSCLC, lymph node dissection (LND) during radical surgery is considered crucial [3]. A better understanding of lymph node metastasis (LNM) pattern aids to demarcate the extent of LND. Many studies focused on LNM in late-stage lung cancer, while LNM in small-size NSCLC should not be ignored as it could have an incidence rate up to 10% [4,5]. Moreover, occult LNM (OLNM) occurred not rarely in early-stage NSCLC [6][7][8], which might lead to a poor prognosis, especially for patients who received sublobar resection and sublevel excision of lymph nodes. Thus, it is more than necessary to precisely evaluate the risk of LNM intraoperatively and postoperatively, even in patients with no preoperatively suspected involvement of lymph nodes.
Machine learning (ML) generally de nes an algorithm-based process that predicts outcome from large data les, presuming the existence of a pattern amidst the data that will identify the outcome. Comparing to traditional statistical models, ML predictive analysis has several bene ts, including less outcomes required for each predictor, no requirement for speci c hypothesis and allowance of interaction between variables [9,10]. ML-based predictive analysis has been validly used in medical eld [11,12]. From the authors' perspective, there were very few studies that have reported the application of ML algorithms for evaluating the risk of LNM in lung cancer patients. This study aims to nd validated ML models for the prediction of LNM in early-stage adenocarcinomas incorporating the clinical characteristics and postoperative histological patterns.

Study population
This study enrolled 709 NSCLC patients who has received lobectomy with systematic lymph node dissection at Peking Union Medical College Hospital (PUCMH) from January 2013 to December 2019. Enrolled patients had single foci NSCLC with maximum diameter ≤ 2 cm on CT. Patients who met any one of the following conditions were excluded: 1) diagnosed with small cell lung cancer; 2) diagnosed with multiple lung cancer; 3) preoperative radiotherapy or chemotherapy; 4) distant metastasis; 5) incomplete clinical information. The study was approved by the Institutional Review Board at PUCMH, Chinese Academy of Medical Science. All patients have signed written consent.

Clinicopathological characteristics
This study enrolled a total of 19 variables in three categories. Preoperative clinical characteristics included age, gender, smoking status, and serum carcinoembryonic antigen (CEA). Radiographical features were recorded from CT by one radiologist and two thoracic clinicians independently, which included tumor imaging density, tumor side, tumor maximum diameter and speci c signs as spiculation, vessel convergence, lobulation and pleural indentation. Disagreement was solved by their consensus. Histologically, cancer lesions were divided into four subtypes, atypical adenomatous hyperplasia (AAH), adenocarcinoma in situ (AIS), microinvasive adenocarcinoma (MIA) and invasive adenocarcinoma (IA) [13]. For all tumor lesions, histological details were further examined by pathological experts at our hospital, which included the presence of papillary, micropapillary, solid, acinar and lepidic components.
Additionally, lymphovascular invasion (LVI) and pleural invasion (PI) were also considered risk factors for LNM. Pathological staging was based on 8th edition TNM Classi cation for lung cancer [14].
Over tting, meaning model becomes too speci c to be suitable for another dataset, is a common risk, especially when variable number is large. The cross-validation strategy have been proven effective for the avoidance of over tting [22,23]. In this study, enrolled patients were randomly and equally split into ve datasets for 5-fold cross-validation. For each running action, one of datasets was used as the testing group and the remaining four as the training group. This process repeated 5 times for each algorithm to nd the optimal models. The performance of ML-based models was evaluated by the area under the receiver operating characteristic (ROC) curve (AUC) for predictive ability and the decision curve for clinical usefulness.

Feature selection
A classi er-speci c evaluator for feature contribution was applied to each model to select variables. The potential models with best predictive performance and clinical usefulness were picked up to identify predictive risk factors. A list of variables sequenced by predictive contribution to the models was returned. Lower rank indicated better relevance to the model.

Statistical analysis
Univariate analysis was performed using SPSS 25.0 (IBM, New York, USA). Normality for quantitative data was analyzed by Shapiro-Wilk test. Normal quantitative parameters were compared under Student's t test and written as mean ± standard deviation (SD), while non-normal quantitative parameters were compared under Mann-Whitney U test and written as median with interquartile (IQR). Pearson's Chi square test (or Fisher's exact test when necessary) was used to compare the distribution of categorical variables. ML-based models were developed using Python programming language (version 3.7). Decision curve analysis (DCA) was performed using R software (version 3.6.3). Statistical signi cance was considered as P value < 0.05 (two-side). Table 1 lists the clinical characteristics of all 709 patients involved in this study. The patients aged from 51 to 64 with a median age of 58 years old. LNM was observed in 80 (11.3%) patients. The node-positive group had a median CEA concentration of 3.63 ng/ml, signi cantly higher than node-negative group, indicating that a higher serum CEA level could be a risk factor of LNM. Additionally, a larger tumor size was signi cantly with LNM (p < 0.001). In terms of the radiologic characteristic of lung cancer foci, nodepositive group and node-negative group were signi cantly different in tumor density (p < 0.001) and pleural indentation (p = 0.02), but not in spiculation (p = 0.315), vessel convergence (p = 0.226) or lobulation (p = 0.154). There was no pGGO cancer lesion in node-positive group. Further, analysis of clinicopathological features showed that the presence of micropapillary component (p < 0.001), solid component (p < 0.001), acinar component (p = 0.001), LVI (p < 0.001) and VPI (p < 0.001) could be possible risk factors of LNM, while the presence of lepidic component indicated LNM-free disease (p < 0.001). All node-positive patients were proved to be invasive adenocarcinomas by pathology. Predictive performance of ML-based models Six supervised ML algorithms were used to develop e cient and reliable predictive models with 19 clinicopathological variables, and their predictive performance is illustrated in Fig. 1 and Table 2  To further compare the clinical usefulness of models, DCA was performed (Fig. 2). Firstly, across almost the entire reasonable range of thresholds, all models performed better than the two extreme lines (treat-all and treat-none lines). Most of them showed similar net bene ts under most circumstances except for DT model. At the thresholds < 0.28, LR presented slightly higher net bene ts than other models. However, when the thresholds ≥ 0.28, RFC model performed best at most values of threshold probability. At the threshold range of 0-0.4, MNB performed almost worst among all models except DT. When thresholds > 0.4, the net bene ts of ADB and ANN decreased sharply and were lower than other models except DT. Therefore, in addition to RFC and LR, XGB and GBDT, which showed stably higher net bene ts than other four models, were also identi ed as potential models.

Variable importance
Based on four potential models (RFC, LR, XGB and GBDT) with great predictive performance and clinical usefulness, the top 10 important variables for LNM prediction and their rank are shown in Fig. 3 [24].
According to the application, the optimal cutoff point of risk probability to distinguish LNM (+) from LNM (-) was 13.85% (sensitivity: 87.5%; speci city: 82.2%). Figure 4 shows the risk probability distribution of all patients, which has been standardized by the following formula: (risk probability-13.85%)/standard deviation.

Discussion
LNM status is crucial for the treatment of early-stage NSCLC. To date, lobectomy plus systematic lymph node dissection is the standard management to achieve low recurrence rate and prolong survival [3,25]. However, compared with selective LND or lymph node sampling, systematic LND could be more likely to cause a series of postoperative complications [26,27]. On other occasions, sublobar resection including segmentectomy and wedge resection has been recommended for early-stage NSCLC patients, which showed similar survival outcome as lobectomy [28,29] and could also preserve more lung function. However, the sublevel surgery as selective LND and sublobar resection could more possibly lead to tumor residual and thus a poor prognosis if LNM occurred. Moreover, occult LNM makes the situation more complicated. It has been estimated that the occurrence rate of OLNM could be between 10.8-17.2% among stage I lung cancer [30][31][32]. Patients with LNM might mistakenly undergo sublevel surgery, leading to a poor prognosis. For these patients, salvage management might be necessary. Therefore, more efforts should be given to accurately predict the LNM status during or after the operation.
Previous studies have revealed some possible predictive factors for LNM in NSCLC. Yu et al reported several independent risk factors including tumor size, pleural invasion, and carcinoembryonic antigen [33]. Pani et al found that histologic subtypes could be related to lymph node status [34]. Another similar study suggested different lymph node dissection strategy for different combination of various clinicopathological features and CEA concentration and albumin level [35]. These studies used uni-and multivariate analysis to reveal clinicopathological predictors for different LNM patterns. Our study, however, innovatively adopts ML algorithms to predict LNM by incorporating a large series of clinicopathological features. Among the predictive models, we found that RFC, GBDT, XGB, ANN all achieved AUC higher than 0.9, which was similar with LR model. However, in the decision curve, LR performed better than others at threshold < 0.28, while RFC performed the best at most points of thresholds ≥ 0.28 and always kept a stably high net bene t. It is noteworthy that all models performed signi cantly better than treat-all and treat-none lines, indicating our models had clinical practice values and patients could gain more bene ts if corresponding managements were conducted according to the predictive outcome of these models.
Furthermore, based on four potential models we identi ed with great performance in both ROC and decision curves, the top ten variables were found out, including solid component, CEA, pleural invasion, tumor imaging density, LVI, micropapillary component, histological subtype, acinar component, lepidic component and gender. In addition to CEA and imaging density that have been reported by previous studies [4,5], many histological features were also strongly related to the occurrence of LNM. Besides pleural invasion and LVI, histological details of growth such as the presence of solid, micropapillary and acinar components indicated high risk for LNM, while the presence of lepidic component could indicate LNM-free disease. In fact, these variables are conventionally not included in intraoperative pathology report. Our study emphasizes the importance of these histological features in the prediction of lymph node status. Thus, intraoperative pathology may be considered to include more detailed information about adenocarcinomas to further evaluate LNM risk, especially for patients who are hard to decide between lobectomy and sublobar resection. Importantly, the risk evaluation of LNM after surgery might be necessary for early-stage adenocarcinoma patients. For those who received sublobar resection or sublevel LND, the salvage management and close follow-up could be required if a high risk for LNM was observed based on our ML model.
In recent years, predicting metastasis with machine learning algorithms, as a promising alternative for other invasive or noninvasive diagnostic method, has been proven to be feasible in lung adenocarcinoma and colorectal cancer [11,12]. These studies predicted on CT image and histologic evidence and obtained satisfying results. However, considered the sample size in the two study was not large, the validity of machine learning prediction needs to be further con rmed on a larger NSCLC patient population. Another methodological problem remained to be further explained is that the false-positive and false-negative rate need to be low enough to achieve good clinical utility. High AUC in ROC represents high predictive accuracy but does not necessary prove good clinical utility, because false-positive or false-negative results could reduce net bene t [36]. To seek for a model that has high predictive accuracy and net bene t, we adopted DCA which has been widely proven to be e ciently and interpretable in the evaluation of clinical utility [37]. From the decision curve, it was clear that RFC has the highest net bene t across the longest stable range of clinically reasonable preferences.
To further enhance the clinical usefulness of our study, a dynamic application of RFC model with 5 clinicopathological variables introduced was developed. So, clinicians and patients worldwide can bene t from our study and evaluate the LNM risk easily. The node-positive patients could be precisely identi ed by the RFC application (sensitivity: 87.5%; speci city: 82.2%; Fig. 4).
This study is not without limitation. The nature of retrospective analysis inevitably causes data acquisition bias. Additionally, the enrolled patients are from a single center and share an ethnicity. Future study is expected to validate the predictive performance of RFC model and more possible clinicopathological variables in a multicenter population.

Conclusions
This study comprehensively evaluated various ML-based predictive models and identi ed RFC model as the optimal one that accurately predicted LNM in early-stage adenocarcinomas. By feature selection, some clinicopathological characteristics were found to be strongly related to LNM. Declarations Ethics approval and consent to participate The study was approved by the Institutional Review Board at PUCMH, Chinese Academy of Medical Science.

Consent for publication
Informed consent in written form has been received from all patients.
Availability of data and materials The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request

Competing interests
The authors declare that they have no competing interests. Authors' contributions YJW and JHL analyzed and interpreted the data; YJW and YMC wrote the manuscript; YMC and PCW performed the statistical analysis; YYW, CH, LG, XYL and ZLW collected the data; NXL and SQL supervised the study. All authors read and approved the nal manuscript.     The standardized risk probability of each patient based on the random forest classi er (RFC) model. Xaxis: each patient; Y-axis: the standardized risk probability.