The significance of spread through air spaces in the prognostic assessment model of stage I lung adenocarcinoma and the exploration of its invasion mechanism

Spread through air spaces (STAS) is a crucial invasive mode of lung cancer and has been shown to be associated with early recurrence and metastasis. We aimed to develop a prognostic risk assessment model for stage I lung adenocarcinoma based on STAS and other pathological features and to explore the potential relationship between CXCL-8, Smad2, Snail, and STAS. 312 patients who underwent surgery at Harbin Medical University Cancer Hospital with pathologically diagnosed stage I lung adenocarcinoma were reviewed in the study. STAS and other pathological features were identified by H&E staining, and a prognostic risk assessment model was established. The expression levels of CXCL8, Smad2, and Snail were determined by immunohistochemistry. The nomogram was established based on age, smoking history, STAS, tumor lymphocyte infiltration, tissue subtype, nuclear grade, and tumor size. The C-index for DFS was (training set 0.84 vs validation set 0.77) and for OS was (training set 0.83 vs validation set 0.78). Decision curve analysis showed that the model constructed has a better net benefit than traditional reporting. The prognostic risk score validated the risk stratification value for stage I lung adenocarcinoma. STAS was an important prognostic factor associated with stronger invasiveness and higher expression of CXCL8, Smad2, and Snail. CXCL8 was associated with poorer DFS and OS. We developed and validated a survival risk assessment model and the prognostic risk score formula for stage I lung adenocarcinoma. Additionally, we found that CXCL8 could be used as a potential biomarker for STAS and poor prognosis, and its mechanism may be related to EMT.


Introduction
Lung cancer is the malignant tumor with the highest morbidity and mortality, among which adenocarcinoma accounts for the vast majority (Siegel et al. 2021). With the popularization of physical examination and the development of imaging technology, more and more stage I lung adenocarcinomas have been diagnosed. Although most patients have a favorable prognosis, the cumulative incidence of local recurrence and distant metastasis within 5 years after complete surgical resection is approximately 12%-15% and 17%-22%, respectively (Hung et al. 2012;Varlotto et al. 2015). For patients with stage IA lung adenocarcinoma, postoperative observation and follow-up should be performed instead of adjuvant chemotherapy, and stage IB patients with high-risk factors may be considered with adjuvant chemotherapy according to National Comprehensive Cancer Network (NCCN) guidelines (National Comprehensive Cancer Network 2021). Exploring and discovering the clinical and pathological factors of poor prognosis in patients with stage I lung adenocarcinoma and establishing a prognosis prediction model are the keys to breakthrough therapy.
At present, the pathological reports of lung cancer mainly focus on tumor size, degree of differentiation, histological type, pleural invasion, tissue margins, and lymph node dissection. However, these clinicopathological factors are insufficient to reflect prognostic differences. Therefore, our study added clinicopathological features, such as the proportion of tumor tissue content, the degree of tumor cell differentiation, and the nuclear grade that reflect the characteristics of tumor parenchyma, the proportion of intratumoral stroma content, the proportion of stromal tumor-infiltrating lymphocytes (sTIL), and the proportion of peritumoral tumor-infiltrating lymphocytes (pTIL) that reflected the characteristics of tumor stroma. Most importantly, our study incorporated the observation and analysis of STAS, which proved to be a strong prognostic factor, significantly associated with recurrence-free and poorer overall survival. We demonstrated that STAS was an independent prognostic risk factor for stage I lung adenocarcinoma and contributed significantly to prognostic risk assessment models. However, the relationship between the morphological and molecular properties of STAS remains unclear.
Interleukin-8 (CXCL8, IL-8) is a granulocyte chemokine with multiple roles in the tumor microenvironment, including recruiting immunosuppressive cells to tumors, enhancing tumor angiogenesis, and promoting epithelial-to-mesenchymal transition (EMT) (Fousek et al. 2021;David et al. 2016;Palena et al. 2012). CXCL8 is also a potential biomarker whose overexpression plays an important role in the metastasis and poor prognosis of melanoma (Wu et al. 2012), breast (Fang et al. 2017), and colorectal cancers (Xiao et al. 2015). In our previous study, analyzing the STAS transcriptome in lung adenocarcinoma by high-throughput sequencing found that CXCL8 promotes STAS occurrence and correlates with survival (Zeng et al. 2022). In this study, we investigated the relationship between CXCL8, EMT markers Small mothers against decapentaplegic protein 2 (Smad2), Zinc finger protein transcription factor (Snail) and STAS in clinical samples, which might provide new insights into the mechanism of STAS formation.

Methods
Patients 312 patients with stage I lung adenocarcinoma who underwent surgical resection at Harbin Medical University Cancer Hospital between January 2013 to December 2016 were retrospectively reviewed. The inclusion criteria included: primary stage I (T1/T2aN0M0) lung adenocarcinoma confirmed by pathologists according to the criteria of the American Joint Committee on Cancer (AJCC 8th). Patients with multiple nodules, incomplete clinical and pathological data, other malignancies, positive surgical margins, neoadjuvant therapy, or who died in the perioperative period were excluded. Clinical information was collected through medical records, including age, smoking history, surgical resection, and stage. Follow-up information was obtained through electronic medical records and telephone interviews. This study was approved by the Ethics Committee of the Harbin Medical University Cancer Hospital.

Pathological assessment
Pathological data were independently assessed by two pulmonary pathologists blinded to clinical information, and inconsistent cases were discussed and reached a consensus, including tumor size, tissue subtype, STAS, visceral pleural invasion, tumor stromal ratio (TSR), sTIL, pTIL, and nuclear grades. The tumor area was completely sampled and reassessed per patient.
Histological subtyping was done according to the International Association for the Study of Lung Cancer, the American Thoracic Society, and the European Respiratory Society (IASLC/ATS/ERS) in 2011 classification of lung adenocarcinoma, including low-grade pattern (lepidic), intermediate (acinar or papillary), high-grade pattern (micropapillary or solid). High-grade patterns were classified using a 20% cutoff value. STAS was defined as micropapillary clusters, solid nests, or single cancer cells that invade the lung parenchyma beyond the tumor margins. TSR was assessed as a percentage per tenfold (10%, 20%, 30%, etc.) for all tumor areas. The cut-off value was 50%, and they were divided into two groups, the stroma-rich group (TSR < 50%) and the stromapoor group (TSR ≥ 50%). The level of sTIL was calculated according to the proportion of mononuclear cells in the tumor stroma, granulocytes and other polymorphonuclear leukocytes were excluded and were assessed as a percentage per fivefold (5%, 10%, etc.). The level of pTIL was calculated according to the proportion of mononuclear cells in the peritumoral tumor and was assessed as a percentage per fivefold (5%, 10%, etc.). The degree of nuclear atypia was assessed as follows. Grade 1 was mild atypia-uniform nuclei in size and shape. Grade 2 was moderate atypianuclei in intermediate size with slight irregularity in shape. Grade 3 was severe atypia-enlarged nuclei of varied sizes and irregular contours, with some nuclei at least twice as large as others. Elastic fibers staining was used when pleural invasion cannot be assessed microscopically.

Statistical analysis
Disease-free survival (DFS) was defined as the date of surgery to the recurrence. Overall survival (OS) was defined as the time of surgery to death. The last follow-up date was July 1, 2021. The training and testing set were generated via random sampling with a ratio of 70/30 to validate the model constructed. Patients in the training set and validation set were compared in terms of baseline variables.
We constructed the prognosis models for both DFS and OS endpoints, respectively. Univariate Cox proportional hazards were performed on each factor first. Then all significant factors on univariate analysis were entered into multivariate Cox regression analysis using a stepwise procedure with an inclusion criterion of 0.10 and exclusion of 0.15. Given the known existence that age would affect the prognosis of cancer patients, the multivariate analysis was adjusted for age as a potential confounder. Finally, the prognosis model for DFS and OS was produced using the training set and validated in the validation set, respectively. DFS and OS nomogram was generated with R package rms. The calibration curves of 3, 5, and 7 years were drawn to display the prediction probability of the constructed model. We used the time-dependent ROC analyses to evaluate our model's discrimination ability using the timeROC package. The model clinical benefit was measured from the decision curve analysis (DCA). And model performances of the DFS model and OS model were validated in the validation set.
The prognostic risk score was constructed by coefficients and their features based on the multivariate Cox regression analysis. To benchmark the prognosis power of the score, the ROC curves were applied compared to other single common characteristics. The patients were divided into the high-score and low-score groups by taking the median score value, and the survival between the two groups was estimated.
All statistical analyses were performed by R version 4.0.3 and GraphPad Prism 9.0. All tests were two-sided and p-value < 0.05 were considered statistically significant. The Chi-square tests were used to compare different groups for categorical variables. For the continuous variables the normality was tested by the Shapiro-Wilk normality test, and the Student's t test was done if normally distributed or the Wilcoxon rank-sum if nonnormal. Kaplan-Meier survival between the two groups was statistically analyzed by the Log-rank test.

Patients characteristics
According to the inclusion and exclusion criteria, 312 lung adenocarcinoma patients were enrolled in the study. We split the dataset into a training set (70% of the patients, n = 219) and a validation set (30% of the patients, n = 93) listed in Table 1. Only the visceral pleural invasion differed significantly between the groups. Both the training and validation sets had no statistical differences in STAS, tissue subtype, TSR, nuclear grades, smoking history, surgical resection, stage, age, sTIL, pTIL, and tumor size.

Construction and validation of the DFS nomogram
Six significant variables associated with DFS by univariate Cox analysis were included in the multivariate analysis. We further adjusted for factors that related to the outcomes of patients, including age, and stepwise regression was used. In the covariate-adjusted model, age, smoking history, STAS, tissue subtype, sTIL, nuclear grades, and tumor size, were significant predictors of DFS (Table 2). Then, the nomogram was constructed to predict DFS at 3, 5, and 7 years. The predictive ability of the model measured by the C-index was 0.84 (95% CI 0.77-0.90) in the training cohort and 0.77(95% CI 0.66-0.88) in the validation cohort. The nomogram was well calibrated, which demonstrated good agreement between nomogram-predicted probability and actual observed probability. The variable with a high AUC above 0.8 was considered to be an effective predictor. In ROC analysis, the AUCs were 0.87, 0.89, and 0.87 in the training set respectively, and higher than 0.8 in the validation set, indicating the good quality and accurate prediction of our model for DFS (Fig. 1). The DCA curves showed that beyond the 7 years, the net benefit of the DFS model was higher than the combined variables included in the traditional pathology report, which demonstrated the clinical utility and value of our DFS model (Fig. 2). Taken together, the calibration curves, ROC analysis, and DCA curves indicated that our model enabled valuables prediction for DFS.

Construction and validation of the OS nomogram
The same method was applied to screen significant variables to build the model and validate the model for the prediction of OS. The final OS model contained age, smoking history, STAS, tissue subtype, sTIL, and nuclear grades (Table 3), and was presented as a nomogram. The C-index of the model was 0.83 (95% CI 0.76-0.91) in the training cohort 1 3 and 0.78(95%CI: 0.68-0.89) in the validation cohort. The majority of the calibration curves followed the diagonal line, which indicated reliable risk estimates. For the OS model, the AUCs were 0.83, 0.86, and 0.88 at 3-,5-, and 7-year, respectively, which showed a better prediction accuracy (Fig. 3, 4). STAS spread through air space, TSR Tumor stromal ratio, sTIL stromal tumor-infiltrating lymphocytes, pTIL peritumoral tumor-infiltrating lymphocytes, HR hazard ratio, CI confidence interval Statistically significant (P < 0.05).

Clinical significance of the prognostic risk score
Based on the coefficients and the level of variables, the prognostic risk score of each lung adenocarcinoma patient was calculated. The prognostic risk score of each patient was calculated according to the following formula, DFS-score = Age*0.05284 + Smoki n g h i s t o r y ( y e s ) * 0 . 9 8 3 4 5 + S T A S ( ye s ) * 0 . 8 7 9 2 9 + s T I L * -0 . 0 1 7 6 0 + Ti s s u e s u btype (high-grade patterns ≥ 20%) *0.75327 + Tumor size*0.43494 + Nuclear grades(Grade2)*1.06172 or (Grade3)*1.66893;OS-score = Age*0.04627 + Smoking history(yes)*0.96185 + STAS(yes)*0.97015 + sTIL*-0.01705 + Tissue subtype(high-gradep atterns ≥ 20%)*0.838 93 + Nuclear grades(Grade2)*1.06471 or (Grade3)*1.62548, We took the AUC as a measurement of the score's predictive value (Supplement Table 1, 2). For DFS, the AUCs of our score in the training set at 3, 5, 7 years were 0.868, 0.886, and 0.869. For OS, it was 0.830, 0.862, and 0.876. ROC analyses showed that the score was better than other clinical variables (Supplement Fig. 1, 2). Finally, the patients were divided into high-risk and low-risk subgroups based on the median score (DFS = 5.46457; OS = 4.05548). The KM curves showed a significant difference in survival between the high-and low-risk subgroups for DFS and OS in the training cohort and the validation cohort (Fig. 5, 6).

Association between STAS and clinicopathological parameters
In this cohort, STAS was observed in 143 patients (45.8%). The relationship between STAS and clinicopathological parameters is summarized in Table 4. STAS is more common in lung adenocarcinomas with more aggressive behavior, such as larger tumor size (P = 0.0016), higher-grade patterns (P < 0.0001), and higher nuclear grade (P < 0.0001). In addition, it was significantly associated with tumor-infiltrating lymphocytes, especially peritumoral tumor-infiltrating lymphocytes (P < 0.0001).

STAS and immunohistochemical expression
The clinical and pathological information of the 100 patients who underwent immunohistochemistry is summarized in Supplement Table 3. The association between STAS and immunohistochemical expression is summarized in Fig. 7. The expression of CXCL8, Smad2, and Snail was higher in the STAS-positive group, and the expression of CXCL8 was correlated with the expression of Smad2, and Snail. Survival analysis showed that the STAS-positive group exhibited worse DFS and OS. High expression of CXCL8 predicts poor prognosis (DFS, P = 0.0055; OS, P = 0.0118), high expression of Smad2 is associated with shorter DFS (P = 0.0253) and no statistical significance with OS, while snail expression has little effect on survival. Fig. 1 Prognostic nomogram for predicting 3-, 5-, and 7-year disease-free survival probability of stage I lung adenocarcinoma patients (A). The calibration curves for predicting diseasefree survival probabilities (B1) at 3-, 5-, and 7 years in the training cohort; at 3-, 5-, and 7 years (B2) in the validation cohort. ROC analysis of nomogram to predict the 3-, 5-, and 7 years DFS rate (C1) in the training cohort; 3-, 5-, and 7 years DFS rate (C2) in the validation cohort. STAS spread through air space, HP highgrade patterns, sTIL stromal tumor-infiltrating lymphocytes

Discussion
Although the research on the diagnosis and treatment of stage I lung adenocarcinoma has gradually deepened in recent years, no major breakthrough has been made ineffective biomarkers and treatment methods. Surgical resection was the preferred treatment for stage I lung adenocarcinoma, but the prognosis remained unresolved, and some patients experienced recurrence and metastasis within a short period of time (Schuchert et al. 2019). Based on clinical features and content of traditional postoperative pathology reports such as tumor size, tissue subtype, and pleural invasion, it is difficult to detect and alert patients with poor prognosis. This study was the first to comprehensively analyze the pathological characteristics of tumor parenchyma and stroma, and aimed to demonstrate and clarify more clinicopathological factors that could be observed by pathologists in HE sections that indicated poor prognosis, so as to enrich and improve the content of traditional lung adenocarcinoma pathology report, and to establish a prognostic model that could be beneficial to clinical risk stratification of patients with stage I lung adenocarcinoma.
In this research, we initially included factors from traditional pathology reports such as age, smoking history, surgical resection, clinical stage, tumor size, tissue subtype, and pleural invasion, and newly added factors that could   more fully reflect the characteristics of tumor parenchyma and stroma, such as STAS, TSR, sTIL, pTIL, and nuclear grade. After univariate and multivariate Cox analysis, we finally identified independent variables affecting prognosis in stage I lung adenocarcinoma, including age, smoking history, tissue subtype, tumor size, STAS, sTIL, and nuclear grade to construct DFS and OS prognostic models, respectively, in which tumor size has less influence on OS and was excluded from the OS nomogram. The results of the C-index for DFS were (training set 0.84 vs validation set 0.77) and for OS was (training set 0.83 vs validation set 0.78) and ROC analysis with the AUC above 0.8 showed that both DFS and OS models we constructed exhibited better survival prediction ability. We also assessed the calibration curve that showed the predicted probabilities of the model are in good agreement with the actual observed probabilities, which also ensured repeatability. Through DCA measurement, the net benefit of the model incorporating STAS, sTIL, and nuclear levels was significantly higher than traditional reports, indicating the clinical application value of the model. In addition, we established the formula for calculating DFS and OS prognostic risk scores according to the different weights of risk factors, and ROC analysis verified that the score performance of this formula was better than that of clinical single variables. By this prognostic risk score formula, we further divided patients into high and low-risk groups with the median risk score as the cutoff point. Kaplan-Meier survival analysis showed a significant survival difference between the two risk groups. All results were subsequently validated by the validation cohort. The establishment and validation of prognostic models and risk score formulas are of great value for the overall diagnosis and clinical treatment of stage I lung adenocarcinoma. For pathologists, we should pay attention to more factors in the process of diagnosis, so that the routine report can be supplemented and improved. For clinicians, it could more accurately identify intermediate and high-risk groups and be used to guide the need for adjuvant therapy and close clinical follow-up. The model and formula we established verified that STAS has a larger weight in the prognosis of stage I lung adenocarcinoma. STAS has emerged as a new invasive modality for lung adenocarcinoma and was significantly associated with poorer recurrence-free and overall survival (Kadota et al. 2015;Masai et al. 2017;Gutierrez et al. 2022). Adenocarcinomas with the micropapillary subtype could also be observed in breast and ovarian cancers and the high invasiveness of the clustered cancer nest was prone to vascular and lymphatic vessel invasion causing metastasis to lymph nodes and distant organs (Eren et al. 2022;Fauvet et al. 2012). In lung adenocarcinoma, we could observe that the STAS phenomenon often appeared around the cancer tissue containing micropapillary components and solid components, often as a group of micropapillary or solid cancer cells free in the alveolar cavity or growing in the alveolar wall surface. In our analysis, STAS was associated with micropapillary and solid components, consistent with the previous conclusions (Gutierrez et al. 2022;Kadota et al. 2019). In addition, our study found for the first time that interstitial lymphocytic infiltration and peritumoral lymphocytic infiltrating cells were related to STAS. Tumor-infiltrating lymphocytes are critical for prognosis, Kim et al. reported that high levels of sTIL were associated with better DFS and OS in non-small cell lung cancer (Kim et al. 2019). Understanding the prognostic significance and the mechanisms of TIL recruitment into tumors and its relationship with STAS has important implications for the development of effective immunotherapies, which deserve further investigation. Furthermore, STAS was an independent prognostic risk factor for DFS and OS in this study. So far, many studies have investigated the formation mechanism of STAS, scholars support that EMT was involved in the development of STAS. Liu et al. supported that STAS was more easily observed in lung adenocarcinoma with high metastasis-associated protein 1 (MTA1) expression levels. MTA1 was considered to be an EMT-related protein that promoted tumor aggressiveness, stemness, and cisplatin resistance (Liu et al. 2018a, b). Liu et al. reported that STAS was associated with overexpressed Twist and Slug and showed a worse prognosis . This might provide a potential mechanism for STAS. Our previous study accomplished a comprehensive analysis of a microarray dataset of STAS that reported high expression of CXCL8 in STAS-positive samples, but its relationship with EMT was unclear.
Previous studies have shown that CXCL8 promotes tumor progression in lung adenocarcinoma and is a poor prognostic factor (Liu et al. 2018a, b). CXCL8 and its receptors CXCR1/CXCR2 play important roles in tumor development, CXCL8-CXCR1/2 Axis can activate multiple signaling pathways, such as the phosphoinositide-3 kinase/Akt (PI3K/Akt), mitogen-activated protein kinase/ extracellular signal-regulated kinase (MAPK/ERK), phospholipase C/protein kinase C(PLC/PKC), Janus kinases/ activator of transcription protein 3 (JAK/STAT3) signaling to promote tumor proliferation, tumor cell invasion and migration, maintain stemness, and anti-tumor immunity (Waugh et al. 2008;Cheng et al. 2019;Alassaf et al. 2020;Hu et al. 2020). EMT was thoughted to be one of the mechanisms by which tumor cells acquire invasive and metastatic functions. Previous studies have shown that CXCL8 could interact with Snail to induce EMT to facilitate invasion and metastasis in colon cancer (Hwang et al. 2011). In addition, CXCL8-CXCR2 promoted EMT in oral cancer through the TGF-β signaling pathway (Meng et al. 2020). TGF-β/SMADs signaling was a classical pathway leading to EMT. Smad2 and Snail were EMTrelated proteins. The results of this study showed that the CXCL8 expression was correlated with Smad2 and Snail. Furthermore, STAS was more likely to be observed with high expression of CXCL8, Smad2, and Snail. This may provide a relevant mechanism for the formation of STAS, whether CXCL8 interacts with EMT-related pathways to endow STAS with higher invasive and metastatic behavior, which is the direction we need to study further. On the other hand, CXCL8 was also a potential biomarker whose overexpression predicted a poor prognosis (Yang et al. 2020;Jia et al. 2018). Our findings suggested that CXCL8 overexpression was associated with worse DFS and OS. We concluded that CXCL8, Smad2, and Snail were related to the frequency of STAS. CXCL8 might serve as a potential biomarker for STAS formation and poor prognosis in lung adenocarcinoma.
The current study has several limitations. First, the nomogram obtained in this study was only applicable to stage I lung adenocarcinoma. Besides, more indicators could be expected to be added in the future for a comprehensive evaluation, such as molecular indicators, etc.
In conclusion, we proposed and validated the nomogram and the prognostic risk score formulas for predicting the DFS and OS of stage I lung adenocarcinoma mainly based on pathological features. As a pathologist, we try our best to give clinical indicators that reflect the prognosis in H&E-stained sections, which is helpful for the clinical evaluation of the prognosis of stage I lung adenocarcinoma. It also can be easily applied in practice and incorporated into routine pathological diagnosis. We riskscored factors strongly associated with prognosis, enabling efficient patient stratification. Finally, we found that STAS was significantly associated with stronger invasive behavior. In addition, highly expressed CXCL8, Smad2, and Snail were associated with STAS. Most importantly, CXCL8 could serve as a potential biomarker for STAS formation and poor prognosis and a routine immunohistochemical detection index for stage I lung adenocarcinoma, and the mechanism may be related to EMT.