Immune Score Closely Associated With Tumor Microenvironment as A Prognostic Factor in Lung Adenocarcinoma

Background: The tumor microenvironment plays a vital role in tumor biology and has recently attracted widespread attention. However, the prognostic signicance of integrated immune scores in lung adenocarcinoma has not yet been identied. This study aimed to systematically estimate the association between immune scores and prognosis and develop a clinical nomogram to predict the survival of patients with lung adenocarcinoma. This study also systematically explored the underlying prognostic factors of the immune score in lung adenocarcinoma. Methods: Public datasets for lung adenocarcinoma was acquired from The Cancer Genome Atlas data portal. The immune score of each sample was calculated using the ESTIMATE algorithm. Univariate and multivariate Cox regression analyses identied several signicant prognostic factors and further developed a prognostic nomogram. The C-index, calibration curve, and ROC curve were used to evaluate the predictive accuracy and discriminative ability of the resultant nomogram. Results: We found that patients with higher immune scores had a better prognosis (log rank test p = 0.0004). The nomogram that integrated the immune score could effectively stratify high-risk LUAD patients in terms of clinical response. Patients in the high-risk groups usually had a worse prognosis (log rank test p < 0.0001) and higher mortality. The mortality rate in high and low risk groups was 42.67% and 26.37%, respectively (p < 0.0001). In addition, correlation analysis showed that the immune score was signicantly dependent on the mRNA expression of immunotherapy-associated biomarkers (PD-1, PD-L1, and LAG3) as well as on the presence of certain immune cell subtypes, but had no correlation with tumor mutation burden. Conclusion: The immune score is a prognostic factor in lung adenocarcinoma. The nomogram with an integrated immune score can effectively predict the survival of patients with lung adenocarcinoma. The mechanism by which the immune score estimates the prognosis of patients with lung is related to the tumor immune microenvironment. immune

Increasing evidence suggests that the tumor microenvironment (TME), a cellular milieu consisting of tumor cells and neighboring tumor-related non-tumor cells, is crucial in tumor biology [5,6]. Immune cells, as one of the key ingredients of TME, play a signi cant role in cancer progression and aggressiveness [7,8]. Previous studies have reported that TME, especially tumor-in ltrating immune cells, contributes to the phenotype and the stage of cancer [9,10]. Moreover, research has shown that the presence of different tumor-in ltrating immune cell subtypes leads to different clinical outcomes in cancer patients. For example, cytotoxic T cells, memory T cells, and T helper 1 cells are positively correlated with good clinical outcomes in cancers such as melanoma, head and neck cancer, ovarian cancer, breast cancer, bladder cancer, colorectal cancer, and lung cancer [11]. It has been also found that the type and density of tumorin ltrating immune cells are useful for distinguishing the clinical stage and prognosis of non-small cell lung cancer [12] and colorectal cancers [13]. A useful tool in predicting the characteristics and purity of immune cells and stromal cells in tumor tissues is the ESTIMATE algorithm developed by Yoshihara et al. [14]. It uses gene expression data to calculate immune and stromal scores. The percentage of immune cells and stromal cells represents the immune score and stromal score, respectively. The ESTIMATE algorithm has been used in studies on multiple cancers, which have shown the effectiveness of such big databased algorithms [15,16]. Due to the prognostic signi cance of the integrated immune in ltration score in breast cancer [17], gastric cancer [18], oral squamous cell carcinoma [19], and ovarian cancer [20], its value can be used as an additional indicator to TNM staging system for patient survival prediction. However, very few studies have reported integrated immune in ltration in lung adenocarcinoma.
In our study, we used the ESTIMATE algorithm to calculate the immune score of patients with LUAD. We aimed to systematically estimate the association of immune score with prognosis and develop a prognostic nomogram for LUAD patients. We also used several bioanalytical tools to explore the mechanisms in uencing the immune score in LUAD.

Methods
Data source RNA sequencing and clinical data for lung adenocarcinoma were acquired from The Cancer Genome Atlas (TCGA) data portal (https://tcga-data.nci.nih.gov/tcga/). In all, 594 samples including 535 lung adenocarcinoma samples and 59 healthy controls were screened in our study. A subfraction of 489 LUAD samples included a complete corresponding baseline and follow-up data (age, sex, and pathological stage) as well as corresponding mRNA expression data.
Estimation of immune cell score and tumor-in ltrating immune cell-type composition Immune scores of each sample were calculated by applying the ESTIMATE algorithm to the mRNA expression data [21]. The optimal cutoff point for the immune score was calculated on the basis of the prognostic signi cance using the X-Tile software (Yale University School of Medicine, New Haven, CT, USA) [22], followed by division of the 489 LUAD patients into low and high immune score groups based on the obtained cutoff value. The relative levels of different tumor-in ltrating immune cell types were quanti ed using the CIBERSORT algorithm, which is a gene expression-based deconvolution algorithm able to accurately distinguish 22 immune cell types [23].

Statistical analysis
All statistical analyses were performed using R 3.6.1 (www.r-project.org) and IBM SPSS Statistics 19.0 (IBM, Inc., Armonk, NY, USA). We used univariate and multivariate Cox regression analyses to identify all signi cant prognostic factors and construct an optimal prognostic nomogram. Kaplan-Meier (K-M) curves were generated for survival analysis, and the log rank test was used for comparison. The predictive accuracy and discriminative ability of the nomogram and other clinical parameters were evaluated using the concordance index (C-index) and the calibration curve. The sensitivity and speci city of the nomogram were evaluated using a time-dependent receiver operating characteristic (ROC) curve.
The relationship between the immune scores and the included variables were compared using the t-test of one-way ANOVA. The chi-squared test was used to analyze mortality of the patients in low and high risk groups. Correlation analysis was also conducted to explore a potential relationship between the immune score and the expression of immune checkpoint regulators. In our study, a two-tailed p-value < 0.05 was considered statistically signi cant. Differentially expressed genes (DEGs) were identi ed using the limma package, and the established cutoffs were fold change in expression > 1.5, and p < 0.05.

Patient characteristics
After data screening, 489 patients were included in this study. Exclusion criteria were the lack of data on patient's survival time, the lack of complete baseline and follow-up data (age, sex, and pathological stage) as well as data duplications. Each score corresponded to a patient. The average age was 65.32 years (SD = 9.96, range 38-88), and 452 (68.10%) patients within this set were older than 60 years. In all, 377 (77.10%) patients were in stage I and stage II of cancer, 112 (22.90%) patients were at stage III and stage IV of cancer. Local lymph node metastasis occurred in 320 patients (65.44%), and distant metastasis occurred in 162 patients (33.13%). The patient's median immune score was 948.01 (range -1355.9-2905.3). Patients were divided into low (307, 62.78%) and high (182, 37.22%), immune score groups based on the cutoff value of 1246.3 (Fig. 1). The median overall survival (OS) was 651 days (range 4-7248 days). Table 1 presents the clinical characteristics of the different patient subgroups distinguished in the training cohort based on immune scores. The average ages of the low and high immune score groups were 62.76 years (SD = 10.12) and 66.27 years (SD = 9.62), respectively.

The relationship between immune scores and clinical characteristics
The results showed that the immune score was signi cantly negatively correlated with patient's age (p = 0.0226), patient's sex (p < 0.0001), tumor size (p = 0.0008), distant metastasis (p = 0.0419), and pathological stage (p = 0.01), but had no correlation with lymph node metastasis (p = 0.3189, Fig. 2). Furthermore, we investigated the relationship between immune scores and prognosis. We found that compared with the low immune score group, the high immune score group displayed a better prognosis (log rank test p = 0.0004, Fig. 3).

Multivariate analyses and prognostic nomogram for OS
The results of the multivariate Cox regression analyses are shown in Fig. 4. Similarly to the univariate analysis, the high immune score subgroup was signi cantly associated with worse prognosis (HR = 0.55, 95% CI = 0.39-0.78, p < 0.001), indicating that the immune score remained an independent prognostic indicator after adjustment for other clinical features. In addition, lymph node metastasis and new tumor events also showed independent prognostic characteristics (N1 (HR = 2.19, 95% CI = 1.20-3.97, p = 0.01) N2-N3 (HR = 2.85, 95% CI = 1.16-7.03, p = 0.023), new tumor events HR = 3.03, 95% CI = 2.14-4.28, p < 0.001)). We then integrated all signi cant independent factors and other commonly used clinical features for estimating OS to construct a prognostic nomogram (Fig. 5). The C-index of the resultant nomogram was 0.723 (95% CI = 0.681-0.767). Consistent with our Cox multivariable regression results, the immune score is characterized by a wider risk point range (0-56) than that of the traditional TNM staging system (0-24). The importance of each clinical parameter in relation to OS is indicated in Additional le 2: Fig.   S2. It shows that the immune score signi cantly contributes to the prognostic nomogram. Moreover, we observed that tumor size, lymph node metastasis, and new tumor events also contributed signi cantly to the nomogram. The risk point ranges were 0-68, 0-96, and 0-100, respectively (Fig. 5). The calibration plot of the 3-year and 5-year survival rates showed that the predicted value of the nomogram was consistent with the actual observed value (Fig. 6a and Fig. 6b).
Comparison of predictive accuracy for OS between nomogram and other clinical characteristics As shown in Fig. 4, the hazard ratios of lymph node metastasis and new tumor events for survival were higher than the hazard ratios for the other factors. Therefore, we compared the predictive power of the immune score nomogram with lymph node metastasis and new tumor events as prognostic tools in patients with LUAD. The C-index for OS predicted by lymph node metastasis and new tumor events was 0.623 and 0.605, respectively, which was signi cantly lower than the C-index predicted by the immune score nomogram (C-index = 0.723). In addition, compared with conventional staging systems, our nomogram also displayed better accuracy in predicting survival. The C-index of the TNM staging system was 0.661, which was also signi cantly lower than that of the immune score model (C-index = 0.723). Therefore, our results suggest that the integrated multi-factor nomogram is a useful predictor for the survival of patients with LUAD.
Performance of the immune score nomogram Each patient was assigned a risk score based on the immune score nomogram, which allowed us to divide the patients into low and high-risk groups. The median risk score, risk score distribution, and survival status of each patient are shown in Additional le 3: Fig. S3a. The high-risk group had a higher mortality rate (42.67%) than the low risk group (26.37, p < 0.0001). Moreover, the Kaplan-Meier curve revealed that patients in the high-risk group exhibited poorer prognoses than patients in the low-risk group did (log rank p < 0.0001; Additional le 3: Fig. S3b). The median survival for the high-risk group was 2.70 years, while that for the low risk group was 7.18 years. ROC analysis was used to assess the prognostic value of the scoring model. The area under the curve (AUC) in 3-year and 5-year OS prediction was 0.793 and 0.779, respectively (Additional le 3: Fig. S3c). Furthermore, we used the ROC curve to compare the performance of the immune score nomogram with other factors in survival prediction. As shown in Additional le 3: Fig. S3d, the AUC of age, tumor size, lymph node metastasis, distant metastasis, and stage were 0.538, 0.639, 0.646, 0.527, and 0.655, respectively. All of these values were signi cantly lower than the AUC of the immune score model (AUC = 0.814).
Bioinformatic analyses in the search for the mechanisms contributing to the immune score Genomic analysis performed on our data showed that the C>A mutation was the most common singlenucleotide variant (SNV) mutation type and that TTN was the most frequently mutated gene in the examined lung adenocarcinoma patient group. Moreover, lung cancer driver gene KRAS also tended to be highly expressed in this data subset (Fig. 7a). Tumor mutation burden (TMB), which plays an essential role in tumor immune therapy, was signi cantly different between the low and high immune score groups (p < 0.0001, Fig. 7b), and 279 DEGs were identi ed in the two groups (p < 0.05, Additional le 4: Table S1). The KEGG pathway analysis revealed that various mutations were enriched in genes relevant for several pathways, including the Staphylococcus aureus infection pathway and the allograft rejection pathway. They were also prevalent in genes coding for the cell adhesion molecules (CAMs). Next, we investigated the relationship between the immune score and the mRNA expression of immunotherapy-associated markers (PD-1, PD-L1, CTLA4, and LAG3) and found a signi cant correlation with the expression of PD-1, PD-L1, and LAG3 (Fig. 8). In addition, we analyzed the association between the immune score and the tumor microenvironment, using the CIBERSORT algorithm, which allowed us to obtain the immune cell landscape in each sample (Fig. 9a). We found that the numbers of ten immune cell groups were signi cantly different in the high and low immune score groups, among which naive B cells, plasma cells, gamma delta T cells, resting dendritic cells, and activated dendritic cells were signi cantly more prevalent in the low immune score group, while memory B cells, CD8 T cells, activated memory CD4 T cells, M1 macrophages, and activated mast cells were signi cantly more abundant in the high immune score group (Fig. 9b).

Discussion
LUAD is a highly heterogeneous disease [24,25], which increases the di culty of treatment and prognostic prediction. The tumor microenvironment has recently attracted widespread attention as an important factor in LUAD progression [26,27]. Tumor in ltrating immune cells, one of the TME components, have been proven to play a key role in this process [28,29]. In the present study, we adopted the ESTIMATE algorithm to calculate the immune score and evaluated the prognostic signi cance of this parameter in lung adenocarcinoma. So far, a prognostic signi cance of immune scores has been validated in several other cancers [30,31]. Our study found that the immune score signi cantly impacted the OS in LUAD as patients in the high immune score group showed a better OS than patients in the low immune score group. The possible reason is that patients with higher immune scores have stronger immune systems which enhance the anti-tumor properties of the tumor microenvironment, and ultimately help control and eliminate tumors [32,33]. All in all, the abovementioned results suggest that the immune score is closely associated with patient outcome. Moreover, we found that immune scores were negatively correlated with several clinical characteristics, such as age, tumor size, M stage, and pathological stage. These results imply that young or early cancer patients may better respond to immunotherapies than old or advanced cancer patients may. However, no correlation was found between the immune score of patients with LUAD and lymph node metastasis. This suggests that the immune response mobilization in the state of lymphatic metastasis may not be enough to further stimulate the anti-tumor response. However, it is important to emphasize that this result requires further research. We also found that the higher the immune score tended to be ascribed to female patients more frequently than to male patients. The results was consistent with Carmen Behrens et al. Report [34], immune in ltration in LUADs was signi cantly higher in female patients. Moreover, Nakamura H et al. [35] also reported that some immune in ltration cells, such as the proportion of CD4 T cells and the number of lymphoblasts, were signi cantly higher in female lung cancer patients than in male patients.
In a previous study, we found that lymph node metastasis-related biomarkers, DNA epigenetic regulation, as well as other clinicopathological and molecular mechanisms could affect the survival of patients with LUAD [36][37][38][39]. The contribution of immune cells to lung cancer has been well recognized, and immune gene signatures are considered as relevant predictive biomarkers for immune therapy responses [40,41]. Nevertheless, they have not been well used in clinical practice. In this study, we highlighted the importance of the collective immune score rather than the separate immune gene signatures in patients with LUAD. It was achieved by constructing a nomogram, with adjustment for possible confounding factors that integrated the immune score with other common OS indicators in lung adenocarcinoma. We found that the nomogram could effectively stratify high-risk LUAD patients. Interestingly, patients who had high-risk scores usually were characterized by poor prognosis and signi cantly higher mortality rates than patients with low risk scores. The effectiveness of our model was further statistically validated. The results of this analysis lead to the conclusion that the immune score can serve as an effective supplement to the TNM staging system for patient survival prediction.
Furthermore, we performed several bioinformatics analyses to explore the possible mechanisms contributing to the immune score in LUAD. First, we explored the association between TMB and immune score by performing genomic analysis and found signi cant differences between high and low immune score groups. The high immune score group had a lower TMB than the low immune score group. The TMB is the total number of mutations per coding area of a tumor genome, including base substitutions, insertions, or deletions detected per million bases [42]. It is known to be exploited as a biomarker to evaluate the patient's response to PD-1/PD-L1 blockade in many tumors such as non-small cell lung cancer (NSCLC). The higher TMB tends to have a better prognosis in several cancers [43,44]. However, in our study, the results were the opposite: the high immune score group had low TMB and a good prognosis. It can be explained by the fact, that the tumor microenvironment modulates the immune response of the body in a very complex way. For example, studies have reported that there is no signi cant correlation between PD-L1 expression and TMB in most cancer subtypes. Therefore, TMB and PD-L1 are two independent biomarkers to predict the response to immune checkpoint modulators [45,46]. The reason for the discrepancy between biomarker characteristics in cancer may be the different immunogenicity of the underlying disease, which triggers different preexisting anti-tumor T cell responses [47]. Therefore, different immune response mechanisms may contribute to the overall immune score. In the correlation analysis between the immune score and the immunotherapy-related markers, we established that the immune score is associated with the expression of PD-1, PD-L1, and LAG3. In recent years, the immune checkpoint proteins (programmed cell death ligand 1/programmed death receptor 1, PD-L1/PD-1) have been used as crucial targets for immunotherapy in many cancers, including LUAD [48]. For example, PD-L1 expression is indicative of worse prognosis for NSCLC patients, but the probability of clinical bene t from immune-modulating drugs is greater in NSCLC patients expressing PD-L1 than in patients who do not express the protein [49,50]. Lymphocyte-activation gene 3 (LAG3) is a transmembrane protein with structural homology to CD4 co-receptors [51]. It is found in activated CD4 T cells, regulatory T cells, Tr1 cells, activated CD8 T cells, natural killer cells, dendritic cells, B cells, and exhausted effector T cells [17,18]. LAG3 has multiple biological functions, for example, it is known to negatively regulate the proliferation, activation, and effector functions of T cells [52,53]. In our study, we proved that the immune score is strongly correlated with immune-related biomarkers such as PD-1, PD-L1, and LAG3. Next, we used the CIBERSORT algorithm to obtain the immune cell landscape of all examined patients with LUAD and explored the relationship between the immune cell subtypes and immune scores of these patients. We found that ten types of immune cells signi cantly differ in numbers between the high and low immune score groups. Among them, memory B cells were signi cantly more abundant in the high immune score group than in the low immune score group (p < 0.0001). It has been previously reported that memory B cells formed in the germinal center after the initial infection (which are essential for the rapid and powerful antibody-mediated immune response during re-infection) were associated with better patient outcomes in LUAD [54]. These results are consistent with the present study. Macrophages play an essential role in the tumor environment. They are divided into two subtypes (M1, M2). M1 macrophages are involved in the in ammatory response and anti-tumor immunity, while M2 macrophages are involved in anti-in ammatory response and are characterized by tumor-promoting properties [55]. In our study, we found that M1 macrophages were signi cantly more prevalent in the high immune score LUAD patient group and had anti-tumor effects. Contrarily, there was no signi cant difference in the number of M2 macrophages between the high and low immune score groups.
Nevertheless, this research has some limitations. Our study used the data from public databases to establish an immune score nomogram related to the prognosis of patients with LUAD and explored the possible mechanisms contributing to the immune score and patient prognosis. It is important to remember that the obtained results need to be further expanded and veri ed in order to be used in clinical practice. The clinical information available in the database is not complete. For example, the relationship between the constructed model and the common driving genes of lung cancer such as EGFR, ALK, and ROS1 needs further study. Moreover, the corresponding regulatory mechanisms also require further clinical veri cation. For example, the relationship between the expression of immunotherapy-related markers (PD-1, PD-L1, LAG3, and CTLA4) and the immune score should be determined by immunohistochemistry.

Conclusions
In summary, this study provided a comprehensive understanding of the association between the immune score and lung adenocarcinoma prognosis and established a robust prognostic nomogram. This predictive model may help easily estimate the survival of patients and identify potential patients who may bene t from adjuvant therapy.    Figure 1 X-tile show the immune score plots. The immune score of 489 LUAD patients ranging from -1335.9 to 2905.3, the best cutpoint divided LUAD patients into two subgroups. low immune score subgroup ranging The calibration curve of overall survival (OS) at 3 and 5 years for the LUAD. Nomogram-predicted probability of OS is plotted on the x-axis; actual OS is plotted on the y-axis.

Figure 7
Page 24/26 Mutation analysis of lung adenocarcinoma. a General overview of mutation in lung adenocarcinoma. b Association with immune score and TMB (tumor mutation burden), *** represent for p value <0.0001.

Figure 8
Correlated analysis the relationship between immune score and immuotherapy associated biomarkers base on the mRNA expression data.