Identication of an Immune-related Seven-lncRNA Signature Predicting Prognosis and Tumor-inltrating Immune Cells in Lung Adenocarcinoma

Background: Lung adenocarcinoma (LUAD) is a common cancer. Immunotherapy is one of the major treatments but showing diverse ecacy in LUAD. Long non-coding RNAs (lncRNAs) are emerging as important players in immune regulation in cancer. Herein, we identied and validated a prognostic signature of immune-related lncRNAs in LUAD and explored its correlation with tumor-inltrating immune cells (TIICs) by bioinformatics analysis. Methods: Immune-related lncRNAs were acquired using Pearson correlation analysis between lncRNAs from The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) databases and immune genes from the ImmPort website and Molecular Signatures Database. The risk signature was constructed in the TCGA group through univariable Cox, lasso and multivariable Cox regression analyses. The prognostic value of the established risk signature was validated in both TCGA and GEO datasets. The interacted TIICs and immune pathways with each single lncRNA and the risk signature were investigated respectively in ImmLnc database, Cibersortx database and gene set enrichment analysis (GSEA) analyses. Results: A seven immune-related lncRNAs prognostic signature was constructed and it stratied LUAD into high and low risk groups. High risk group showed poorer overall survival (OS) in comparison with low risk group via survival analysis.The seven-lncRNAs signature was closely correlated with various TIICs and immune pathways mostly involved in T cell activation, antigen processing and presentation, chemokines and cytokine receptors. Conclusions: The seven lncRNAs model was identied as a predictable signature for prognosis of LUAD patients probably due to its immunomodulation role. This study might provide a new target for enhancing the ecacy of immunotherapy in this mortal disease. operating characteristic curve; AUC, area GSEA, gene set analysis; RMA, Robust Multiarray Average; FDR, false discovery rate; ncRNAs, non-coding RNAs; PD-1, programmed death 1; CRNDE, colorectal neoplasia differentially expressed; TME, tumor microenvironment; CXCL5, C-X-C motif chemokine ligand 5; NK cells, natural killer cells.


Background
In 2018 globocan report, new lung cancer cases were 2,093,876 and the deaths were 1,761,007, both ranked the rst position among all malignancies, indicating lung cancer was still the biggest threaten to people worldwide [1]. Lung cancer is generally classi ed into two distinct classes: small cell lung cancer and non-small cell lung cancer (NSCLC), and the latter occupies more than 80% of lung cases. Lung adenocarcinoma (LUAD) is the most common subtype of NSCLC, but it is still a heterogeneous disease with diverse molecular signatures and short of predictable prognostic markers [2]. Recently, the application of immune checkpoint blockade (ICB) has dramatically changed the therapeutic scenario of LAUD and improved its overall survival (OS) [3]. But unsatisfactorily, only a fraction of LUAD patients responds to ICB even in the cases with high programmed death-ligand 1(PD-L1) expression [4]. Multifaceted factors cause this diverse e cacies including mismatch repair de ciency, mutation or neoantigen load and PD-L1 level, but none of these factors is su cient to predict the prognosis of LUAD under ICB treatment [5,6].
The purpose of ICB treatment is aimed to remove the inhibition role of cancer cells on TIICs, and thus unleash TIICs activation. So the quantity and intrinsic state of TIICs is one of the core factors determining ICB e cacy. The function state of cytotoxic T lymphocytes (CTLs) in tumors predict cancer immunotherapy response, and the interaction of each gene with CTLs in ltration in tumors in uence patient survival [7,8]. TIICs can even induce hyperprogressive disease (HPD) de ned as an accelerated growth of cancer cells during ICB treatment. Immune cells including tumor-associated macrophages (TAMs) [9], CD4 T cells [10], CD8 + T cells [11], effector Regulatory T (eTreg) cells [12] and neutrophil-tolymphocyte ratio [13] were signi cantly associated with HPD after ICB treatment, hinting the impacts of TIICs on diversi ed outcomes of ICB in NSCLC.
Long non-coding RNAs (lncRNAs) are non-protein-coding RNAs length greater than 200 nucleotides, joining in genetic transcription, mRNA stability and protein function regulation [14]. LncRNAs are intense participants in numerous stages of tumor immunity through mediating TIICs functions [15], including T cells [16], monocytes [17], macrophages [18] and granulocytic myeloid-derived suppressor cells (G-MDSCs) [19]. LncRNA NKILA induces tumor immune evasion by promoting apoptosis of CTLs and type 1 helper T (T H 1) cells [16]. LncRNA Pvt1 mediates the immunosuppression function of G-MDSCs weaken ICB e ciency [19], and accumulating evidences further deepen the understanding of the modulation of lncRNAs on TIICs function in recent reports [16,20]. Herein, we constructed a seven-lncRNAs prognostic signature from ltered immune-related lncRNAs, and explored the correlation of this signature with TIICs in LUAD by comprehensive bioinformation analysis, with the purpose of providing an alternative predictor for LUAD prognostication and possible targets for ICB sensitization.

Acquisition of the immune-related lncRNAs in LUAD
The format of fragments per kilobase of exon per-million fragments (FPKM) of RNA-seq and clinical data of LUAD were obtained from The Cancer Genome Atlas (TCGA, https:// portal.gdc.cancer.gov). Raw data of gene expression, clinical information and platform les were obtained from online database of Gene Expression Omnibus (GEO) (http://www.ncbi.nlm.nih.gov/geo/) [21]. The TCGA datasets were assumed as the discovery group, and two independent GEO cohorts (GSE31210 [22,23] and GSE50081 [24]) were processed as the validation groups. Patients with overall survival less than 30 days were excluded since they might die of other diseases rather than LUAD [25].

Construction and validation of the risk model
Immune-related lncRNAs and clinical information were merged to build the prognostic signature using the data from the patients with complete information. The signature was constructed by univariate and multivariate Cox proportional hazards regression and Lasso regression from the TCGA datasets and validated in two GEO datasets. The risk score was calculated as follows: Pearson analysis within Gene Expression Pro ling Interactive Analysis (GEPIA2) database [28] was used to con rm the correlation between the seven lncRNAs and immune genes originated from the TCGA group. According to the median risk score estimated by the lncRNAs through their coe cients of the signature, the LUAD patients were separated into two groups (high risk group and low risk group). Then the prognostic value between clinical features (including age, gender, T, N, M and stage) and the lncRNAs within the signature were compared by univariable and multivariable Cox analyses. Besides, the correlation analysis between the clinical feature and the seven lncRNAs and signature were implemented via Analysis of Variance (ANOVA). The Kaplan-Meier method was used to draw the survival curves, and the sensitivity and speci city of the model were evaluated by the receiver operating characteristic (ROC) curve analysis.

TIICs and immune pathways analyses between two risk groups
The TIICs and immune pathways involved in each single lncRNA were investigated in the ImmLnc database(http://bio-bigdata.hrbmu.edu.cn/ImmLnc/) [29]. Differences of TIICs and immune pathways between the high and low risk groups were analyzed respectively by Cibersortx database [30]

Statistical analysis
All data and gures were processed with R (version 3.6.1) and perl language. The raw data of gene expression pro les from GEO database were standardized through the Robust Multiarray Average (RMA) method. P value less than 0.05 was considered to be signi cant in this study. "***", "**", "*" and "ns (not signi cant)" represent p < 0.001, p < 0.01, p < 0.05 and P > 0.05, respectively.

Patients and immune-related lncRNAs
After wiping out patients with incomplete clinical information, a total of 486 patients in TCGA datasets, 226 patients in GSE31210 and 127 patients in GSE50081 were included in this study. The TCGA cohort was performed as the training group, and two independent datasets of GSE31210 and GSE50081 were served as the validation groups. The detailed clinical features were summarized in Table 1. A total of 17,351 lncRNAs obtained from TCGA datasets and 1,075 lncRNAs identi ed from the GEO group were screened and 1,069 common lncRNAs were included in this study. Then through co-expression analysis with 1,986 immune-related genes discerned in TCGA datasets, 206 immune-related lncRNAs were discovered and applied to establish the prognostic risk model. 3.2 The establishment of seven immune-related lncRNAs signature The prognostic signature was constructed utilizing these 206 immune-related lncRNAs. 13 lncRNAs were picked out as candidate prognostic biomarkers to build the model at the p-value < 0.005 (Fig. 1a) [31]. 12 lncRNAs were ltrated as vital factors after the logistic least absolute shrinkage and selection operator (Lasso regression, Fig. 1b

The seven lncRNAs were closely related with immune genes
To explore the mechanism of the prognostic signature, the immune gene analysis of the seven lncRNAs was applied in the TCGA datasets and validated in the GEPIA2 database. Twenty immune genes were signi cantly correlated with the seven lncRNAs (absolute correlation coe cient > 0.4 and p-value < 0.05, Table 2), and the correlations were also con rmed in GEPIA2 analysis ( Fig. 2a-t).

The signature independently predicted LUAD prognosis
To test the prognostic value of the model, the clinical information of both training group and validation groups were employed. According to the median immune risk score, the patients were divided into high risk group and low risk group. Using Kaplan-Meier survival analysis, the OS in high risk group was shorter than that in low risk group in the TCGA group (Fig. 3a), and similar results were obtained in two GEO datasets (Fig. 3b, 3c). Within this model, the mortality of patients increased in high risk score group with raised expression level of LINC01116, CASC15, LINC00857, LINC00460, albeit decreased LINC00324, CRNDE, GAS6-AS1 levels( Fig. 3d-f).
Clinicopathological features (such as age, gender, TNM and stage), individual lncRNA and seven-lncRNA signature were respectively analyzed for their correlations with OS by univariable and multivariable Cox regression methods in TCGA group. The risk score calculated from seven lncRNAs was signi cantly negative correlated with OS ( Fig. 4a-b). And its value of prognosis predictor was also validated using the time-dependent ROC analysis with the area under the curve (AUC) of ROC curves showing 0.743, 0.676 and 0.842 in three datasets ( Fig. 4c-e). Further results showed that the risk score was positively correlated with T, N, M pathologic stage, but was not with the advanced stage of T4 and N3, which may be interpreted by limited T4 or N3 cases in TCGA and no cases in GSE50081 and GSE 31210 ( Fig. 5a-d).

TIICs and immune pathways were related with the seven lncRNAs
To investigate the immune cells and pathways involved in the seven lncRNAs, the ImmLnc database was performed in per lncRNA. From the TIICs inquiry, most frequent TIICs engaged in individual lncRNA in the seven-lncRNA signature (more than 4 lncRNAs) were B cells, Dendritic, Macrophage and CD4 T cell (Table 3). And the immune pathways of Antigen Processing and Presentation, Chemokines, Cytokine Receptors, Cytokines and Antimicrobials were embroiled in three lncRNAs of the signature (LINC00460, GAS6-AS1 and LINC00857) (P adjust < 0.05 and absolute Score > 0.995) ( Table 4).    Mast cells resting were found (P < 0.01, Fig. 6a).
In order to explore the correlation between the model and immune microenvironment, GSEA analysis of the immunological signatures was carried out between high and low risk groups in the TCGA datasets.
Within the 4872 gene sets of high risk group, 59.36% of them were signi cantly enriched at the false discovery rate (FDR) < 25%, while 23.20% were signi cant at nominal p-value < 1% and 38.94% were at nominal p-value < 5%. In the low risk group, 39.94% gene sets were enriched at FDR < 25%, 9.48% of them were at nominal p-value < 1% and 20.22% at nominal p-value < 5%. Contrastly, the high risk group was distinctly enriched in the immunological signatures, especially in the pathways of macrophage and CD4 T cells ( Fig. 6b-g).

Discussion
Lung adenocarcinoma is a deadly malignancy, which is prone to local relapse and distant metastasis [2].
Besides genetic abnormality delineating the process of cancer, lncRNAs are gradually concerned for their increasing role in the progression of cancer. LncRNAs are long-term studied for their role of promoting proliferation and metastasis in the progression of LUAD [32], but it is still lack of lncRNAs model in predicting LUAD prognosis.
In this study, we constructed a seven lncRNAs model as a prognostic signature for LUAD. We rst screened out 206 immune-related lncRNAs using Pearson correlation analysis between lncRNAs and immune genes, and then construct a seven lncRNAs signature via Cox proportional hazards regression and logistic least absolute shrinkage and selection operator. High and low risk groups were split by the median risk score among LUAD patients from each cohort, and poorer overall survival of the high-risk group were validated in both TCGA and two GEO datasets. Further we unraveled that more in ltration of activated memory CD4 T cells, resting NK cells, Macrophages M0 and activated Mast cells, and less of Monocytes, resting Dendritic cells and resting Mast cells were discovered in high-risk group. Immune pathways related to Macrophage and memory CD4 T cell were also signi cantly enriched in the high-risk patients.
Immune destruction and recruited immune cells to tumor site are emerging hallmarks in cancer [33]. can shape cancer destiny [34]. TIICs are considered as active players in cancer progression and cause heterogeneous responses to variant treatments [35].
LncRNAs play vital roles in the function of immune cells within tumors [16,36]. Here we found that all the seven lncRNAs were closely associated with TIICs, and the most relevant TIICs were B cells, macrophage, neutrophils and Dendritic cells. Antigen-present is an important process for anti-tumor immune reaction.
Immune pathways especially Antigen Processing and Presentation involved in the seven lncRNAs and signature are deeply engaged in the recruitment of TIICs and cancer development. Our study indicated that immune pathways related to macrophage and memory CD4 T cell were distinctly enriched in high risk groups.
Various lncRNAs were reported being predictors in the prognosis of B-cell lymphoma via stimulation or inhibition role on B cells. Macrophage is a vital participant in the process of Antigen Processing and Presentation and immunotherapy [37]. It can be regulated by several lncRNAs [38]. Macrophages originate from monocytes of bloodstream and becoming TAMs after migration. TAMs drive disease to deteriorate via promoting cancer metastasis in early-stage lung cancer [39]. LncRNAs participate in the development and polarization of macrophages, which might showed big impact on cancer progression and the e cacy of PD-L1 antibody [36]. Neutrophils hold the bidirectional traits in promoting or averting cancer progression, and were also identi ed in our studies to be relevant to lncRNAs. Dendritic cells are specialized as antigen-presenting cells serving immune process, and their interaction with tumor cells are presented as immune-activator or immune-suppressor during cancer progression, which can be regulated by lncRNAs [40]. Among the seven lncRNAs, CRNDE, GAS6-AS1, LINC00324 and LINC00857 have the most intimate connection with these TIICs (p < 0.05 and absolute Rs value > 0.2). LncRNA colorectal neoplasia differentially expressed (CRNDE) promotes proliferations and metastasis by instigating suppressor TIICs.
Prominently, within tumor microenvironment (TME), in ltrating T cells are the strongest predictor for PD-L1 e cacy. We found that high risk group had a greater in ltration level of T cells CD4 memory activated, while less of T cells CD4 memory resting. T cells can also be recruited by chemokines into cancer area from circulation [34]. Intriguingly, decreased LINC00324 might lead to immune evasion of cancer cells through defective NK cells via inhibiting immune pathways of cytokine and chemokine signaling and Ag processing [41][42][43][44]. Noticeably, LINC00460 was enriched in four immune pathways by the ImmLnc analysis, indicating its critical role in immune system. C-X-C motif chemokine ligand 5 (CXCL5) which was closely correlated with LINC00460 in the signature could increase PD-L1 expression via PI3K/AKT pathway [45] and regulate the function of tumor-associated neutrophils which inhibit the activities of Th1 and Th17, CTLs and NK cells [46][47][48][49]. Similarly, VEGFC/VEGFR3 signaling closely connected with LINC01116 and LINC00460 were mainly expressed on immune cells including macrophages and NK cells, and might also disturb ICB e cacy via affecting dendritic cells and CD8 T cells [50][51][52], or inducing the naïve T cells via CCL21 [53].

Figure 6
Page 32/32 The signi cantly in ltrated TIICs and immune pathways within two risk groupsof LUAD. Differently in ltrated TIICsof the two risk groups in the Cibersortx analysis(a).Top six enriched plots from GSEA analysis(b-g). Abbreviations: GSEA, gene set enrichment analysis; ES, enrichment score; NES, normalized ES; p, normalized p-value.