Individualized Prognostic Score for Lung Adenocarcinoma Based on Immune Status

Background: Lung adenocarcinoma (LUAD) is the prevalent non-small cell lung cancer (NSCLC) histological subtype, with rising morbidity and mortality. This study was conducted with the motive to develop a prognostic model by examining the relation between immune-related gene (IRGs) proles. Methods: We combined immune-related gene (IRG) expression proles and progression-free intervals (PFIs) in 594 LUAD patients by using TCGA dataset. The COX regression analysis and computational difference technique were used to predict survival-associated and differentially expressed IRGs in LUAD patients. Computational biology was also used to investigate the characteristics and molecular mechanisms of these LUAD-specic IRGs. Using multivariable COX analysis, a novel prognostic index based on immune-related genes was established. Results: There was a strong correlation between the clinical outcome of LUAD patients and 58 differentially expressed immune-related genes. These genes were found to be actively involved in a KEGG pathway involving cytokines and cytokine receptors, as demonstrated by functional enrichment analysis. A prognostic signature comprised of IRGs (CRABP1, S100A16, FGF2, RBP2, FURIN, SLC11A1, ANGPTL4, VIPR1, BDNF, IL11, INHA, SEMA4B, TNFRSF11A) performed relatively well in prognostic prediction and was associated with tumour stage, tumour burden, age, number of lesions, and metastasis. Surprisingly, the IRG-based prognostic index effectively predicted immune cell inltration. Conclusions: Our ndings collectively identied multiple clinically relevant IRGs, identied regulators of the immunological repertoire, as well as established the critical role of a tailored, surveillance, IRG-based immune signature in the diagnosis, and prognosis of LUAD.


Introduction
Globally, lung cancer (LC) is the main cause of cancer-related deaths, causing around 2 million diagnoses as well as 1.8 million deaths with a 21% ve-year survival rate in recent years (1,2). Non-small-cell lung cancer (NSCLC) is the major histological subtype of LC, responsible for 84% of all LC diagnoses apart from the other subtype called small-cell lung cancer (SCLC), with a declining contribution of 13% (3). The heterogeneous class of NSCLC includes adenocarcinoma (LUAD), large cell carcinoma (LCC) and squamous cell carcinoma (SCC) (4).
LUAD is the most frequently diagnosed and fatal subtype among the three, with an average 5-year survival rate of less than 20% (5). It accounts for approximately 60% of all NSCLCs (6), which could be due to the increasing proportion of smokers in the majority of countries. The prognosis in patients with LUAD is di cult, which could be due to the high risk of metastasis (7).
Numerous therapies, including chemotherapy, molecular targeted therapy, surgical resection, and radiotherapy, have been developed in the clinical treatment of LUAD over the last decades, but overall survival time for LUAD patients has not improved signi cantly, owing in part to a dearth of useful molecular biomarkers (8). As a result, identifying novel prognostic gene markers for LUAD is critical for increasing LUAD patients' overall survival. Given that the immune system has been implicated in the development and progression of cancer, examining the differential expression of immune-related genes (IRGs) in tumour tissue samples is critical for optimising immunotherapy, understanding the tumour microenvironment, predicting patient prognosis and improving clinical diagnosis. Despite widespread recognition of IRGs' importance in cancer growth and treatment, no rigorous genome-wide pro ling investigation into their clinical impact and molecular processes on LAUD has been conducted (9). This systematic, integrated investigation of IRGs in LUAD contributes to the understanding of their clinical relevance and sheds light on possible molecular features. The large number of LUAD samples available for this study aided in the generation of robust results. We identi ed many IRGs that play a critical role in the beginning and progression of LUAD. These IRGs have the potential to be useful clinical biomarkers. Bioinformatics systems will permit a more in-depth examination of the molecular mechanisms behind these proteins. Most signi cantly, a customised immune prognostic signature based on differentially expressed IRGs was proposed to predict clinical outcomes.

Data Collection
The transcriptome RNA sequencing datasets as well as clinical follow-up data concerning patients with lung adenocarcinoma (LUAD) were obtained rst from TCGA Data Portal (https://cancergenome.nih.gov/). HTSeq-FPKM was used to collect the expression data, which included 535 primary LUAD cells and 59 non-tumor samples, as well as clinical information for these patients. A detailed list of IRGs was obtained from the Immunology Database and Analysis Portal (ImmPort) database (https://www.immport.org/shared/home), which reliably and timely updates immunology data, facilitating the exchange of immunology data among cancer researchers and giving a list of IRGs for cancer researchers to use. The GSE31210 dataset was a second set of transcriptome pro ling data from LUAD patients that was acquired from the GEO database and used to validate the predictive IRG model. It was derived and retrieved from the GEO database (https://www.ncbi.nlm.nih.gov/geo

Differential Gene Analysis and Filtering
The training set consisted of instances from the TCGA database. The Wilcox Test was used to determine whether IRGs were differentially expressed between LUAD and nearby non-tumor lung tissues. All transcriptional data were subjected to differential gene analysis with a false discovery rate (FDR) of 0.05 and a log2 |fold change| greater than one as cutoff values. Next, differential IRGs were ltered out of all differentially expressed genes to remove those that were not. Univariate Cox regression analysis was performed to nd IRGs that were associated with survival time using the R software survival package.

Functional Enrichment Analysis
The functional enrichment of IRGs was studied using Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses and Gene Ontology (GO). The threshold for false discovery rate (FDR) was set to 0.05.

Investigation of the Molecular Mechanisms Governing the Function of Immune Genes
It is very well established that transcription factors (TFs) play a critical role in determining the extent of gene expression. Correlation analysis was used to determine the relationship between prognosisassociated IRGs and differential TFs, and 318 TFs were retrieved from the Cistrome Cancer database (http://www.cistrome.org/). The corFilter was set to 0.4 and the P-value was 0.001 for the analysis. Additionally, Cytoscape programme version 3.7.2 was used to obtain the regulatory network for these IRGs and TFs.

IRGs-Based Prognostic Risk Model Construction and Validation
Multivariate Cox regression analysis was used to analyse the IRGs, as well as the weighted score was used to calculate the risk score on every respective patient. The predictive model's score was calculated as follows: (exprIRG1coef1) + (exprIRG2coef2) +... + (exprIRGncoefn). LUAD patients were classi ed into a low-risk and even a high-risk group using the median risk score as the cut-off number. Following a cutoff point of the median risk score, individuals with LUAD were divided into two groups: those at low risk and those at high risk. The R packages "survival" was used to perform Kaplan-Meier analysis and "survminer" was used to visualise the survival curves. Additionally, a receiver operating characteristic curve called ROC curve was produced and the area under the curve (AUC) was determined to determine the predictive potential of this risk model. The "survivalROC" R package was used to perform these calculations. Following the trend of risk scores, the survival status of patients was determined, as well as the expression of the IRGs involved in this model. Additionally, multivariate and univariate studies were conducted to see whether the risk score could be used independently as a prognostic predictor.
Furthermore, the link between IRG expression and other clinicopathological variables was explored.

Immune Cells In ltration and Correlation Analysis
LUAD patients were evaluated for the prevalence of tumor-in ltrating immune cells using the CIBERSORT method. Each sample with abundance ratio matrix of 22 immune cell types was calculated using the gene expression data, which include different types of T cell (T follicular helper cells, naïve CD4 T cells, activated memory CD4 T cells, resting memory CD4 T cells, CD8 T cells, γδ T cells, and Tregs), neutrophils, macrophages (M0, M1, M2), memory B cells, resting dendritic cells, activated DCs, naïve B cells, resting natural killer cells, activated NK cells, plasma cells, resting mast cells, eosinophils, activated mast cells, resting dendritic cells, and monocytes (10). The CIBERSORT results for samples with a P 0.05 suggested that the inferred fractions of immune cell populations were accurate and suitable for further study.

Statistical Analysis
All of the analyses were carried out with the help of the R programming language (version 3.6.2). The Wilcox test was used to determine whether or not there was any difference in gene expression between the TCGA cohorts. Cox regression was used to conduct univariate and multivariate analyses. The log-rank test was used to conduct the survival analysis. Pearson correlation analysis was used to determine the correlation. Student's t-tests were used to determine differences in clinicopathologic characteristics. In all statistical tests, P values less than 0.05 were considered signi cant.

Screening and Identi cation of IRGs
The TCGA data portal was used to obtain transcriptome RNA sequencing data for LUAD samples (https://cancergenome.nih.gov/). It was obtained using the HTSeq-FPKM method the expression data for 535 primary LUAD and 59 non-tumor tissues. Comparing tumour and non-tumorous tissues' gene expression using LUAD transcriptome RNA data revealed 6778 differentially expressed genes, of which 5176 were up-regulated in tumours and 1600 were down-regulated in tumors ( Figures 1A and 1C). From this list of genes, we were able to identify 505 deferentially expressed IRGs, with 338 of them being upregulated and 167 being down-regulated ( Figures 1B and 1D). Due to the fact that survival duration and survival status are the primary clinical outcomes for patients, we concentrated our efforts on identifying molecular biomarkers that could serve as effective prognostic indications. Using univariate Cox regression analysis, 54 prognosis-associated IRGs (p-IRGs) were identi ed by merging the differential immune gene matrix and the clinical outcome matrix ( Figure 1E).

Function Enrichment Analysis
The Gene Ontology (GO) is now divided into three categories, which are as follows: Molecular Function (MF), Biological Process (BP), and Cellular Component (CC) (CC). On 505 differentially expressed IRGs, we applied functional enrichment analysis and discovered 1,263 GO keywords and 45 signi cant KEGG pathways. The analysis of gene functional enrichment revealed that the in ammatory pathways were the ones that were most frequently implicated. It was cellular components, biological processes, and molecular activities that were the most frequently mentioned in biological terms, with "humoral immune response", "immunoglobulin complex," and "signalling receptor activity" being the most frequently mentioned (Figure 2A). When it comes to the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, differentially expressed IRGs were shown to be the most commonly enriched in cytokinecytokine receptor interactions ( Figure 2B). Following that, we ran a similar enrichment study on p-IRGs and discovered that the immune microenvironment was a signi cant factor ( Figure 2C). Additionally, the most often discovered KEGG pathways were the Cytokine-cytokine receptor interaction, the Chemokine signalling route, and the Ras signalling pathway ( Figure 2D).

Association Between Differential IRGs and Transcription Factors
Transcription factors (TFs) are responsible for the regulation of gene expression. So we investigated putative molecular mechanisms that could explain the clinical importance of IRGs and TFs retrieved from databases. After following the analysis of 318 transcription factors (TFs), we discovered that 70% were differently expressed between the LUAD and the non-tumor lung samples (Figures 3A, B). According to the results of the correlation study between TFs and p-IRGs, it can be determined that 29 TFs are involved, which includes TTF2, TCF21, SOX9, RXRG, PRKDC, PDX1, NCAP, MYH11, MYBL2, LMO, LMNB1, KLF, IRF4, H2AFX, FOXP3, FOXM1, FOS, FLI1, EZH2, ETV1, ETS1, ERG, EPAS1, E2F7, E2F1, CENPA, CBX7, CBX3, and BRCA1 were found to be mutually mediated with 26 immunological genes. Following that, we constructed a regulatory network based on the foregoing ndings, which we then illustrated using the Cytoscape software programme ( Figure 3C).

IRGs Risk Model Construction and Validation for LUAD Patients
The 54 p-IRGs identi ed by merging the differential immune gene matrix and the clinical outcome matrix are not the typical driver genes for LUAD, so we hypothesised that aberrant expression of these IRGs would be a predictive factor.
We then entered 54 p-IRGs into a multivariate Cox regression analysis which acquired 13 risk-related immunological genes (r-IRGs) and their coe cients that were used in the creation of the risk model, which was then used in the formulation of the risk model (Table 1).   TABLE 1 The names and coe cients of risk immune-related genes (r-IRGS) in the risk model. The 13 r-IRGs involved VIPR1, FURIN, RBP2, FGF2, SLC11A1, SEMA4B, BDNF, IL11, TNFRSF11A, ANGPTL4, INHA, S100A16, CRABP1. In the model, the risk score was generated using the coe cients expression values. We refer to this risk score as the "immune-based prognostic index" in this section (IPI). On the basis of the median value of IPI, patients were divided into two groups: those at high risk of developing cancer and those at low risk. Figures 4A-C shows the survival status of the discovery cohort (TCGA) as well as the expression of r-IRGs. Based on the median risk score, patients with LUAD in different cohorts were categorised into high-and low-risk groups, respectively. Because of the statistically substantial difference between the survival curves of the high risk and low risk groups reported in both discovery (TCGA) and validation (GSE31210) cohorts, IPI has been shown to be a good predictor of mortality ( Figure 5A, Figure 5B). The dependability of this risk model was validated using ROC analysis, which we performed. The area under the receiver operating characteristic (ROC) curve for the discovery cohort (TCGA) and validation cohort (GSE31210) was 0.747 and 0.688, respectively, for the two cohorts ( Figure 5C, Figure 5D).
As a result of these ndings, IPI has a moderate potential for developing a prognostic signature based on IRGs in the context of survival monitoring. An further study was carried out by the prognostic model, which was a univariate COX regression analysis. It was demonstrated that the clinical stage, the T stage, the N stage, and the risk score were all strongly connected with the prognosis of LUAD patients (P < 0.001) ( Figure 6A). Once this was completed, a multivariate COX regression analysis was conducted on both clinical characteristics and the risk score. Only the clinical stage and the risk score were shown to be signi cantly associated with the prognosis of LUAD patients (P 0.05). ( Figure 6B). Finally, IPI has the potential to become a standalone predictor.

Clinical utility of prognostic signature
In order to gain a better understanding of the clinical signi cance of r-IRGs and IPI, we investigated the link between them and clinical pathology, including age, gender, clinical stage, regional lymph node metastasis (N), tumour burden (T), and distant metastases (M). ANGPTL4 was positively correlated with advanced T-stage cases ( Figure 7A, P = 0.014). FURIN was higher expressed in male patients ( Figure 7B) and positively correlated with lymph node metastases ( Figure 7C

IRI and Immune Cell In ltration Correlation
For the purpose of determining the potential of IRI to forecast the condition of the immune microenvironment, the CIBERSORT algorithm was employed to evaluate the quantity of tumor-in ltrating immune cells in the LUAD patients' immune microenvironment. In our study, we discovered that risk score was signi cantly positively correlated to NK cell resting levels (Figures 8).

Discussion
In ltrates of lymphocytes are widespread in lung adenocarcinoma (LUAD), the most common kind of non-small-cell lung cancer. This shows that the immune system is actively involved in the formation and progression of the cancer (5). The identi cation of many important gene makers associated with survival prognosis, as proven by existing research, is critical, particularly for cancers with substantial tumour heterogeneity and poor prognosis. In addition to the present histopathological staging system, "Immunoscore" is a quantitative forecasting tool based on tumour immune contexture that is currently being tested in clinical trials in a variety of cancer types as a supplement (11,12).
As a result, in this retrospective investigation, we developed a 13-gene survival risk predictive model connected to the overall survival of LUAD patients. By using a multivariate logistic regression analysis, researchers were able to determine the relationships between these prognosis-related genes and clinical survival. In addition, the capacity of the risk model to predict the condition of the immunological microenvironment was explored. We rst assessed LUAD immune in ltration using the TCGA LUAD cohort and then identi ed the IRGs that were differentially expressed between the LUAD and nearby nontumor LUAD samples in the LUAD. Using massive RNA sequencing and weighted immune-related gene expression, it is possible to determine the health of tumour cells. According to the KEGG analysis, the most signi cant pathway was found to be enriched in the process of Cytokine-cytokine receptor interaction. GO terms that were signi cantly enriched in terms of biological process and molecular function included terms that were highly connected to receptor-ligand action. Exactly as predicted, the functional enrichment analysis revealed that these genes were largely involved in signalling pathways that were relevant to the immune system. Furthermore, these enhanced signalling pathways are typically implicated in tumorigenesis-related processes such as tumour growth, invasion, and metastasis, implying that immune-related genes may be useful in predicting clinical outcomes. The TCGA LUAD cohort was utilised to develop a prognostic gene signature, which was then used to predict the survival outcomes of LUAD patients based on their gene expression pro les. A greater risk score was found to be associated with a poorer overall survival rate.
All of the genes associated with risk model construction are found in S100A16, CRABP1 RBP2, FURIN, FGF2, SLC11A1, SEMA4B, BNDF IL11, INHA, ANGPTL4, TNFRSF11A, and VIPR1 (see list below).The S100 protein family of genes is one of those that plays a critical role in the development and progression of tumours at various stages. A large number of human pathophysiological processes, including adipogenesis, in ammation, tumour progression, and osteoporosis, have been linked to the S100 gene family. S100A16, the S100 family's latest member has been linked to several human pathophysiological conditions, including tumour progression, in ammation, adipogenic differentiation, and osteoporosis (13)(14)(15). When it comes to lung cancer, high S100A16 expression has been demonstrated to be signi cantly associated with a bad prognosis (16-18). SEMA4B is a member of the semaphorin protein family, which is responsible for largely regulating cell motility and adhesion. In vitro studies have demonstrated that SEMA4B inhibits the invasion of non-small cell lung cancer via inhibiting the PI3K/Ak signalling pathway (19). IL-11 is a member of glycoprotein (GP) 130 cytokine family that also includes IL-6 and IL-27 (20). These cytokines share a receptor component GP130 and can activate the Janus kinase (JAK)/STAT3 pathway (21)(22)(23)(24). In the past, the binding of interleukin-11 (IL-11) to its transmembrane coreceptor, IL-11R, and the subsequent signalling cascade have been related with platelet maturation, neurogenesis, osteoclastogenesis, and adipogenesis, among other things (25,26). Both in mouse models and in humans, interleukin-11 (IL-11) has been linked to a variety of in ammation-associated malignancies, mostly because of its potential to activate the JAK-STAT3 signalling pathway. (27). This protein, also known as angiopoietin-like 4 (ANGPTL4), is a member of the angiopoietin superfamily of secreted proteins that are structurally linked to factors that modulate angiogenesis. ANGPTL4 has been found in a number of solid tumours, indicating that it may play an essential role in cancer growth and progression, altered redox regulation, anoikis resistance, metastasis and angiogenesis (28-30). Hypoxia conditions, which are a signi cant characteristic of the tumour microenvironment, may serve as a link between ANGPTL4 and carcinogenesis in one instance. Indeed, hypoxia-inducible factor-1 (HIF-1) promotes the overexpression of cyclooxygenase-2 (COX-2) in response to hypoxia (HIF-1) (31-33), Prostanoids, particularly prostaglandins PGE2, are produced as a result of this process. Increased levels of PGE2 activate an intracellular signalling cascade, which results in the stimulation of the ANGPTL4 gene expression in the cells (34). A type I homotrimeric transmembrane protein, tumour necrosis factor receptor superfamily member 11a (TNFRSF11a), shares the highest amount of homology with the CD40 antigen (35). In addition to the lung, skeletal muscle, brain, heart, kidney, skin and liver, TNFRSF11a is found in high concentrations in certain malignancies. Following an earlier work in mice (35)(36)(37), it was discovered that TNFRSF11a-mediated intracellular signalling is required for mammary gland development through controlling the expansion of the stem and progenitor cell compartments, as well as some cancers. In a recent study in mice, it was discovered that TNFRSF11a-mediated intracellular signalling is required for mammary gland development via regulating the expansion of the stem and progenitor cell compartments (38). TNFRSF11a overexpression in mice, on the other hand, resulted in aberrant proliferation and poor differentiation, resulting in an increase in the incidence of cancer (39). The vasoactive intestinal peptide (VIP), also known as PACAP, is a critical neuropeptide that regulates lung physiology. It primarily acts via the VAPC1 and VAPC2 receptor subtypes. VIP is an autocrine lung cancer growth factor whose receptor is expressed in a large number of lung cancer cell lines. Several studies have shown that VIPR1 inhibits the growth of several cancers, including prostate cancer, lymphoblastoma, and medulloblastoma, implying that VIPR1 may signi cantly inhibit the growth and development of cancer cells. These ndings shed light on the role of VIPR1 expression in regulating lung adenocarcinoma's biological behaviour (40). The remainder of the LUAD genes' function and regulation mechanism are largely unclear.
In summary, we developed an immunological signature based on IRGs for the purpose of predicting prognosis in patients with LUAD. This model has the potential to be a valuable tool for differentiating  Immune-related genes with differential expression. A heatmap (A) and volcano plot (B) of genes that are differently expressed between lung adenocarcinomas (LUAD) and surrounding non-tumor tissues. The heatmap (C) and volcano plot (D) depict differential expression of immune-related genes (IRGs). The black dots indicate undifferentiated genes, whereas the red and green spots indicate differentiated genes.
(E) Forest plot of hazard ratios illustrating immune-related gene prognostic values.