Development of a Prognostic Model Based on Differentially Expressed Immune Genes of Lung Adenocarcinoma

Due to the enormous heterogeneity and molecular complexity, the ecacy of existing lung adenocarcinoma risk prediction models were less than satisfactory. In this study, we downloaded the immune-related genes were from InnateDB, and the differentially expressed immune genes (DEIGs) were analyzed with edgeR and DESeq2 algorithm. In total, 1359 DEIGs were identied. Cytoscape was employed to build a mRNA-miRNA-lncRNA network which was consists of 8 lncRNAs, 7 miRNAs and 117 DEIGs. Functional enrichment analysis indicated that the 117 DEIGs were involved in immune and inammatory responses and actively involved in a MAPK signaling pathway. A prognostic signature based on ten DEIGs including ASPH, CAV1, FKBP4, GRIK2, FURIN, SLC6A8, FSCN1, CKAP4, HAPLN2 and IL22RA2 which performed well was correlated with tumor burden, tumor stage and metastasis. The similar result was obtained in the validation dataset GSE72094 (p<0.0001). The prognostic index reected inltration by B cell, CD4+T cell, CD8+T cell and dendritic cell. We also found that ASPH and FSCN1 were over-expression in A549, H1299, PC-9 cells, and the positive expression of ASPH, FSCN1, MS4A1 and CD40LG had a correlation with the TNM stage, cellular differentiation and the lymphnode metastasis(p<0.05). High levels of ASPH and FSCN1, low levels of MS4A1 and CD40LG expression are all associated with poor overall survival in LUAD (p<0.05). In conclusion, our results identied DEIGs of clinical signicance, veried the effect of the DEIGs-based, personalized prediction model for lung adenocarcinoma prognosis. Together, our results revealed that ASPH and FSCN1 could be a prognostic marker for lung adenocarcinoma. no ceRNA the endpoint. The risk prediction model of 10 immune genes has not been studied before. The prognostic model based on ten DEIGs which was proved useful in prognostic predictions, and correlated with tumor burden, tumor stage, and metastasis. Several studies have shown that ASPH is highly expressed in the tumor tissues of patients with small-cell lung cancer, colon cancer, malignant glioma, hepatocellular carcinoma, cholangiocarcinoma and pancreatic cancer, and can affect the migration ability of tumor cells and disease progression (54). It is reported that ASPH can induce antigen-specic activation of CD4+ and CD8+T cells in human peripheral blood monocytes(55). Kanlikilicer et al. found that exosomal miRNA confers chemo-resistance via targeting CAV1/p-gp/M2-type macrophage axis in ovarian cancer(56). Recent research revealed that FKBP4 is a member of the immunophilin protein family, which plays a role in immunoregulation and basic cellular processes involving protein folding and tracking associated with HSP90(57, 58). Meanwhile, there is plenty of evidence that GRIK2 and CKAP4 has close connection with the initiation, progression and prognosis of various malignancies including lung cancer(59–61). Recent studies have conrmed that FSCN1 is up-regulated in a variety of tumors, and can be used as a tumor marker in Hodgkin's lymphoma and Epstein Barr virus-associated lymphoma (62), as a predictor of prognosis and recurrence time of pancreatic ductal adenocarcinoma, and can promote immune escape of tumor cells(63, 64). These results corroberated the prognostic value of our model. The remaining genes merit further study to explore their potential as prognostic biomarkers for LUAD patients. indicator, but also as an immune status indicator. Lots of studies demonstrated that the tumor-inltrating immune cells play an important role in regulating tumor growth, metastasis and angiogenesis of lung cancer. TIL-Bs plays a bidirectional regulating role. Wang SS et al. showed that high-density TIL-Bs was closely related to the good prognosis of lung cancer patients (65). Kinoshita T et al. showed that high expression levels of TIL-Bs, CD4+T cells and CD8+T cells in the tumor microenvironment of NSCLC patients predicted long disease-free survival (66).These reports are in line with our results. We explored the relationships between the risk score of our model and immune cell inltration to investigate the tumor–immune interactions. Our analysis indicated that the risk score of our model was inversely related to the inltration of B cell, CD4+ T cell and dendritic cell. The high-risk patients had lower inltration levels of B cell, CD4+ T cell and dendritic cell. Thus, our model predicted risk score could serve as an indicator for immune inltration status. We have also validated the expression of ASPH, FSCN1, MS4A1 and CD40LG in lung cancer cells and tissues, and revealed that the expression of ASPH, FSCN1, MS4A1 and CD40LG are independent prognostic factors. Furthermore, high levels of ASPH and FSCN1 and low levels of MS4A1 and CD40LG expression are associated with poor overall survival in lung adenocarcinoma. These results suggest that our model can be used as a reliable risk-prediction tools in patients with LUAD. the of DEIGs


Introduction
Lung cancer is the most common cause of cancer deaths globally. There were 2.2 million incident cases of lung cancer and 1.9 million deaths in 2017. From 2007 to 2017, lung cancer cases increased by 37% (1,2), in which adenocarcinoma accounts for 40% of all cases, and its incidence rate is on the rise (3). Nearly 33%-52% of patients die within ve years even with complete surgical resection (4). It remains a huge burden on the healthcare system and a signi cant unmet medical need. In recent years, tumor immunotherapy has become a very important therapeutic strategy for many types of cancer (5)(6)(7). Based on a number of clinical studies, immunotherapy, such as blockade of immune checkpoint therapy, has been approved by FDA for using in lung cancer recently (8-10). However, only a small patient population can bene t from immunotherapy, limiting its clinical application. Therefore, it is important to nd biomarkers that can reliably predict the prognosis and patient survival.
Extensive research show that immune microenvironment and immunogenomics play a key role in the development of tumor (11), the prognostic impact of the immune microenvironment in lung cancer has also been demonstrated (12). While these results analysed the signi cance of immunology in lung adenocarcinoma, the molecular mechanisms still remain unclear, particularly with regards to immunogenomic effects. Although previous studies have reported survival strati cation based on the immune-related gene expression in patients with lung adenocarcinoma (13,14), deeper veri cation is still lacking.
In this study, we combined multiple datasets from TCGA to develop and validate an individualized prognostic signature for LUAD. We systematically analyzed the expression and prognostic value of DEIGs and built an effective prognostic index for LUAD. Our aim is to delve deep into the clinical application value of DEIGs and their potential as biomarkers on prognostic strati cation. Our results could improve the estimation of lung adenocarcinoma prognosis and offer a foundation for further immune-related research.

Materials And Methods
All methods were carried out in accordance with relevant guidelines and regulations. The studies involving human participants were reviewed and approved by Ethics Committee of Tumor Hospital of Shaanxi Province. Written informed consent for participation was waived for this study by Ethics Committee of Tumor Hospital of Shaanxi Province in accordance with the national legislation and the institutional requirements.

PPI network construction and hub-genes identi cation
The PPI network was built with genes in the ceRNA network with STRING (v11.0) (25). Afterwards, the edge le was downloaded and imported into Cytoscape.
And hub-genes were clustered by MCODE App in Cytoscape, with 3 as the "Degree cutoff".

Construction of immune signatures
We used the 117 differentially expressed mRNA preserved in the nal ceRNA network as input for model construction. The total TCGA LUAD patient cohort was randomly sampled to obtain 70% patients of training set and 30% patients of test set. The GSE72094 queue serves as the validation data set. Then the prediction model was built on the most frequent gene set with effective coe cients in the lasso regression using R package 'glmnet' (26) for 1000 iterations on the training dataset. The function "cv.glmnet" was employed to perform 10-fold cross-validation of glmnet. The risk score was de ned as the sum of the normalized expression of genes multiplied by their coe cients in the gene set. Receiver operating characteristic curve (ROC) was used to evaluate the cutoff of risk scores as a predicting factor for the survival of LUAD patients at ve years prior to death. After dividing the patient into two groups according to the risk score, 'Survminer' was employed for survival analysis for both training and testing data. Statistical analysis was performed by R (3.6.1). Heatmaps were generated using R package pheatmap (1.0.12).

Univariate and Multivariate cox regression
Cox proportional hazards modeling was applied to investigate the relationship between predictor variables and survival time. The correlations among predictor variables were adjusted in multivariate cox regression. The categorical variables were transformed into numeric ones before cox regression. For example, Stage I converted to "1", Stage II converted to "2", Stage III converted to "3", and Stage IV converted to "4". The Staging variable was then treated as a numeric variable in the regression analysis.

Cell culture
The Human Bronchial Epithelioid Cells, BEAS-2B and human NSCLC cell lines, H1299, PC9, A549, were purchased from the American Type Culture Collection (ATCC; Manassas, VA, USA). The cells were cultured in RPMI-1640 or DMEM medium (Hyclone, UT, USA) supplemented with 10% fetal bovine serum (FBS; Transgenomic Co., Shanghai, China) at 37°C in a humidi ed incubator containing 5% CO2 in air. Cells were replenished daily with fresh medium and were harvested by trypsinization and split at a 1:3 ratio with fresh medium every 2 d (27, 28).

Western blot analysis
After drug treatment, cells were harvested in ice-cold PBS and lysed with radioimmunoprecipitation assay buffer con-taining protease and phosphatase inhibitors, then centrifuged (12,000 rpm, 20 minutes) at 4°C. The protein supernatant was collected and its concentration determined using BCA protein assay reagent (both from Wolsen Company, Xi'an, People's Republic of China). Each protein sample was subjected to sodium dodecyl sulfate polyacrylamide gel electrophoresis (8-12%), then transferred onto polyvinylidene di uoride membranes (EMD Millipore, Billerica, MA, USA), blocked with 10% non-fat milk in 0.05% Tris-based saline-Tween 20 for 2 hours at RT. Membranes were labeled with primary antibodies against ASPH, FSCN1 (1:1000, Santa Cruz Biotechnology), and anti-GAPDH (1:1000, Santa Cruz Biotechnology) overnight at 4°C (29). Subsequently, blots were washed with Tris-based saline-Tween 20 for 30-40 minutes, then further probed with secondary antibody at RT for 1 hour and subjected to reaction with enhanced chemiluminescence solution and visualized using ChemiDoc XRS+ system (BioRad Laboratories Inc., Hercules, CA, USA). For quanti cation of speci c protein bands, lms were scanned and analyzed using Labworks software (Mission Viejo, CA, USA) (30,31).

Immunohistochemistry
A total of 60 patients of LUAD getting surgery with no prior chemotherapy or radiotherapy at Tumor Hospital of Shaanxi Province were recruited in the study between January 2014 and December 2015. A mouse polyclonal anti-ASPH and anti-CD40LG antibody at a dilution of 1:50, anti-FSCN1 and anti-MS4A1 antibody at a dilution of 1:100 (all from Santa Cruz Biotechnology, INC), which were speci c for immunohistochemistry. PBS was used to displace the primary antibody as the negative control. The Histological diagnosis was performed by 3 independent, experienced pathologists for all the cases. The Immunohistochemistry (IHC) technique was performed as previously described (34). 5 µm-thick sections were cut from the human lung adenocarcinoma tissue and xed in 10% buffered formalin overnight and para n-embedded. The slides were depara nized and rehydrated in graded alcohols, followed by antigen retrieval in a microwave oven. Slides were blocked with 10% normal goat serum for 20 minutes at 37°C to reduce nonspeci c binding. The slides were incubated overnight at 4°C (35). After being washed, Horseradish peroxidase (HRP) conjugated goat anti-rabbit IgG was used as secondary antibody, and then visualized with 3,3′-diaminobenzidine (DAB) solution. Finally, hematoxylin was used to counterstain the section. The percentage of positive cells was classi ed into 5 score ranges: <10% (0),10-25% (1), 25-50% (2), 50-75% (3), and >75% (4). The intensity of staining was divided into 4 groups: no staining (0), light brown (1), brown (2), and dark brown (3). The staining positivity was determined using immunoreactivity score (IRS) which is the product of intensity score and quantity score. An overall score of >6 as strong positive, >3 as weak positive, and ≤3 was de ned as negative.

Results
Identi cation of differentially expressed DEIGs TCGA LUAD dataset legacy-archive (hg19) was downloaded, including four datasets: RNA count, miRNA count, somatic mutation and somatic CNV (Table   S1). Gene expression data of "Primary solid Tumor" and "Solid Tissue Normal" samples were accessed by R package "TCGA biolinks".The DEIGs were identi ed by obtaining the overlapping subset of a list of immune-related genes from InnateDB and the list of different expressed genes. A total of 7477 immune-related genes were downloaded from InnateDB. The differentially expressed gene analysis was performed by edgeR and DESeq2, and only DEIGs detected by both methods were included. A total of 1359 differentially expressed genes were identi ed, 713 up-regulated and 646 down-regulated ( Table 1). As expected, functional enrichment analysis indicated that the cytokine-cytokine receptor interactions and the in ammatory pathways were among the top terms  Construction of mRNA-miRNA-lncRNA Network Cytoscape was employed to build a mRNA-miRNA-lncRNA network using differentially expressed lncRNA, miRNA and immune-related target mRNA ( Figure 2). In total, the ceRNA network consisted of 8 lncRNA (AGAP11, CASC2, GAS5, MIAT, PVT1, SNHG1, SNHG12, SNHG3),7 miRNA(hsa-mir-1276, hsa-mir-133b, hsamir-137, hsa-mir-451a, hsa-mir-543, hsa-mir-551a, hsa-mir-577) and 117 target mRNA. (Table 2)  Characteristics of target genes As expected, the DEIGs were mainly involved in immune and in ammatory responses by using gene functional enrichment analysis. The most frequent biological terms among biological processes, cellular components, and molecular functions were "myeloid cell differentiation," "cytoplasmic region" and "protein heterodimerization activity"( Figure 3A). For KEGG enrichment, MAPK signaling pathway was the most often enriched by differentially expressed target genes ( Figure 3B). The 117 target genes were used to construct a protein-protein interaction network. Two subnetworks were extracted via MCODE algorithms.
The rst MCODE network had six proteins, SH3GL2, TFRC, SYT1, FZD4 and REPS2. The second MCODE network had ve components, PDIA6, CDH2, CKAP4, CP and SPP1( Figure S1). The mutations of those 117 target genes were investigated in 569 tumors. The top 50 genes were visualized using OncoPrint. Different mutation type was indicated by different color, and the missense mutation is the most common type ( Figure S2).

Evaluation of clinical outcomes
The optimized model consisted of the following genes: ASPH, CAV1, FKBP4, GRIK2, FURIN, SLC6A8, FSCN1, CKAP4, HAPLN2, IL22RA2, which could serve as prognostic marker for LUAD patients. The area under curve of the receiver operating characteristic (ROC) curve was 0.699 for 3years, 0.627 for 5 years, 0.681 for 10years, indicating the prognostic model based on DEIGs has de nite potential in survival monitoring. The heatmaps showed that the gene expression pro les were different between the cases of the high and low risk score groups (Figure 4). Univariate Cox regression analysis suggested that the prognostic signature, age, tumor stage, pathologic stage and metastasis status are all associated with prognosis( Table 3).The prognostic model based on DEIGs was identi ed as an independent predictor using multivariate cox regression analysis after the adjustment of other parameters( Figure 5). We validated our model and selected genes using two additional datasets from GEO database. We obtained signi cant results in GSE72094( Figure S3).

Correlation between Prognostic Signature and Immune In ltration
We analyzed the relationship between model predicted risk score and immune cell in ltration to verify the function of immune related genes in tumor immune microenvironment. As shown in the gure 6, to some extent, the risk score of our model is inversely  Table 4 showed the characteristics of both groups. We found that the positive expression of ASPH, FSCN1, MS4A1 and CD40LG had a correlation with the TNM stage, cellular differentiation and the lymph node metastasis (p<0.05). But were not associated with the age and sex(p>0.05  (Figure 9).

Discussion
Lung cancer is still one of the deadliest cancers worldwide, in which adenocarcinoma is the most common histological subtype. Chemotherapy, radiotherapy, targeted therapy and immunotherapy are the most common therapeutic methods for lung adenocarcinoma (36), due to its molecular complexity and cellular heterogeneity, the effect of the treatment is unsatisfactory. And the shortage of effective prognostic biomarkers to guide cancer therapy is one of the reasons for the poor prognosis. Several studies con rmed that the interaction between the tumor and its immune microenvironment is crucial in the development and progression of the tumor. The emergence of cancer cells often occurs in densely in ltrated in ammatory environments (37), and the immunoscore has been de ned to quantify the in situ immune in ltrate by Galon J et al (38).
Due to the close relationship between tumor microenvironment and immunotherapy, it is necessary to nd the immune-related genes related to the prognosis of LUAD patients and construct the relevant immune model, which can provide guidance for the immunotherapy of LUAD. Our study is an integrated bioinformatic analysis of publicly available datasets, underscoring the potential to reveal molecular characteristics and clinical signi cance by analyzing of DEIGs in LUAD. What's more, an individualized immune prognostic model based on DEIGs was developed for LUAD.The risk score calculated by this model independently predicted the outcome of patients with LUAD, and we also evaluated the in ltration of immune cells by exploring the relationship between the model and immune cells. These results indicate that the immune-related risk prediction model constructed by us is of great signi cance for predicting the prognosis of LUAD patients and may provide new immunotherapeutic targets.
Although the signi cance of DEIGs in tumor progression and immunotherapeutics has been con rmed already (39)(40)(41), the molecular mechanisms and clinical signi cance are unclear. We found several DEIGs signi cantly involved in the tumorigenesis and development of LUAD and built an immune-related ceRNA network which may perform a variety of biological functions and in uence immune status in the human body and participate in the occurrence and development of various tumors (42). The indepth exploration of their molecular mechanisms was analyzed by bioinformatics. We found that the MAPK (mitogen-activated protein kinases) signaling pathway was the top enriched pathway from the 117 target genes. MAPKs are intracellular class of serine/threonine protein kinases (43), and lung adenocarcinoma has been associated with mutations that activate MAPK pathway in previous studies (44,45). As expected, functional enrichment analysis indicated that the DEIGs were mainly involved in immune and in ammatory responses. Furthermore, a PPI network was constructed for the 117 target genes, among which, 10 hub genes were identi ed by cytoscape. Among these hub genes, ve of them have been reported, including FZD4, PDIA6, CDH2, CKAP4 and SPP1, a possible connection between SH3GL2 and CP and LUAD has also been reported (46, 47), but mechanism is unclear. Until now, the function and mechanism of SYT1, TFRC or REPS2 in LUAD have not been reported (48-51). The network will be useful to provide information for studying its function and molecular mechanism in the future.
The 117 ceRNA network target genes were used in the modeling process. Although previous researchers have also modeling for LUAD patients' survival prediction (14,52,53), no study ever proposed a signature that chose the DEIGs which is also the ceRNA network target genes as the endpoint. The risk prediction model of 10 immune genes has not been studied before. The prognostic model based on ten DEIGs which was proved useful in prognostic predictions, and correlated with tumor burden, tumor stage, and metastasis. Several studies have shown that ASPH is highly expressed in the tumor tissues of patients with small-cell lung cancer, colon cancer, malignant glioma, hepatocellular carcinoma, cholangiocarcinoma and pancreatic cancer, and can affect the migration ability of tumor cells and disease progression (54). It is reported that ASPH can induce antigen-speci c activation of CD4+ and CD8+T cells in human peripheral blood monocytes (55). Pinar Kanlikilicer et al. found that exosomal miRNA confers chemo-resistance via targeting CAV1/p-gp/M2-type macrophage axis in ovarian cancer(56). Recent research revealed that FKBP4 is a member of the immunophilin protein family, which plays a role in immunoregulation and basic cellular processes involving protein folding and tra cking associated with HSP90 (57,58). Meanwhile, there is plenty of evidence that GRIK2 and CKAP4 has close connection with the initiation, progression and prognosis of various malignancies including lung cancer (59)(60)(61). Recent studies have con rmed that FSCN1 is up-regulated in a variety of tumors, and can be used as a tumor marker in Hodgkin's lymphoma and Epstein Barr virusassociated lymphoma (62), as a predictor of prognosis and recurrence time of pancreatic ductal adenocarcinoma, and can promote immune escape of tumor cells (63, 64). These results corroberated the prognostic value of our model. The remaining genes merit further study to explore their potential as prognostic biomarkers for LUAD patients. In conclusion, in this study, we successfully constructed a risk prediction model of 10 immune genes, and systematically analyzed their prognostic value in LUAD. The model constructed in this study is an independent prognostic factor for survival in patients with LUAD. Our preliminary results give us a possible hint to discuss the relationship between immune related genes and the prognosis of LUAD. Through further biological function analysis, we can understand the function of these DEIGs in the occurrence and development of LUAD, thus providing a new idea for the treatment of LUAD. However, there were also some limitations of our studies, rendering further studies into the mechanisms of DEIGs modulated LUAD progression necessary.

Declarations
Availability of data and materials All data generated or analyzed during this study are included in this published article and its Supplementary Information les. The datasets generated and/or analyzed during the current study are available in the TCGA repository(https://portal.gdc.cancer.gov/). The immune-related genes were derived from InnateDB (https://www.innatedb.com/). While the estimated in ltration abundance of immune cells of LUAD samples were obtained by TIMER (https://cistrome.shinyapps.io/timer/).  Relationships between the risk score and estimated in ltration abundances of immune cells.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. supplement.pdf