Identication of molecular markers associated with the progression and prognosis of lung adenocarcinoma:a bioinformatic study

Objective: For the identication of genes of prognostic signicance related to tumor microenvironment (TME) in lung adenocarcinoma. Methods and Materials: Transcriptome data and clinical data of lung adenocarcinoma originated from the Cancer Genome Atlas (TCGA) database. Immune scores and stromal scores were calculated by “Estimation of Stromal and Immune cells in Malignant Tumors using Expression data” algorithm. Based on these calculated scores, the samples were classied as high and low score groups. The average score of respective gene in different groups was calculated. A heat map was created to screen out genes exhibiting differential expressions. The interaction of up-regulated differentially expressed genes (DEGs) and down-regulated DEGs was harvested from a Venn diagram and then covered in the overlapping genes. The core genes inuencing prognosis of lung adenocarcinoma were screened out by function enrichment analysis, protein-protein interaction network analysis and Kaplan-Meier (K-M) method on the overlapping genes. Results: A total of 515 samples of lung adenocarcinoma were harvested from TCGA database. As revealed from the results, a high immune score was related to a high survival rate, while the matrix did not show signicant relationships to survival rate. A total of 775 DEGs and 367 overlapping genes were harvested. The functions of these overlapping genes were tightly correlated with DEGs and immune response and were noticeably improved in cytokine-cytokine receptor interaction and chemokine signaling pathway. Eight genes, namely, CCR5, CCR2, CCL14, GNRH2, PKHD1L1, MS4A1, FCER2 and FDCSP, were correlated with prognosis of lung adenocarcinoma. Conclusion: The genes and pathways affecting prognosis of lung adenocarcinoma were screened out, which offer ideas for subsequent study on lung adenocarcinoma.


Introduction
Cancer is affecting increasing people worldwide on a year to year basis; it has turned into a primary factor threatening human health. Besides, lung cancer heads the list of cancers in morbidity and mortality [1]. In accordance with the report of International Agency to study cancer (IARC), there are about 2.1 million lung cancer patients in the world. Lung cancer has two major types, namely, small-cell lung carcinoma (SCLC) (20%) and non-small-cell lung carcinoma (NSCLC) (80%) [2]. However, most patients with NSCLC were diagnosed in late stage, and the 5-year survival rate was less than 15% [3]. Though an increasing number of treatments have been used in clinic, drug resistance and intolerance still affect the therapeutic effect of lung cancer. Therefore, for the enhancement of the prognosis of lung cancer patients, it is necessary to understand the pathogenesis of tumor and explore more treatment strategies.
Tumor microenvironment(TME) is critical to tumor development. The in ammatory cells and immunomodulatory mediators in TME may participate in tumor progression [4][5][6]and may help tumor escape with the disease progression [7,8]. For instance, tumor associated macrophages (TAMs) can present different phenotypes according to the stimulation of TME (e.g., chemokines), promoting tumor progression or speci c immune response [9][10][11]. Moreover, tumor associated broblasts (CAF) can promote immunosuppression by inducing myoid derived suppressor cell (MDSC) to generate reactive oxygen species (ROC) [12]. In terms of treatment, some studies proved that some drugs can exert high therapeutic effect by improving TME. For instance, dexamethasone Increases Cisplatin-Loaded Nanocarrier Delivery and E cacy in Metastatic Breast Cancer by Normalizing the Tumor Microenvironment [13]. The vaccine targeting Legumain builds the novel paradigm since a reduction in the density of TAMs in the TME inhibits the release of factors potentiating tumor growth and angiogenesis. As a result, the TME is remodeled, and its immunosuppressive milieu decreases; thus, potentiates the DNA vaccine's ability to effectively suppress metastasis, vascularization, and proliferation of tumor cells [14]. Though we have some knowledge of the relationship between TME and tumor, the interaction between TME and lung adenocarcinoma has been rarely known.
The Cancer Genome Atlas (TCGA) database is a database that collects cancer patients' information.
Thus far, it has collected more than 20000 patients' information, including 33 kinds of tumors. Big data has facilitated the development of all walks of life, and medicine is no exception. Yoshihara et al.
developed an algorithm [10] that can select genes that affect tumor prognosis by calculating the immune score and stromal score of TME. This algorithm is applied in hepatocellular carcinoma and leukemia [15,16], which it is proved to be feasible. In the present study, based on TCGA database and by the algorithm created by Yoshihara et al., the aim is to screen out genes affecting prognosis of lung adenocarcinoma and offer ideas for lung adenocarcinoma diagnosis and treatment.

Methods And Material
Transcriptome data and clinical data of lung adenocarcinoma originated from TCGA website(https://portal.gdc.cancer.gov/). The clinical data covered survival time, survival state, age, sex and stages. The Yoshihara's algorithm is adopted in the downloaded RNA expression data. The stromal, immune and estimate scores of cancerous tissue samples were calculated by R (3. 6. 2) estimate package, of which the survival score was the sum of stromal score and immune score.
The samples were split into high score group and low score group with the median of immune score and stromal score as the dividing line. Multivariate survival regression analysis was conducted with R package; meantime, survival time was adopted as an independent variable, and survival rate was adopted as a dependent variable (p<0.05) to determine the relationships between stromal, immunity and survival.
To investigate the relationships between sex and immune, stromal score, R software was used to carry out Wilcoxon rank-sum test (p<0.05) and a box plot was plotted.
The samples were split into high score group and low score group based on median of immune score and stromal score and Wilcoxon rank-sum test was adopted (p<0.05). The average value of respective gene in different groups was calculated. Besides, the logarithm of "average value in high score group/average value in low score group" was adopted, in which Log2FC was acquired. differentially expressed genes (DEGs) according to Log2FC>1 or Log2FC<-1 and FDR<0.05 were selected. Genes complying with Log2FC>1 and FDR<0.05 were termed as up-regulated genes and genes complying with Log2FC<-1 and FDR<0.05 were down-regulated genes. Pheatmap in R package was used to plot the heat map which can observe the expression of genes in different groups.
The interactions of up-regulated genes in stromal group and immune group were collected, as well as those and of down-regulated genes. A Venn diagram was plotted with VennDiagram package in R package.
The overlapping genes were analyzed using clusterPro ler, org.Hs.eg.db, ggplot2, stringi, DOSE tool package, gene ontology (GO) and kyoto encyclopedia of genes and genomes (KEGG)in R language (p<0.05, q<0.05). The study screened out the top 10 interactions exhibiting the most highly enriched MF, CC and BP in GO and all 28 KEGG pathways. GO expressed the functions of genes in biological process, cellular component and molecular functions, while KEGG was used for delving into the enrichment ways of genes.
This study conducted PPI analysis on DEG genes on STRING (https://string-db.org/) data platform. minimum required interaction score = 0.9. After deleting free proteins, the top 30 most connected genes were taken as core genes with R, with which, a bar diagram was plotted.
K-M survival analysis on DEGs that affected prognosis of lung adenocarcinoma with survival package. p<0.05 suggested a statistical signi cance. Genes correlated with prognosis were screened out.

Results
A total of 515 issue samples from patients were harvested from TCGA database, including 535 cancer tissue samples and 59 normal sample. Clinical information of 522 patients was obtained, covering 242 male cases and 280 female cases ( COX regression revealed that high immune score and estimate score were correlated with a high survival rate (p=0.0207 and p=0.0257, Fig. 1A and Fig. 1C). The effects in stromal score group was insigni cant (p=0.1030, Fig. 1B).
From the box plot, the stromal score, immune score and estimate score of females were all higher than those of males (p=0.1081, p=0.0360 and p=0.00362, Fig.2A, Fig.2B and Fig. 2C).
Heat map which can observe the expression of genes in different groups was plotted based on stromal score and immune score ( Fig. 3A and Fig. 3B). The pink represents the low score group, while the blue represents the high score group. Low expression is denoted in green, median expression is represented in black, while high expression is in red. According to Log2FC>1 or Log2FC<-1 and FDR<0.05,we found 683 up-regulated genes and 120 down-regulated genes in stromal score group. The immune score group included 611 up-regulated genes and 164 down-regulated genes. In Venn diagram, 300 up-regulated common genes (Fig. 4A) and 67 down-regulated common genes (Fig. 4B) were screened out.
GO enrichment analysis (Fig. 5) suggested a close relationships between DEGs and immune response. Immune response-activating cell surface receptor signaling pathway, regulation of lymphocyte proliferation, T cell activation and leukocyte proliferating process were highly enriched. In KEGG enrichment analysis, hematopoietic cell lineage,, viral protein interaction with cytokine and cytokine receptor chemokine signaling pathway and cytokine-cytokine receptor interaction were most highly upregulated. The mentioned pathways displayed close relationships to immune reaction and virus.
4. Discussion TME, composed of endothelial cells, in ammatory mediators, mesenchymal stem cells, immune cells and stromal cells, displays tight relationships to occurrence and development of a tumor [17,18]. It has been demonstrated that the immune cells in TME of lung cancer are phase-dependent [19,20], and the immune cells in different stages can act as potential biomarkers. Tumor cells were capable of modifying stromal cells in TME, while stromal cells can boost the growth of tumor cells [18,21].Stromal cells and immune cells are becoming a novel way for cancer treatment.
In this study, the estimate, stromal and immune scores of respective sample originating from TCGA database were calculated. According to the outcomes, high immune score displayed correlations with a high survival rate and longer survival time. The stromal score, immune score and estimate score of females were overall higher than those of males. The samples were split into high score group and low score group in accordance with immune score and stromal score. Genes exhibiting differential expressions were screened out. Interaction of genes was harvested and 300 up-regulated genes, 67 downregulated genes were harvested. GO enrichment analysis, PPI network and KEGG enrichment analysis were performed on DEGs and as revealed from the results, there was a close relationships between DEGs and immune response. Cytokine-cytokine receptor interaction, chemokine signaling pathway and viral protein interaction with cytokine and cytokine receptor pathway were mostly highly enriched. FPR2, C3AR1 and MCHR1 were most connected in PPI network. As suggested from the investigation of the relationships between single gene and survival, CCR5, CCR2, CCL14, GNRH2, PKHD1L1, FCER2, MS4A1 and FDCSP exhibited a close relation with survival.
CCR5 and CCR2 are chemokine receptors, pertaining to G-protein-coupled receptors (GPCRs). Nearly 40% of drugs take effect via GPCRs [22,23]. Multiple cells (e.g., B cells, natural killer cells and monocytes) can express CCR5. When chemokine ligand is combined with CCR5, they may activate immune reaction and induce migration [24,25]. Innate αβ T cells are capable of inducing CCR5-depdent immunogenic macrophage programming, activating and amplifying T cells, thereby inhibiting tumor growth [26]. CCR5 may help elevate the survival rate of tumor-in ltrating macrophages, and CCR2 can keep macrophages dominant in tumor in ltrating cells [27]. The chemotaxis experiment demonstrated proved that RUNX3 can facilitate CD8 T cell recruitment by up-regulating CCL3 and CCL20, in which CCR5 and CCR6 act as receptors [28]. In terms of treatment, anti-CCR5 therapy in colorectal cancer is capable of avoiding the use of tumor cells on immune cells [29,30]. CCR5 also can facilitate tumor metastasis and migration [31][32][33][34][35].PRDX6 boosts lung tumor development through its mediated and CCL5-related activation of the JAK2/STAT3 pathway [36]. CCR5 also can facilitate neovascularization [37,38]. For CCR2,CCL2-CCR2 can impact the recruitment of in ammatory cells as well as differentiation of effector T cells [39,40]. Moreover, it can also activate T cells and memory T cells [41]. Circulating brocytes make the preparation of the lung for cancer metastasis by adopting Ly-6C+ monocytes through CCL2 [42]. In hepatocellular carcinoma, oncogene is able to induce aging hepatocytes to secrete cytokines and attract immune cells to clear away them, as an attempt to prevent the occurrence of cancer, which is termed "senescence surveillance". The process requires to be assisted by CCR2 myelocytes, so the reduction in CCR2 will facilitate the progression of [43]. In brief, CCR5 and CCR2 may affect tumor development by in uencing the expression and migration of immune cells.
CCL14, also termed as chemokine 14, drives the cells to migrate to a speci c site. Chemokine has been extensively employed; it also leads different subtypes of immune cells to TME [44,45]. Moreover, chemokine is capable of directly affecting tumor cells and endothelial cells in TME, thereby regulating tumor growth, angiogenesis and tumor migration. It displays relationships to tumor development, so it marks a good prognosis [46,47]. The role of CCL14 in tumor progression continues to be not clear. Some research proved that CCL14 inhibited hepatocellular carcinoma (HCC) in vitro through the inhibition of the progression of cell cycle and promoting cell apoptosis and inhibited HCC development in vivo through Wnt/β-catenin signal pathway [48]. By suppressing CCL14, angiogenesis and metastasis of breast cancer can be inhibited [49]. A research highlighted that CCL14 acts as an individual element of epithelial ovarian cancer (EOC) prognosis, and upregulation of CCL14 is associated with a more conducive prognosis in EOC patients [50]. Thus, CCL14 may be a non-speci c prognostic factor. In contrast, CCL14 may facilitate invasion, proliferation of bone marrow and macrophage polarization in multiple myeloma. It may also has a relation with chemotherapy resistance [51]. As indicated from the data on TCGA, CCL14 acts as a bene cial factor of prognosis, complying with the conclusions in breast cancer and hepatocellular carcinoma. More studies are required to clarify why CCL14 achieves different results in different cancers.
GNRH2 (gonadotropin releasing hormone 2) refers to a subtype of gonadotrophin releasing hormone, released by hypothalamus and facilitating secretion of luteinizing hormone [52][53][54]. GnRH2 may be the oldest form of GnRH [55] exhibiting a strong conservative property [56]and has remained almost unchanged during the long-term evolution. GnRH2 is produced in kidneys, bone marrow, lymph glands, heart, lungs, etc. [57,58]. GnRH2 has been proven capable of inhibiting proliferation of prostatic cancer [59,60], ovarian cancer, breast cancer and endometrial cancer [61][62][63][64]; it also exerts better effects than GnRH1 [65]. According to Ling Poon S et al., GnRH2 can inhibit ribosomal phosphoprotein of cancer cells and further inhibit protein translation and cell proliferation [66]. Extracellular vesicles may be its way to regulate tumor proliferation [67]. Moreover, GnRH2 is of implication in cell apoptosis. Several studies con rmed that GnRH2 facilitated apoptosis of human granulosa cells [68] and regulated autophagy [69].
In tumor invasion, GnRH2 can up-regulate the content of metal protease [70], which is crucial in different aspects of biological activities (e.g., cell proliferation, differentiation and vascularized cell migration) [71]. GnRH2 impacts cell apoptosis and cancer metastasis, as well as prognosis of cancer. Shiota M et al.
reported that GnRH2 mutation would down-regulate the in uences of androgen deprivation therapy in patients with metastatic prostatic cancer [72]. GnRH2 can impact tumor proliferation, metastasis, prognosis and others, and it plays a role in multiple tumors. Besides its stability, this gene is an ideal index.
In the present study, the genes and pathways tightly correlated with prognosis in TME of lung adenocarcinoma were explored with TCGA database and the algorithm created by Yoshihara et al.; they provide novel insights for subsequent studies The mentioned genes and pathways were proven effective in other tumors, whereas their role in lung adenocarcinoma was not reported. Further experimental research is still required to determine the actual effects before their clinical application. The mentioned genes and pathways impact TME and TME while TME conversely impacts tumor proliferation and metastasis; thus, they interact with each other.
Some limitations remain in the present study. First, all the data are from TCGA database with an insu cient sample size; second, the mentioned results have not been experimentally veri ed. All these factors may cause outcome bias.
In brief, eight genes were tightly correlated with survival after an estimate analysis, and they may act as potential biological markers of lung adenocarcinoma Declarations Funding This research did not receive any speci c grant from funding agencies in the public, commercial, or notfor-pro t sectors.

Ethics Statement
The data in this article are all from TCGA database, so there is no ethical approval and consent participation section.

Consent for Publication
Not applicable.

Availability of Data and Materials
The datasets generated during and analyses during the current study are available in the TCGA database.

Con icts of Interest
The authors declare no con ict of interest.

Authors' Contributions
Kexin Du and Xiaohan Wang analyzed the data, and Kexin Du wrote the article.