Comprehensive Analysis of Prognostic Value and Immune Inltration of AT(cid:0)-associated Genes in Non-small Cell Lung Cancer

Background: Lung cancer is one of the most commonly diagnosed cancer and the leading cause of cancer-related death in the world. AT(cid:0)(cid:0)alveolar type II cells, AT(cid:0) (cid:0)are a key structure of the distal lung epithelium and have a secretory function that is essential to maintain normal lung homeostasis. AT(cid:0) cells dedifferentiate into a cell stem-like state, which can continuous differentiation, proliferation, repair and damage, and helps initiate and maintain tumor progression. However, the potential mechanistic value of AT(cid:0)-associated genes as a clinical biomarker and therapeutic target of NSCLC has not been fully elucidated. Methods: We used the Gene Expression Prole Interaction Analysis (GEPIA) and Oncomine database to explore the expression of AT(cid:0)-associated genes (AQP4, SFTPB, SFTPC, SFTPD, CLDN18, FOXA2, NKX2-1 and PGC) in NSCLCpatients. Then we euse the Kaplan Meierplotter and the GEPIA website to evaluate the prognosis of survival impact of differential expression of these genes. Finally, we analyzed the correlation between eight AT(cid:0)-associated genes and inltration of immune cells using the TIMER website. Results: The expression levels of AQP4, SFTPB, SFTPC, SFTPD, CLDN18, FOXA2, NKX2-1 and PGC were remarkably reduced in lung cancer tissues, and also observably related to clinical cancer stages. Low mRNA expression of AQP4, SFTPB, SFTPC, SFTPD, CLDN18, FOXA2, NKX2-1 and PGC were associated with short overall survival (OS) in NSCLS patients and the low expression of CLDN18, FOXA2, NKX2-1, PGC, SFTPB, SFTPC, SFTPD were signicantly related to a reduced progression-free survival (FP), and low CLDN18, FOXA2 and SFTPD mRNA expression led to a short post-progression


Background
Lung cancer is one of the most commonly diagnosed cancer and the leading cause of cancer-related death in the world [1][2] [3]. Non-small cell lung cancer (NSCLC) is one of the most main types of lung cancer approximately 85%, mainly including lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC) [2][4] [5][6]. Studies demonstrated that NSCLC patients with an average ve-year survival rate of 15% [1] [7]. This poor survival rate is attributable to many factors, including most patients were at an advanced stage at the time of diagnosis, currently available therapies are limited[8] , [9]. Within the last decade, with the improvement of treatment technology and equipment and the emergence of the era of precision radiotherapy, the diagnosis and treatment of lung cancer have been improved to a certain extent[8] [10] [11]. However, despite advances in treatment, the overall prognosis for NSCLC has not yet improved signi cantly.
The alveolar cells are mainly composed of AT I alveolar type I cells, AT I and AT alveolar type II cells, AT [12] [13]. Among them, AT is a key structure of the distal lung epithelium and has a secretory function that is essential to maintain normal lung homeostasis [14]. In recent years, there is currently substantial evidence showing that AT and AT -associated genes abnormal expression is signi cantly related to the occurrence and development of some diseases including cancer [15], . AT is essential for normal lung function. One of the pathological features of the idiopathic pulmonary brosis (IPF) lung is the senescence of AT [15][16] [17]. AT is also involved in the occurrence and development of COPD through the upregulated expression of many anti-or pro-in ammatory genes, including genes encoding oxygenase 2 (HO-2) and inducible nitric oxidase (iNOS) [15]. Importantly, several studies have also shown that AT plays a crucial role in the oncogenesis of lung cancer [7][18]. Ashley detected the AT -associated genes abnormal expression in the lung cancer cells in lung cancer tissues including aquaporin 4 AQP4 , surfactant pulmonary associated protein B SFTPB , surfactant pulmonary associated protein C (SFTPC), surfactant pulmonary associated protein D (SFTPD), claudin 18CLDN18, forkhead box A2(FOXA2), NKX homeobox-1 gene NKX2-1 which also known as thyroid transcription factor-1 TTF-1 and pepsinogen C PGC by single-cell RNA sequencing [19]. However, the potential values of these AT cell-related genes as clinical biomarkers and therapeutic targets in NSCLC have not been fully clari ed. Therefore, in the present study, we performed an in-depth and comprehensive analysis of the potential values of ATassociated genes in mainly type lung cancer of NSCLC, including LUAD and LUSC based on multiple large bioinformatics databases. It aims to provide clinicians with additional information to assess and adjust the diagnostic methods and treatment options of NSCLC patients. Finally, the overall prognosis for NSCLC patients can be improved.

Differential Expression of AT -associated genes in Patients With NSCLC
We rst explored the expression levels of 8 AT -associated genes in lung cancer tissues and normal paracarcinoma tissues using the Oncomine database, the mRNA expression levels of AQP4, CLDN18, FOXA2, NKX2-1, PGC and SFTPB, SFTPC, SFTPD were all remarkably reduced in lung cancer tissues in multiple datasets ( Figure 1). Furthermore, we used the GEPIA dataset to compare the mRNA expressions of the AT -associated genes in both 483 LUAD and 347 normal tissues and 486 LUSC, and 338 normal tissues. Our results indicated that AQP4, CLDN18, PGC and SFTPB, SFTPC, SFTPD were low expression in LUAD tissues, and the AQP4, CLDN18, FOXA2, NKX2-1, PGC and SFTPB, SFTPC, SFTPD were lower expression in the LUSC tissues ( Figure 2). We also contrast the relative expression levels of eight AT -associated genes in LUAD and LUSC tissues. The results reveal the highest expression of gene in LUAD and LUSC is SFTPB ( Figure 3).
Correlation Between mRNA Expression of Different ATassociated genes and Tumor Stages of NSCLC Patients Lung cancer is divided into four stages according to the disease progression. As the condition develops, the patient's physiology and physical condition will also constantly change. Therefore, we assessed the correlation between the expression of AT -associated genes and the patients' pathological cancer stages of LUAD and LUSC patients by using GEPIA. We found that the a signi cant correlation between the expression of all eight AT -associated genes and pathological stage of NSCLC : AQP4 P = 1.81e-06 CLDN18 P = =4.64e -06 , FOXA2 P = 0.000128 , NKX2-1 P =0.000756 , PGC P = 3.08e-07 SFTPB P =3.33e-07 /SFTP C P = 1.4e-08 and SFTP D P = 1.54e-07 ,band NSCLC patients who were in more advanced cancer stages were all almost inclined to express higher mRNA expression of AT -associated genes. (Figure 4). These data suggested that the 8 AT -associated might play a signi cant role in the tumorigenesis and progression of NSCLC.

Prognostic Features of AT -associated genes in Patients With Lung Cancer
To analyze the prognostic values of AT -associated genes in NSCLC patients, we assessed the correlation between these genes and prognosis using Kaplan-Meier plotter ( Table 1) were signi cantly related to a reduced FP. Finally, low CLDN18 (HR = 0.98, p = 0.032), FOXA2 (HR = 0.74, p = 0.021) and SFTPD (HR = 0.96, p = 0.021) mRNA expression apparently led to a short PPS. However, no signi cant difference was found between the AT -associated genes and Disease-free survival (DFS) in NSCLC patients. (Table 1).

Genetic Alteration, Expression and Protein/Gene Interaction Analyses of AT -associated genes in Patients With NSCLC
Epigenetic alteration plays a vital role in early malignancies, so a comprehensive analysis of the molecular characteristics of AT -associated genes was further performed on the LUAD and LUSC samples, respectively. We used the cBioPortal online tool to analyze the AT -associated genes alterations for LUAD (TCGA, Pan-Cancer Atlas) and LUSC (TCGA, Pan-Cancer Atlas). As a group, two or more alterations were detected in different subtypes of NSCLC, and the 8 AT -associated genes were varied in 273 samples out of 1053 patients with NSCLC (26%) ( Figure 6A). Moreover, the mutation rates of AQP4, CLDN18, FOXA2, NKX2-1, PGC and SFTPB, SFTPC, SFTPD were 3, 5, 2.4, 9, 2.8, 1.8, 5, and 1.1% of the investigated lung cancer samples, respectively ( Figure 6A).
Moreover, a PPI network analysis of AT -related genes was conducted with STRING. The results in Figure  6B exposed that the (deleted inmalignant brain tumors) DMBT1 gene which is a candidate tumor suppressor gene recently discovered in recent years was closely connected with AT -associated genes alterations ( Figure 6B). Besides, some genes that play an important role in immune response regulation, blood cell proliferation, defense mechanisms, and acute phase response genes are also signi cantly connected with AT -associated genes alterations. including Micro bril-associated glycoprotein 4 (MFAP4), Pulmonary surfactant-associated protein A1 (SFTPA1) ( Figure 6B). The GeneMANIA results also revealed that the functions of the differentially expressed AT -associated genes and their associated molecules (such as, Leucine-rich repeat kinase 2 (LRRK2), lysosomal-associated membrane protein 3 (LAMP3), Cathepsin E (CTSE), ATP-binding cassette transporter A3 (ABCA3), forkhead box F1(FOXF1), and Napsin A (NAPSA)) were primarily related to lung development, late endosome, aspartic-type peptidase activity ( Figure 6C).

Immune Cell In ltration of AT -associated genes in Patients With NSCLC
Immune cell level is associated with the proliferation and progression of the cancer cell. In this study, to verify AT -associated genes have been involved in cancer-related in ammation and the in ltration of immune cells, thus affecting the clinical outcome of NSCLC patients, we use the TIMER database to provide a comprehensive analysis of the correlation between eight AT -associated genes and immune cell in ltration (Figure 7, Figure 8)

Discussion
The occurrence of lung cancer is a multistep process. For example, LUAD has thought to progress always from atypical adenomatous hyperplasia (AAH) to adenocarcinoma in situ (AIS) [20] and before the development of LUSC, we can observe pre-invasive lesions in the airways [21]. Distinct molecular events and other malignant phenotypes make normal lung cells gain or lose some functions leading to deregulation of key genetic signals involved in cell proliferation, differentiation, apoptosis, migration, invasion [22] [23]. The study showed that AT cells can dedifferentiate into a cell stem-like state, which can continuous differentiation, proliferation, repair and damage. Therefore, AT is suspected to be the cell of origin in oncogene-driven lung cancers and can help maintain tumor progression [19] .
In recent years, these 8 AT -associated genes have been con rmed to play key roles in growth and development, multiple diseases (including several cancers). For example, FOXA2 has been proved that plays crucial roles during lung morphogenesis, surfactant protein production, goblet cell differentiation and mucin expression [24]. Besides, Liu experimentally found that the histone demethylase PHF8 can drive neuroendocrine prostate cancer (NEPC) development by epigenetically upregulation of FOXA2 [25]. Thyroid transcription factor 1 (TTF-1 or NKX2-1) has long been known as an important development regulator of driving the brain, lungs, and thyroid, maturation and morphogenesis[26] [27]. Studies have demonstrated that NKX2-1 gene mutations related to compensated congenital hypothyroidism and unexplained respiratory distress due to lung hypoplasia in neonates[28]. NKX2-1 ampli cation and overexpression also be proved that contribute to lung cancer cell proliferation rates and survival [29].
Interestingly, some researchers found an opposite phenomenon that NKX2-1 can constrain lung adenocarcinoma in part by repressing the embryonically restricted chromatin regulator Hmga2 [30]. Thus, the oncogenic and inhibitory function of NKX2-1 in the same tumor type con rms its role as a bifunctional lineage factor. Aquaporins (AQPs) are water channel proteins that can capable of selectively transporting water and other small solutes across cells [31] [32]. In the lung, AQPs were supposed to facilitating uid transport in alveolar space, airway humidi cation, pleural uid absorption, and submucosal gland secretion. AQP4 is one of a member of the aquaporin family which was rst discovered in 1994[32] [33]. The change of AQP4 expression is associated with many central nervous system (CNS) diseases including epilepsy, edema, stroke, and glioblastoma [34]. Besides, in breast cancer, thyroid carcinoma (undifferentiated) and stomach cancer, AQP4 is low expression [35][36] [37][38]. On the contrary, studies found that AQP4 is a high expression in lung cancer and is involved in the invasion of lung cancer cells [39] [40]. Surfactant proteins (SP) are involved in surfactant function and innate immunity in the human lung. In cystic brosis (CF), the genetic contribution of the surfactant protein genes, SFTPB, SFTPC, and SFTPD have been proved. Besides, a study has shown that major genetic mutations with childhood intermittent lung disease (ILD) also occur in surfactant genes, including SFTPA1, SFTPA2, SFTPB, SFTPC, ABCA3 and NKX2-1. Finally, CLDN18 is required for intercellular connectivity and has been reported to be involved in cell migration and metastasis, making it an oncogene in various cancer types, including pancreatic, esophageal, ovarian, and lung cancer.
In this study, we rst systematically analyzed the expression of eight AT -associated genes ( AQP4, SFTPB, SFTPC, SFTPD, CLDN18, FOXA2, NKX2-1 and PGC) in lung cancer. The expression levels of AQP4, SFTPB, SFTPC, SFTPD, CLDN18, FOXA2, NKX2-1 and PGC in lung cancer tissues were lower than those in normal tissues. Additionally, we also veri ed that differential expression of AT -associated genes ( AQP4, SFTPB, SFTPC, SFTPD, CLDN18, FOXA2, NKX2-1 and PGC) was observably related to clinical cancer stages in NSCLC patients. These results indicate that all of these eight AT -associated genes function as an oncogene and might take a signi cant part in the tumorigenesis and progression of NSCLC. Besides, all of these eight AT -associated genes were found to be notably related to OS in lung cancer patients. Low mRNA expression was associated with short OS in lung cancer patients. The remaining seven genes (SFTPB, SFTPC, SFTPD, CLDN18, FOXA2, NKX2-1 and PGC) except AQP4, were signi cantly associated with FP and lower mRNA expression was related to shorter FP while low CLDN18, FOXA2 and SFTPD mRNA expression apparently led to a short PPS. All these results indicate AT -associated genes might be a risk factor for survivals of NSCLC patients and could be potential prognostic biomarkers. In addition, our study showed that the expression of AT -associated genes might be signi cantly correlated with and the in ltration of six immune cell types. The tumor microenvironment (TME) is complex and continuously evolving and could affect tumor progression and recurrence [41]. Immune cells are important constituents of the tumor stroma and critically take part in this process [42]. This result also suggest that ATassociated genes may also re ect the immune status besides the disease prognosis.
We analyzed AT -associated genes comprehensively in NSCLC based on their expression, mutation,, survival analysis, and in ltration of immune cell. Undeniably, our study also had some limitations. All of our results were acquired from public databases so that our results need to be validated. The potential mechanisms and molecules of eight AT -associated genes also should be further explored in the progression of NSCLC.

Conclusion
In conclusion, this work provided strong evidence of the values of AT -associated genes (AQP4, SFTPB, SFTPC, SFTPD, CLDN18, FOXA2, NKX2-1 and PGC) as clinical biomarkers and therapeutic targets in NSCLC. In the future we hope the results could make these eight AT -associated genes were expected to become new prognostic biomarkers in NSCLC and provide some new inspirations to assist in the design of new immunotherapies.

Oncomine
Oncomine database is a publicly accessible online cancer microarray database. (www.oncomine.org), which provides a genome-wide expression analysis [43]. In this study, it was utilized to analyze the transcription levels of AT -associated genes in NSCLC tissues and their corresponding adjacent normal control samples. A p-value < 0.05, a fold change of 2, and a gene rank in the top 10% were set as the signi cance thresholds. Student's t-test was used to analyze.
Gene Expression Pro ling Interactive Analysis (GEPIA) GEPIA (http://gepia.cancer-pku.cn/index.html) is a newly developed interactive web server for analyzing the RNA sequencing expression data of 9736 tumors and 8587 normal samples from the TCGA and Genotype-tissue Expression dataset [44]. GEPIA offers customizable functions such as tumor/normal differential expression analysis, pro ling according to cancer types or pathological stages, patient survival analysis, similar gene detection, correlation analysis, and dimensionality reduction analysis. In this study, we performed the pathological type and stage analysis of eight AT -associated genes using the "LUAD" and "LUSC" datasets. The Student's t-test was used to generate a p-value and the p-value cutoff was 0.01.

Kaplan-Meier Plotter
Kaplan-Meier Plotter (https://kmplot.com/analysis/) is a useful prognostic biomarker assessment tool that can assess the effect of 54 k genes on survival in 21 cancer types [45]. In this study, LUAD and LUSC patients were split into high and low expression groups based on median values of AT -associated genes expression and analyze the prognostic value of the AT -associated genes in LUAD and LUSC regarding OS (overall survival), FP ( rst progression), and PPS (post-progression survival). The hazard ratio with 95% con dence intervals and log-rank P value was calculated and the statically signi cant difference was considered when a p-value is <0.05. cBioPortal cBioPortal (www.cbioportal.org) is a comprehensive web resource that could visualize and analyze multidimensional cancer genomics data[46] [47]. In this study, we analyze the AT -associated genes alterations for LUAD (TCGA, PanCancer Atlas) and LUSC (TCGA, PanCancer Atlas), which contained mutations, Structural variants copy-number alterations. STRING STRING (https://string-db.org/) is a database of known and predicted protein-protein interactions (PPI) [48]. In this study, we conducted associations among the PPI network of AT -associated genes to explore the role of AT -related genes co-expressed genes with STRING GeneMANIA GeneMANIA (http://www.genemania.org) is a useful website that can nd information on protein-protein, protein-DNA and genetic interactions, pathways, reactions, gene and protein expression data, protein domains and phenotypic screening pro les [49]. In this study, we used it to weights that indicates the predictive value of AT -associated genes.

Timer
Timer web server (https://cistrome.shinyapps.io/timer/) is a comprehensive resource for systematical analysis of the in ltration of different immune cells and their clinical impact across diverse cancer types [50]. In this study, we use the "Gene module" and "Survival module" to explore the correlation of eight AT -associated gene levels and the immune cell in ltration, the clinical outcome in LUAD and LUSC, respectively.     Prognostic value of AT -associated genes in LUAD and LUSC (Kaplan-Meier plotter). Low mRNA expression of AQP4, SFTPB, SFTPC, SFTPD, CLDN18, FOXA2, NKX2-1 and PGC were associated with short overall survival (OS) while the low expression of CLDN18, FOXA2, NKX2-1, PGC, SFTPB, SFTPC, SFTPD were signi cantly related to a reduced FP, and low CLDN18, FOXA2 and SFTPD mRNA expression apparently led to a short PPS in NSCLC patients.  Correlations between differentially expressed AT -associated genes and immune cell in ltration (TIMER).