Identification of Overall Survival-related FAMGs
To investigate the biological process of the FAMGs, functional enrichment analysis showed that these FAMGs were enriched in the cellular response to stimulus and metabolic-related biological processes such as steroid metabolism, bile acid, and epoxygenase P450 pathway (Figure S1A). KEGG analysis confirmed that cytochrome P450 metabolism, Wnt, steroid, Phenylalanine, and bile acid synthesis ranked as the top pathways (Figure S1B). Then, we explore the interplay of these FAMGs through PPI network analysis, and two modules (named CYP3A4, and KRT19) were identified by the number of nodes (Figure S1C). In module CY3A4, these membrane-associated CYPs are the major enzymes involved in drug metabolism. Genes in module KRT19 are the regulators in Wnt signaling which control cell proliferation and growth. This further promoted us to assess their clinical relevance in LUAD. A total of 22 FAMGs were found to be correlated with patients’ outcomes by univariate Cox regression analysis in the TCGA LUAD dataset (Figure 1A). Among these OS-related FAMGs, decreased survival was observed in patients with high expression of 11 OS-related FAMGs such as TCN1, RHOV, DKK1, and GPR115, while elevated expression of the remaining FAMGs was considered as protective factors.
Construction and Validation of The Prognostic Signature Based on FAMGs
The 22 FAMGs with prognostic significance were delivered for further analysis. Stepwise multivariate Cox proportional hazard regression analysis was employed to determine the minimum set of features that generated the most powerful prediction for patients’ prognoses. This identified 10 genes of significance (GPR115, SOAT2, CDH17, MOGAT2, COL11A1, TCN1, LGR5, SLC34A2, RHOV, and DKK1) which comprise the optimal prognostic signature (Figure 1B). The formulation was listed as follows:
FAScore = [Expression level of GPR115 *(-0.0573)] + [Expression level of SOAT2*(-0.0987)] + [Expression level of CDH17 *(0.0608)] + [Expression level of MOGAT2 *(-0.1079)] + [Expression level of SLC34A2 *(-0.0651)] + [Expression level of COL11A1 *(0.0522)] + [Expression level of TCN1 *(0.0502)] + [Expression level of LGR5*(-0.0663)] + [Expression level of RHOV *(0.0917)] + [Expression level of DKK1 *(0.0771)]
The FAScore for an individual patient was calculated, and patients were divided into low- and high-risk groups using median FAScore as the cutoff value. We generated a Kaplan-Meier curve and decreased survival was observed in patients within the high-risk group as compared to those patients in the low-risk group (Figure 1C, p=1.19e-11). The number of deaths was increasing along with elevated FAScore (Figure 1D-E). We noted that six model genes (GPR115, CDH17, MOGAT2, COL11A1, TCN1, and RHOV) were significantly up-regulated in LUAD patients as compared to adjacent normal tissues (Figure S1D), while two genes (SLC34A2 and LGR5) were markedly down-regulated in patients (Figure S1D). Additionally, six model genes were up-regulated in high-risk patients (Figure S1E), while the expression of the remaining four genes (SLC34A2, LGR5, MOGAT2, and SOAT2) was observed to be increased in low-risk patients (Figure S1E). The area under curve (AUC) values of the receiver operating characteristics (ROC) curve for 1-, 3-, and 5-year were 0.80, 067, and 0.67 (Figure 1F), indicating that the prognostic signature has a robust capacity for monitoring prognosis.
Clinicopathological features such as clinical stages and age were correlated with disease progression and patient’s prognosis. To further test its predictive independence, univariate Cox regression analysis was performed in the TCGA LUAD dataset, and we found that FAScore, involved lymph nodes, tumor size, and clinical stages correlated with decreased survival, while patients receiving treatment have favorable survival (Figure 2A). Multivariate Cox regression analysis showed that the prognostic signature could serve as an independent predictor after adjusting for other clinical parameters such as treatment (Figure 2B). Similar observations were found in the GSE68465 LUAD dataset (Figure 2C-D). In the GSE72094 dataset, the FAScore could serve as an independent indicator for OS prediction (Figure 2E-F). In addition, patients with oncogenic driver mutations including EGFR and KRAS were associated with clinical survival, which was consistent with previous reports.
Validation of the Prognostic Signature in Independent Cohorts
To verify the reproducibility of the prognostic signature in LUAD, we applied this signature formula in six independent publicly available cohorts (GSE30219 (n=85), GSE31210 (n=226), GSE50081 (n=127), GSE68465 (n=442), GSE72094 (n=398), and GSE11969 (n=90)) totaling 1,368 LUAD patients to assess the predictive capability of the proposed signature for patients’ prognosis. Patients were also stratified into low- and high-risk groups, and we found that patients in the low-risk group in these six cohorts have prolonged OS than those in the high-risk group (Figure 3A-F). The predictive performance of the signature in each validation set was also calculated and showed that the AUCs of 1-, 3-, and 5-year all exceeded 0.6. Specifically, the AUC of 1-year in the GSE31210 validation set was 0.91 (Figure 3B). These data convinced that the signature has robust risk stratification in the microarray-based platform for LUAD patients (Figure 3A-F).
Differentially expressed and Functional Enrichment Analysis
The differentially expressed genes (DEGs) were identified using the edgeR package. To look at the pathways involved in the difference of the malignant characteristics between low- and high-risk groups, Gene Set Enrichment Analysis (GSEA) was performed in Hallmark and KEGG pathway gene sets. Epithelial-mesenchymal-transition (EMT), DNA repair, oxidative phosphorylation, G2M checkpoint, and hypoxia were found to be enriched in high-risk patients (Figure S2A-D, Table S2), suggesting dysregulated metabolism and cell cycle, and the malignant transition was active in patients within the high-risk group. These pathways were also observed in the KEGG gene set enrichment analysis (Figure S2E-H). Increasing reports indicated that these pathways were correlated with resistance to ICB therapy32.
Predictive Potential of The Signature for Patients’ Relapse
Relapse has been one of the main challenges for patients receiving various types of treatment such as chemotherapy or radiotherapy. We used four LUAD data sets (TCGA, GSE30219, GSE31210, and GSE50081) with available clinical information on patients’ relapse or progression to investigate the predictive potential of the signature. We found that patients in the low-risk group have significantly favorable progression-free survival as compared to those patients in the high-risk group (Figure 4A). A similar result was also observed in the GSE30219 set (Figure 4B). As for disease relapse, patients with low FAScores in the GSE31210 set showed better survival (Figure 4C), suggesting the signature may be an indicator for monitoring patients’ relapse or disease progression, which was verified in an independent LUAD set (GSE50081) as well (Figure 4D).
Tumor Immune Microenvironment Analysis
Recognition of the dual role of TIME in anti-tumor immunity has led to remarkable leaps forward in tumor immunotherapy33. To delineate the TIME landscape in both risk groups, each patient was scored based on profiling of 29 immune signatures using single sample gene set enrichment analysis (ssGSEA) and found that patients in the low-risk group showed enhanced immune activities (Figure 5A). Further ESTIMATE analysis showing higher immune scores and ESTIMATE scores, and decreased tumor purity (Figure 5B-D) in patients within the low-risk group in contrast to the high-risk group demonstrated the notion (Figure 5B-C). Next, we interrogated the infiltrated immune cell subsets by multiple deconvolution algorithms. CIBERSORT analysis showed that CD8+ T cells and B cells increased in patients in the low-risk group, whereas activated CD4+ T cells, neutrophils, and mast cells were elevated (Figure 5E), which was confirmed by TIMER (Figure 5F) and EPIC (Figure 5G) infiltration analysis. In addition, xCell deconvolution also convinced that CD8+ and CD8+ central memory T cells, and microenvironment scores were higher than those of patients in the high-risk group (Figure 5H). Several co-stimulatory molecules were up-regulated in low-risk patients such as CD27/28, and ICOS (Figure 5I), while the co-inhibitory molecules were shown differential expressed in both groups, CD47, IDO2, CTLA4, and PD-1, were increased expression in low-risk patients, but PD-1/L1 and B7-H3 were decreased (Figure 5J), indicating that patients in low-risk groups might benefit from the immunotherapy targeting PD-1/CTLA4. We noted that IL2, CD160, and KLAG1, the positive regulators of immune cell proliferation and activation expressed highly in patients in the low-risk group, and H2AX, a key immune cell senescence marker, expressed higher in high-risk patients, this might be a sign of immune cell exhaustion in this group (Figure 5K).
The Signature Correlates with Anti-tumor Immunity and Therapy Response
Given the difference of elevated tumor-infiltrating lymphocytes (TILs) (Figure 5E-H, Figure 6A) such as cytotoxic T cells and NK cells in high- and low-risk groups and their critical roles in predicting the efficacy and treatment response. We investigated the association between the signature and widely used immunotherapy markers PD-L1 expression (Figure 5J) and tumor mutation burden (TMB) in the LUAD cohort. FAScore was positively correlated with the TMB of patients (Figure 6J). Accounting reports have revealed that the repertoire of T cell receptors (TCR), which recognize antigens presented by the major histocompatibility complexes (MHC), could serve as a predictive indicator of responsiveness to immunotherapy34;35. We conducted the repertoire analysis of TCR and found that patients in the low-risk group exhibited higher TCR richness and diversity (Figure 6B). B cell receptor richness in low-risk patients was also increased (Figure 6C), while the BCR diversity was comparable (Figure 6D).
IFN‐γ is a pleiotropic cytokine with antitumor or pro-tumorigenic roles36, and TGF-β is an important cancer-promoting cytokine that contributes to the suppression of anti-tumor immunity37 (TGF-β and Cancer Immunotherapy). We subsequently scored the IFN‐γ and TGF-β responses and found that both cytokine responses were enhanced in high-risk patients (Figure 6E-F). The proliferation index was accordingly markedly higher in patients with high-risk scores (Figure 6G). In addition, we found that FAScore was significantly correlated with TMB (Figure 6H). A lower aneuploidy score has been observed in patients with complete or partial responses to immune checkpoint blockade38, and the same decreased trend was found in low-risk patients as compared to those in high-risk patients (Figure 6I). Mutant-allele tumor heterogeneity (MATH), a hallmark of cancer that is a promising biomarker for clinical outcomes and patients’ response to therapy39, was decreased in low-risk patients in contrast to those high-risk patients (Figure 6J). Furthermore, the number of cancer stem cells (CSCs) was estimated using an mRNA expression-based stemness index and found that CSCs was decreased in low-risk patients (Figure 6K).
Assessment of responsiveness to therapy including immunotherapy is a critical challenge before treatment. These identified features supported that the signature implies predicting responsiveness to therapies. We found that patients with progressive disease have higher FAScores than those patients with responders after receiving chemo/radiotherapy in the TCGA dataset (Figure 6L). IPS analysis indicated that low-risk patients might benefit from the anti-PD-1/CTLA-4-based immunotherapies (Figure 6M). Next, to verify the hypothesis, we applied the signature to the dataset from patients receiving anti-PD-1 treatment and found that responders have significantly lower risk scores as compared to patients in the high-risk group (Figure 6N).