Prognostic model of inflammation-related genes. The flow chart of this study was presented (Supple Fig. 1). 551 samples (54 none-tumor samples and 497 tumor samples) of LUAD were downloaded from TCGA. Excluding none-tumor samples and samples with incomplete information of survival time and survival status, 464 eligible samples were obtained. 254 inflammation-related genes were obtained from Molecular Signatures Database(MSigDB). R-package "survival" was used for univariate COX analysis to obtain the prognostic genes. As shown in the forest map (Fig. 1A), 54 inflammation-related genes associated with OS were obtained. In order to avoid the possibility of collinearity between gene expression levels, LASSO regression is used to perform variable screening and re-sampling 1000 times for these 54 genes, and selected the genes with more than 900 repetitions. The figure shows that the optimal number of variables is equal to 15(Figure. 1B-D). We further conducted multivariate Cox regression analysis on these 15 genes by stepwise method, and finally included 8 genes to construct a COX proportional hazard model (Figure. 1E).
Validation of risk groups and genetic prognostic models. Regression coefficients of eight genes included in the prognostic model are shown (supple table. 1). The risk score of each sample is calculated as follows formula: Risk Score= 0.1241*CCL20 expression - 0.6192*CCR2 expression + 0.3660*GNAI3 expression + 0.1240*ITGA5 expression + 0.2050*NMI expression + 0.1330*PCDH7 expression + 0.430*PSEN1 expression - 0.3382*SLC11A2 expression. The two best cut-off values of risk score are equal to 1.05, 2.01 respectively, the tumor samples were divided into high-, medium- and low-risk group (Supple Fig. 2). A total of 289 tumor samples of GSE30129 from GEO were taken as the validation group, risk score and risk group were defined as the same way, the two best cutoff values of risk score are equal to 11.28 and 15.7(Supple Fig. 3). Survival analysis based on R-package “survival” showed that the overall survival differences were statistically significant among the three risk groups both in the two data sets(p-value<0.001). (Fig. 2A,E). R-package “timeROC” was used to draw time-dependent ROC curves. AUC at 1-, 3- and 5- years was equal to 0.715, 0.719, 0.699 in the cohort of TCGA(Fig. 2B), and 0.608, 0.631, 0.639 in GSE30129, which shows that the model has good predictive ability(Fig. 2F). Principal component analysis(PCA) and uniform manifold approximation and projection analysis(UMAP) indicate that all samples can be well clustered as risk groups(Fig. 2C,D,G,H). Univariate and multivariate analysis suggested that risk group can serve as an independent prognostic signature of LUAD both in TCGA cohort and GSE30129(Fig. 2L,Q,M,R). The survival state curve shows most of the survival cases are concentrated in the low-risk group, while the death cases tend to be concentrated in the high-risk group, indicates there is a significant correlation between the risk score and the survival status of the patients, (Fig. 2I,J,N,O).
Clinical correlation of the prognostic index. The clinical baseline characteristics of TCGA cohort (Supple Table 2) and GSE30129 (Supple Table 3) are shown. The risk score was significantly correlated with the clinical features such as tumor stage (p-value < 0.001), T (p-value < 0.001) and N (p-value < 0.01) (Fig. 3A), the proportion of samples of later stage, T and N tend to be higher as the level of risk group raise up(Fig. 3B-D). The decision curve analysis shows that the net benefit rate of risk score is the highest at each threshold probability compared with other clinical characteristics, suggesting that it have practical application value (Fig. 3I). The ROC curve constructed by clinical characteristics showed that the AUC of risk score and tumor stage in predicting 1-year survival was 0.715 and 0716, which were close to each other. The AUC of 3-year and 5-year survival predicted by risk score was 0.719 and 0.699, higher than 0.688 and 0.644 of 3-year and 5-year survival predicted by tumor stage(Fig. 3E-G). The C-index of risk score was higher than that of other clinical characteristics in univariate regression model with bootstrap re-sampling (1000 times) method(Fig. 3H).
Construct and verify the prognostic nomogram. We integrated signatures including risk groups, T, M, N and age into the COX proportional hazard model to construct a nomogram(Fig. 4A). The TCGA data set and GSE30129 were included in 313 cases and 280 cases with complete clinical information respectively. The C-index of the nomogram in the TCGA cohort and GSE30129 are equal to 0.745 and 0.692 respectively. According to the calibration curves(Fig. 4B-C), the prediction probability of this nomogram is very close to the observation probability after 1000 simulations of the 1-, 3- and 5-years calibration curves drawn by bootstrap re-sampling method in the TCGA cohort and GSE30629.
Gene set enrichment analysis. Gene set enrichment analysis(GSEA) was carried out on the high- and low-risk group in TCGA by using GSEA software(version 4.1.0). In terms of KEGG pathways, the high-risk group mainly focused on cell cycle regulation, DNA replication, homologous chromosome, p53 signaling pathway, pentose phosphate pathway, etc. In the low-risk group, the main pathways were enriched in T and B cell receptor signal pathway, cell adhesion signal pathway, chemokine signal pathway, etc. In terms of GO functions, the high-risk group mainly focused on mitosis, epithelial polarization, protein regulation, desmosome, cadherin, etc. The low-risk group focused on adaptive immune response, B cell receptor signal transduction, lipid metabolism, immune-related T cell activity, T cell selectivity, cytokine receptor activity, immune checkpoint activity (Fig. 5), etc.
Tumor microenvironment and tumor mutation burden. Tumor microenvironment was quantified by score of stromal cells (stromal score) and score of immune cells (immune score). Sum of immune score and stromal score equals the tumor microenvironment (estimate score).Immune score, stromal score and estimate score were negatively correlated to risk score. Tumor purity was positively correlated with risk score (Fig. 6). Scatter plots of correlation coefficients showed that there was a positive correlation between tumor mutation burden (TMB) and risk score(Supple Fig. 4E). In the waterfall plot, the proportion of samples with mutations in the high-risk group was significantly higher than that in the low-risk group, and missense mutations were the most common type of mutation. The most frequently mutated gene was TP53, which was also the gene with the most significant difference in mutation ratio among the high- and low- risk group, followed by TTN(Supple Fig. 4A-C). TP53 gene and KRAS gene have mutually exclusive mutation relationship(Supple Fig. 4D).
Correlation analysis between prognostic model and immune. In order to further explore the correlation between risk score and immune status, 16 kinds of immune cells and related immune functions in all samples were scored by ssGSEA. Most of the estimated infiltration abundances of immune cells in the high-risk group were down-regulated with statistical significance (p-value<0.05) (Fig. 7A), including B cells、dendritic cells (DCs)、immature dendritic cells (iDCs)、activated dendritic cells (aDCs)、plasmacytoid dendritic cells (pDCs)、CD8+ T cells、macrophages、mast cells、neutrophils、T helper cells、T follicular helper cell (Tfh)、Type 1 T help Cells (Th1 cells)、tumor infiltrating lymphocyte (TIL)、regulatory T cells (Treg), both risk score and TMB were negatively correlated with abundance of multiple types of immune cells (Fig. 7D). In terms of immune function, the activity scores of high-risk group were significantly lower in check-point、cytolytic activity、Human lymphocyte antigen (HLA)、inflammation-promoting、T cell co-inhibition、T cell co-stimulation、type II IFN response(Fig. 7C). CIBERSORT method was used to calculate the relative content of 22 types of immune cells in each sample. Similar to the result of ssGSEA, the relative content of B memory cells, resting CD4 T memory cells, monocytes, M0 macrophages, resting dendritic cells, mast cells were higher in the samples in the low-risk group (Supple Fig. 5). Correlation analysis showed that the expression values of most immune checkpoint related genes were up-regulated in the low-risk group, including PD-CD1 and BTLA, while TNFSF9 and CD276 were down-regulated in the low-risk group(p-value<0.05)(Fig. 7B). Previous studies have summarized the immune infiltration patterns of various cancer types, which can be defined as six types, including: C1(wound healing), C2 (INF-γ dominant), C3 (inflammatory), C4(lymphocyte depleted), C5 (immunologically quiet), C6(TGF-β dominant). In this study, the ensemble average of risk score of the sample with type C1 (WoundHealing) was the highest, and that of the sample with type C3 (Inflammatory) was the lowest(Fig. 7E).
Immunotherapy effect and drug sensitivity analysis. There was no statistical difference in IPS between the high-risk and low-risk groups among samples that predicted a negative response in both immunotherapy regimens(Fig. 8A). Among tumor samples that were predicted to have a positive immune response to a single regimen of PD-1/PD-L1 (Fig. 8B) or CTLA4 inhibitors (Fig. 8C) and a positive immune response to both regimens (Fig. 8D), the ensemble average of immunophenoscore (IPS) in the low-risk group were higher than those in the high-risk group, and the differences were statistically significant. The IC50 values of commonly used LUAD targeted drugs and chemotherapy drugs were calculated to analyze the drug sensitivity of 464 samples from TCGA. The IC50 values of cisplatin (Fig. 8E), docetaxel (Fig. 8F), erlotinib (Fig. 8G) and gefitinib (Fig. 8H) were significantly increased in the low-risk group in TCGA, suggesting higher sensitivity of patients in the high-risk group to use these drugs.