The mRNAsi is higher in LC tissues and correlated with clinical characteristics
In order to explore the potential relationship between mRNAsi and LC, the mRNAsi of tumor tissue was compared with that of normal tissue. The results showed that the mRNAsi in the tumor group was relatively higher than that in the normal group, and also higher than the corresponding paired adjacent tissues (Figure 1A and 1B). When the tumor group was divided into the low mRNAsi group and the high mRNAsi group according to the median value, the results also showed that there was statistical differences between the two groups (Figure 1C). In terms of different grades, the mRNAsi of LC showed significant differences, with a gradually increasing trend (Figure 1D). Besides, the mRNAsi of Stage II and T2 was increased than that of Stage I and T1, respectively (Figure 1E and 1F). In addition, compared with the low mRNAsi group, the patients in the high mRNAsi group had significantly poorer overall survival (Figure 1G).
WGCNA analysis of differential genes between the low and high mRNAsi tumor groups
We noted that mRNAsi was closely related to overall survival, and there was a significant difference in mRNAsi scores between the low and high mRNAsi tumor groups, suggesting that the differential genes between the two groups regulate stem cell characteristics to some extent. Thus, we analyzed the transcriptome data from the two groups. The results revealed that there were 998 DEGs, of which 290 genes were up-regulated and 708 genes were down-regulated (Figure 2A). WGCNA results showed that the yellow module was highly correlated with mRNAsi (Figure 2B and 2C). According to the filtering criteria of cor.MM> 0.8 and cor.GS> 0.5, 32 key genes of yellow module were screened out (Figure 2D). The heatmap and box plot were pictured based on the expression values of the module genes. The results displayed that the key genes of the yellow module were significantly down-regulated in LC tissues (Figure 2E and Figure 3A).
GO analysis disclosed that these genes were enriched in extracellular structures and angiogenesis (Figure 3B). In terms of KEGG analysis, the results revealed that these key genes mainly focused on PI3K-AKT signaling pathway, ECM− receptor interaction and focal adhesion (Figure 3C). Furthermore, the correlation analysis of these genes showed that there was a strong correlation between the key genes (Figure 3D). As shown in Figure 3E, there is a close interaction between key genes at the protein level. Among them, the edge number of COL4A4, FBN1, DCN, LUM, SPARC and THBS2 is more than 7, indicating that these genes are the core genes in the network (Figure 3F).
Screening and analysis of methylation-driven genes
We conducted a combined analysis of DNA methylation and gene expression profiles in the low and high mRNAsi group, and screened out 43 methylation-driven genes (Figure 4A-C). After the intersection of these 43 genes and 998 DEGs, 6 methylation-driven genes are obtained, namely AQP1, CKMT2, DCAF4L2, GRHL2, RAB25, SPINT2. Among them, DCAF4L2 is hypomethylation, while the rest are hypermethylation. Then, we extracted the values of the methylation sites of these genes, conducted correlation analysis of each methylation site in combination with gene expression profiles, and finally screened the methylation sites according to the correlation coefficient less than -0.3. There were 67 methylation sites that met the conditions, among which AQP1 accounted for 10, CKMT2 for 11, DCAF4L2 for 13, GRHL2 for 18, RAB25 for 4, and SPINT2 for 11 (Figure 4D-I and Supplementary table 4). Additionally, univariate COX regression and KM survival analysis for these 67 methylation sites showed that CKMT2|cg26206748, GRHL2|cg18899777 and GRHL2|cg18899777 were related to survival (Figure 4J-L).
Screening and analysis of CNV-driven genes
After comparing the CNV values of the low and high mRNAsi groups, 234 differential copy number genes were screened, which were mainly concentrated on chromosome 16 and chromosome 17 (Figure 5A). Subsequently, 181 CNV-driven genes were selected from the 234 genes through the combined analysis of CNV and gene expression profiles. After the intersection of these 181 genes and 998 DEGs, 4 CNV-driven genes are obtained, namely ABR, EFNB3, FOXF1, ZFP3, whose number variation types were mostly concentrated in the loss of one copy number and mostly occurred in the high mRNAsi group. As expected, the gene expression of these four genes decreased with the loss of copy number. (Figure 5B-E).
Analysis of survival-associated alternative splicing of DEGs
Among the 998 DEGs, 217 DEGs had alternative splicing events, of which AT events occurred the most, followed by ES events (Figure 6A). Univariate COX regression analysis showed that 58 alternative splicing events in 217 DEGs were associated with survival (Figure 6B-E). After extracting the gene expression of splicing factors, correlation analysis was performed on the 58 events to construct a splicing network related to survival (Figure 6F).
Construction of lncRNA‐miRNA‐mRNA ceRNA network related to tumor stemness
803 DEmRNA and 133 DEmiRNA were obtained from the low and high mRNAsi tumor groups. Enrichment analysis of transcription factors using FunRich showed that DEmRNA transcription factors were mainly enriched in NFIC, MYFS, FOSB, FOS, JUND, JUNB, JUN, TEAD1, RREB1 and SRF, while DEmiRNA transcription factors were chiefly concentrated in FOXJ2, RORA, NKX6-1, FOXA1, NFIC, MEF2A, SP4, POU2F1, SP1 and EGR1 (Figure 7A-B). The results of pathway analysis showed that DEmiRNA was mainly involved in Glypican pathwy, proteoglycan syndecan-mediated signaling pathway, IFN-gamma pathway, VEGF and VEGFR signaling network, etc (Figure 7C). GO analysis results presented that DEmiRNA participated in cell communication, signal transduction and GTPase activity, etc (Figure 7D-F). Subsequently, 9 lncRNAs, 5 miRNAs and 70 mRNAs were selected based on TargetScan, miRcode and miRTarBase databases (Figure 7G).
Stemness in tumor cells is related to tumor microenvironment in LC
We evaluated immune score by ESTIMATE method. Interestingly, we observed a phenomenon that the mRNAsi score was slightly negatively correlated with the immune score (Figure 7H). Meanwhile, patients with high mRNAsi scores and low immune scores have a poor prognosis (Figure 7I).
In order to further explore the relationship between immune infiltration and stemness in tumor cells, the proportion of 22 immune cells was evaluated based on CIBERSORT. Figure 8A summarizes the results obtained from 45 tumor samples (p<0.05). Compared with the low mRNAsi tumor group, the high group had higher proportion of CD8+ T cell, memory activated CD4+ T cell and follicular helper T cell, while naïve B cell, memory resting CD4+ T cell and Tregs decreased (Figure 8B). Besides, the mRNAsi score was positively correlated with the proportion of CD8+ T cell, memory activated CD4+ T cell and follicular helper T cell, while negatively correlated with naïve B cell, memory resting CD4+ T cell and Tregs (Figure 8C-G).
Validation of the relationship between stemness and tumor microenvironment.
To systematically understand the relationship between stemness in tumor cells and tumor microenvironment, we evaluated immune score and the proportion of 22 immune cells on 25 types of tumors, including gastric carcinoma, esophageal carcinoma and colon cancer. For immune environment, the immune scores of 13 tumors were significantly negatively correlated with the mRNAsi scores, with brain lower grade glioma being the most correlated (Figure 9A-B). Even though this phenomenon was not evident in the remaining tumors, a negative correlation trend was observed in these tumors (Figure 9A).
Additionally, Figure 9C summarized the results of differential analysis of 22 immune cells from 25 types of tumor (p<0.05). Among the 25 types of tumors, about half of them had apparent differences in the proportion of CD8+ T cell, resting memory CD4+ T cell, activated memory CD4+ T cell, follicular helper T cell and M1 macrophages. Moreover, the proportion of CD8+ T cell, activated memory CD4+ T cell, follicular helper T cell and M1 macrophages in the high mRNAsi group was mostly up-regulated, while the proportion of resting memory CD4+ T cell was mainly down-regulated. Besides, about one-third of the tumors had obvious contrasts in the proportion of naïve B cell, Tregs and resting Mast cells, which was mainly down-regulated in the high mRNAsi group.
Construction and validation of prognosis model based on immune genes
After the intersection of 1789 immune genes and 998 DEGs, 104 differential immune genes were obtained (Figure 10A). Then, 8 candidate genes were selected following univariate COX analysis and KM survival analysis, which were used for multivariate COX analysis to construct a prognosis model for LC. As shown in figure 10B, each gene in the model was closely associated with survival. Moreover, the prognosis model performed reasonably well in predicting between good and poor prognosis for LC patients (Figure 10C). To further evaluate the efficiency of prognostic model, survival receiver operator characteristic curves were applied into the model, showing that the AUC values of 1-year, 3-year and 5-year were 0.749, 0.750, 0.649, respectively (Figure 10E). The result of risk survival analysis of the ICGC dataset also showed that the prognosis model could clearly distinguish the prognosis of LC patients (Figure 10D). The AUC values of 1-year, 3-year and 5-year were 0.774, 0.802, 0.776, respectively (Figure 10F). To further evaluate the efficiency of prognostic models, we stratified patients using clinical factors. Similarly, the prognosis model still demonstrated a perfect ability to predict the prognosis (Figure 11A-G).
The detailed information between risk value, survival status and model genes is visualized as Figure 12A-D. The prognostic value of clinical risk factors and risk score were evaluated by univariate and multivariate COX analysis. The results showed that pathological T and M stage, pathological stage and riskscore were important prognostic factors in univariate COX analysis, while riskscore was an independent prognostic parameter in multivariate Cox regression. In summary, the prognosis model does performed reasonably well in predicting outcomes in LC patients (Figure 12E-F).
The relationship between prognosis model gene CNV and immune cells
We assessed the relationship between CNV and immune cells on basis of TIMER website (https://cistrome.shinyapps.io/timer/). As shown in Figure 12G-K, arm-level gain of FABP6 and high amplication of BRIC5 and FABP6 were related to CD4+ T cell, Macrophage and Neutrophil. Moreover, high amplication of FABP6 was also associated with B cell, CD8+ T cell and dendritic cell. Besides, arm-level gain of IL17R and High amplication of EPO and OXTR were connected with CD4+ T cell, Macrophage and CD8+ T cell, respectively (Figure 12G-K).