Identifying 4 prognostic-related differentially expressed ER Stress-related genes
The process of this study was exhibited in the flow chart (Fig 1). We already obtained 23 normal samples and 554 tumor samples with mRNA expression, and the clinical data from the Cancer Genome Atlas (TCGA) in total. 178 ER Stress-related gene expression was compared practically between the normal and tumor groups, and then 41 ERGs were selected as differentially expressed genes based on |log2FC|>1 and FDR<0.05, among them, 12 samples (CAV1, ITPR1, LRRK2, THBS1, ATF3, BCL2, SERP2, MAP3K5, CLU, PRKN, CREB3L2, PPP1R15A) were downregulated, while 29 samples (LONP1, GRINA, EIF2AK1, HYOU1, AUP1, BAX, EDEM2, AIFM1, HM13, PDIA3, ERMP1, DNAJC10, XBP1, CXCL8, ATP2A1, MANF, DERL3, CALR, BAK1, ERO1A, PPP1CA, PDIA4, CREB3L3, P4HB, SDF2L1, TRIB3, RNF183, PDIA2, ERN2), were upregulated. The expression level of the 41 DEGs was presented in the heatmap (Fig 2A). We further conducted the protein-protein interaction (PPI) manipulation (Fig 2B), which already presents a direct interaction between the ERGs. 132 ERGs were identified as hub genes with a restriction of the interaction score as the highest confidence of 0.9, among them 32 were DEGs (Table 1).
Construction and validation of the 4 ERGs -based risk signature for prognostic effectiveness
To identify the candidates for constructing a risk signature, we first filtered 6 genes related to the prognosis of the EC according to the uni-Cox regression analysis with P<0.05 (BAK1, CREB3L3, DNAJC10, PPP1R15A, TRIB3, XBP1) (Fig 3A). Then, the LASSO regression analysis confirmed that these 6 genes are related to the survival of the EC patients (Fig 3B, C). To avoid overfitting the prognostic signature, we applied the minimum likelihood of deviance of the first-rank value of Log(λ) during LASSO regression analysis. We also performed a multi-Cox regression analysis, and then 4 genes were finally identified as significant risk signature genes (CREB3L3, PPP1R15A, TRIB3, XBP1). Among them, PPP1R15A and XBP1 are related to the protective effect with HRs<1, whereas CREB3L3 and TRIB3 are the increased risk factors with HRs>1. The calculation formula of the risk scoring is as below:
Risk Score = (0.3965)×(CREB3L3 exp.)+(-0.4152)×(PPP1R15A exp.)+(0.2223)×(TRIB3 exp.)+(-0.3508)×(XBP1 exp.).
After screening out, we first sorted out 523 clinical samples from TCGA. Next, we randomly divided the patients into the training and testing groups with ~1:1ratio (of 262 samples in the training group and 261 samples in the testing group). The risk scores of the training and testing groups were then calculated separately. Based on the median score, all samples from both the training and testing groups were classified into a high-risk group and a low-risk group. As demonstrated, patients from the low-risk group already exhibit a better prognosis with statistically lower risk and fewer deaths than the high-risk group (Fig 4A, B). Furthermore, the OS between the high- and low-risk groups were significantly different with P<0.05 (Fig 4C). The AUCs of the training and testing cohorts were 0.711 or 0.678 at 1 year, 0.696 or 0.625 at 2 years, 0.690 or 0.610 at 3 years, respectively (Fig 4D). The above data imply that the risk score can be used as a valid factor for effectively predicting the prognosis of EC patients. Both the PCA and tSNE analyses indicate that the risk signature constructed by us can be effectively utilized to well distinguish high-risk patients from low-risk patients (Fig 4E, F). The heatmaps constructed by us can also be used to visualize the variance of the prognostic ERGs expression between the high- and low-risk groups (Fig 4G). It should be noted that the risk score can only be used as an independent predictor through key uni-Cox and multi-Cox regression analyses with a hazard ratio (HR) of the risk score, and then 95% confidence interval (95% CI) was 1.422, and 1.244-1.627 (P<0.05) in uni-cox regression while 1.717, and 1.009-1.360 (P<0.05) in multi-cox regression (Fig 5A).
Nomograph construction and the clinical correlation
According to 4 independent prognostic factors such as risk score, age, stage, and grade (all p < 0.05 in uni-Cox and multi-Cox), we built a clinical nomogram to predict the prognosis of the EC patients through a comprehensive score (Fig 5B). We also utilized the 1- and 2- and 3-year calibration plots to present or prove that the nomogram had a reliable prediction efficiency, which can be used to exhibit excellent accordance between the predictive and actual survival in both the training and testing groups (Fig 5C, D). As shown, the diagonal line presents the ideal prediction. Furthermore, the performed ROC analyses show that the AUC of the 1-and 2- and 3-year predictive survival according to the nomograph are 0.691and 0.689and 0.676 in the training group, and then 0.677 and 0.623 and 0.608 in the testing group, which means that the nomograph can be effectively used to predict (Fig 5E, F).
Enriched functions based on the risk signature
To further illustrate the biological functions and the enriched pathways of the risk signature, we performed both the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis and the Gene Ontology (GO) enrichment analysis between the high- and low-risk groups. The KEGG demonstrates that the pathways enriched in the low-risk group are: protein export, alpha-linolenic acid metabolism, fatty acid metabolism, ether lipid metabolism, and peroxisome, respectively, whereas the pathways in the high-risk group are: DNA replication, RNA polymerase, type II diabetes mellitus, glycosaminoglycan biosynthesis chondroitin sulfate and heparan sulfate (Fig 6A). Moreover, the GO showed that genes are mainly enriched in the response to the factors such as endoplasmic reticulum stress, endoplasmic reticulum lumen, and endoplasmic reticulum protein-containing complex, which indicates that the genes chosen by us are closely related to the ER Stress to some extent (Fig 6B).
Relationship between 4 ERGs-based risk signature and clinicopathological characteristics
The correlation between the survival probability and different clinicopathological characteristics was further interrogated. The data demonstrate that the patients from the low-risk group present much better survival overall, and the risk scores are significantly related to the issues such as age, grade, and stage features (Table 2). The patients with a higher stage and grade level are more likely to present significant differences based on the risk score (Fig 6C). Instructively, the risk signature constructed by us is convictive and reliable for diagnosing and treating EC patients.
Immune infiltration differences between two subgroups
Generally, immune abnormalities will play a critical role in oncogenesis, thus we continuously analyzed the enrichment of the immune cells and then immune pathways based on the risk score through a single sample gene set enrichment analysis (ssGSEA) method. According to the box plot, the samples such as B cells naive, plasma cells, T cells CD4 memory resting, macrophages M1, macrophages M2, dendritic cells resting, dendritic cells activated, mast cells resting, mast cells activated, and neutrophils, are all sensitive to the risk score with a p<0.05 criteria. Among them, only macrophages M1 and macrophages M2 are higher in the high-risk group (Fig 7A). Continuously, we analyzed the survival probability of the samples including immune cells, dendritic cells activated, NK cells activated, plasma cells, T cells CD8, and T cells regulatory (Tregs), which all exhibited a significant difference between the high- and low-risk groups (Fig S1A). The dendritic cells are activated and the plasma cells show a better prognosis in the low-risk group, whereas the NK cells are activated, and the T cells CD8 and T cells regulatory (Tregs) are in the high-risk group. It is evident that the ERGs are closely related to immune cell infiltration, thus we already performed the CIBERSORT to further explore the tumor microenvironment (TME). The compositions of 22 immune cells in the samples and the relative percentage filtered in the high- and low-risk groups were analyzed further (Fig 7B, C). Meanwhile, we further investigate the correlation between the abundance of several typical immune cells and the mRNA expression of 4 ERGs through TIMER database (Fig 8). Macrophages are significantly related to all 4 ERGs. TRIB3 expression level is only correlated with the abundance of another immune cell, neutrophil. CREB3L3 is strongly correlated with B cells and CD4+ T cells with similarly positive tendencies. Except for CD4+ T cells, XBP1 shows significantly negative correlations with CD8+ T cells, neutrophils, and dendritic cells. PPP1R15A is positively related to the abundance of neutrophils and dendritic cells. Compared with the diploid/normal group, we further figure out that immune cell infiltration is also related to CNVs of 4 ERGs (Fig 9). Arm-level deletion of TRIB3, XBP1, and PPP1R15A decreases the infiltration of immune cells. Compared with the arm-level deletion, the arm-level gain of CREB3L3 is more likely to decrease the infiltration of CD8+ T cells, neutrophils, and dendritic cells. High amplification of PPP1R15A mainly affects B cells, CD8+ T cells, macrophages, and dendritic cells. TRIB3, CREB3L3, and XBP1 are lack of deep deletion.
Besides the immune cells, we have also explored the relationship between the immune pathways and the risk score. The box plot also shows the typical characteristics of the microstructures including B_cells, CCR, checkpoint, iDCs, aDCs, HLA, neutrophils, NK_cells, T_helper_cells, TIL, Treg, T_cell_co-inhibition, T_cell_co-stimulation, Type_I_IFN_Response, and Type_II_IFN_Response, are significantly different according to the risk score (Fig 7D). The survival analysis of the immune pathways was further conducted, and the results indicate thattheimmune pathways with prominent survival differences,all have better survival in the high-risk group than in the low-risk group (Fig S1B). Because the immune checkpoints are significantly different between two subgroups, the expression distinction was then analyzed, where only a few are expressed higher in the high-risk group, like CD 40. As demonstrated, the majority of the immune checkpoints are higher in the low-risk group, especially corresponding to CTLA4 and CD28. Unfortunately, there is no significant difference in PD-1 or PD-L1 (Fig 7E).
The above outcomes suggest that the risk score will exhibit a considerable impact on the prognosis of various immune cells and pathways, which may helpfully predict new individualized immunotherapy for EC patients.
Consensus clusters selection based on the expression level
The consensus clustering analysis of all 523 EC samples in the TCGA cohort for recognizing the relationship between the expression levels of 4 risk-signature ERGs and subtypes, was performed. We already selected the clustering variable (k) = 3 from 2 to 9 based on a higher intragroup correlation and a lower intergroup correlation (Fig 10A), with 225 cases in cluster 1, 142 cases in cluster 2, and 156 cases in cluster 3. Survival analysis indicates that there exists an evident difference in survival among the three subgroups above. As compared to Cluster 2 and Cluster 3, Cluster 1 has a much better prognosis (Fig 10B).
The issues including the information on mRNA expression, clusters, and clinical characteristics concerning grade and stage, and age, were integrated and displayed in the heatmap. As shown, the parameter grade and stage were significantly different among the three clusters (Fig 10C). Furthermore, we further explored the expression differences of 30 immune checkpoints in three clusters, among them, the samples of CD244, TNFSF9, Lag3, CD200, TNFSF15, HHLA2, ICOSLG, TNFRSF25, CD40, TNFRSF14, TNFSF14, CD44, and BTNL2, are more statistically significant than others (p<0.001) (Fig 10D).
Diverse potential chemotherapies of two subgroups
Because chemotherapy is one of the effective and widely-accepted treatments for EC patients, we further did a prediction of the sensitivity of potential chemotherapies based on IC50 for each sample in both the low- and high-risk groups. Generally, the patients with a lower risk score are more sensitive to medicines such as AKT inhibitor VIII, Bicalutamide, Docetaxel, and Temsirolimus. While the higher-risk score patients should be more sensitive to Pyrimethamine, Shikonin, Rucaparib, and Veliparib (Fig 11). These findings might provide new options for future clinical treatment.
Dual verification patterns of the 4 ERGs-based risk signature
The protein level of 4 risk-signature ERGs (CREB3L3, PPP1R15A, TRIB3, XBP1) was verified according to the HPA database (Fig 12A). Unfortunately, we couldn’t obtain successfully a protein expression of XBP1 in the HPA database. Besides this gene, we find that both the CREB3L3 and TRIB3 are upregulated, but PPP1R15A is downregulated on the protein level. The mRNA levels of 4 ERGs are different in 40 EC cell lines (Fig 12B).
We then performed a series of Quantitative Real-Time Polymerase Chain Reaction (qRT-PCR) to verify the mRNA expression of 4 risk-signature ERGs (CREB3L3, PPP1R15A, TRIB3, XBP1) in EC and normal tissues, separately. The expression level of these genes differs significantly between the tumor and normal tissues, but CREB3L3 and TRIB3, and XBP1 are overexpressed, while PPP1R15A is expressed conversely (Fig 12C). The primer sequences are provided in Table 3. Overall, the experimental results further confirm the reliability of the risk signature. (* if p < 0.05, ** if p < 0.01, and *** if p < 0.001)