Gastric cancer patients’ characteristics and stable prognostic gene identification
The detailed characteristics of the patients in this study are as follows (Supplemental Table S1). In this study, 362 patients with clinical information in the TCGA data set were screened during modeling. The mean age at diagnosis was 67.0 years (range:30.0 - 90 years), 234 (64.6%) were males, and 128 (35.4%) were females. All patients screened had OS and PFI information. The mean survival days of OS was 603.7 days, and the mean survival days of PFI was 543.6 days. Through bootstrapping testing described in the materials, 1,446 genes were screened. After survival analysis, 425 of the 1,446 genes were screened and identified as stable prognostic genes. (Supplemental Table S2).
Construction of Immune infiltration subgroups using stable prognostic genes
Firstly, unsupervised clustering was adopted for classifying 362 cancer tissues to diverse molecular subtypes on the basis of those 425 stable prognostic genes. Thereafter, the R package “ConsensusClusterPlus” function was adopted for assessing cluster stability and selecting the best cluster number. At last, the Type1 and Type2 patient clusters were discovered (Fig. 1a-b). Differences in OS (Fig. 1c) and PFI (Fig. 1d) between the two groups were statistically significant as shown by Kaplan-Meier curve. In terms of clinical features, we conducted further studies and found significant differences between the two types in the grade. The patients of Type2 significantly had a more advanced grade compared with Type1(Supplemental Table S3). Subsequent cell infiltration analysis revealed significant differences in the number of stromal cells and immune cells in Type1 and Type2 patients, including neutrophils(t=-3.6) and endothelial cells(t=-13.3) (Fig. 1e). We investigated the relationship between immune scores, stromal scores and OS,and found that the higher the score of neutrophils (Fig. 1f) and endothelial cells (Fig. 1g), the poorer the survival of patients, while such results were contradictory with previous results. As can be seen from the violin plot, there are significantly fewer neutrophils and endothelial cells in Type1 than in Type2, which also confirms that Type1 has a better survival than Type2. Finally, our result of molecular typing was compared with other established molecular subtypes of gastric cancer. The results showed that Type1 patients were mainly concentrated in C1, C2, GI.CIN and GI.HM-indel subtypes, and patients of Type2 most in C1, C2, C3, GI.CIN as well as GI.GS subtypes (Fig. 1h, Supplemental Table S3).
Construction of prognostically relevant gene set
For developing a gene set with clinical effectiveness, LASSO Cox regression model was utilized to reduce the dimensionality of those 425 identified prognostic genes. Thereafter, all cases were classified as the training or the validation cohort to analyze the prognosis. Differences were not statistically significant in clinical features between both groups (Supplementary Table S1). Through the LASSO model, based on the information OS and PFI, we generated stable gene sets (Supplemental Figure S1a-d). The OS stable gene set contained 18 genes, and the PFI gene set contained 21 genes (Supplemental Table S4). Then, Cox analysis was performed on the two gene sets to establish two prognostic models respectively. Finally, we acquired stable gene risk score of OS (SGRS-OS) and PFI (SGRS- PFI). All cases were classified as 2 groups based on SGRS- OS and SGRS- PFI, and the cutoff value calculated by the whole queue was adopted (0.14 for SGRS-OS and 1.44 for SGRS-PFI). In the training and validation sets, the Kaplan Meier curves showed that patients in the high SGRS-OS cohort had a worse prognosis. (Fig. 2a-b). In the ROC, SGRS-OS, which served as the continuous variable in both training and validation cohorts, displayed high predicting ability compared with the TNM classification system. Stage was a categorical variable, so SGRS-OS was converted into a four-categorical variable, for the sake of enhancing the comparability. Even as a categorical variable, the prediction accuracy of SGRS-OS remains good (Supplemental Figure S2a-b). Similar results were also found for the SGRS-PFI set with documented PFI information (Fig. 2c-d, Supplemental Figure S2c-d). The predictive ability of SGRS-OS and SGRS -PFI models was tested in each subgroup stratified by immune subtype, level, sex, stage and age in the whole cohort, respectively, and SGRS-OS and SGRS-PFI were analyzed as continuous variables. As observed from the forest plots, the greater values of the two models markedly identified cases with dismal prognostic outcomes in each subgroup (Fig. 2e-f).
Stable gene set predicts the efficacy of chemotherapy in gastric cancer
Relative to supportive care (15), systemic chemotherapy, which is associated with the advantages of survival as well as quality of life, is developed to be the standard therapeutic modality to manage the metastatic or unresectable GC (16). Therefore, the outcome of chemotherapy is crucial for survival in patients with gastric cancer. We screened the patients with chemotherapy information and combined the chemotherapy results with SGRS-OS to explore the relationship. We used SGRS-OS to predict the efficacy of chemotherapy in patients with gastric cancer and found that low SGRS-OS patients were associated with good chemotherapy outcomes, while high SGRS-OS patients tended to be associated with bad chemotherapy outcomes. The accuracy of ROC curve was plotted and indicating a passable accuracy (Fig. 3a). Therefore, we can use the SGRS set to predict the chemotherapy efficacy of patients, providing a strong reference for the survival of clinical patients. For developing a related quantitative approach to predict the mortality possibility in patients, 2 nomograms were established in the present work (Fig. 3b-c) by enrolling the prognostic factors and scores obtained from the stable gene set. As suggested by the calibration plots, those as-constructed nomograms had better performance than the ideal model (Fig. 3d-e).
Identification of SGRS-OS and SGRS-PFI related clinical characters and biological pathways
This study also examined the correlations between scores obtained from the stable gene set and clinical features/molecular subtypes (Fig. 4a–b). In terms of clinical features, SGRS-OS and SGRS- PFI were significantly increased in more advanced stage patients. In addition, grade also affects the scores of the stable gene set, while age and gender have less influence on the it. In terms of molecular typing, we observed that the SGRS for C3, C6, and Type2 were also higher than other types. In terms of the pathway, we found that both SGRS-OS and SGRS-PFI values were significantly correlated with base excision repair, DNA replication and RNA degradation (Fig. 4c). Therefore, the development of gastric cancer is closely related to gene expression, which provides a strong basis for gene expression to predict the prognosis of gastric cancer.
Identification of CGB3 as a potential biological target
In order to further explore the function of genes chosen to module in the development of gastric cancer, differential expression analysis of modeling genes was performed using gastric cancer samples and normal samples. Nine differential expression genes (DEGs) were identified, of which 3 were down-regulated and 6 were up-regulated (Fig. 5a). For better validating the as-constructed stable signature, those 9 DEGs expression levels were compared in normal versus GC tissues derived from The Human Protein Atlas (THPA). It was suggested by immunohistochemical results that, CGB8 (ENSG00000213030.5) expression upregulated within GC tissues, confirming the difference in CGB8 level in normal versus GC tissues (Fig. 5b). Furthermore, ROC curve analysis was also performed for evaluating CGB8 sensitivity and specificity in diagnosing GC. ROC curves of CGB8 in TCGA database was displayed (Fig. 5c), showing good sensitivity and specificity with AUC of 0.700. In addition, survival analysis showed that CGB8 is a risk factor in the progression of gastric cancer (Fig. 5d). Of note, the expression and function of CGB8 in gastric cancers remain largely unknown. Therefore, we proposed CGB8 as a biological target and tried to discover its role in gastric cancer development.