Combining Gene Expression Signature With Clinical Features for Survival Stratication of Gastric Cancer

Background: The AJCC staging system is considered as the golden standard in clinical practice. However, it remains some pitfalls in assessing the prognosis of gastric cancer (GC) patients with similar clinicopathological characteristics. We aim to develop a new clinic and genetic risk score (CGRS) to improve the prognosis prediction of GC patients. Methods: The gene expression proles of the training set from the Asian Cancer Research Group (ACRG) cohort were used for developing genetic risk score (GRS) by LASSO-Cox regression algorithms. CGRS was established by integrating GRS with clinical risk score (CRS) derived from Surveillance, Epidemiology, and End Results (SEER) database. GRS and CGRS were validated in ACRG validation set and other four independent GC cohorts with different data types, such as microarray, RNA sequencing, and qRT-PCR. Multivariable Cox regression was adopted to evaluate the independence of GRS and CGRS in prognosis evaluation. Results: We established GRS based on a nine-gene signature including APOD, CCDC92, CYS1, GSDME, ST8SIA5, STARD3NL, TIMEM245, TSPYL5, and VAT1. GRS and CGRS dichotomized GC patients into high and low risk groups with signicantly different prognosis in four independent cohorts, including our Zhejiang cohort (all HR > 1, all P < 0.001). Both GRS and CGRS were prognostic signatures independent of the AJCC staging system. Receiver operating characteristic (ROC) analysis showed that area under ROC curve of CGRS was larger than that of the AJCC staging system in most cohorts we studied. Nomogram and web tool (http://39.100.117.92/CGRS/) based on CGRS were developed for clinicians to conveniently assess GC prognosis in clinical practice. Conclusions: CGRS integrating genetic signature with clinical features shows strong robustness in predicting GC prognosis, and can be easily applied in clinical practice through the web application. web application been to conveniently predict the prognosis of GC patients in practice.


Background
Gastric cancer (GC) is one of the most commonly diagnosed cancers and the third leading cause of cancerrelated death around the world [1,2]. Despite the improvement of diagnosis, surgical and other treatment approaches in the past few decades, the prognosis of patients with advanced GC that accounts for approximately 65% of GC cases remains very poor [3]. The AJCC staging system based on clinical and pathological characteristics has been considered as the golden standard for predicting GC prognosis; however, it remains a big challenge to stratify GC patients with similar clinical and pathological characteristics.
Emerging studies show that gene expression pro les of tumor tissues based on microarray or RNA sequencing have provided prognostic information [4,5]. Successful applications of gene expression pro les have yielded many tools with potential prognostic value for clinicians in a variety of cancers, such as lung cancer [6,7], breast cancer [8,9], and large B-cell lymphoma [10,11]. Large scale studies such as The Cancer Genome Atlas (TCGA) and Asian Cancer Research Group (ACRG) have produced a variety of publicly available expression pro les of GC tissues [12,13], while researchers have developed various approaches for survival strati cation for GC patients [14][15][16]]. However, model over tting, lack of adequate validation, and failure to be applied across different data platforms hinder their clinical application. Even though clinical [17] and genetic [14][15][16]18] models for risk strati cation in GC patients have been established, the tool that integrates clinic with genetic information of GC patients has yet to be developed.
In this study, we established a new prognostic signature, clinic and genetic risk score (CGRS), by integrating gene expression pro les with clinical characteristics. CGRS has been con rmed in four different cohorts for accurately predicting GC prognosis, and signi cantly strati ed stage III GC patients into high and low risk groups with different survival time. Furthermore, an easy-to-use nomogram and web application based on CGRS were developed to facilitate its application in clinical practice.

Patients included in this study
Four cohorts of publicly available GC gene expression pro les were included: ACRG cohort (GSE62254, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE62254) [13], TCGA cohort (http:// rebrowse.org/? cohort=STAD) [12], Singapore cohort (GSE15459, https://www.ncbi.nlm.nih. gov/geo/query/acc.cgi? acc=GSE15459) [18] and Korea cohort (GSE84437, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc= GSE84437). SEER database was used to generate the clinical risk score (https://seer.cancer.gov/) [19]. An additional validation set of archived fresh frozen tumor specimens from GC patients who underwent surgery from 2008 to 2013 were obtained from Zhejiang Cancer Hospital. All aspects of this study were approved by the ethics committee of Zhejiang University School of Medicine. All participants gave written, informed consent. Research was conducted in accordance with the Declaration of Helsinki guidelines for the ethical conduct of research in 1975. For all patients, detailed clinical and pathological information can be found in Table 1.

Gastric cancer gene expression datasets and preprocessing
We collected four cohorts (ACRG, TCGA, Singapore, and Korea cohorts) comprising gene expression pro les of GC patients for which survival data were available online. ACRG cohort was randomly split into training and validation sets. Other cohorts based on different platforms were used as additional validation sets. Gene mutational statuses were obtained from TCGA and ACRG cohorts. For Affymetrix microarray data, CEL les were downloaded and normalized with MAS5 algorithm using Custom chip De nition Files (Brainarray v.22, http://brainarray.mbni.med. umich.edu/), followed by log2 transformation and quantile normalization [20]. For Illumina microarray data, the IDAT les were downloaded and normalized by Illumina Genomestudio software (https://www.illumina.com), followed by log2 transformation and quantile normalization. For RNASeq data, RSEM data were downloaded and log2 transformed shift by 0.001. Additional details are included in the Supplementary Materials and Methods.
RNA extraction, ampli cation, and real-time quantitative RT-PCR Total RNA was extracted from fresh frozen tissues and 1 μg of total RNA was reverse-transcribed using the High Capacity cDNA Reverse Transcription Kit (Applied Biosystems). qRT-PCR was performed in a Roche Real-Time PCR System using ChamQ Universal SYBR qPCR Master Mix Kit (Vazyme Biotech) and their speci c primers (Supplementary Table 1). ∆∆Ct method was used to assess the relative value of gene expression. Additional details are included in the Supplementary Materials and Methods.

Functional annotation and pathway enrichment analysis
The correlations between genes and GRS were assessed by the Pearson correlation test. The genes with absolute R greater than 0.4 and P values under 0.05 were considered as statistically signi cant. The genes co-expressing with GRS in the training set were clustered using AutoSOME [21]. The genes in the top two clusters were assessed for enrichment analysis with curated gene sets from the Molecular Signatures Database (MsigDB, http://software.broadinstitute.org/gsea/msigdb) by the R clusterPro ler package (version 3.6.0) [22,23]. Gene Set Enrichment Analysis (GSEA) was performed by the JAVA program using gene sets collection from the MsigDB [24]. The Enrichment Map was used to visualize networks discriminating low CGRS Group from high CGRS Group [25]. Additional details are included in the Supplementary Materials and Methods.

Statistical analysis
All statistical tests performed were two-sided, except for one-sided hypergeometric tests. P values under 0.05 were considered as statistically signi cant. All genes were tted in the univariate Cox proportional hazards regression, and those with P values that were assessed by 10000 permutations under 0.01 (likelihood ratio test) were considered as prognostic genes. Those prognostic genes were then tted into a multivariate Cox model adjusted with patients' clinical characteristics. The remaining genes (P < 0.01) were considered as independent prognostic factors for GC patients. To obtain the minimal set of genes, the LASSO penalty algorithm was carried out for selecting features that passed 10-fold internal cross-validation. The remaining nine prognostic genes were integrated into GRS. CRS based on age and the AJCC stages was developed from SEER database. CGRS was de ned as the integration of GRS and CRS weighted by their coe cient in the multivariate Cox model. Receiver operating characteristic and prediction error curves were produced using the survcomp (version 1.28.5) and (version 1.4.18) packages, respectively [26,27]. The nomogram and calibration plots were generated by rms (version 5.1.2) package [28]. The decision curve was generated by rmda (version 1.6) package [29]. All the analysis was conducted by R software (version 3.4.4). Additional details are provided in the Supplementary Materials and Methods.

Results
Identi cation of prognostic genes from the training set To develop a new model for precisely predicting GC prognosis, we selected the ACRG dataset that has detailed clinical information and gene expression pro les (Fig. 1a). We evaluated the impact of sample size on prognostic power for two genes, TEAD1 and GZMB [30,31], which have been reported as prognostic factors for GC patients previously. The data showed that about 150 patients were required for reliable assessment of prognostic power (Supplementary Fig. 1a and b). Therefore, we randomly split 300 GC patients from ACRG cohort into the training (n = 150) and validation sets (n = 150) ( Table 1). The univariate Cox proportional regression analysis was used to identify prognostic genes in the training set. As a result, 2069 genes were considered as survival associated genes (P < 0.01, 10000 permutations; Supplementary Table 2). To eliminate the noise caused by other factors, these genes were further tted into a multivariate Cox proportional regression model, adjusted by patients' clinical characteristics including the AJCC stages, age, gender, and Lauren's subtypes. Finally, 558 genes whose expression was signi cantly associated with survival independently were identi ed in the training set (P < 0.01; Supplementary Table 2).

GRS for GC patients' prognosis prediction
To obtain the minimal set of genes to build GRS for GC prognosis prediction, we applied the LASSO penalty algorithm to assess the prognostic value of previously identi ed 558 genes ( Supplementary Fig. 1c). After LASSO selection, nine genes were retained (Fig. 1b, Supplementary Table 2). One gene whose expression was signi cantly associated with favorable prognosis was ST8SIA5 (ST8 alpha-N-acetyl-neuraminide alpha-2,8sialyltransferase 5). The remaining eight genes whose expression were signi cantly associated with adverse outcomes were STARD3NL (STARD3 N-terminal like), GSDME (gasdermin E), TMEM245 (transmembrane protein 245), VAT1 (vesicle amine transport 1), CCDC92 (coiled-coil domain containing 92), TSPYL5 (TSPY like 5), APOD (apolipoprotein D), and CYS1 (cystin 1). Coe cients for these nine genes were determined by the multivariate Cox regression model, and GRS was then calculated in terms of the normalized expression levels of these nine genes ( Fig. 1c).
The patients from the training set were assigned to high (n = 75) and low GRS groups (n = 75) using the median GRS value as the cutoff. Kaplan-Meier analysis showed there existed a signi cant difference in 5-year overall survival between high and low GRS groups (HR = 2.70, 95% CI = 2.07 to 3.52, P = 2.35e-12) (Fig. 1d). Further univariate Cox analysis revealed that GRS remained prognostic in each subgroup (Fig. 1e). Moreover, after calculating the correlations between GRS and global gene expression pro les in the training set, we found that 1205 genes were found to be signi cantly correlated with GRS (absolute R > 0.4, P < 0.01) (Supplementary Table  S4). These genes were clustered into two largest clusters in the training set, which were further compared with gene sets from Molecular Signatures Database to assess the enrichment of biological pathways and processes.
The results indicated that cluster 1 shared genes associated with extracellular matrix and genes expressed in stem cells, and cluster 2 was overlapped with cell cycle related genes and genes highly expressed in the early stage of cancer ( Fig. 1f; Supplementary Table 3).
To further validate the prognostic value of GRS, GC patients from ACRG validation set were strati ed into high and low GRS groups by the cutoff (the median GRS value of the training set, the same below). GRS was signi cantly associated with overall survival (HR = 1.49, 95% CI = 1.21 to 1.83, P = 1.66e-4), which was further con rmed in whole ACRG cohort (HR = 1.89, 95% CI = 1.61 to 2.22, P = 3.70e-14) ( Fig. 2a and b). To further evaluate the performance of GRS, 192 patients from the Singapore cohort, 433 patients from the Korea cohort, and 388 patients from TCGA cohort were strati ed into high and low GRS groups according to the cutoff, respectively. GRS remained signi cantly associated with GC prognosis in all the cohorts (HR = 1.31, 95% CI = 1.09 to 1.57, P = 4.77e-3 in Korea cohort; HR = 1.46, 95% CI = 1.20 to 1.77, P = 1.40e-4 in Singapore cohort; HR = 1.29, 95% CI = 1.10 to 1.52, P = 2.21e-3 in TCGA cohort) ( Fig. 2c-e). The multivariate Cox analysis showed that GRS was a prognostic signature independent of the AJCC staging system, age, gender, and Lauren's subtypes (Supplementary Table 4). Moreover, GRS was also prognostic within subgroups of patients harboring wild-type or mutant forms of TP53, MUC16, ARID1A, or PIK3CA in ACRG and TCGA cohorts whose gene mutation statuses were available (Supplementary Table 5).
To conveniently apply GRS in clinical practice, we employed qRT-PCR assays on fresh frozen tumor specimens for the nine GRS genes and one addition housekeeping gene RNU6-1 that lacks prognostic association and displays stable expression [32]. Specimens were obtained from 109 patients with GC who underwent gastrectomy from 2008 to 2013 at Zhejiang Cancer Hospital, termed as the Zhejiang cohort (Table 1). GRS was shown to remain signi cantly prognostic in Zhejiang cohort (Table 2; Supplementary Table 4). The patients with low GRS had longer survival time than that of high GRS patients (HR = 1.40, 95% CI = 1.12 to 1.75, P = 2.93e-3) (Fig. 2f). Further multivariate Cox analysis showed that GRS was associated with GC prognosis independent of age, gender, and the AJCC stage in Zhejiang cohort (Supplementary Table 4). Taken together, these data suggest that GRS may be applied for GC prognosis prediction in clinical practice across different platforms, such as microarray, RNA sequencing, and qRT-PCR.

CGRS for prognosis prediction of GC patients
Given that the AJCC stages and age are signi cantly associated with GC prognosis, and GRS is a prognostic factor independent of the AJCC staging system and age. We integrated GRS with clinical variables to create CGRS for predicting GC survival. First, SEER database that contains 33250 GC patients was used to determine coe cients for different AJCC stages and age by the multivariate Cox regression model. The data showed that clinical risk score (CRS) for each patient could be calculated by the following formula, CRS = 0.021*Age (years) + AJCC stage, where the values for different stages are 0 (stage I), 0.31 (stage II), 0.75 (stage III), and 1.56 (stage IV), respectively (Fig. 2g). The univariate and multivariate Cox analyses, as well as the Kaplan-Meier curve, showed that CRS was signi cantly associated with GC prognosis in all cohorts we studied ( Table 2; Since there was no signi cant difference in patients' distribution between ACRG training set and SEER set (Supplementary Table 6), we integrated CRS with GRS into CGRS through the formula determined by multivariate Cox regression model (CGRS = 1.25*CRS + 0.88*GRS) in ACRG training set (Fig. 2h). CGRS was validated to be signi cantly associated with GC prognosis when strati ed GC patients into high and low CGRS groups according to the median value from the ACRG training set (HR = 2.70, 95% CI = 1.57 to 2.16, P = 2.53e-19) (Fig. 2i). Moreover, CGRS showed strong robustness in predicting overall survival of GC patients in internal (HR = 1.80, 95% CI = 1.51 to 2.16, P = 3.21e-10 in ACRG validation set; HR = 2.12, 95% CI = 1.85 to 2.42, P = 1.27e-27 in whole ACRG cohort) and external validation sets (HR = 2.10, 95% CI = 1.71 to 2.57, P = 2.33e-13 in Singapore cohort; HR = 1.72, 95% CI = 1.44 to 2.05, P = 1.61e-9 in TCGA cohort; HR = 2.72, 95% CI = 1.71 to 4.33, P = 2.00e-5 in Zhejiang cohort) (Fig.  2j-n). Further univariate and multivariate Cox analyses con rmed the survival prediction power of CGRS (Supplementary Table 7). Additionally, CGRS remained prognostic within subgroups of patients harboring wildtype or mutant forms of TP53, MUC16, ARID1A, or PIK3CA in ACRG and TCGA cohorts whose gene mutation statuses were available (Supplementary Table 8). Together, these results reveal that CGRS can be used to assess GC prognosis independent of other clinical characteristics including AJCC stages, age, gender, and Lauren's subtypes across different platforms.
The prognosis prediction of GRS and CGRS in different AJCC stages The AJCC staging system is generally considered as the golden standard for evaluating GC prognosis in current clinical practice [33]; however, it remains some de ciencies in predicting patients with similar clinical and pathological characteristics [34,35]. In this study, we applied our GRS and CGRS in GC patients within the same stage. Due to the small population of stage I GC patients, the performance of GRS and CGRS was uctuated in different cohorts (Supplementary Fig. 3). For stage II GC patients, GRS and CGRS were signi cantly associated with GC prognosis in several independent cohorts when strati ed the stage II GC patients into high and low risk groups, however they failed in Singapore cohort and ACRG validation sets due to relatively fewer patients ( Supplementary Fig. 4). Patients with GC are often diagnosed at advanced stage, and stage III accounts for about 35% of GC cases [36,37]. Both GRS and CGRS were able to classify stage III patients into high and low risk groups with statistically signi cantly different survival time in all of the training and validation sets (all HRs > 1, all P < 0.05; Fig. 3). Further multivariate Cox analysis con rmed the robustness of GRS and CGRS in stage III GC patients ( Table 2; Supplementary Table 9). Finally, we examined the prediction power of GRS and CGRS in stage IV GC patients, the performance of GRS and CGRS were unstable in different cohorts because of relatively small population size (Supplementary Fig. 5). Together, these data indicate that both GRS and CGRS are able to predict the prognosis of stage III GC patients, and can be important complements for the AJCC staging system.

The association between GRS, CGRS and molecular subtypes
Emerging studies have established several molecular subtype systems of GC in the past few years [12,13,18].
Here, we systematically analyzed the association between our risk scores and molecular subtypes of GC. In TCGA study, GC can be divided into four molecular subtypes. Though there is no signi cant relevance between clinical outcome and TCGA subtypes, the microsatellite instability (MSI) group that has relatively favorable outcome exhibited lower value of GRS and CGRS (Supplementary Fig. 6a-c). Further analysis indicated that CGRS and GRS were negatively correlated with the levels of mutation load and DNA methylation ( Supplementary Fig. 6g-i), which was consistent with GC patients with high mutation or methylation loads tend to have better prognosis ( Supplementary Fig. 6d-f). According to Singapore study, the metabolic subtype of GC patients that have relatively longer survival time acquired lower value of GRS and CGRS than other subtypes. The invasive subtype of GC patients that showed relatively poor prognosis displayed high value of GRS and CGRS ( Supplementary Fig. 7a-c).
In ACRG cohort, GC patients have been classi ed into four molecular subtypes with different clinical outcomes. The MSS/EMT subtype that has the poorest outcome acquired relatively higher value of GRS and CGRS ( Supplementary Fig. 7d-f). Taken together, these results suggest that our CGRS and GRS are signi cantly associated with molecular subtypes with signi cant survival differences.

Comparisons with other established GC signatures
To investigate the prediction accuracy of GRS and CGRS in GC prognosis, we compared the prediction power of GRS and CGRS with other three published gene signatures [15,16,38]. All of the three signatures were signi cantly associated with GC prognosis in multiple cohorts (Supplementary Table 10). Since GRS and CGRS contained no overlap genes with other signatures, we computed ROC of signatures and the AJCC staging system in four cohorts. GRS had larger area under the curve (AUC) according to ROC analysis compared with published signatures (Fig 4a and b). Further prediction error curve analysis also indicated that GRS showed lower prediction error rate in evaluating GC prognosis ( Supplementary Fig. 8). However, GRS showed no advantages in predicting GC prognosis compared with the AJCC staging system. Moreover, CGRS that integrated GRS with clinical characteristics could predict GC prognosis with more sensitivity and speci city according to the ROC analysis ( Fig. 4a and b). The prediction error curve analysis also revealed that CGRS had relatively lower prediction error rate in four independent cohorts (Supplementary Fig. 8). The above results demonstrate that CGRS has more advantages in predicting GC prognosis compared with the AJCC staging system and several published signatures in most cohorts we obtained.

Potential clinical application of CGRS
To facilitate the clinical applications of CGRS, we generated an easy-to-use nomogram for predicting the 1-, 3-and 5-year overall survival probability of GC patients using CGRS (Fig. 5a). The nomogram was evaluated for its calibration by plotting predicted probabilities at 1, 3, and 5 years, respectively. The overall survival probability predicted by nomogram was close to the observed probability at these three thresholds (Fig. 5b). Furthermore, the decision curve analysis showed that CGRS could bring more bene ts for high risk GC patients in clinical applications (Fig. 5c). Moreover, we developed an online tool for conveniently applying CGRS in clinical practice (http://39.100.117.92/CGRS/). In the web application, the oncologists only need to select the data type, and then input age, the AJCC stage, and nine gene expression values of an individual GC patient. When clicking the Calculate button, 1-, 3-and 5-year overall survival predicted probabilities will be calculated for the patient. These ndings indicate that the easy-to-use nomogram and web application may accelerate the application of CGRS in predicting GC prognosis in clinical practice.

Biological pathways involved in GC prognosis
To investigate the biological processes and pathways involved in GC prognosis, we dichotomized the patients from ACRG and TCGA cohorts into high and low CGRS groups according to the median CGRS value of ACRG training set, respectively. GSEA was subsequently performed to identify prognostic biological processes and pathways. Functional networks based on signi cantly enriched gene sets were built by enrichment map (FDR < 0.05) ( Fig. 6a and b; Supplementary Table. 11). Intriguingly, the cell cycle, RNA transcription, apoptosis and cell metabolism pathways were signi cantly enriched in low CGRS patients from ACRG (Fig. 6c-f) and TCGA cohorts ( Figure. 6i-6l). However, extracellular matrix pathways that play important roles in tumor invasion and metastasis were signi cantly enriched in high CGRS patients ( Fig. 6g and m). Furthermore, T cell receptors were also signi cantly enriched in high CGRS patients, which indicated that high CGRS patients might have more neoantigens for immunotherapy ( Fig.6h and n). Taken together, these data suggest that genes correlated with cell cycle and tumor microenvironment might be involved in GC prognosis.

Discussion
The current assessment of GC prognosis is mainly based on the AJCC staging system [33,37]. However, the AJCC staging system is not sensitive and accurate enough in predicting the survival of GC patients with similar clinical and pathological characteristics [34,35]. Previous signatures based on genetic or clinicopathological features for GC have been built to solve this problem in previous studies [14][15][16]]. Meanwhile, model over tting and lack of adequate validation largely hinder their clinical applications [39]. As far as we know, no signature has been established by integrating clinical data with gene expression pro les in GC patients. Here, we developed CGRS by integrating genetic signature with clinical risk score for GC patients. CGRS was validated in four independent cohorts to ensure its robustness in evaluating GC prognosis (Fig. 2). CGRS showed more sensitivity and speci city than previously published prognostic signatures according to both ROC and PEC analyses ( Fig.4; Supplementary Fig. 8), which have been widely used to assess the prediction power in survival analysis. Moreover, CGRS showed stronger robustness than the AJCC staging system in TCGA, ACRG, and Zhejiang cohorts (Fig. 4). In further subset analysis, CGRS was able to stratify stage III GC patients into high and low risk groups with signi cantly different overall survival rates. Therefore, our results indicate that CGRS shows strong robustness in predicting GC prognosis.
Another major challenge for prognostic risk scores is the complex calculation procedure in clinical practice. Given that the nomogram has been developed for predicting prognosis in cancer [40,41], we tried to establish an easyto-use nomogram based on CGRS for predicting the overall survival probability of GC patients at 95% con dence interval, which was con rmed by the calibration plot and decision curve (Fig. 5). Furthermore, we developed a web-based tool (http://39.100.117.92/CGRS/) to facilitate the clinical application of CGRS. The users only need to select the data type (RNASeq, microarray, or qRT-PCR), and input nine gene expression values, age, and the AJCC stage of the individual patient and press the Calculate button, 1-, 3-and 5-year predicted overall survival probabilities will be calculated for the patient. Thus, our nomogram and web application based on CGRS can be easily applied in evaluating the prognosis of GC patients in clinical practice.
In line with our expectations, several genes in GRS are also present in other prognostic signatures [42][43][44]. For example, APOD gene encoding a component of high-density lipoprotein has been reported to promote cell migration through interacting with growth factors [45], and higher APOD mRNA levels indicate poor survival in breast or colorectal cancer patients [46,47]. TSPYL5 gene contributes to breast cancer progression by reducing p53 protein levels and inhibiting the expression of p53-target genes [48]. TSPYL5 has been also documented as an independent prognostic factor in breast or liver cancer patients [43,49]. GSDME gene plays an important role in pyroptosis, and is reported to be used as a prognostic factor in oesophageal squamous cell carcinoma [50,51].
For the other genes incorporated into GRS system so far, no association with prognosis has been reported yet. The biological function and potential mechanism of the nine genes in GC need to be further investigated.   ACRG (k), Singapore (l), TCGA (m), and Zhejiang (n) cohorts. The median value from the ACRG training set was used as the cut-off to classify patients to high and low risk groups. HRs and 95% CIs were calculated using the Cox regression method. P values were calculated using the log-rank test. Tick marks on curves represent censoring. Kaplan-Meier analysis of 5-year overall survival of Stage III GC patients based on GRS or CGRS. a-f 5-year overall survival prediction of Stage III GC patients according to GRS in ACRG training set (a), ACRG validation set (b), ACRG (c), Singapore (d), TCGA (e), and Zhejiang (f) cohorts. g-l 5-year overall survival prediction of Stage III GC patients according to CGRS in ACRG training set (g), ACRG validation set (h), ACRG (i), Singapore (j), TCGA (k), and Zhejiang (l) cohorts. The median value from the ACRG training set was used as the cut-off to classify patients to high and low risk groups. HRs and 95% CIs were calculated using the Cox regression method. P values were calculated using the log-rank test. Tick marks on curves represent censoring.  Nomogram based on CGRS predicts 1-, 3-, and 5-year overall survival probability of GC patients. a The nomogram was generated using data from all platforms. The overall survival probability of 1-, 3-and 5-year can be acquired based on CGRS. b The calibration curve for predicting 1-, 3-and 5-year overall survival probability of GC patients. The y-axis represents the actual overall survival probability of the GC patients, and the x-axis represents the nomogram-predicted survival probability. The solid vertical lines represent the 95% CIs, the gray line indicates an ideal nomogram that has 100% accuracy. c Decision Curve Analysis (DCA) of the nomogram at 1-, 3-and 5-year. The curves could assess the clinical bene ts of the nomogram. The y-axis represents the net bene t, and the xaxis represents the threshold probability. Gray solid lines assume that all patients will die at 1-, 3-and 5-year ("treat all"). Solid horizontal lines assume that no one will die at 1-, 3-and 5-year ("treat none"). The red solid lines indicate the prediction model of the nomogram. The association between CGRS and biological pathways and processes is evaluated via gene set enrichment analysis (GSEA). a, b Networks of biological pathways and processes between low and high CGRS GC patients in ACRG (a) and TCGA (b) cohorts. Nodes indicate enriched gene sets; similar nodes are grouped and annotated. Node size was proportional to the number of genes within the gene set. The thickness of blue lines was proportional to shared genes between gene sets. Low connective and uninformative sub-networks and nodes