3.1 Establishment and verification of prognostic risk model based on ARGs signature
In order to identify ARGs related to survival, we performed univariate Cox analysis on the mRNA level data of each ARG. We identified 116 and 22 ARGs related to OS in GC patients from the ACRG and TCGA databases, respectively. Finally, 13 overlapping prognostic ARGs were screened for subsequent analysis (Supplementary table2, Fig. 1A).
To further reduce the number of genes in the feature, LASSO regression analysis was performed on 13 ARGs. (Fig. 1B, C) Then, a multivariate Cox regression analysis was performed on the five genes from LASSO to develop risk characteristics (Fig. 1D). Finally, based on 300 GC cases in the training set, a risk signature including four ARGs was constructed. According to the linear combination of the expression levels weighted by the regression coefficients of the multiple Cox regression analysis, the prognostic risk score formula is specifically constructed as follows: risk score = A2M\(\times\)0.275279-SNXG\(\times\)0.438526 + AGTR1\(\times\)0.100184 + CTF1\(\times\)0.355774 + SERPINE1\(\times\)0.049352.
3.2 mRNA and protein expression changes in gastric cancer
To determine the mRNA and protein levels of the five ARGs, we analyzed RNA-seq data of ARGs between normal and tumor tissues in the ACRG dataset. We found that the mRNA levels of A2M, SNCG, CTF1, and AGTR1 were significantly decreased in patients, while the mRNA level of SERPINE1 was significantly increased (Fig. 2A). Furthermore, we further verified the protein expression levels of ARGs by immunohistochemistry, and we found that the protein levels were consistent with their mRNA expression levels (Fig. 2B,C). Five ARGs from five different GC datasets were analyzed through the cBioPortal database. The results showed that the frequencies of gene alterations in A2M, SNCG, CTF1, AGTR1, and SERPINE1, including amplifications, deep deletions, and missense mutations, were 7%, 1.7%, 0.5%, 2%, and 4%, respectively (Fig. 2D).
3.3 Prognostic value of ARG signature in training set ACRG
We drew a heat map showing five ARG expression profiles (Fig. 3A). Then we sort the risk scores from low to high and divide the sample data into low-risk groups and high-risk groups based on the median value (Fig. 3B). The survival status and follow-up time of each GC patient are shown in Fig. 3C. The Kaplan-Meier survival curve was used in the training set to show the difference in overall survival between the high-risk group and the low-risk group (P < 0.0001) (Fig. 3D). Meanwhile, In the ACRG training set, time-correlated ROC curve analysis of different years is used to evaluate the signature. The AUC values of 1-year, 3-year, 5-year and 7-year OS probabilities in the training set are 0.68, 0.67, and 0.69, 0.69, respectively (Fig. 3E). In addition, we also performed Cox regression analysis, and the results showed that ARG features are independent predictors after adjusting for clinicopathological features (Fig. 3F, J).
Next, we will show the recurrence status and follow-up time of each GC patient in Fig. 3G. The Kaplan-Meier survival curve was used in the training set to show the difference in RFS between the high-risk group and the low-risk group (P < 0.0001) (Fig. 3H). And, in the training set, the time-correlated ROC curve analysis of different years is used to evaluate the signature. The AUC values of 1-year, 3-year, 5-year, and 7-year RFS probabilities in the training set are 0.71, 0.71 and 0.73, 0.72, respectively (Fig. 3I).
In order to further verify the prognostic value of ARG features for various demographic and clinical features, we performed a subgroup analysis on the data from the ACRG training set. After controlling for age, male, diffuse, intestinal, and clinical staging, high and low risks are still significant in the risk assessment model (Fig. 4).
3.4 GSEA for Risk Dependent Group
To further study the potential functional mechanisms of the different prognosis of GC patients in the ACRG cohort and the low-risk and high-risk subgroups of TCGA, we used GO to perform GSEA in ACRG and TCGA and found that ARG is mainly in ribosomal subunit, structural constituent of ribosome, and protein localization. to endoplasmic reticulum, ribosome, translational initiation, cell adhesion molecule binding, establishment of protein localization to endoplasmic reticulum, cadherin binding enrichment (Fig. 5A, B), and then we used KEGG to perform GSEA in ACRG and TCGA and found that ARG is mainly in It is enriched in Ribosome, Parkinson disease, Protein processing in endoplasmic reticulum, Spliceosome, RNA transport, Huntington disease (Fig. 5C, D). These results may help to gain insight into the cellular biological effects of ARG.
3.5 Verify the prognostic value of centralized ARG signatures
In the validation set, we sorted the risk scores from low to high, and GC patients were divided into high-risk (N = 200) and low-risk (N = 133) cohorts based on the risk score. The risk score distribution is shown in Fig. 6A. The survival time and risk score of each GC patient are shown in Fig. 6B. And we drew a heat map showing five ARG expression profiles (Fig. 6C). The Kaplan-Meyer survival curve of the low-risk group and the high-risk group in the verification set is shown in Fig. 6D. Use time-correlated ROC curve analysis of different years to evaluate signatures. The AUC values of 1-year, 3-year, and 5-year OS probabilities in the validation set were 0.573, 0.597, and 0.598, respectively (Fig. 6E).
3.6 Tumor immunoassay
We used the cibersort algorithm to process ACRG data to examine the relationship between risk score and tumor immunity. The percentages of immune cell types in the low-risk and high-risk groups of the training data set are shown in Fig. 7A. When we performed immune cell type-specific analysis, we found that low-risk patients showed higher levels of dendritic cell activtied. High-risk patients showed higher levels of B cell naive, T cell CD4 naive, and T cell gamma delta (Fig. 7B). We also show the correlation between immune cell types and ARGs (Fig. 7C).
3.7 Independent prognostic value of risk model and ARG nomogram
Further evaluate the prognostic value of the risk model. We analyzed the AUC (area under the curve) of the risk score, age, gender, stage, T, M, and N training cohorts as 0.709, 0.529, 0.501, 0.443, 0.699, 0.734, and 0.587, respectively (Fig. 8A). Then, we validated the above risk factors in the validation cohort and found that the AUC (area under the curve) of the training cohort with risk score, age, gender, staging, T, M, and N in the validation set were 0.573, 0.613, 0.47, 0.593, They are 0.556, 0.516 and 0.596 respectively (Fig. 8B). We found that the risk scoring model we established has good independent prognostic value. In order to better predict the prognosis of GC patients, we established a nomogram model (Fig. 8C) based on OS-related variables (age, gender, grade, stage, T, M, N, and risk score) in the ACRG database., We used the ROC curve of the training cohort to evaluate the prognostic value of the nomogram model, and found that the AUC of 1-year, 3-year, and 5-year in the training cohort were 0.81,0.806, 0.792. (Fig. 8D)
3.8 Predictive performance and clinical applicability evaluation of ARG nomogram
In order to further evaluate the predictive performance and clinical applicability of the prognostic nomogram, calibration curve and decision curve analysis (DCA) were performed. The calibration curve of the nomogram has good consistency in the training set (Fig. 9A-C). In addition, DCA confirmed our expectations, indicating that the combined nomogram model has the highest predicted net income compared with the TNM staging system (Fig. 9D-F).