Identification and functional annotation of DEARGs
The flow diagram of the integral analysis is described in Figure 1. We obtained 217 autophagy genes expressed in 306 CC patients and 8 normal controls. Among these expressed autophagy genes, there were 53 differentially expressed genes (Figure 2a), including 21 down-regulated and 32 up-regulated (FDR < 0.05, |log2FC| > 1). The box plot revealed the expression pattern of differential genes in tumor and normal samples (Supplementary Figure 1a), all DEARGs were represented in Additiona1 file 4. To rummage the signal pathways of DEARGs, we performed GO and KEGG enrichment analysis (Figure 2b,c). In the term of biological process (BP), the enriched pathways such as autophagy, macroautophagy, cellular response to external stimulus, intrinsic apoptotic signaling pathway and extrinsic apoptotic signaling pathway closely related to cell proliferation and migration. For the cellular component (CC) part, autophagosome, autophagosome membrane and vacuolar membrane were involved in autophagy. The results of molecular function (MF) of the genes showed these genes participated in many binding reactions. KEGG analysis showed 53 DEARGs were included in critical pathways associated with cancer development, such as apoptosis, autophagy – animal, platinum drug resistance and p53 signaling pathway. The detailed results of significant top 8 GO annotation and top 10 KEGG pathway enrichment were shown in Additiona1 file 5. The PPI network showed the connection between the various genes (Figure 2d). The hub genes might be identified by the number of connections, CASP3 had the most connected nodes with other genes (Supplementary Figure 1b).
Construction and verification of autophagy prognostic model in the training set
We first performed univariate Cox regression analysis to find autophagy gene signatures related to OS and RFS respectively. Among the 35 genes related to the OS, 13 genes were protective with HR < 1, and 22 genes were dangerous with HR > 1 (Figure 3a). Among the 9 genes related to the RFS, 3 genes were protective with HR < 1, and 6 genes were dangerous with HR > 1 (Figure 3b). Lasso regression was adopted to further screen variables to ensure the stability of our model, and finally 20 candidate genes related to OS and 8 candidate genes related to RFS were obtained (Figure 3c-f). Multivariate Cox regression analysis was utilized to determine the genes that constructed the predictive model and their regression coefficients. For OS and RFS models, 9 autophagy-related gene signatures (VAMP7, MTMR14, ATG4D, KLHL24, TP73, NAMPT, CD46, HGS, ATG4C) and 3 autophagy-related gene signatures (SERPINA1, HSPB8, SUPT20H) were gotten. The coefficient of each gene was shown in Table 1. Moreover, we could find that at the level of mRNA or protein, not all risk signatures related to survival were differential genes (Supplementary Figure 2a-l). The results from the K-M analysis of single-gene illustrated 8 genes were significantly related to survival, including five protective genes like ATG4D, KLHL24, TP73, HSPB8, SERPINA1 (P = 0.025, 0.009, 0.030, 0.050, 0.014 respectively) and three dangerous genes like CD46, HGS, SUPT20H (P = 9.609e−04, 0.007, 0.014 respectively), which was consistent with the Cox regression analysis (Supplementary Figure 3a-l). Simultaneously, it could be speculated that these risk genes might have a synergistic effect, not just individual genes. According to the expression of model genes and regression coefficients to calculate the patient's risk score, the formula was as follows:
Among them, coef represents the coefficient and xi represents the relative expression value of each risk signature.
The median risk score was applied as a cut-off value to divide patients into high- and low-risk groups. The OS and RFS survival analysis respectively between the two groups were significantly different (Figure 4a, c, P = 2.636e−08 for OS, P = 2.104e−03 for RFS),lower risk score generally presages a better survival prognosis.
The AUC of ROC analyses at 1, 3, and 5 years was 0.783, 0.830, and 0.824 for OS and 0.682, 0.793, and 0.843 for RFS (Figure 4b, d,). Additionally, the risk score, the number of survivals and the expression of model genes between the high- and low-risk group had a difference. The expression of protective genes was higher in the low-risk group, and the expression of dangerous genes was higher in the high-risk group. The low-risk groups had fewer deaths than the high-risk groups (Figure 4e-j).
OS and RFS prognostic risk models were independent predictive indicators in the CC patients of the TCGA database
We further analyzed whether the risk score could assume independent predictive indicators for CC patients. The clinical information was given in Table 2. Univariate Cox regression analysis showed that age (P = 0.030, HR = 1.022), stage (P < 0.001, HR = 1.811), T (P = 0.017, HR = 1.397), BMI (P = 0.036, HR = 0.954) and the risk score (P < 0.001, HR = 2.973) correlated with OS of CC patients (Figure 5a). Differently, only risk score (P < 0.001, HR = 2.867) was associated with RFS of CC patients (Figure 5c). Next, we performed multivariate Cox regression analysis using the above clinical parameters and risk score. The results showed that the prognostic risk model could serve as an independent prognostic indicator to predict OS and RFS in patients respectively (P < 0.001, HR = 2.759 for OS, P < 0.001, HR = 2.785 for RFS) (Figure 5b,d). Furthermore, multivariate Cox analysis showed that stage significantly correlated with OS (P = 0.009, HR = 1.436). These results demonstrated that both prognostic models could be independently used to predict OS and RFS in CC patients. The nomogram calculated the survival prediction value of the individual at 1, 3, and 5 years based on the total score and the probability of the outcome event (Figure 5e, f). The calibration curve showed that the predicted risk was consistent with the actual risk (Supplementary Figure 4a-f).
Verification of autophagy-related predictive signatures in the testing group
The effect of the prediction model is likely to be different due to changes in scenarios and populations. To verify the validity of the prognostic model and improve its generalizability, we conducted external verifications on the OS and RFS prognosis model. Since other appropriate gene expression data and clinical information about CC patients were not available, we chose UCEC and HNSCC as external verification data. We downloaded the transcriptome data of UCEC patients and clinical data including OS from the UCSC Xena database, then calculated the risk score based on the expression of nine OS model genes and regression coefficients in the UCEC patients from our testing group. Furthermore, GSE117973 containing mRNA expression information and PFS clinical data was used for validating the prognostic model of RFS and risk score was calculated. In the OS and RFS validation sets, patients were divided into high- and low-risk groups based on the calculated median risk score respectively. K-M survival curves showed that in UCEC patients and HNSCC patients, the higher risk score was associated with adverse outcomes (Supplementary Figure 4g, i). The AUC also proved that the risk signature had good accuracy with 0.571, 0.635, and 0.669 at 3, 5 and 7 years respectively for the OS of UCEC patients. For the PFS of HNSCC patients, the AUC was 0.443, 0.571, 0.635 at 1, 3 and 5 years respectively (Supplementary Figure 4h, j).
Diagnostic value of risk signatures in GEO consolidated data sets in CC
We identified the diagnostic application of risk signatures in the combined data using ROC curve analysis. The AUC for ATG4C, ATG4D, CD46 , HSPB8, MTMR14 and SUPT20H were 0.704, 0.640, 0.697, 0.627, 0.743 and 0.705 respectively (95% CI = 0.612-0.796, 0.542-0.737, 0.603-0.791, 0.528-0.726, 0.654-0.832, 0.614-0.796), (Figure 6a-f). This demonstrated that risk signatures were potential diagnostic markers. Diagnostic values of other risk signatures without statistical significance were shown in Supplementary Figure 5a-f.
Single-gene Gene Set Enrichment for the risk signatures
Single-gene GSEA of the risk signatures has shown the potential roles of the genes in CC. The visualized results were filtered with P < 0.05 and FDR < 0.25, as shown in Supplementary Figure 6a-k. These genes were enriched in six different signal pathways, namely KEGG_ REGULATION_OF_AUTOPHAGY (ATG4C, ATG4D, KLHL24, TP73), KEGG_ MTOR_SIGNALING_PATHWAY (HGS, VAMP7), KEGG_NEUROTROPHIN_SIGNALING_PATHWAY (KLHL24), KEGG_PROSTATE_CANCER (SERPINA1, SUPT20H), KEGG_PROSTATE_CANCER (SUPT20H) and KEGG_FOCAL_
ADHESION (SUPT20H). Details were shown in Additiona1 file 6.
Risk score could be applied as a sensitive indicator of target therapy
Pearson correlation analysis manifested our OS-related risk score was significantly related to the mRNA expression level of AKT1(cor = 0.1565, P = 0.0075), BCL2 (cor = -0.2589, P < 0.0001), MTOR (cor = 0.2013, P = 0.0006), TP53 (cor = 0.2013, P = 0.0006), VEGFA (cor = -0.3215, P < 0.0001) and RFS-related risk score was correlated with the mRNA expression level of AKT1 (cor = 0.2630, P = 0.0002), TP53 (cor = 0.1744, P = 0.0130), VEGFA (cor = 0.2093, P = 0. 0028), as shown in Figure 6g-n. These results pointed out that patients with higher OS-related risk score and RFS-related risk score might respond better to therapies targeting AKT1 and VEGFA, patients with lower OS-related risk score might respond better to therapies targeting BCL2. Besides, patients with higher OS-related risk score might be more sensitive to drugs targeting MTOR. Interestingly, patients with higher OS-related risk score and RFS-related risk score had opposite sensitivity to TP53, which was worth exploring further.
RNA expression in CC and normal samples
Finally, among the 12 autophagy biomarkers related to the survival of cervical cancer, based on literature search, we selected 6 genes (ATG4C、ATG4D、CD46、TP73、SERPINA1 and HSPB8) to verify their RNA expression in CC and normal cervical tissue samples. The results showed that except for SERPINA1, the other five genes were down-regulated in cervical cancer(Figure 7a-f, all P < 0. 05). This result shows that ATG4C, ATG4D, CD46, TP73 and HSPB8 may play a protective role in the progression of cervical cancer. Interestingly, ATG4D, CD46, TP73 and HSPB8 were negatively correlated with the risk score, which is consistent with the experimental results. ATG4C is positively correlated with the risk score, which is contrary to the above qPCR results.