3.1 Acquisition of Candidate ARLs in CESC
The flowchart illustrates the main idea and content of this study (Fig. 1). We obtained 202 ARLs by Pearson correlation analysis with screening conditions of correlation coefficients > 0.4 and p < 0.001 (Table S3). And the co-expression relationship between ARGs and ARLs was demonstrated by plotting the sankey diagram (Fig. 2A).
3.2 Construction of Prognostic Signature of ARLs
In order to enhance patient prognosis and further the development of precision medicine, we created a prognostic model relying on ARLs in CESCC patients. Initially, we integrated the expression of ARLs with the clinical survival data from the TCGA-CESC cohort. We obtained 11 OS-related ARLs by univariate regression analysis (P < 0.05) (Fig. 2B) and lasso regression analysis, and analyzed their regression coefficients and cross-validation trends (Fig. 2C,D). Finally, we identified 7 candidate ARLs by multifactorial regression analysis, namely MIR210HG,AP001528.1,AC119427.1,AC124045.1,PTPRD-AS1,LINC00683 and KIAA0087.A prognostic prediction model was developed based on these 7 ARLs and this formula was obtained by weighting the regression coefficients on the identified ARLs: Risk Score=(0.3546×ExpressionMIR210HG)+(0.7189×ExpressionAP001528.1)+(0.3485×ExpressionAC119427.1)+(-0.5958×ExpressionAC124045.1)+(0.6929×ExpressionPTPRD-AS1)+(-1.7296×ExpressionLINC00683)+(2.2408×ExpressionKIAA0087). In addition, we also conducted co-expression analysis of ARGs and ARLs (Fig. 2E), based on the results we can see VEGFA, NRP1 and APP have more correlation with OS-related ARLs compared to other genes.
3.3 Validation of the prognostic value and signature independence of the model
To verify whether the Signature constructed by 7 OS-related ARLs has good prognostic value for CESC patients, we analyzed and evaluated the model. We divided CESC patients into high-risk and low-risk groups according to their risk scores for subsequent validation of the model. First, we ranked the CESC patients by risk score in the All, Train, and Test groups and plotted the patients' survival status as scatter plots, and the same trend of increasing mortality with increasing risk score was observed in all three groups (Fig. 3A-F). Also, we used Kaplan-Meier analysis. and in the All group it could be seen that the prognosis and clinical outcome of patients in the high-risk group were much worse than those in the low-risk group (p < 0.001) (Fig. 3G). In the time-dependent ROC curves, the AUCs at 1,3 and 5 years were 0.792, 0.735 and 0.742, respectively (Fig. 3J). survival curves in the Train group also showed significantly better clinical outcomes for patients in the low-risk group than in the high-risk group (P < 0.001) (Fig. 3H). In the time-dependent ROC curves, the AUCs at 1,3 and 5 years were 0.781, 0.724 and 0.745, respectively (Fig. 3k). The Kaplan-Meier analysis for the Test group led to the same conclusion as the previous two groups (p = 0.04) (Fig. 3I).The ROC curves for the Test group showed AUCs of 0.936, 0.762 and 0.830 (Fig. 3L). The ROC curves for all three groups showed high specificity and accuracy of this model. We also performed a Multi-index ROC analysis, and from the results it is clear that risk scores have an advantage over traditional clinicopathological characteristics in predicting the prognosis of CESC patients(Risk,AUC = 0.705).
3.4 Principal Components Analysis
By performing principal components analysis (PCA) analysis on all genes, all ARGs, all ARLs and the candidate lncRNAs(Fig. 4A-D), we could observe whether they could clearly distinguish the patients in the low-risk and high-risk groups. The results of Fig. 4D's study demonstrate that the model we developed is remarkably predictive because the lncRNAs used to build the model are able to discriminate patients in the low-risk group from those in the high-risk group more effectively than the other three.
3.5 Association between clinicopathological features and risk scores
Figure 5A shows the rickscore, age, gender, grade, T stage, M stage and N stage of the CESC sample in the TCGA database. Figure 5B-F details the proportion of patients with different clinicopathological characteristics in the high and low risk groups. By analyzing the correlation between risk scores and clinicopathologic features, we can conclude that the frequencies of the four clinicopathologic features, grade, T stage, M stage and N stage, differed more significantly between the high and low risk groups.
3.6 Survival analysis of groups with different clinicopathological characteristics
We used the R package "survival" and "survminer" to analyze the survival of subgroups with different clinicopathological characteristics in order to further understand whether the prognosis of patients in the high and low risk groups differed among the subgroups. The samples were separated into various subgroups according to age (65 years old, > 65 years old), grade (G1-2, G3-4), T stage (T1-2, T3-4), N stage (N1, N2), and gender (only female).Subsequently, we assessed survival in subgroups and plotted the corresponding Kaplan-Meier curves (Fig. 6). Overall survival was substantially worse in the high-risk group than in the low-risk group in all remaining clinical categories, with the exception of age = 65 and N0 stage. These findings suggest that a risk-prognosis model based on seven ARLs associated with prognosis can predict the prognosis of different clinical subgroups of CESC patients more accurately.
3.7 Development of nomogram based on clinicopathological features
To clarify whether the risk-prognosis model and different clinicopathological characteristics could be used as independent prognostic indicators for patients with CESC, we performed univariate and multifactorial Cox regression analyses of the TCGA-CESC cohort. In terms of the results of the univariate Cox regression analysis, the risk score was a significant prognostic predictor (P = 0.007, HR = 1.098, 95CI = 1.026–1.175) (Fig. 7A). Follow-up multifactorial Cox regression analysis further validated that the risk score had superior independent predictive power of prognosis (P = 0.003, HR = 1.109, 95CI = 1.035–1.188) (Fig. 7B). We used the patient's age, T stage, M stage, N stage, grade, and riskscore to construct the nomogram (Fig. 7C), which can be used to predict the survival of CESC patients at 1, 3, and 5 years, and to quantify the patient's OS using the nomogram. The nomogram combines the prognostic model developed by the 7 ARLs with other clinicopathological features, which helps to improve the model's deficiencies and thus improves the prognostic accuracy. The 1,3 and 5-year calibration curves of the column line graph likewise illustrate good agreement between predicted and observed outcomes, demonstrating its powerful ability to predict prognosis (Fig. 7D).
SC cohort.
3.8 Functional Enrichment Analysis
Prior to the functional enrichment analysis, we obtained 334 differentially expressed ANGs, and details of these genes are visible in Table S4. To clarify the relationship between risk scores and certain biological functions and signaling pathways, we performed GO functional analysis on differentially expressed genes. We used FDR < 0.05 and p < 0.05 conditions to screen for significantly enriched items. According to the results of GO functional analysis (Fig. 8A), Biological Processes (BP) mainly included cilium organization,cilium assembly,epidermis development,skin development, Cellular Component (CC) mainly includes motile cilium, apical part of cell, apical plasma membrane, Cellular Component (CC) mainly includes motile cilium, apical part of cell, apical plasma membrane, cytoplasmic region, plasma membrane bounded cell projection cytoplasm, and immunoglobulin complex.However, unfortunately, no significantly enriched items related to Molecular Function (MF) were screened accordingly. After GSVA analysis, we found 45 significantly enriched pathways (Fig. 8B,Table S5). From GSEA analysis, we obtained pathways active in both high risk and low risk groups (Fig. 8C,Table S6). Finally, we were surprised by the results of these analyses and found that the results of partial functional enrichment analysis were closely linked to cellular immune responses. To elucidate in detail the immune landscape in the high and low risk groups in CESC patients, we performed a systematic immune correlation analysis of the TCGA-CE
3.9 Risk Score Predicts TME and Immune Cell Infifiltration
The tumor is infiltrated by multiple immune cells, and these immune infiltrating cells have both tumor-promoting and anti-tumor effects[22]. Immunotherapy, a novel and promising therapeutic strategy, is also frequently used in the treatment of cervical cancer, and its therapeutic efficacy is closely related to the immunogenicity of TME. On the basis of this, we conducted a systematic study of immune infiltration and immunological function. We used seven different algorithms, XCELL, TIMER, QUANTISEO, MCPCOUNTER, EPIC, CIBERSORT-ABS, and CIBERSORT, to analyze the correlation between different risk scores and the number of immune infiltrating cells (Fig. 9A). Then, in order to compare the immunological infiltration in the high-risk and low-risk groups, we applied the CIBERSORT algorithm. We observed significant distinctions in the expression of B cells naive, Plasma cells, T cells CD8, Tregs, Macrophages M0, Macrophages M2, and Mast cells activated between high and low risk groups (Fig. 9B).Based on this, we speculate that the expression and immune activity of these cells may be affected by ARLs, but the shortcoming is that many experimental studies are still required to verify this speculation. Immunotherapy is seen as a promising therapeutic strategy for tumors, and immune checkpoint inhibitors can be used as immunotherapy by reducing the immune escape of tumor cells [23]. Immune checkpoints are important for immunotherapy, therefore, we performed an analysis of immune checkpoints between high and low risk groups(Fig. 9C).It is notable that only two genes, CD44 and TNFSF9, were notably downregulated in the low-risk group compared to the high-risk group, while CD40LG, CD27, BTLA, CD200, CD28, PDCD1, HHLA2, IDO2, VTCN1, TNFRSF14, CD160, IDO2, LAG3, LAGLS9, CD48, and TIGIT16 genes were significantly upregulated. The expression product of CD44, a non-kinase transmembrane glycoprotein, is a cancer stem cell marker that binds to ligands and induces cell proliferation, increases cell survival and increases cell viability, thereby mediating tumor progression and metastasis[24] [25; 26]. It has been shown that TNFSF9 can promote metastasis of pancreatic cancer through Wnt/Snail signaling and macrophage M2 polarization [27]. It has also been shown that by targeting TNFSF9 can inhibit the growth and metastasis of prostate cells [28].However, unfortunately, there are no relevant experiments in cervical cancer that can draw similar conclusions as the previous ones. However, based on this we can make a wild guess that low expression of CD44 and TNFSF9 in the low-risk group makes their prognosis better than that of the high-risk group. The risk score combined with patient-specific immune checkpoint gene expression can be used clinically to determine and adjust relevant treatment regimens, thus allowing patients to better benefit from clinical treatment. The above-mentioned studies on immune cell infiltration and immune checkpoints have yielded promising results, but our immune-related studies did not stop there. Since changes in immune cell infiltration usually lead to changes in immune function, we further studied the immune function of CESC patients by ssGSEA analysis. Cytolytic activity,HLA,and inflammation promoting were much lower in the high-risk group compared to the low-risk group (Fig. 9D). Finally, we also performed tumor immunosuppression and rejection (TIDE) analysis on the risk model. The implementation of immune checkpoint inhibitor therapy was less successful for patients in the high-risk group due to the TIDE scores in the high-risk group being significantly higher than those in the low-risk group (P༜0.05), which indicated a higher chance of immune escape in the high-risk group(Fig. 9E).
3.10 Analysis of Tumor Mutation Burden
Tumor mutation burden (TMB) represents the number of cancer mutations, with more mutations in cancer cells producing more neoantigens and thus increasing the chance of T cell recognition. Immune checkpoint inhibitor therapy seems to have superior clinical results in patients who have high TMB[29]. In the CESC sample, we examined tumor mutations in the high- and low-risk categories. Mutations occurred in 81.5% of the samples in the low-risk group, with TTN (29%), PIK3CA (27%), KMT2C (18%), and MUC16 (16%) being the top four genes with high mutation frequency (Fig. 10A). In the high-risk group, 82.61% of the samples were mutated, and the top four genes with high mutation frequency were still TTN (30%), PIK3CA (30%), KMT2C (19%), and MUC16 (17%) (Fig. 10B). PIK3CA mutations induce the β-catenin/SIRT3 signaling pathway thereby promoting glycolysis and proliferation of cervical cancer cells, thereby accelerating cancer progression[30]. Further research has revealed that patients with positive PIK3CA mutations have lower overall survival when treated with cisplatin. Hence, this population of patients may benefit from PIK3CA inhibitor therapy in conjunction with cisplatin [31]. TMB between the low-risk and high-risk groups was compared, and it was discovered that there was no discernible difference between the two groups (P = 0.28) (Fig. 10C). We also performed an in-depth analysis regarding TMB, dividing CESC patients into low and high mutation groups and performing Kaplan-Meier analysis, and discovered that the high mutation group's survival rate outperformed the low mutation group by a wide margin(p = 0.04) (Fig. 10D). Finally, we combined the two characteristics of TMB and risk score and divided the sample into four groups, H-TMB + high risk, H-TMB + low risk, L-TMB + high risk, and L-TMB + low risk, and then conducted Kaplan-Meier analysis and found that the survival rate of the H-TMB + low risk group was significantly higher than the other three groups, while the prognosis of L-TMB + high risk group was the worst (Fig. 10E). When we use the built model with TMB to forecast a patient's prognosis, we may get a more comprehensive and accurate prediction result.
3.11 Drug Sensitivity Analysis
Figure 11 shows the 12 popular immunotherapeutic drugs with notable variations in chemosensitivity across individuals at high- and low-risk. We discovered that Afatinib (P = 1.9e-05), Erlotinib (P = 0.00063), Gefitinib (P = 0.00087), Ibrutinib (P = 0.00041), Lapatinib (P = 7.6e-05), Sapitinib (P = 1.5e -05), Trametinib (P = 0.00017), Ulixertinib (P = 5.8e-05), and VSP34_8731 (P = 0.00019) displayed better IC50s in the low-risk group compared to the high-risk group. However, three drugs, AZD4547, EPZ004777, and GNE-317, had high IC50 in the high-risk group. Based on the risk score, we can study the response of CESC patients during immunotherapy in more depth and thus make the drug treatment more precise.
3.12 Validation of the built model by RT-qPCR
We explored the expression of the seven OS-related ARLs involved in modeling in cervical cancer cells and normal cervical cancer epithelial cells by qRT-PCR experiments. We found that the expression of AC119427.1(Fig. 12A) and MIR210HG(Fig. 12E) was significantly lower in normal tissues than that in cervical cancer tissues. However, AP001528.1(Fig. 12B), LINC00683(Fig. 12D) and PTPRD-AS1(Fig. 12F) were highly expressed in normal tissues. From the experimental results, we speculate that the expression status of these lncRNAs may be closely related to the development of cervical cancer.