A riskscore-based nomogram for the prediction of overall survival in cervical squamous cell carcinoma

Based on the ESTIMATE algorithm, we acquired the immune/stromal scores of cervical squamous cell carcinoma (CSCC) patients collected from The Cancer Genome Atlas (TCGA) dataset. Subsequently, we analyzed the DEGs between high- and low immune score groups using R package edgeR and performed K-M analysis to illustrate the relationship between differentially expressed genes (DEGs) and the overall survival to select survival-related DEGs. Then the LASSO regression model was constructed with the package “glmnet” in R to evaluate the riskscore of each patient. Finally, we developed a nomogram composing riskscore and clinicopathological characteristics to predict the overall survival (OS) of CSCC patients. The R software v3.6.1 was used for statistical analyses. All statistical tests were two-tailed.


Background
Cervical cancer represents one of the most common solid tumors in females worldwide, with 530,000 new cases and 270,000 deaths annually around the world [1].The overwhelming majority of these new cases arise in developing countries, including China [1].With the improvement of early diagnosis rate of cervical cancer, the incidence has been decreased due to application of various screening methods and effective vaccines against HPV.Nonetheless, on account of the lack of related knowledge education, healthcare resources and economic factors, the incidence rate of cervical cancer in some underdeveloped and remote areas remains high, with a younger trend [2].There are multiple methods to treat cervical cancer, including surgical resection, radiotherapy, chemotherapy, immunotherapy and so on.However, the incidence of patients with stage I-III experiencing metastatic disease ranges from 15-61% [3].Moreover, the median overall survival for patients with disease progression varied a lot, with a range of 7 to 53 months [4], indicating that it is crucial to identify prognostic factors for cervical cancer.To our knowledge, lymph node status and the International Federation of Gynecology and Obstetrics (FIGO) stage are important prognostic factors for cervical cancer, but it seems that the prediction e ciency of these factors is limited [5].
Recently, Substantial progress has been made in immunotherapy for cervical cancer, and immune checkpoint inhibitors have been playing its role as a new therapeutic option for these patients [6].
However, clinical studies have shown the response rate of cervical cancer to checkpoint immunotherapy to be in the region to 10-25% [7,8], so it is necessary to narrow down the patients who could be expected to have satisfactory effects using ideal biomarkers [9].The tumor microenvironment (TME) consists of immune cells, stromal cells, extracellular matrix molecules and in ammatory mediators [10].In cervical cancer, the composition of the TME has an impact on survival, with a higher ratio of CD8 TILs to CD4 Tregs being associated with less extensive disease (lymph node negativity) and improved survival rates [11,12], suggesting that the understanding of the TME may help identify optimal biomarker to predict overall survival in cervical cancer and improve the e cacy of cancer treatments in the future.To our knowledge, there has not been a prognostic nomogram based on immune-related factors in cervical cancer up to now.
ESTIMATE was a R package designed by Yoshihara et al [13] to obtain immune and stromal scores based on the speci c gene expression signatures.Several reports have documented the application of ESTIMATE in multiple tumors [14][15][16][17].However, whether the prognosis in CSCC, which is the most common histologic subtype of cervical cancer, could be investigated using ESTIMATE algorithm is still unclear.In the present study, the TME and the gene expression pro le of CSCC were investigated to establish a riskscore prognostic model for CSCC based on The Cancer Genome Atlas (TCGA) database.

Data collection
The transcriptional pro les and clinical information of CSCC patients were retrieved from the TCGA website (https://cancergenome.nih.gov/).The inclusive criteria were as follows: (1) only patients with primary CSCC were enrolled, (2) no history of other malignancies, (3) only samples with RNA-sequencing data, (4) complete clinicopathological materials including age and stage, which are found to be correlated with overall survival in our study, (5) complete survival data with a survival time of more than 1 mo, (6) minimum follow-up of 60 days.Ultimately, 210 CSCC samples with complete RNA-seq data and the corresponding clinical information were subjected to consequent analyses.This study meets TCGA's publication guidelines.Immune and stromal scores were calculated by applying the ESTIMATE algorithm [13] to the mRNA expression data.

Overall survival curve
Kaplan-Meier (K-M) plots were performed to illustrate the correlation of immune/stromal scores with patients' overall survival (OS).The relationship was tested by the log-rank test.

Differentially expressed genes analysis
To screen for differentially expressed genes (DEGs), 210 CSCC patients obtained from the TCGA dataset were divided into high and low immune score groups according to the ESTIMATE results.The differentially expressed genes (DEGs) were identi ed using the package edgeR [40] in R software (Version 3.6.1;https://www.r-project.org/).The genes that satis ed the inclusion criteria of adj.P-value < 0.05 and |log FC|≥1 were identi ed as statistically signi cant DEGs and included in this study.K-M analysis was performed to illustrate the relationship between DEGs and patients' OS.The relationship was tested by the log-rank test and P < 0.01 was considered as signi cant.Volcano plot and heatmaps were generated using the "gplots" and "pheatmap" package in R software, respectively.

GO and KEGG enrichment analysis of DEGs
The Gene Ontology (GO) and Kyoto encyclopedia of genes and genomes (KEGG) pathway analyses were performed by "clusterPro ler" package [41] in R to identify potential biological processes (BP), molecular functions (MF), cellular components (CC) and signaling pathways related to the DEGs.

Construction of the riskscore prognostic model
To prevent over tting [42], the least absolute shrinkage and selection operator (LASSO) regression with L1-penalty was applied, which is an appropriate solution to establish signatures if there are numerous correlated covariates with small sample size [43].In our study, the LASSO regression model was constructed with the package "glmnet" in R and the penalty parameter "lambda" was used to choose the best model based on leave-one-out cross-validation [44].Finally, based on the LASSO analysis, we extracted genes from DEGs which were regarded as signi cant in K-M analysis and their corresponding coe cients.Combining expression levels of the selected RNAs ( ) with the relative coe cients, a riskscore for each sample was calculated: , the represents the coe cient of each mRNA from the model.CSCC patients were divided into low-and high-risk groups based on the median riskscore.The K-M survival curves for risk were performed.The predictive ability of the riskscore prognosis model was assessed by "timeROC" package in R software.

Estimation of immune in ltration
The CIBERSORT algorithm [45] was applied to estimate immune in ltration cell composition including 22 types of immune cells, based on normalized gene expression pro les.Samples with the CIBERSORT P < 0.05 were considered eligible for subsequent analysis.For each sample, all the estimates of immune cell type fractions summed up to 1 [46].

Development and validation of the nomogram
Using the package of "rms" in R, a nomogram that can visualize the prognostic ability of various factors in a single gure [47] was established based on the riskscore and clinicopathological data.The concordance index (C-index) and receiver operating characteristic (ROC) analysis were used to determine its predictive accuracy.The calibration curves [48] were used to determine its discriminatory capacity.The internal validation of the nomogram was conducted by bootstraps with 1000 resamples.

Statistical analysis
The R software v3.6.1 was used for statistical analyses.All statistical tests were two-tailed.

Results
Immune scores were signi cantly related to CSCC prognosis The ow chart of our study procedure is presented in Fig.  2A, B and C).To explore the in uence of immune/stromal scores on prognosis, we divided 210 CSCC patients into high-and low-score groups and constructed K-M curves.We found that the immune scores were signi cantly positively correlated with overall survival (OS) (P = 0.005) (Fig. 2D).On the contrary, stromal scores were not associated with OS (P = 0.364) (Fig. 2E).Then, we investigated the correlation between immune/stromal scores and clinical characteristics.The results showed that the distributions of immune/stromal scores did not vary with stage (Fig. 3A and B) and age (Fig. 3C and D).

Differentially expressed genes with immune score
As stromal scores were not correlated with overall survival in this study, we only explored the difference of gene expression pro les between high-and low immune score groups to construct the prognostic model.
Finally, according to the inclusion criteria, 641 DEGs were identi ed and exhibited by volcano plot (Fig. 4A).Compared with the low immune scores group, 504 genes were upregulated and 137 genes were downregulated in the high immune score group (Supplementary Table 1).Among these DEGs, 72 DEGs were signi cantly positively correlated with OS (P < 0.01) ( Supplementary Table 2).An additional le shows this in more detail [see Additional le 1].The heatmap showed the expression pro les of the 72 genes of prognostic value of cases in low and high immune scores groups (Fig. 4B).

Functional enrichment analysis
The potential functions of the 641 DEGs were evaluated by GO analysis and KEGG pathway.The top 5 GO terms identi ed included immunoglobulin complex, antigen binding, adaptive immune response based on somatic recombination of immune receptors built from immunoglobulin superfamily domains, lymphocyte mediated immunity and complement activation classical pathway (Fig. 5A).The KEGG pathway analysis revealed that the DEGs were enriched in allograft rejection, viral protein interaction with cytokine and cytokine receptor, cytokine-cytokine receptor interaction, cell adhesion molecules (CAMs) and staphylococcus aureus infection (Fig. 5B).
Construction of a riskscore model and evaluation of its predictive ability Seventy-two genes of prognostic value were included in LASSO analysis, in which the contributions of 72 genes were weighted by their relative coe cients (Fig. 6A and B).In consequence, two genes were selected for constructing the riskscore model, and the nal riskscore formula was as follows: Risk score = [-0.0115× expression level of FOXP3] + [-0.0201 × expression level of ZAP70].Then the riskscore of each sample was calculated according to the formula and the samples were devided into high-or low-risk groups based on the median riskscore.The distribution of the riskscore was indicated in Fig. 7A.As indicated in the K-M curve plot (Fig. 7B), there was a signi cant difference between high-and low-risk groups in patients' OS.Patients in the low-risk group had signi cantly longer OS than those in the highrisk group.The area under the ROC curve (AUC) of the prognostic model for OS was 0.766, 0.717 and 0.701 for the rst, third, and fth years, respectively (Fig. 7C), demonstrating that the riskscore model had good performance.
Correlation of the riskscore with immune in ltration By using the CIBERSORT algorithm, we acquired the relative fractions of 22 immune cell types.185 cases of CSCC in the TCGA dataset were enrolled for subsequent analysis after the lter criteria with P value < 0.05, with 97 cases in low-risk group and 88 cases in high-risk group.As shown in the bar plot, the T cells CD8 and the macrophage M0 were the most signi cant enrichment of immune cells among 22 immune cell types (Fig. 8A).As shown in the heatmap, the fractions of immune cells in CSCC varied signi cantly between the low-and high-risk groups (Fig. 8B).The vioplot showed the fraction of T cells CD8, the macrophage M1 and resting Mast cells was signi cantly higher in low-risk group compared with high-risk group.In contrast, the fraction of resting CD4 memory T cells, the macrophage M0 and activated Dendritic cells were signi cantly higher in high-risk group patients compared with low-risk group (Fig. 8C).
As the expresion T cells CD4 naive in all samples were estimated as zero, we eliminated this cell type in the analysis of the correlation between different tumor-in ltrating immune cells (TIICs).The corHeatmap indicated a weak to moderate correlation in the proportions of 21 types of TIICs (Fig. 8D).These results indicated that the different immune cell in ltration in patients with CSCC may be a potential prognostic biomarker and targets for immunotherapy.

A nomogram based on the riskscore model
To establish a clinically applicable method to evaluate the prognosis of CSCC patients, a nomogram was generated for OS prediction using the three prognostic factors including riskscore, age and pathologic stage (Fig. 9A).The concordance index (C-index) was 0.746.The calibration plots for the possibility of 3and 5-year survival showed an optimal agreement between the prediction by the novel nomogram (Fig. 9B-C).The AUC was 0.805, 0.723 and 0.748 for the 1, 3 and 5-year survival times, respectively (Fig. 10A).However, the AUC was 0.3, 0.375 and 0.358 for the 1, 3 and 5-year survival times (Fig. 10B) when immunescore was included as a single prognostic factor to predict patients' OS, suggesting that the nomogram has a much better predictive accuracy than the immunescore model and our method to construct a prognostic model is effctive.In brief, the nomogram has a good predictive accuracy and aid clinical management.

Discussion
Cervical cancer remains a huge challenge for public health in developing countries.The past decade has witnessed an astonishing outpouring of research on the treatment of cervical cancer.Signi cant progress has been made in precise immunotherapy, which is taking the place of chemotherapy gradually.Further, it is urgently needed to identify effective biomarkers in tumor immune microenvironment correlated with prognosis.
In our study, we applied the ESTIMATE algorithm to calculate immune and stromal scores of 210 CSCC samples obtained from the TCGA dataset, where the results showed that high immune scores were signi cantly associated with better OS.As stromal scores were were not found to be signi cantly related to prognosis, we just analyzed the DEGs between high-and low immune score groups.Then 641 DEGs were identi ed.GO analysis showed that the DEGs were mainly involved in the TME, such as immunoglobulin complex, antigen binding, adaptive immune response based on somatic recombination of immune receptors built from immunoglobulin superfamily domains, lymphocyte mediated immunity and complement activation classical pathway.K-M analysis of 641 DEGs suggested that 72 DEGs were signi cantly correlated with prognosis.Subsequently, LASSO regression was applied to construct a riskscore model that could identify CSCC patients with unfavorable outcomes.Finally, we selected 2 genes including FOXP3 and ZAP70.Though the difference of expression level of FOXP3 and ZAP70 between two groups were not obvious in the heatmap (Fig. 4B), both FOXP3 and ZAP70 satis ed the criteria of logFC > 1.It may result from the expression of FOXP3 and ZAP70 being in the range of 0-2, which was all green in the heatmap and was hard to distinguish.However, it did not mean the difference of expression level of FOXP3 and ZAP70 between two groups was not obvious, as they satis ed the criteria of FC > 2. The heatmap only showed the difference of expression level between low-immune-score group and high-immune-score group, it did not re ect the contribution of predictive ability of genes to the overall survival.The AUC for the riskscore model for predicting the 1-, 3-, and 5-year survival were 0.766, 0.717 and 0.701, respectively, indicating that this model had a good performance for OS prediction and modifying the TME may impede or eliminate tumors.The genes (FOXP3 and ZAP70) that formed our riskscore model may be potential targets and provide better performance in combination.
FOXP3 is a transcription factor belonging to the family of forkhead box (FOX) protein.All the family members have a forkhead (FKH) domain which fuctions as a transcriptional repressor or activator [18][19][20] and regulate the transcription of approximately 700 genes [21].FOXP3 was found to be expressed in both regulatory T cells [22,23], based on which the detection of the FOXP3 expression in tumor samples was used as an indicator of regulatory T cells [24], and in tumor cells, including breast cancer, lung cancer, gastric cancer, thyroid cancer, colon cancer, bladder cancer and glioblastoma cell lines [25][26][27][28][29].However, the role of FOXP3 in cancer is inconsistent and even the opposite.In some types of human tumors, the expression of FOXP3 is upregulated with promotion of the development of tumors, leading to a poor prognosis.While in some other types of tumors, it is a totally different story [30].It was reported that the expression level of FOXP3 was high in the cytoplasm and nucleus as well as cancer interstitium in cervical cancer, but low in cervical intraepithelial neoplasia (CIN), and the expression of FOXP3 was even not observed in normal cervical epithelium, indicating that FOXP3 may not be involved in the initiation of this malignancy but may facilitate tumor growth [31].It was also found that the expression of FOXP3 in metastatic lymph nodes was signi cantly stronger than that in normal lymph nodes, suggesting the correlation between FOXP3 and tumor metastasis [31].The protein tyrosine kinase zeta-associated protein (ZAP70) is a protein tyrosine kinase acting in the T cell receptor signal transduction system, which has been identi ed to play a crucial role in T cell activation and other immune response [32][33][34].Studies have accumulated pointing to a correlation of ZAP70 expression in CLL cells with IGVH status [35][36][37].ZAP70 expression has been considered as an effective prognostic factor in chronic lymphocytic leukemia [37].However, the expression of ZAP70 and the impact on prognosis in cervical cancer are unclear.For all we know, there has not been a riskscore prognostic model consisting of these two gene signature in cervical cancer.
Then, we constructed a nomogram comprising riskscore, age and the pathologic stage.
The nomogram we developed was the visualization of Cox regression model speci cally to provide a more individualized prediction of outcome of each patient.Each variable had a corresponding score on the "points" scale at the top.After summing all of the assigned scores and locating it on the "total points" scale, it was easily able to discover the corresponding probability of 1-, 3-and 5-year OS.The results of both AUC curves and calibration plot showed that the nomogram we established had a good predictive accuracy.
The past decade has witnessed the evolution of the treatment of metastatic cervical cancer from cytotoxic agents to immunotherapy.A high frequency of PD-L1 expression has been reported in up to 80% of CSCC patients, making this disease a prime candidate for PD-1/PD-L1 inhibition [38].Futhermore, identifying immune mechanisms to overcome immune suppression and understanding the biomarkers of prognosis will contribute to the development of effective individualized targeted approaches.We investigated immune mechanisms and the component of TIICs subpopulation between patients in the low-and high-risk groups.The results have shown a signi cantly higher fraction of T cells CD8, the macrophage M1 and resting Mast cells in low-risk group patients, while a signi cantly higher fraction of resting CD4 memory T cells, the macrophage M0 and activated Dendritic cells in high-risk group patients.
A recent study has demonstrated the levels of CD8(+) and CD8(+) CD28(+) T cells in the early stage cervical cancer group were markedly lower than those in the CIN group and the control group and the ratio of CD4(+)CD25(+)/CD4(+) in the cervical cancer group was signi cantly higher than in the control group [39], which is consisitent with the results of our study.Based on all these results, we could draw a conclusion that the riskscore model is associated with the immunosuppressive environment.
As far as we know, the prognostic nomogram we developed is the rst one based on immune-related factors in CSCC.However, we are obliged to admit the limitations in our study.The riskscore model is necessary to be further validated in multicenter clinical trials and prospective studies.

Conclusion
In conclusion, our research developed a riskscore model that is based on two immune-related genes to predict the OS of CSCC, which provides an immunological perspective to uncover the mechanisms and assists in clinical decision making for personalized treatment.The nomogram we developed based on riskscore could predict overall survival in CSCC and may bene t those patients through individualized immunotherapy.

Figure 2 The
Figure 2

Figure 4 Comparison
Figure 4

Figure 5 GO
Figure 5

Figure 9 The
Figure 9