Consensus Clustering Identified Two ICD-Related Subtypes
Then, we used consensus clustering to identify the ICD-related clusters in the samples of cervical cancer. After that, cervical cancer samples were divided into two clustering subtypes (Fig. 1A and 1B). On the basis of typing, we further classified all samples into two groups. ICD-related genes were generally expressed at low levels in the cluster C1 subtype, indicating that this subtype is ICD-low. The clustered C2 subtype, on the other hand, displayed high expression levels of ICD-related genes, indicating that it is an ICD-high subtype (Fig. 1C).
The patients in the high-expression subtype of ICD-related genes survived longer than those in the low-expression subtype of ICD-related genes (Fig. 1D), and the difference was statistically significant (P = 0.013), according to the results of our subtypes of ICD-related genes were further examined, and the heat map was contrasted with the volcano plot (Fig. 1E, F).
Thereafter, we performed protein-protein interaction (PPI) analysis on the ICD-related genes using the GENEMANIA database (Fig. 1G), which showed that the core genes were IL17A, TLR4, IFNG, TNF, HMGB1 and IL10.
Enrichment Analysis
To validate the biological functions of ICD-related genes, we used the GO and KEGG databases to examine the biological function characteristics and potential pathways of ICD-related genes (Fig. 2A-D). Our findings indicated that the biological functions of ICD-related genes in GO enrichment analysis were mainly involved in leukocyte-mediated immunity, positive regulation of cell activation, and positive regulation of leukocyte The biological functions of ICD-related genes were mainly related to leukocyte-mediated immunity, positive regulation of cell activation, positive regulation of leukocyte activation, lymphocyte-mediated immunity, the external side of the plasma membrane, the plasma membrane signaling receptor complex, antigen binding, etc. While in the KEGG enrichment analysis, the biological functions of ICD-related genes are mainly involved in cytokine-cytokine receptor interaction, cell adhesion molecules, the chemokine signaling pathway, hematopoietic cell lineage, viral protein interaction with cytokine and cytokine receptor, etc.
We further conducted a GSEA enrichment analysis to contrast the overall enrichment of functional pathways between the subtypes with high and low expression of the ICD (Fig. 2E–F). Gene sets were differentially enriched in the ICD high and low expression subtypes, involving different types of biological pathways, such as allograft rejection, antigen processing and presentation, autoimmune thyroid disease in the high expression subtype of ICD-related genes, chemokine signaling pathway and cytokine cytokine receptor interaction pathway are enriched, while in the low expression subtype of ICD-related genes, drug metabolism cytochrome p450, maturity onset diabetes of the young, metabolism of xenobiotics by cytochrome p45, o glycan biosynthesis and retinol metabolism pathways were enriched in the low expression subtype of ICD-related genes.
Somatic Mutations In Icd-related Genes High And Low Expression Subtypes
The somatic mutation frequency was greater in the ICD high expression subtype than in the ICD low expression subtype (87.29% vs. 82.53%), according to our analysis of the somatic mutations in the high and low expression subtypes of ICD-related genes (Fig. 3A, B), and we also found that TTN, PIK3CA, KMT2C, MUC16, and KMT2D in both subgroups were common mutations in both subgroups, but the frequency of mutations in different subtypes of genes was significantly different; for example, the ICD high expression subtype showed a high frequency of TTN and PIK3CA mutations in the low expression subtype only accounted for 23% and 27% of the total.
Tumor Microenvironment Landscape In Icd-related Genes High And Low Expression Subtypes
In this study, we investigated the characteristics of the tumor microenvironment in the subtypes with high and low expression of ICD-related genes (Fig. 4A–D). When compared to the ICD-related gene low expression subtype, the stromal score, the immunological score,and the estimation score were all higher, while the tumor purity was lower. These differences were statistically significant (P<0.001).
Immune Landscape In Icd-related Genes High And Low Expression Subtypes
The CIBERSORT method was then used to investigate the variances in immunological infiltration of 22 immune cells between the two subtypes (Fig. 5A) and the correlation of immune cells (Fig. 5B). The immune cell composition of each cervical cancer sample and their relationships to one another may be seen in this study. In particular, T cells CD8, T cells CD4 memory activated, T cells follicular helper, Macrophages M1, and Dendritic cells resting were significantly increased in the subtype with high expression of ICD-related genes, while T cells CD4 memory resting, Macrophages M0, Dendritic cells activated, Mast cells activated, and Eosinophils were significantly increased in the low expression subtype of ICD-related genes (Fig. 5C).
The differences in the expression of HLA genes and immune checkpoints between the two subtypes were then further investigated (Figs. 5D, E). The findings revealed that the majority of human leukocyte antigen (HLA) genes and immune checkpoints were upregulated in the high ICD-related gene expression subtype, indicating that this subtype was more likely to benefit from immunotherapy.
Construction And Validation Of The Icd Risk Signature
We then constructed a prognostic risk model based on six ICD-associated genes. First, we screened 10 genes associated with prognosis (Fig. 6A) by univariate Cox regression analysis(Additional file 1: Table S1). Subsequently, these genes were included in the LASSO regression model, and finally 6 genes (ATG5, FOXP3, IFNG, IL1B, PDIA3, TNF) were obtained for the construction of the prognostic model (Fig. 6B, C).
According to the median value of the risk scores across all samples, we generated the risk score (coefficient*expression of each gene in the sample) for each sample and categorized the samples into high-risk and low-risk categories to obtain the ICD-associated genes risk model (Fig. 6D). We observed that when the risk score increased, the number of patients who died increased as well, and that the expression of the high-risk genes ATG5, IL1B, PDIA3, and TNF increased with rising risk, while the expression of the low-risk genes FOXP3 and IFNG decreased with rising risk. The survival of patients in the high-risk group in the TCGA cohort was considerably worse than that in the low-risk group (P 0.01), and this finding was also supported by the GEO cohort. Finally, we further confirmed this finding by survival analysis (Fig. 6E).
Cox analysis identified 6 genes most associated with OS in TCGA dataset; (C) The distribution of risk scores, survival status of patients, and the expression heatmaps of prognostic 6-gene signature in TCGA database; (D, E) Survival analysis between ICD-high risk group and ICD-low risk group in TCGA and GSE44001 cohort.
A Nomogram Of Icd-related Genes For Cervix Cancer
We combined clinical data and risk scores of cervical cancer patients in the TCGA database and carried out a multifactorial Cox regression analysis to create a nomogram (Fig. 7A), which contained six factors: age, grade, T, N, and risk, which was used to predict patients' overall survival rates in 1, 3, and 5 years. The results indicated that our developed model could be used as an independent predictor for cervical cancer patients. Beside that, we used univariate and multifactorial Cox regression analysis (Fig. 7B, C; Additional file 2,3: Table S2, S3) to investigate whether the risk score, compared with other clinical traits such as age, grade, T and N, could be used as independent predictors for cervical cancer patients.
We also used the risk score of ICD-related genes to calculate the OS at 1, 3, and 5 years for cervical cancer patients (Fig. 7D). The AUC of ROC for OS at 1-, 3-, and 5-years in cervical cancer patients were 0.809, 0.695, and 0.709, respectively, according to our findings. In order to compare the effectiveness of a risk score for predicting OS in cervical cancer patients, we also built a ROC containing a risk score of ICD-related genes and other clinical traits (Fig. 7E). The results showed that the risk score had the highest AUC of 0.809, which indicated that the risk model we built to predict OS in cervical cancer patients was the most accurate.
The Association Of Icd Risk Signature With Immune Cells
According to the obtained results of the correlation between the ICD risk model and immune cells, we found that B cells naive, Dendritic cells resting, Macrophages M1, Mast cells resting, T cells CD4 memory activated, T cells CD8, T cells follicular helper, T cells gamma delta and T cells regulatory (Tregs) were negatively associated with risk score (Fig. 8A), while Dendritic cells activated, Macrophages M0, Mast cells activated, Neutrophils, NK cells resting and T cells CD4 memory resting were positively correlated with risk score (Fig. 8B).
Protein Expression Of Atg5, Foxp3, Ifng, Il1b, Pdia3 And Tnf
We took the pertinent information from the Human Protein Atlas (HPA) website in order to disclose the protein expression profiles of ATG5, FOXP3, IFNG, IL1B, PDIA3, and TNF in normal tissues and cervical cancer tissues. We found that, compared with normal tissue, ATG5, FOXP3, IFNG, PDIA3 and TNF were highly expressed in cervical cancer tissues (Fig. 9A–E). Because of its low expression level, IL1B was not found to be expressed in either normal or cancer tissues.