Identification of differentially expressed IRGs in HGSOC patients
We downloaded and normalized the gene expression profiles of 5 FT tissues and 341 HGSOC from the GTEx and TCGA databases, respectively. Using the Wilcoxon test, we finally screened 1544 genes that were differentially expressed between FT and HGSOC (Figure 1A). Of these, 99 genes served as differentially expressed IRGs after intersection with the immunologically relevant gene list downloaded from the ImmPort database. The results were visualized by heatmaps and volcano plots and are shown in Figure 1B-C.
Evaluation of survival-related IRGs
The inclusion criteria of the patients involved in our study were as follows: (1) the pathological type was HGSOC, (2) clinical parameters were completed, and (3) minimum follow-up of 90 days. Finally, 341 patients from the TCGA database were included in the training cohort to obtain the IRG prognostic signature (Additional table 1). We also aimed to use the gene expression profiles and clinical information of ovarian cancer patients from the GEO database to validate our IRG prognostic signature. Therefore, we selected 757 patients from GSE49997, GSE140082, GSE26712 and GSE32062 that met the inclusion criteria for the GEO test cohort. After normalizing the gene expression profiles in both the TCGA and GEO databases, nine survival-related IRGs were identified by performing univariate Cox proportional hazard regression analysis in the TCGA training cohort (Figure 2A).
Construction of the prognostic signature using the TCGA training cohort
Through further Lasso regression analysis and multivariate Cox regression analysis, we obtained 9 optimal survival-related IRGs and combined them into a prognostic signature of HGSOC patients. The prognostic model included HLA-F, PSMC1, PI3, CXCL10, CXCL9, CXCL11, LRP1, STAT1 and OGN. The survival prognosis of a certain patient was predicted by calculating the risk score, which was the sum of each survival-related IRG above times its coefficient, as shown in Figure 2B. The comprehensive risk score was imputed as follows: (-0.08217 × expression level of HLA-F) + (-0.12360 × expression level of PSMC1) + (0.12159 × expression level of PI3) + (0.28363× expression level of CXCL10) + (-0.07392 × expression level of CXCL9) + (-0.39712 × expression level of CXCL11) + (0.07471 × expression level of LRP1) + (-0.03230 × expression level of STAT1) + (0.11022 × expression level of OGN). Patients in the TCGA training cohort were divided into low-risk and high-risk groups according to the median risk score, which was -0.15764. As shown in Figure 2C-D, the AUC (area under the curve) values for the prognostic signature at 1, 2, and 3 years were 0.622, 0.709 and 0.670, respectively, which proved the accuracy of this signature, and the survival prognosis of the low-risk group was significantly better than that of the high-risk group (p=1.018e-8). The 3-year survival rates of the low-risk and high-risk groups were 76.7% (95% CI: 69.8–84.2%) and 55.4% (95% CI: 47.8–64.2%), respectively. The 5-year survival rates of the low-risk and high-risk groups were 45.5% (95% CI: 37.2–55.5%) and 21.2% (95% CI: 15.1–29.6%), respectively. To present the relationship between the risk score and survival status more intuitively, we ranked the HGSOC patients according to risk score. As the risk score increased, the survival time decreased gradually, and the survival status was more likely to be death. The different expression patterns of those 9 IRGs in the low-risk group and high-risk group were visualized in the heatmap (Figure 2E-G). The expression of OGN, LRP1 and PI3 in the high-risk group was relatively higher than that in the low-risk group, and the expression of CXCL9, CXCL10, CXCL11, PSMC1, HLA-F and STAT1 was lower than that in the low-risk group.
Verification prognostic signature using GEO test cohort
Next, we calculated the risk scores of 757 patients in the GEO test cohort (Additional table 2) according to the above formula. Using -0.15764 as the cut-off value, we divided 387 patients into the low-risk group and 370 patients into the high-risk group. In the GEO test cohort, the low-risk group also showed a better survival prognosis than the high-risk group (p=2.632 e-2). The 3-year survival rates of the low-risk and high-risk groups were 66.7% (95% CI: 61.6–72.3%) and 59.3% (95% CI: 54.0–65.2%), respectively. The 5-year survival rates of the low-risk and high-risk groups were 44.1% (95% CI: 37.6–51.7%) and 39.6% (95% CI: 33.1–47.4%), respectively (Figure 3A). The AUC values for the prognostic signature at 1, 2, and 3 years were 0.545, 0.581 and 0.572, respectively (Figure 3B). When patients were ranked in ascending order of the risk score, the survival prognosis (including status and time) worsened. The different expression patterns of these 9 IRGs in the low-risk and high-risk groups had the same tendency as those in the TCGA training cohort (Figure 3C-E).
Age and risk score were independent prognostic factors of HGSOC
To evaluate the independent prognostic significance of the risk score in the TCGA training cohort, we conducted univariate Cox regression analyses and multivariate Cox regression analyses, in which other clinical characteristics (including age, grade, stage) were also involved. All the factors were treated as category variables. As a result, age and risk score were considered independent prognostic factors of HGSOC (Figure 4A-B). Elderly patients might have had a worse survival outcome (hazard ratio of death = 1.016 [95% CI, 1.002-1.029; p=0.020], and patients who had a higher risk score were more likely to suffer a poor prognosis (hazard ratio of death = 2.876 [95% CI, 2.084-3.969; p<0.001]). In addition, patients aged ≥60 years had a higher risk score than patients aged <60 years (p=0.018). However, there was no significant relationship between the risk score and pathological stage (Figure 4C-D). Finally, for better prediction of the 1-, 3-, and 5-year survival rates of HGSOC patients, we constructed a prognostic nomogram combining risk score, age and stage (Figure 4E). The total points were used to evaluate patient survival outcomes.
Immunological feature differences between the low-risk group and the high-risk group
First, we investigated the contribution of immune cells and stromal cells in tumour tissues of the TCGA training cohort. Based on the ESTIMATE algorithm, the stromal score, immune score and ESTIMATE score were computed, which represented stromal content, immune infiltration and tumour purity in tumour tissue, respectively. Tumour tissues in the high-risk group have fewer immune elements(p=0.028) and tend to have more stromal components, although the differences were not significant (p=0.126). There was also no difference in tumour purity between these two groups(p=0.379) (Figure 5A-C). Second, regarding the relationship between immune cell infiltration conditions and risk scores, we found that as the risk scores increased, the proportions of neutrophil, DCs, CD8+ T cells, CD4+ T cells and B cells decreased (the correlation indices were -0.122, -0.202, -0.326, -0.160 and -0.198; the p values were 0.026, 1.909e-4, 9.165e-10, 0.003 and 2.658e-4, respectively). The risk scores had no significant correlation with macrophages (p=0.889) (Figure 5D). Then, regarding the difference in HLA-related gene expression between the low-risk group and the high-risk group, we found that among 24 HLA-related genes, 21 genes were expressed at higher levels in the low-risk group than in the high-risk group (including HLA-F, HLA-E, HLA-DRB1, HLA-DRA, HLA-DQA2, HLA-DPB2, HLA-DPB1, HLA-DOB, HLA-DMB, HLA-DMA, HLA-C, HLA-B, HLA-A) (Figure 5E). This result indicated that patients in the low-risk group might have greater antigen processing and presentation capability. Finally, we also used gene set enrichment analysis (GSEA) to analyse the pathways enriched in patients in the low-risk group and high-risk group. There were 14 gene sets significantly enriched at p<0.05 in the high-risk group and 24 gene sets significantly enriched at p<0.05 in the low-risk group, which are listed in additional table 3. We selected 5 immune-related or meaningful pathways enriched in the low-risk group and high-risk group and showed them in Figure 5F. As a result, ECM receptor interaction, hedgehog signalling pathway, focal adhesion, adherens junction and tight junction were key pathways enriched in patients in the high-risk group. Intestinal immune network for IgA production, primary immunodeficiency, antigen processing and presentation, autoimmune thyroid disease and allograft rejection were key pathways enriched in patients in the low-risk group.
Identification of the potential transcription factors that regulate the expression of the nine prognostic IRGs
There were 32 transcription factors which differentially expressed between FT and HGSOC. After expressive correlation analysis, we found 3 transcription factors could regulate the expression of the nine IRGs in prognostic signature. As shown in Figure 5G, FOS, NR4A1 and NR2F1 could upregulate the expression of LRP1. NR2F1 could not only upregulate the expression of OGN, but also downregulate the expression of CXCL11 and CXCL10.