Immune subtype analysis of TCGA-OC datasets
We searched, downloaded and organized the OC database in TCGA. We obtained 546 TCGA-OC samples, including 500 tumor tissue samples, 2 tumor metastasis samples, and 44 normal tissue samples. After normal and tumor metastasis samples were filtered, 345 OC samples were collected. Samples with a survival time of less than 30 days were filtered, and 321 OC samples were obtained for subsequent analysis.
Stromal cells are regarded to play an important role in tumor development and drug resistance, so it is necessary to analyze the tumor purity. We analyzed the tumor purity of the 321 OC samples based on Estimation of STromal and Immune cells in MAlignant Tumour tissues using Expression data (Supplementary Table S1). We found that the tumor purity was negatively associated with the stromal score, immune score and estimate score (Table 1), and patients with low tumor purity had more immune cells (Supplement Table 1). Furthermore, we evaluated the proportions or abundances of various immune cells using bioinformation through CIBERSORT (Fig. 1). The results of the immune cell abundance matrix were analyzed using ssGSEA, and hierarchical agglomerative clustering was analyzed. R package NbClust (Charrad et al., 2014) can be used to evaluate the optimal clustering of samples; the number of clusters varied from 2 to 10, and then the optimal cluster number was determined to be 2 using the evaluation index (Figs. 2A–2C,). We classified the 321 TCGA-OC cohorts into high-immunity subtype (S1) and low-immunity subtype (S2) for the pursuant analysis (Figs. 2D and 2E). From Fig. 3A, we found that the S1 subtype had higher stromal and microenvironment scores.
TABLE 1. Analysis of TumourPurity in TCGA-OC cohorts
|
TumourPurity
|
StromalScore
|
ImmuneScore
|
ESTIMATEScore
|
TumourPurity
|
1
|
-0.89
|
-0.89
|
-0.99
|
StromalScore
|
-0.89
|
1
|
0.62
|
0.89
|
ImmuneScore
|
-0.89
|
0.62
|
1
|
0.90
|
ESTIMATEScore
|
-0.99
|
0.89
|
0.90
|
1
|
The results of immune infiltration were compared and analyzed according to the immune subtypes (S1 subtype and S2 subtype) (Fig. 3B). The abundances of immune infiltration of T cells CD8, T cells CD4 memory-activated, macrophages M1, mast cells resting and regulatory T cells (Tregs) in the S1 subtype were significantly higher than those in the S2 subtype (Fig. 3B). Moreover, DCs resting, macrophages M0 and mast cells activated had higher abundances in the S2 subtype than in the S1 subtype. (Fig. 3B).
Analysis of differentially expressed genes in S1 and S2 subtypes
We assessed the differential expression genes (DGEs) in the two subtypes. A total of 1111 genes were screened from the two subtypes for the subsequent analysis visualized in the volcano and heatmap plots (Figs. 3C and 3D; Supplementary Table S2). We analyzed the gene functions of these DGEs using the KEGG pathway and GO. The GO analysis demonstrated that in the biological process group, these DEGs were mainly involved in immune response-activating cell surface receptor signaling, immune response-activating signal transduction, regulation of immune effector process, adaptive immune response based on somatic and lymphocyte-mediated immunity (Fig. 4A). The external side of plasma membranes, immunoglobulin complexes, and contractile fibers enriched in the cellular component group (Fig. 4B). In addition, in the molecular function group, antigen binding, actin binding, immune receptor activity, and immunoglobulin receptor binding were associated with DGEs (Fig. 4C). We conducted the KEGG pathway analysis and found that these immune-related DGEs were involved in several pathways, including cell adhesion molecules, Staphylococcus aureus infections, viral protein interactions with cytokines and cytokine receptors (Fig. 4D).
Moreover, we selected the top GSEA results of the GO enrichment and KEGG pathway analysis (Figs. 4E-4P). In the GO enrichment, adaptive immune responses, immunoglobulin complexes, complement activation, and lymphocyte-mediated immunity (Fig. 4E) were associated with DGEs of the two subtypes. The DGEs involved in the functions of cell cycles were down-regulated (Figs. 4H and 4I). In the KEGG pathway analysis, cytokine receptor interactions, cell adhesion molecules and hematopoietic cell lineages were associated with DGEs (Fig. 4K). The DGEs involved in the pathway of vascular smooth muscle contraction were up-regulated in the OC patients (Fig. 4O).
The screening for prognostic signatures and the establishment of the prognosis model
We performed the univariate Cox regression analysis to screen IRGs. Survival was used for the univariate Cox regression analysis. We selected the 18 most significant genes from the univariate Cox regression analysis for the next LASSO Cox analysis to build the risk model (Table 2). The 8 IRGs from the 1111 IRGs associated with OC patients were significantly related to OS (Table 2). Next, LASSO was performed to conduct a prognosis risk model using the significant IRGs. Eighteen IRGs were significant in the univariate Cox regression analysis using risk models (Fig. 5). After LASSO screening, we used multiple assays to screen the 18 genes to construct the IRG prognosis risk model (IRGPRM) for OC patients. Six immune-related prognosis biomarkers for OC patients were obtained (Fig. 5C; Table 3) after the univariate Cox regression analysis and multivariate Cox regression analysis. The IRGPRM was constructed based on the expression of the six immune-related mRNAs. The IRGPRM was constructed with the following formula: IRGPRM = - 0.093 × CCL22 − 0.034 × CCL21 − 0.027 × CD79A − 0.039 × CTLA4 − 0.093 × MS4A2 − 0.041 × MS4A1. We divided OC patients into high-risk (n = 161) and low-risk groups (n = 160) using the median risk as the separator threshold. Scatter plots of risk value distribution and survival time were drawn respectively (Figs. 6A and 6B). We analyzed the differential expression levels of the six IRGs for each patient in the two risk groups (Fig. 6C). The expression levels of the six IRGs were higher in the low-risk group than those in the high-risk group in OC (Fig. 6C). We found that the expression patterns of the IRGs were in accordance with the HR values (Table 3). The Kaplan–Meier analysis indicated that the low-risk group had better OS than the high-risk group in OC (Fig. 6D). The results of receiver operating characteristics (ROCs) that predict 1-, 3- and 5- year OS indicated that the IRGPRM could be used as risk prediction for OC patients, with area under curves (AUCs) of 0.639, 0.620 and 0.586, respectively (Fig. 6E). The 1-, 3- and 5- year OS predictions were calibrated. All the calibrated curves were well fit by the nomogram analysis (Figs. 6F–6H).
TABLE 2. Lasso Cox analysis of risk models based on the eighty immune-related DEGs
Variable
|
Beta
|
StandardError
|
Z
|
P
|
HR
|
HRlower
|
HRupper
|
CCL22
|
-0.19
|
0.07
|
-2.84
|
0.01
|
0.82
|
0.72
|
0.94
|
CCL21
|
-0.10
|
0.04
|
-2.48
|
0.01
|
0.91
|
0.84
|
0.98
|
CD79A
|
-0.12
|
0.05
|
-2.48
|
0.01
|
0.89
|
0.81
|
0.98
|
CTLA4
|
-0.18
|
0.08
|
-2.42
|
0.02
|
0.83
|
0.72
|
0.97
|
CCR4
|
-0.20
|
0.08
|
-2.38
|
0.02
|
0.82
|
0.70
|
0.97
|
FCRL3
|
-0.41
|
0.17
|
-2.38
|
0.02
|
0.67
|
0.48
|
0.93
|
MS4A2
|
-0.33
|
0.15
|
-2.27
|
0.02
|
0.72
|
0.54
|
0.96
|
IRF4
|
-0.19
|
0.09
|
-2.22
|
0.03
|
0.82
|
0.69
|
0.98
|
MS4A1
|
-0.21
|
0.10
|
-2.22
|
0.03
|
0.81
|
0.67
|
0.98
|
CD6
|
-0.18
|
0.08
|
-2.15
|
0.03
|
0.84
|
0.71
|
0.98
|
CD27
|
-0.13
|
0.06
|
-2.14
|
0.03
|
0.88
|
0.77
|
0.99
|
TNFRSF4
|
-0.18
|
0.08
|
-2.14
|
0.03
|
0.84
|
0.71
|
0.99
|
ICOS
|
-0.17
|
0.08
|
-2.13
|
0.03
|
0.85
|
0.72
|
0.99
|
CSF2RB
|
-0.15
|
0.07
|
-2.09
|
0.04
|
0.86
|
0.75
|
0.99
|
P2RY8
|
-0.19
|
0.09
|
-2.04
|
0.04
|
0.83
|
0.70
|
0.99
|
SLAMF1
|
-0.22
|
0.11
|
-2.00
|
0.05
|
0.81
|
0.65
|
1.00
|
BATF
|
-0.14
|
0.07
|
-1.98
|
0.05
|
0.87
|
0.75
|
1.00
|
LAX1
|
-0.18
|
0.09
|
-1.97
|
0.05
|
0.83
|
0.70
|
1.00
|
Var: Cox univariate analysis IRGs; Beta: coefficient; HR: relative risk value; Z: Z value of statistical test; P: the P value tested by Cox test with multi-factor; DEG: differentially expressed genes.
TABLE 3. six IRGs used to construct the risk prognosis model
Characteristics
|
HR
|
CI95
|
P
|
CCL22
|
-0.09
|
0.91
|
0.28
|
CCL21
|
-0.03
|
0.97
|
0.47
|
CD79A
|
-0.03
|
0.97
|
0.72
|
CTLA4
|
-0.04
|
0.96
|
0.68
|
MS4A2
|
-0.09
|
0.91
|
0.60
|
MS4A1
|
-0.04
|
0.96
|
0.77
|
Var: Cox univariate analysis IRGs; HR: relative risk value; CI95.high: range of 95% confidence interval of relative risk value HR; P: the P value tested by Cox test with multi-factor; DEG: differentially expressed genes.
Verification of the prognostic capability of the prognosis risk model
Two GEO datasets (GSE41613, n = 62; GSE42743, N= 37) were used to test the prognosis model. We calculated the risk value for each OC patient in the two GEO datasets according to the prognosis model. We divided patients in the GSE41613 (Figs. 7A–7C) and GSE42743 datasets (Figs. 8A–8C) into high-risk and low-risk groups for the subsequent analysis. The same expression pattern of the six IRGs was observed in the TCGA (Fig. 6C) and GSE41613 datasets (Fig. 7C). However, the expression patterns were different between the GSE42743 dataset (Fig. 8C) and the other two datasets (Figs. 6C and 7C). In our study, the high-risk group was associated with worse OS than the low-risk group (p = 0.011, Fig. 7D and p = 0.0078, Fig. 8D) in the two datasets. The result was similar to that observed in the TCGA dataset. To verify the capability of the six IRGPRM, we evaluated the conformance index and the AUCs to determine the risk scores of the two GEO datasets. We found that the model had an accurate predictive capability with the 1-, 3-, and 5-year AUCs of 0.774, 0.781 and 0.765 in GSE41613, respectively (Fig. 7E). Although there was no database of 5-year OS in GSE42743, the analysis results of the 1- and 3-year OS ROC curves indicated that the model could accurately predict the prognosis of OC patients (AUCs = 0.971 and 0.868, respectively, Fig. 8E). The above analyses explained that the risk scores predicted by the six IRGPRM saccurately described the prognosis of OC.
The effects of the expression levels of IRGs on the distribution of immune cell infiltration
We explored the correlations between the six IRGs and immune infiltration levels, planning to reveal the underlying mechanism by which the six IRGs influenced the prognosis of OC. Firstly we found that the six IRGs were up-regulated in the low-risk group (Fig. 9A; Supplement Table 2). We assessed the correlations of the expression levels of the six IRGs with infiltrating levels of immune cells (Figs. 12B–G) using TIMER. The results illustrated that the expression of the six genes had nearly the same effect on the microscopic characteristics of immune infiltration (Figs. 9B–9G). Intriguingly, the relatively high expression of the six IRGs promoted tumor immune infiltration (Figs. 9B–9G), and the expression of the six genes was higher in the low-risk group than in the high-risk group. These findings suggested that the signatures of the six IRGs might play a momentous role in the formation of the immune infiltration in OC. The correlation between the six IRGs and immune infiltration may explain why the low-risk group had better OS in OC to some extent.
Nomogram analysis of the clinical application
The risk values predicted by the model were combined with other clinical indicators to construct a nomogram for predicting the prognosis of patients. Using the clinical information of TCGA data, including clinical stages, ages and genders, we screened variables significantly correlated with prognosis through the univariate COX regression analysis and multivariate Cox regression analysis and found that clinical stages, risk scores and ages were adverse factors for the prognosis of OC (Table 4). Risk scores and clinical stages were used to construct the prognosis model for developing subsequent nomograms (Table 5). We found that higher total scores led to a shorter survival time in OC (Figs. 10A and 10B). We confirmed the ROC curves of the nomogram of the1-, 3-, and 5-year survival probability (Fig. 10C) and found that the model had an accurate predictive capability with the AUCs of 0.653, 0.691 and 0.676 (Fig. 10C), and the predicted valued matched the actual values well (Figs. 10D–10F). These analyses explained that the risk scores predicted by the six IRGPRM accurately described the prognosis of OC.
TABLE 4. HR analyses of the clinical characteristics and risk score with the OS based on univariate cox analysis
Variable
|
Beta
|
HR
|
HRlower
|
HRupper
|
P
|
Stage
|
0.41
|
1.51
|
1.23
|
1.86
|
0.00
|
riskScore
|
1.66
|
5.27
|
2.07
|
13.41
|
0.00
|
age_at_index
|
0.02
|
1.02
|
1.00
|
1.03
|
0.02
|
gender
|
-0.09
|
0.92
|
0.64
|
1.32
|
0.64
|
Beta: coefficient; HR: relative risk value; P: the P value tested by Cox test with univariate cox analysis;
TABLE 5. HR analyses of the clinical characteristics and risk score with the OS based on multivariate cox analysis
Variable
|
Beta
|
HR
|
HRlower
|
HRupper
|
P
|
riskScore
|
1.43
|
4.19
|
1.60
|
10.97
|
0.00
|
age_at_index
|
0.02
|
1.02
|
1.01
|
1.04
|
0.01
|
Stage
|
0.42
|
1.52
|
1.23
|
1.88
|
0.00
|
Beta: coefficient; HR: relative risk value; P: the P value tested by Cox test with multivariate cox analysis.