Identification of DEGs in ovarian cancer process and immune invasion process
We aimed to identify DEGs that are closely related to the occurrence of ovarian cancer and the immune infiltration process. Compared with 88 normal samples, 2,908 genes were up-regulated and 3,162 genes were down-regulated in 379 tumor samples (Supplementary Fig. 1A). These 6070 (2908 + 3162) genes could be regarded as closely related to the occurrence of OV, and were used in subsequent WGCNA analysis.
It is generally believed that the degree of tumor immune infiltration is closely related to the content of stromal cells and immune cells in tumor samples. Compared with the low StromalScore group, a total of 734 genes were up-regulated and 398 genes were down-regulated in the high StromalScore group (Supplementary Fig. 1B). Compared with the low ImmuneScore group, 629 genes were up-regulated and 520 genes were down-regulated genes in the high ImmuneScore group (Supplementary Fig. 1C). From the intersection of StromalScore and ImmuneScore, the up- (420 genes, Fig. 1B) and down-regulated genes (263 genes, Fig. 1C) were identified, these genes can be regarded as genes that play a key role in the progress of tumor immune infiltration.
Results of DEGs GO and KEGG gene enrichment analysis
To verify the method of grouping according to the scores assigned by the ESTIMATE algorithm was indeed applicable to our investigations, and achieve a better understanding of the roles of the identified 683 DEGs (420 + 263) in ovarian cancer, GO and KEGG pathway enrichment analysis was performed. The top five GO terms were: T cell activation; regulation of lymphocyte activation; leukocyte cell-cell adhesion; lymphocyte differentiation; and regulation of T cell activation. These terms showed that the DEGs obtained according to the ESTIMATE algorithm are closely related to the immune process in tumor tissues and confirmed the effectiveness of the StromalScore and ImmuneScore (Fig. 2A,2B). It can be observed that many genes are involved in these five GO terms. CCL19, ZNF683, PLA2G2D, and CD2 genes not only had the largest fold-change in expression, but were also in all of the top five GO terms, indicating that these genes may be more critical in the immune process. The top three KEGG terms were: viral protein interaction with cytokine and cytokine receptor between viral proteins and cytokines as the basis of viral infection and pathogenicity; cytokine-cytokine receptor interaction; and chemokine signaling pathway (Fig. 2C, D). About 15% of human cancers can be attributed to virus infection23. In addition to their association with tumor metastasis and inflammation, chemokines are also closely related to regulation of the immune system. Chemokines not only affect the migration and differentiation of lymphocytes24, but also are closely related to the maturation, differentiation and functional effects of T and B lymphocytes25. PF4, CXCL9, CXCL13, and CCL19 were also enriched in all of the top three KEGG pathways and had the largest fold-changes on expression.
WGCNA of DEGs related to ovarian cancer
To make the connectivity of the gene regulatory network obey the power law distribution, we exponentially weighted the correlation coefficients of genes. A soft threshold (weight) of beta = 8 better meets the requirements of scale-free networks (Fig. 3A). We obtained 14 gene modules through dynamic clustering and then performed a correlation analysis between the gene modules and the occurrence of tumors (Fig. 3B). Based on previous reports26, we assumed that when the correlation coefficient > 0.65, the module was the key gene module in the process of disease, and the hub genes were selected from these modules. As shown in Fig. 3B, the red, blue, turquoise, black, green, purple, pink, grown, magenta, and yellow modules had a coefficient > 0.65 and were included in the subsequent analysis. In addition, the correlation analysis between modules (Fig. 3C) showed that the similarity among red, blue, and turquoise gene modules was high; the similarity among the brown, magenta, and yellow gene modules was high; and the similarity among the tan, black and pink gene modules was high. The scatter plot of gene importance is shown in Fig. 3D and Supplementary Fig. 1D-1F. A total of 2,526 genes were obtained by selecting key genes in the upper quartile of the horizontal and vertical coordinates of these gene modules27. A total of 46 genes were obtained from the intersection of these genes with the DEGs obtained using the ESITIMATE algorithm (Fig. 3E). These genes were regarded as related both to the occurrence of ovarian cancer and the degree of immune infiltration.
The establishment of the IGCI score
The degree of immune infiltration is often closely related to the tumor recurrence and the amount of tumor stem cells. The tumor recurrence is related to the patient’s final survival status. Therefore, 46 genes from the intersection of the key genes of WGCNA and the DEGs by ESTIMATE algorithm were used and then Lasso regression was performed on these genes and 7 clinical factors (Age; Asian,1 means yes, 0 means not; Black,1 means yes, 0 means not; White,1 means yes, 0 means not; Stage; Pharmaceutical.Therapy, 1 means yes, 0 means not; Radiation.Therapy, 1 means yes, 0 means not) in the training set. The summary of the patient's clinical information is shown in Table 1. To prove our random grouping of patients is reasonable, we used the t-test for continuous variables and the chi-square test for discrete variables to compared the clinical information of the training set and the test set. As shown in Table 1, all clinical information has no significant difference between the training set and the test set (P > 0.05), indicating this grouping can be used in subsequent studies.
Table 1
Patient clinical baseline data
Characteristics | Train sets (n = 130) | Test sets (n = 128) | t/χ2 value | P value |
Age (mean ± sd.) | 59.55 ± 11.46 | 59.09 ± 11.32 | 0.3261 | 0.7466 |
Race (%) | | | 2.1846 | 0.3354 |
White | 115 (88.5) | 115 (89.8) | | |
Black | 9 (6.9) | 11 (8.60) | | |
Asian | 6 (4.6) | 2 (1.6) | | |
Stage (%) | | | 3.9815 | 0.1366 |
I or II | 13 (10.0) | 7 (5.5) | | |
III | 91 (70.0) | 103 (80.5) | | |
IV | 26 (20.0) | 18 (14.0) | | |
Pharmaceutical (%) | | | 0.55596 | 0.4559 |
Yes | 125 (96.2) | 126 (98.4) | | |
No | 5 (3.8) | 2 (1.6) | | |
Radiation (%) | | | 1.9711 | 0.1603 |
Yes | 4 (3.1) | 10 (7.8) | | |
No | 126 (96.9) | 118 (92.2) | | |
Survival state (%) | | | 0 | 1 |
Death | 65 (50.0) | 64 (50.0) | | |
Alive | 65 (50.0) | 64 (50.0) | | |
In lasso regression, by minimizing partial likelihood deviance, we selected the penalty coefficient lambda (Supplementary Fig. 2A, 2B); the remaining 10 factors were included in the subsequent study. The levels of these 10 factors in the training set are shown in Supplementary Table 1. Through Multivariate Cox proportional hazards regression analysis, we constructed a prognostic score based on immune genes and clinical information (IGCI) for patients with ovarian cancer:
$$\text{I}\text{G}\text{C}\text{I} \text{s}\text{c}\text{o}\text{r}\text{e}=0.0232\times \text{A}\text{g}\text{e}-0.6457\times \text{W}\text{h}\text{i}\text{t}\text{e}-3.6304\times \text{P}\text{h}\text{a}\text{r}\text{m}\text{a}\text{c}\text{e}\text{u}\text{t}\text{i}\text{c}\text{a}\text{l}.\text{T}\text{h}\text{e}\text{r}\text{a}\text{p}+0.4089\times \text{F}\text{G}\text{F}7-0.1290\times \text{C}\text{C}\text{R}1+0.0085\times \text{C}\text{D}14$$
The high level of Age, FGF7 and CD14 was found to be disadvantageous for patients’ prognosis, while high level of White, Pharmaceutical.Therap, and CCR1 was favorable factor for patients’ prognosis. We calculated the hazard radio (HR) of each gene and the corresponding 95% confidence interval, as shown in Figure 4A. More details of the multivariate Cox proportional hazards regression are shown in Supplementary Table 2.
The patient groups were subdivided according to median IGCI score in the training set. As the IGCI score gradually increased, the survival time decreased, and the proportion of patients’ deaths gradually increased in both training and test set (Supplementary Figure 2C,2D), which indicated the accuracy of IGCI score. In addition, analysis of patient survival status based on IGCI score groups showed significant differences (Figure 4B,4C). The P-value in the training set is less than 0.001 (Figure 4B), and the P-value in the test set is less than 0.05 (Figure 4C). In the training set, the 1-, 3-, and 5-year OS rates in the low-risk group were 97.3% (95% CI = 91.3–100.0), 77.1% (95% CI = 65.4–90.8), and 47.0% (95% CI = 33.0–65.9); the 1-, 3-, and 5-year OS rates for the high-risk group were 90.8% (95% CI = 84.0–98.1), 46.5% (95% CI = 33.7–64.0), and 14.5% (95% CI = 6.5–32.0). In the test set, the 1-, 3-, and 5-year OS rates in the low-risk group were 97.1% (95% CI = 92.0–100.0), 78.2% (95% CI = 66.9–91.3), and 34.3% (95% CI = 21.8–54.0); the 1-, 3-, and 5-year OS rates for the high-risk group were 86.1% (95% CI = 77.1–96.3), 48.7% (95% CI = 35.4–66.8), and 16.1% (95% CI = 7.1–36.5). The IGCI score obtained was used to predict the disease status of patients at 1-, 3-, and 5-years. The AUC of the obtained ROC curves in test set were 0.766, 0.662, and 0.628 (Figure 4D–4F). The AUC of the obtained ROC curves in training set were 0.802, 0.720, and 0.711 (Supplementary Figure 3A–3C). These results reflected the practical application value of the IGCI score.
TIMER database (https://cistrome.shinyapps.io/timer/) was used to explore the relationship between the genes in the IGCI score and the content of various immune cells. There was a positive correlation between FGF7 expression and the infiltration of Neutrophil cells (Cor = 0.107, P = 1.86e-02) and there was a negative correlation between FGF7 expression and the infiltration of B cells (Cor = -0.135, P = 3.09e-03, Figure 4G). CCR1 expression was positively associated with the infiltration of B cells (Cor = 0.139, P = 2.21e-03), CD8+ T cells (Cor = 0.321, P = 6.23e-13), CD4+ T cells (Cor = 0.309, P = 4.22e-12), Macrophage cells (Cor = 0.335, P = 4.90e-14), Neutrophil cells (Cor = 0.529, P = 5.61e-36) and Dendritic cells (Cor = 0.433, P = 2.47e-23; Figure 4H). CD14 expression was positively associated with the infiltration of B cells (Cor = 0.193, P = 1.97e-05), CD8+ T cells (Cor = 0.402, P = 4.24e-20), CD4+ T cells (Cor = 0.329, P = 1.34e-13), Macrophage cells (Cor = 0.495, P = 4.23e-31), Neutrophil cells (Cor = 0.633, P = 4.41e-55) and Dendritic cells (Cor = 0.562, P = 2.22e-41; Figure 4I).The above results further verify that the genes in the IGCI score are closely related to the immune infiltration process of OV, and may be effective prognostic markers.
In order to explore the content of prognostic genes at the protein level, we used data on immunohistochemistry (IHC) datasets (The Human Protein Atlas database, http://www.proteinatlas.org/) to explore the content of FGF7, CCR1, and CD14 in protein levels in ovarian cancer and control groups. The results are shown in Figure 5A,5B. The corresponding sample information is shown in the Supplementary Table3. The IHC database lacks the corresponding data for the CCR1 gene. FGF7 (Antibody:HPA043605) and CD14 (Antibody:HPA001887) have higher protein content in the tissues of ovarian cancer patients, the staining of IHC sections is deeper, and the results of CD14 are more obvious. These two genes are disadvantages in our prognostic model. The results of IHC and the prognostic model are consistent.
To assist the clinical work of ovarian cancer, we established a nomogram based on the IGCI score (Figure 5C). At present, AJCC stage is often used to predict the prognostic status of OV patients. In addition, Carter et al.20 proved that a signature of chromosomal instability containing 25 genes (CIN25) can also effectively predict the prognostic status of OV patients. In order to verify the validity of the IGCI score and the nomogram, we calculated the C-index of the IGCI score and compared with the C-index of the AJCC stage and CIN25. The results are shown in Table 2. The C-index of the IGCI score in the training and test set were 0.701 (95% CI: 0.625, 0.779) and 0.630 (95% CI: 0.542, 0.719), which were significantly higher than the results of the AJCC stage (P <0.05) and CIN25 (P <0.05). In addition, we have plotted the 3-year and 5-year survival rate calibration curves to evaluate the predictive power of the IGCI score. In the training (Supplementary Figure 3D,3E) and test (Figure 5D,5E) set, our prediction results are close to the ideal results (red lines), and the errors are within the standard error range. This shows that the prediction results of the IGCI score are accurate.
Table 2 C-index results for IGCI score, AJCC stage and CIN25.
Method
|
Training set
|
Test set
|
C-index(95%CI)
|
P
|
C-index(95%CI)
|
P
|
IGCI score
|
0.701(0.625,0.779)
|
|
0.630(0.542,0.719)
|
|
AJCC stage
|
0.568(0.493,0.642)
|
<0.05
|
0.541(0.477,0.604)
|
<0.05
|
CIN25
|
0.543(0.461,0.625)
|
<0.05
|
0.571(0.533,0.608)
|
<0.05
|
Genetic mutation analysis
To understand the types of gene mutations that are closely related to ovarian cancer, we first generated a waterfall map of the type of genetic mutation (Figure 6). From TCGA database, we obtained 409 samples with complete information for types of genetic mutations and clinical data and the top 10 genes with the most mutations were selected for display (Figure 6). The waterfall map showed no clear relationships among the stage of cancer, patient age and gene mutations (Figure 6). Compared to other genes, TP53 had a variety of mutation types, while TTN and DST were found to be more prone to mutations in the intron regions. The survival analysis of gene mutations revealed that the mutation of five genes significantly affect the prognosis of patients (Figure 7A): TOP2A (P= 0.0018), CSMD3 (P= 0.0119), FLG2 (P= 0.0166), TRRAP (P= 0.0389), and HMCN1 (P= 0.0443). Five genes showed marginal significance (0.05 <P<0.1, Figure 7B): LRP1 (P=0.0556), SZT2 (P=0.0638), FAT3 (P=0.0654), RB1 (P=0.0866), and DNAH9 (P=0.0906).
We performed a Chi-squared test (Table 3) on the gene mutations and the median ImmuneScore grouping. The mutations with P <0.05 included: BRCA1, RBAK, ZNF462, ADGRV1, RB1, and VWF. The mutations with 0.05 <P <1 included: APOB, CSMD1, DNAH3, KCNA6, COL11A1, ZNF638, AMER1, SPOCD1, SORBS2, ZNF774, LTN1, KIF3C, RANBP6, ADCY5, CSMD3, and PKHD1.
Table 3 Chi-squared test of gene mutation and immune infiltration scores
Gene
|
Mutation + high score
|
Mutation + low score
|
Wild + high score
|
Wild + low score
|
P
|
BRCA1
|
10
|
2
|
126
|
134
|
0.0387
|
RBAK
|
0
|
6
|
136
|
130
|
0.039
|
ZNF462
|
0
|
6
|
136
|
130
|
0.039
|
ADGRV1
|
8
|
1
|
128
|
135
|
0.0419
|
RB1
|
1
|
8
|
135
|
128
|
0.0419
|
VWF
|
8
|
1
|
128
|
135
|
0.0419
|
APOB
|
11
|
3
|
125
|
133
|
0.0547
|
CSMD1
|
3
|
11
|
133
|
125
|
0.0547
|
DNAH3
|
9
|
2
|
127
|
134
|
0.0647
|
KCNA6
|
0
|
5
|
136
|
131
|
0.0709
|
COL11A1
|
5
|
0
|
131
|
136
|
0.0709
|
ZNF638
|
0
|
5
|
136
|
131
|
0.0709
|
AMER1
|
0
|
5
|
136
|
131
|
0.0709
|
SPOCD1
|
0
|
5
|
136
|
131
|
0.0709
|
SORBS2
|
5
|
0
|
131
|
136
|
0.0709
|
ZNF774
|
5
|
0
|
131
|
136
|
0.0709
|
LTN1
|
0
|
5
|
136
|
131
|
0.0709
|
KIF3C
|
5
|
0
|
131
|
136
|
0.0709
|
RANBP6
|
0
|
5
|
136
|
131
|
0.0709
|
ADCY5
|
0
|
5
|
136
|
131
|
0.0709
|
CSMD3
|
19
|
9
|
117
|
127
|
0.0725
|
PKHD1
|
10
|
3
|
126
|
133
|
0.0881
|
The RB1 and CSMD3 mutation had small P-value in both the prognostic process (RB1:P=0.0866; CSMD3:P=0.0119) and the chi-square test (RB1:P=0.0419; CSMD3:P=0.0725). This shows that RB1 and RSMD3 mutations may play an important role in OV patients. In order to explore the clinical value of RB1 and RSMD3 mutations, we used the GDSC database for drug sensitivity analysis. RSMD3 mutation was not included in the GDSC database. The result of the RB1 mutation was shown in Figures 7C and 7D. The efficacy of AZD6482, Pictilisib, AZD8186, Gefitinib in RB1 mutant samples was more significant (P <0.05), while the efficacy of AZD7762, RO-3306 in RB1 mutant samples decreased (P <0.05). The above results indicate that the occurrence of RB1 mutation may make the patient's response to drugs change greatly. Compared with the wild type, the prognosis of patients with RB1 mutation is better (Figure 7B). On this basis, the use of drugs with enhanced efficacy for these patients may get better treatment results.