Prognostic Value of Genes Related with Tumor Microenvironment in Ovarian Cancer


 Background

Ovarian cancer (OV) is the fifth leading cause of cancer death among females. Growing evidence supports a key role of tumor microenvironment in growth, progress, and metastasis of OV. However, the impacts of gene expression signatures related with OV microenvironment on prognosis have not been well-established . This study aimed to apply ESTIMATE algorithm to extract genes related with tumor microenvironment that predicted poor outcomes in OV patients.

Methods

The gene expression profile of OV samples were downloaded from The Cancer Genome Atlas (TCGA) database. The immune scores and stromal scores of 469 OV samples were available based on the ESTIMATE algorithm. To better understand impacts of gene expression signatures related with OV microenvironment on prognosis, these samples were categorized based on their ESTIMATE scores into high and low score groups. A different OV cohort from the Gene Expression Omnibus (GEO) database was used for external validation.

Results

The molecular subtypes in OV patients were correlated with stromal scores, in which the mesenchymal subtype had the highest stromal scores (p < 0.0001). Poor prognosis were found in patients (especially for patients with overall survivals (OS) < 5 years) with higher stromal score (p = 0.0376). 449 differentially expressed genes (DEGs) in stromal scores group were identified and 26 DEGs were significantly associated with poor prognosis in OV patients (p < 0.05). Eventually, 6 genes have further validated to be significantly associated with poor outcomes in 40 patients from a different OV cohort of GEO database (p < 0.05).

Conclusion

In this study, several genes related with tumor microenvironment that predicted poor prognosis in OV patients were extracted. In addition, some previously overlooked genes could be potential prognostic biomarkers for OV.

relationship between BRCA1/BRCA2 mutations and OV prognosis, the prognostic signi cance of BRCA1 mutations have not been well illuminated.

ESTIMATE (Estimation of STromal and Immune cells in MAlignant Tumor tissues using Expression data)
has been proposed to be a new algorithm to score the fraction of immune and stromal cells in tumor tissues, then predict the in ltration level of immune and stromal cells and tumor purity based on gene expression signatures from the TCGA database [8]. Despite several reports have used the ESTIMATE algorithm in some tumors, such as breast cancer [9], head and neck squamous cell carcinoma [10], Glioblastoma multiforme [11], and gastric cancer [12], its application in OV has not been studied in detail.
In this study, we applied ESTIMATE algorithm to mine genes related with tumor microenvironment that predicted poor outcomes in OV patients. Notablely, we have further validated these genes by a different OV cohort from the Gene Expression Omnibus (GEO) database (Series GSE32063).

Data Preparation
Level 3 date of gene expression pro le of OV samples were downloaded from the TCGA data coordination center (https://tcga-data.nci.nih.gov/tcga/), using the Affymetrix HT Human Genome U133a (HT-HG-U133A) microarray platform on September 8, 2017. Clinical data of OV samples, including gender, age, histological type, clinical stage, neoplasm histologic grade, molecular subtypes, mutation status, and survival information were also obtained from TCGA database. ESTIMATE scores, including immune scores and stromal scores of 469 OV samples were available based on the ESTIMATE algorithm to the normalized expression matrix. A dataset from the GEO database were used for validation in this study, including 40 OV patients from Series GSE32063.
Identi cation and analysis of differentially expressed genes (DEGs) According to the ESTIMATE scores obtained from the ESTIMATE algorithm, all OV samples were divided into two groups, namely high (immune or stromal) scores group and low (immune or stromal) scores group. R package limma was used for the identi cation of DEGs [13]. The DEGs between two groups (high group vs. low group) were screened based on the following cut-offs: false discovery rate (FDR) adjusted p-value < 0.05 and fold change (FC) > 1.5. "Upregulated DEGs" represented the genes that were upregulated in the high group compared with the low group and "downregulated DEGs" showed the genes that were downregulated in the high group compared with the low group. statistically signi cant.

Construction of protein-protein interaction (PPI) network
The PPI network of upregulated DEGs in the stromal group was constructed by the STRING database [16] and Cytoscape software 3.6.1 [17].

Overall survival (OS) curve
The relationship between gene expression levels of DEGs and OS in OV patients was analyzed by Kaplan-Meier survival curve with a log-rank (Mantel-Cox) test.

Statistical Analyses
All statistical analysis were carried out using GraphPad Prism 5 (Version 5.01) and R version 3.5.1. The data analyses in this study were performed by standard statistical tests whenever appropriate. Comparisons between groups were evaluated by unpaired t test and one-way analysis of variance. P < 0.05 was deemed to be statistical signi cant.

Correlation of ESTIMATE scores with OV prognosis
The gene expression pro les and clinical date of 469 OV female patients diagnosed pathologically between 1992 and 2009 were downloaded from the TCGA database. Among them, 63 (13.4%) cases were of differentiated subtype, 80 (17.1%) cases of immunoreactive subtype, 64 (13.6%) cases of mesenchymal subtype, 74 (15.8%) cases of proliferative subtype, and 188 (40.1%) cases of unknown molecular subtype. ESTIMATE algorithm was used to evaluate in ltration of immune and stromal cells and tumor purity. According to ESTIMATE algorithm, immune scores ranged from -1498.58 to 2774.16, and stromal scores were distributed between -1988.05 to 1837.43, respectively. In short, immune scores of patients were higher than stromal scores in the entire OV cohort. Immunoreactive subtype showed the highest average immune scores of 4 subtypes, followed by mesenchymal subtype, differentiated subtype, and proliferative subtype (Fig. 1A, p < 0.0001). Similarly, mesenchymal subtype ranked the highest average stromal scores, followed by immunoreactive subtype, differentiated subtype, and proliferative subtype (Fig. 1B, p < 0.0001), suggesting that both immune and stromal scores were obviously correlated with the classi cation of molecular subtypes in OV patients. The proliferative subtype patients had the lowest immune scores and stromal scores.
In this study, there were 10 BRCA1 mutant patients with OV, 287 BRCA1 wild-type patients, and 172 patients of unknown status. In addition, there were 9 BRCA2 mutant patients, 288 BRCA2 wild-type patients, and 172 patients of unknown status. We drew the distribution of ESTIMATE scores between BRCA1/2 mutant and wild-type patients with OV. However, no signi cant difference was found in ESTIMATE scores between BRCA1/2 mutant and wild-type patients ( Fig. 1C-F).
To better understand the correlation of ESTIMATE scores with OV prognosis, we categorized the 469 OV patients into high and low scores groups according to the top half of 235 samples with higher ESTIMATE scores and the bottom half of 234 samples with lower ESTIMATE scores. We found that patients with low immune scores showed no difference in median survival compared with patients with high scores (p = 0.5707, Fig. 1H). Median survival of patients with high stromal scores was lower than the patients with low scores based on Kaplan-Meier curve, although it showed no statistical signi cance (p = 0.1145). And the trend was mainly re ected in the shorter median survival (OS < 5 years) (p = 0.0376, Fig. 1G).

Comparison of DEGs with ESTIMATE scores
To explore the association of gene expression signatures with ESTIMATE scores, we compared gene expression pro les of 469 OV samples downloaded from TCGA database. Heatmaps and PCA plots demonstrated visualized distribution of gene expression pro les between high and low ESTIMATE scores groups in Fig upregulated DEGs shared in the immune scores and stromal scores groups, and 13 downregulated DEGs shared (Fig. 2G, H). Since only stromal scores showed distinct association with OV prognosis, we determined to mainly explore the DEGs in stromal scores group for the subsequent analysis in this study.
To predict the potential function of 428 upregulated DEGs (Table 1) in stromal scores group, we performed the GO function and KEGG pathway enrichment analysis of these genes. There were 355 GO terms of BP, 64 GO terms of CC, and 76 GO terms of MF indicated signi cant difference based on FDR < 0.05 and -log(FDR) > 1.301. Top 10 GO terms were identi ed, such as immune response, extracellular space, and extracellular matrix (ECM) structural constituent ( Fig. 3A-C). In addition, a total of 54 KEGG pathway categories showed to be signi cant based on FDR < 0.05 and -log(FDR) > 1.301. Top 10 KEGG pathway categories were identi ed, such as staphylococcus aureus infection, phagosome, ECM-receptor interaction, which were related with immune response (Fig. 3D).

Correlation of expression of DEGs with OV prognosis
To estimate the possible role of DEGs in OS, we ploted Kaplan-Meier survival curves with log-rank (Mantel-Cox) test. Among the 428 upregulated DEGs in the stromal scores group, a total of 44 DEGs were screen out to be signi cantly associated with OV prognosis. Of these 44 DEGs, 26 DEGs (hazard ratio (HR) > 1, Table 2) obviously predicted poor outcomes for OV (p < 0.05, selected genes are shown in Fig.   4A-F).

PPI network of prognostic DEGs
To functionally explore the interactions between these prognostic DEGs, we constructed PPI network by the STRING database and Cytoscape software, which consisted of 41 nodes. We de ned the color of node continuously by log (FC) value of prognostic DEGs in the network. Similarly, we identi ed the size of node continuously based on degree value which was the number of connections the node and other nodes. In this PPI network, MMP9, CXCL10, GZMB, CXCL9, CXCL11, CXCL13, C5AR1, GBP1, CD2, GBP2 and CX3CR1 were the notable nodes with higher degree values, because they showed strong association with other nodes (Fig. 5).

Validation of prognostic DEGs in a external cohort
To determine whether the prognostic DEGs identi ed from the TCGA database also have signi cant prognostic value in a different OV cohort, we obtained and analyzed gene expression signatures of a dataset with 40 OV samples from the GEO database (Series GSE32063). Eventually, 6 genes, including CH25H, CX3CR1, GFPT2, NBL1, TFPI2, and ZFP36 were veri ed to be signi cantly correlated with poor outcomes (p < 0.05, Fig. 6A-F).

Discussion
Because the early symptoms and signs of OV are atypical or non-existent, most patients are initially diagnosed at an advanced stage with low ve-year survival rates [18]. Therefore, it is necessary to develop new methods to screen for early OV and new treatment strategies. With the impact of tumor microenvironment on OV prognosis has been increasingly emphasized [19], the interaction between tumor microenvironment components and OV prognosis was urgently needed to better understand. In this study, we identi ed several genes related with tumor microenvironment that predicted poor outcomes in OV patients from the TCGA database.
The mesenchymal subtype is featured by the overexpression of several genes related with activated stroma at the molecular level [20], although the stromal cells of the mesenchymal subtype have not been well described. It has been gradually recognized that stromal components are not only a scaffold for integrity of tissue structure, but also play a role in tumor genesis, growth, invasion and metastasis [21,22].
In this study, the molecular subtypes in OV patients were found to be signi cantly correlated with the level of stromal scores, in which the mesenchymal subtype had the highest stromal scores. We further compared median survival of high and low stromal scores groups, and found that patients with high stromal scores were shown to be associated with low median survival, indicating poor prognosis. Therefore, our results demonstrated mesenchymal subtype was associated with poor outcomes, which was consistent with previous reports [23]. Stromal components in tumor microenvironment may signi cantly contribute to the behaviors of OV with mesenchymal subtype.
The prognostic signi cance of BRCA1/2 mutations in OV is still unclear. Previous report found a better prognosis for BRCA2 mutant patients, with no signi cant difference in prognosis for BRCA1 mutant patients compared with wild-type patients [24]. But several studies have showed that patients with BRCA1 and BRCA2 mutations had a better outcome compared with wild-type patients [25,26], whereas other studies have demonstrated no signi cant difference [27]. Our study also showed no signi cant prognostic difference, but small sample sizes of BRCA1/2 mutations might lead to inaccurate estimates of survival.
Increasing evidences have showed that tumor development is in uenced not only by its internal tumor cells but also by its external tumor microenvironment components [28-31]. However, these studies have focused limited attention on stromal components in the tumor microenvironment. In this study, through gene expression pro le of 469 OV samples, we rstly analyzed 449 DEGs identi ed from the comparison between high and low stromal scores groups, some of which participated in tumor microenvironment, as shown by GO function and KEGG pathway enrichment analysis. This was consistent with previous studies that stromal components were correlated with the building of tumor microenvironment in OV [32-34].
Then, we performed survival analysis of these 449 genes and determined that 44 DEGs were signi cantly associated with OV prognosis, of which 26 DEGs obviously predicted poor prognosis. In addition, we constructed PPI network of 44 DEGs and found that these genes were highly interrelated. Eventually, we have further validated 6 genes (CH25H, CX3CR1, NBL1, TFPI2, GFPT2 and ZFP36) related with tumor microenvironment to be signi cantly associated with poor outcomes in 40 patients from a different OV cohort of GEO database. Of these 6 genes, as mentioned above, it has been demonstrated that the expression of CX3CR1, TFPI2 and GFPT2 were associated with poor prognosis in OV patients. Although the remaining 3 genes, including CH25H, NBL1, and ZFP36, have not previously showed correlation with OV prognosis, our results supported that these genes may be potential prognostic biomarkers for OV. As a reticulum-associated membrane protein, CH25H has been shown to inhibit infection of several viruses through catalyzing cholesterol converting into 25-hydroxycholesterol (25HC) [43]. It has been previously demonstrated that the overexpression of NBL1 suppressed tumor growth [44]. Furthermore, ZFP36 has been reported to play a key role in the post-transcriptional regulation of tumor necrosis factor (TNF) and the modulation of mRNA stability [45].

Conclusion
we applied ESTIMATE algorithm based on stromal scores to extract several genes related with tumor microenvironment from TCGA database. Notablely, we have further validated these genes by a different OV cohort from the GEO database, which may be signi cantly associated with poor prognosis in OV patients. In addition, some previously overlooked genes could be potential prognostic biomarkers for OV. However, there were also some limitations in this study. First, fewer genes have been shown to involve in OV prognosis, possibly due to the small sample size of OV cohort for validation. Second, considering that this was a retrospective study on the basis of public databases, it was di cult to include regional differences, since the database of this study was from the United States, while the database for validation was from Japan. Therefore, well-designed, prospective, and multicenter clinical studies are still imperative to further verify the prognostic value of these genes related with tumor microenvironment by large data-based analyses, which may help us to develop novel prognostic biomarkers and therapeutic targets for OV in clinical practice.

Availability of data and materials
The datasets generated during the current study are available in the TCGA repository (https://tcgadata.nci.nih.gov/tcga/). TCGA are publicly available databases.

Ethics approval and consent to participate
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. The study has been approved by the ethics committee of Sun Yat-Sen Memorial Hospital of Sun Yat-Sen University.

Consent for publication
Not applicable.

Competing interests
The authors declare no con icts of interest.