Identification of the genes related to lymphovascular metastasis
We firstly defined LVSI status based on the information of lymphatic invasion and venous invasion available in TCGA clinical metadata. Patients with either lymphatic invasion positive or venous invasion positive were regarded as LVSI-positive. Those absent of both types of invasions were defined as LVSI-negative. Differential expression analysis was performed to identify LVSI-associated genes in ovarian cancer, using the transcriptome profiles of 136 LVSI-positive and 56 LVSI-negative samples. DEGs related to metastasis were obtained by analyzing the transcriptome data of high-grade serous ovarian cancer samples from GSE2109. There were eight significantly up-regulated DEGs (POSTN, LUM, THBS2, COL3A1, COL5A1, COL5A2, FAP, FBN1) common in both datasets (Figure 1a, Additional file 1, Table S3). When validated in another independent dataset GSE30587, all the identified DEGs were significantly elevated in omental metastases compared with the paired primary ovarian tumors (Additional file 2, Figure S1a). Therefore, the eight genes associated with both LVSI status and metastasis were identified as a candidate geneset suggestive of lymphovascular metastasis, hereafter referred to as the Lymphovascular Metastasis Gene Signature (LMGS). Interestingly, according to the expression profiling based on the parabiosis model of ovarian cancer hematogenous metastasis, four genes (POSTN, LUM, COL3A1, COL5A2) of the LMGS were shown to be significantly up-regulated in the omental metastases generated through a hematogenous route (Additional file 2, Figure S1b). This result further indicates the potential role the LMGS in the hematogenous spread of ovarian cancer.
In order to explore the biological rationale of the LMGS, GO term enrichment analysis was performed[10] showing a strong association of the LMGS with extracellular matrix (ECM) organization (Figure 1b). Protein–protein interaction (PPI) analysis also revealed that the LMGS was closely connected in a biological functional network (Additional file 2, Figure S1c), rather than a randomly combined gene panel.
Considering that the dysregulation of ECM correlates with poor prognosis in multiple cancer types including ovarian cancer[16], we then conducted GSEA analysis to investigate whether the overexpression of LMGS was linked to the key biological traits suggestive of cancer progression. Serous ovarian cancer samples obtained from TCGA dataset were dichotomized based on the expression level of the LMGS (gene Z-score cutoff of +1)[16]. In the LMGS overexpression group, the majority of the genesets correlated with the hallmarks of cancer were significantly enriched, including angiogenesis, epithelial–mesenchymal transition (EMT), apoptosis, hypoxia, and inflammatory response (Figure 1c-1g).
The LMGS is predominantly expressed by tumor stromal components.
Since all eight genes of the LMGS were closely connected in terms of the biological function, the overall upregulation degree of the LMGS in different samples was compared in addition to the expression levels of an individual gene. Here, we applied the GSVA algorithm to generate a sample-wise enrichment score of the LMGS (referred as LMGS score) as described previously[11]. As an unsupervised and non-parametric gene set enrichment (GSE) analysis of a single-sample expression profiling, the GSVA algorithm outperforms the classic GSE methods such as ssGSEA and PLAGE, providing higher statistic power to detect subtle activation changes[11].
Molecular subtyping often provides clues to the underlying mechanism of a certain phenotype. Hence, we investigated the association between the overexpression of the LMGS and the previously identified molecular subtypes of ovarian cancer. Notably, the LMGS was significantly enriched in the C1 subtype clustered in the Tothill dataset (Figure 2a and Additional file 3, Figure S2a), a subtype characterized by stromal activation and extensive desmoplasia[17]. Besides, similar analysis demonstrated that the upregulation of the LMGS related with the TCGA mesenchymal subtype (Figure 2b and Additional file 3, Figure S2b), which displayed high infiltration levels of stromal cells as well as the worst survival[18].
We thus hypothesized that the LMGS overexpression as well as the process of lymphovascular metastasis might at least partly relate to the transcriptional traits dominated by non-epithelial components of tumor tissues. Therefore, the expression pattern of the LMGS was analyzed in three independent transcriptome datasets of microdissected ovarian cancer samples. As expected, the LMGS score was significantly higher in tumor stroma in comparison to epithelial components of ovarian cancer (Figure 2c, Additional file 3, Figure S2c and S2d), demonstrating that the high level of the LMGS is predominantly attributed to tumor stromal components. The analysis of the dataset GSE40595 additionally suggests that the LMGS may be responsible for the transition of stromal components in carcinogenesis of ovarian cancer, as the LMGS expression was elevated in tumor stroma compared with normal ovarian stroma (Figure 2d and Additional file 3, Figure S2e).
Both quantitative and qualitative changes in ovarian cancer stroma are responsible for the increased expression of the LMGS.
Expression profiles from most public datasets were generated from bulk tumor samples with various degrees of stromal infiltration. To better decipher the role of tumor stroma in the process of lymphovascular metastasis, we applied the ESTIMATE algorithm to analyze tumor purity inferred by the infiltration levels of non-tumor cells. Strikingly, reduced tumor purity as well as elevated mesenchymal infiltration with statistical significance was observed in LVSI-positive samples compared with LVSI-negative ones (Figure 3a and 3b). Tumor purity predicted by an alternative approach known as the ABSOLUTE algorithm produced consistent results. Besides, omental metastases displayed similar trends in comparison to primary ovarian tumors (Figure 3c and 3d) according to two independent datasets (GSE2109, GSE30587). Taken together, the primary tumors with LVSI-positive status biologically resemble to metastatic tumors in terms of the expression pattern of the LMGS, which is potentially ascribed to the increased proportion of tumor stroma.
Validated in three large independent datasets (TCGA, GSE9891, GSE26712), a remarkable negative correlation was observed between the expression of the LMGS and tumor purity in serous ovarian cancer samples (Figure 3e, Additional file 4, Figure S3a-S3b). Considering that stromal cells and immune cells are both major cellular components of tumor stroma, we then tested whether the infiltration of immune cells was involved in the expression changes of the LMGS. Although statistic significant, the correlation between the LMGS scores and the ESTIMATE immune scores, which represent the infiltration levels of immune cells, was relatively weak to draw powerful conclusions (Additional file 4, Figure S3c-S3e). The purity-corrected correlation analysis generated from the TIMER database also showed similar results (Additional file 1, Table S4). The correlations between the individual gene expression of the LMGS and the abundance of immune infiltrates, including B cells, CD4+ T cells, CD8+ T cells, neutrophils, macrophages, and dendritic cells, were all significant but extensively weak in ovarian cancer samples. By contrast, the activation of the LMGS in serous ovarian cancer samples from three datasets was strongly and positively correlated with the presence of stromal cells represented by ESTIMATE stromal scores (Figure 3f, Additional file 4, Figure S3f-S3g). Moreover, quantification of eight immune and two stromal cell populations inferred by the MCP-counter revealed that infiltration of fibroblasts was remarkably elevated in primary ovarian cancer samples with LVSI-positive status compared to the LVSI-negative ones (Figure 3g, Additional file 4, Figure S3h). Similar results were obtained when exploring the infiltration patterns in omental metastases compared with primary ovarian tumors (Figure 3h-3i, Additional file 4, Figure S3i-S3j). These data indicate that the increased infiltration of stromal cells, especially fibroblasts, is responsible for the reduction of tumor purity in the process of lymphovascular metastasis.
As tumor stromal activation involves both quantitative and qualitative changes of stromal components, we then investigated whether the overexpression of the LMGS also reflected the activation of tumor stroma related to cancer-associated fibroblasts (CAFs). GSEA was conducted revealing that in LVSI-positive samples, gene signatures representative of wound healing were enriched (Figure 4a-4c), a process where myofibroblasts resembling CAFs are recruited and reprogrammed[19]. In the stromal profiles of the dataset GSE40595, pathways involved in CAF activation were also shown to be positively correlated to the upregulation of the LMGS (Figure 4d-4f). Besides, based on in the transcriptome profiles of the ovarian cancer stroma from two independent datasets, a significant and positive correlation was validated between the expression of the eight genes in the LMGS and several common CAF markers (Figure 4g and S4a).
Of note, GO annotation as well as GSEA uncovered a potential link between the upregulation of the LMGS and the activation of the TGF-β signaling pathway (Figure 1b and 4c), the role of which in CAF activation has been validated in several cancer types including ovarian cancer[20, 21]. Hence, we investigated whether the expression of the LMGS was partly regulated by the TGF-β pathway. Strikingly, according to the transcriptome profiles in the dataset GSE40266, all eight genes of the LMGS were significantly unregulated in the ovarian fibroblasts NOF151-hTERT treated with either TGF-β1 or TGF-β2 compared to controls (Figure 4i). Together, these data suggest that in addition to the increased infiltration of the ovarian cancer stroma, the activation of CAFs also contributes to the overexpression of the LMGS in lymphovascular metastasis.
To further investigate the effects of the LMGS overexpression in ovarian cancer cells, we ranked the ovarian cancer cell lines in the Cancer Cell Line Encyclopedia (CCLE) database by the GSVA scores of the LMGS. Top-ranked cell lines displayed a mesenchymal phenotype based on the EMT phenotype annotation (Additional file 1, Table S5) generated from published sources[22], implicating that ovarian cancer cells with high levels of the LMGS expression may acquire invasive phenotypes partly through EMT.
The overexpression of the LMGS is associated with increased invasiveness and poor prognosis in patients with serous ovarian cancer
The activation of tumor stroma is considered as an important determinant of poor survival in patients with solid tumors including ovarian cancer, we next evaluated the prognostic values of the LMGS in patients with serous ovarian cancer.
In three large validation datasets, the LMGS was significantly upregulated in patients undergoing suboptimal cytoreduction (Figure 5a, Additional file 5, Figure S4a-S4c). Late-stage patients also displayed higher expression levels of the LMGS compared to those with stage I-II diseases (Figure 5b, Additional file 5, Figure S4d-S4f). These data uncovered the correlation between the LMGS overexpression and the increased invasiveness of serous ovarian cancer, which notoriously relates to poor survival.
In order to evaluate the prognostic values of the individual gene expression of the LMGS, we conducted a meta-analysis of the transcriptome profiles from multiple public datasets. According to the hazard ratio (HR) generated from the forest plots, high expression of any gene of the LMGS was significantly correlated with worse overall survival (OS). These results remained significant when adjusted for tumor stage and debulking status. Moreover, similar results were obtained in the analysis of progression-free survival (PFS) (Table 1), suggesting that the prognostic effects of the eight genes in both OS and PFS are independent from these clinical parameters.
Next, four largest cohorts with most comprehensive clinical information from independent datasets were used to validate the overall prognostic value of the LMGS in patients with serous ovarian cancer. It was confirmed that patients with high levels of the LMGS had worse overall survival in each validation dataset (Figure 5c-5f). When applied to the entire cohort combining the above validation datasets, the overexpression of the LMGS was further shown to be an independent predictor of OS (log-rank P=0.0205, HR=1.306[1.042-1.636]) in patients with late-stage diseases undergoing optimal cytoreduction (Figure 5g).