DOI: https://doi.org/10.21203/rs.3.rs-1393066/v1
Ovarian cancer is a gynecological malignant tumor with the highest mortality, it preferentially metastasizes to omental tissue, leading to intestinal obstruction and death. scRNA-Seq is a powerful technique to analyze tumor heterogeneity. Analyzing omentum metastasis of ovarian cancer at the single cell level may be more conducive to analyzing and understanding omentum metastasis and prognosis of ovarian cancer at the cellular function and genetic levels.
The scRNA-seq data of omentum metastasis site were acquired from the GEO (Gene Expression Omnibus) database, the data were clustered by Seruat package and annotated by SingleR package. Cell differentiation trajectories were reconstructed through monocle package, and obtained the marker genes of every branch. The microarray data of ovarian cancer were extracted from GEO and were clustered into omentum metastasis associated subpopulations according to the marker genes of single cell differentiation trajectories. The tumor microenvironment and immune infiltration differences between subpopulations were analyzed by estimate package and CIBERSORT. After that, based on the survival information and the intersection of bulk RNA-seq data of TCGA (The Cancer Genome Atlas) and marker genes of GEO subpopulations, univariate Cox and Lasso regression were performed to further establish an omentum metastasis-associated genes (OMAGs) signature, the signature was then tested by GEO data. Finally, the clinicopathological characteristics of TCGA-OV and GEO were screened by univariate and multivariate Cox regression analysis, in order to draw the nomogram.
The scRNA-seq data (GSE147082) were clustered into 18 cell clusters, and annotated into 14 cell types. Reconstruction of differentiation trajectories divided the cells into 5 branches, a total of 833 cell trajectories related characteristic genes were obtained. Two microarray datasets (GSE132342 and GSE135820) were integrated to gain the expression data of 510 genes in 7846 patients, which were subtyped into 3 subpopulations by the cell trajectories related characteristic genes. K-M survival analysis showed that the prognosis of cluster2 was the worst and cluste1 was the best, p < 0.001. The tumor microenvironment showed that the ESTIMATE Score, Stromal Score in cluster 2 is significantly higher than that in the other two clusters, while Immune Score, Tumor Purity were significantly lower, p < 0.001; The immune infiltration analysis showed differences in the fraction of 10 immune cells among the 3 subpopulations, p < 0.05. Immune checkpoint analysis showed that there were significant differences in the expression levels of 6 genes in the 3 subpopulations, which were significantly correlated with prognosis. Subsequently, the scRNA-seq characteristic genes of 5 branches, GEO characteristic genes of 3 subpopulations were intersected with the bulk RNA-seq of 379 patients in TCGA-OV, and 11 candidate genes of OMAGs (omentum metastasis-associated genes) were obtained by Lasso regression. The composition of OMAGs was further refined by Cox multivariate regression analysis, and established a 6-OMAGs signature (RiskScore = 6.669*VCAN + 0.173*FN1–0.128*IGKC + 0.258*RPL21 + 2.133*UCHL1 + 0.457*RARRES1). Taking TCGA data as training set and geo data as test set, ROC curves were drawn to verify its sensitivity. Ultimately, 5 clinicopathological characteristics (age, tumor status, initial treatment outcome, stage, grade) were screened by univariate and multivariate Cox regression analysis, and the nomogram was drawn.
Finally, we constructed a clinical prognostic model related to omentum metastasis of ovarian cancer, which was composed of 6-OMAGs and 5 clinicopathological features.
Ovarian cancer is the gynecological malignant tumor with the highest mortality[1]. At present, no effective method for early detection of ovarian cancer has been found. 70% of clinical ovarian cancer is advanced at the time of diagnosis, while after surgery and chemotherapy, 70% of patients will still have metastasis within 2–3 years[2]. Ovarian cancer metastasis has distinct characteristics: it preferentially metastasizes to omental adipose tissue, leading to intestinal obstruction and death[3]. However, the mechanism of this tropism is still remained unclear. Finding the cause of this transfer and preventing it has been a crucial problem that scholars hope to solve for many years.
The omentum is a special adipose tissue in the peritoneal cavity, which is composed of adipocytes and stromal blood vessels, including preadipocytes, fibroblasts, vascular endothelial cells and a variety of immune cells can promote various immune responses, including inflammation, tolerance and fibrosis, thereby promoting peritoneal immunity[4]. Studies have found that after the initial cytoreductive surgery, there are still residual cancer cells latent. The "latent cancer cells" may flow into the omentum through the production of substances such as inflammatory factors, resulting in the change of omental microenvironment: epithelial mesenchymal transformation (EMT), angiogenesis, immune infiltration, inflammation, etc.[5–8], drive the formation of niche before metastasis and contribute to successful transmission, that is, pre metastatic microenvironment (PMN). When tumor cells are planted in the omentum, they can support tumor growth through immune and metabolic mechanisms[7]. However, the preference for omentum preferential metastasis of ovarian cancer is still unclear.
Single cell RNA sequencing (scRNA-seq) uses optimized next-generation sequencing technology to define the global gene expression profile of single cells, which is helpful to isolate the previously hidden heterogeneity in the cell population, especially for the study of tumor heterogeneity[9]. In recent studies, Hu et al.[10] identified 6 subtypes of fallopian tube epithelium cells in normal human fallopian tube tissues by scRNA-seq, revealed intra-tumoral heterogeneity in serous ovarian cancer and defined SOC subtypes that correlate with patient prognosis; Olalekan et al.[11] using scRNA-seq from ovarian tumors resected from omental metastases, revealed immune cell types and subsets with possible roles in the management of disease; Hornburg et al.[12] used scRNA-seq to describe the cell spacing of ovarian cancer microenvironment and shed light on how tumor, immune, and stromal cells interact and shape T cell infiltration.
In the present study, scRNA-seq data of ovarian cancer omentum metastasis site was used to explore significant significantly different genes related to prognosis in different cell clusters, combined with GEO microarray data and TCGA-OV bulk RNA-seq data, a risk model of 6-OMAGs (omentum metastasis-associated genes) signature was constructed. Integrated with the clinicopathological information of GEO and TCGA-OV, an ovarian cancer omentum metastasis related prognostic model of risk score and 5 clinical parameters was established. We believe our results will improve our understanding of molecular mechanism and characteristics of ovarian cancer omentum metastasis and its connection with patient survival.
The scRNA-seq data was downloaded from GEO (Gene Expression Omnibus, https://www.ncbi.nlm.nih.gov/geo/) database, accession number GSE147082, including 9,885 cells 16041 genes isolated from the omental metastatic site of 6 ovarian cancer patients. The bulk RNA-seq data of 379 ovarian cancer patients 56461 genes in TCGA-OV were obtained from TCGA (The Cancer Genome Atlas, https://portal.gdc.cancer.gov/) database. The microarray data were acquired from GEO with accession number GSE132342 and GSE135820, which contained 3769 and 4077 patients of 510 genes, respectively .
By using the R package Seurat[13] (Versio:4.0.6), the scRNA-seq data was converted into a Seurat object by the function CreateSeuratObject(), while the PercentageFeatureSet() function is used to calculate the percentage of mitochondrial genes, unqualified data was rejected by the criteria: 1) cells less than 50 genes were detected; 2) genes identified in less than 3 cells; 3) mitochondrial DNA reads for more than 5%. The data was then normalized with the method “LogNormalize”. The top 1500 genes with large coefficient of variation between cells were extracted by the function FindVariableFeatures(). Subsequently, PCA (principal component analysis) was performed, in order to evaluate the most significant principal components for cell cluster analysis. Next, cell cluster classification was assessed by t-NSE (t-Distributed Stochastic Neighbor Embedding), after that, the maker genes of each clusters were screened by the function FindAllMarkers() with the cutoff values of |log2 fold change(FC)|≥1, the expression ratio of cell population ≤ 0.25 and adjust Pvalue ≤ 0.05. Afterwards, singleR[14] (Versio:1.8.0) pakage was utilized for cell clusters annotation, while the reference was loaded from celldex package, HumanPrimaryCellAtlasData(). Cell differentiation trajectories was reconstructed through monocle[15] (Versio:2.22.0) package, cell trajectory difference between each cluster was also analyzed, followed by the differential expression analysis and function enrichment, GO (Gene Ontology) and KEGG (Kyoto Encyclopedia of Genes and Genomes).
Both datasets GSE132342 and GSE135820 were already log2 transformed, since they used the same platform (NanoString nCounter Human CodeSet OTTA2014), the two gene expression count data were merged and the batch effecter was removed then standardized by using Limma[16] (Version: 3.50.0) package. Based on the results of cell cluster classification and cell trajectory analysis of scRNA-seq previously, the 7846 patients in total were divided into different groups via ConsensusClusterPlus[17] (Version: 1.58.0) package, determining the number and membership of possible clusters in the data set provides quantitative evidence, then Kaplan Meier survival curve and histogram of clinicopathological characteristics as well as the up and down-regulated gene in branches were showed to find the differences between groups.
In tumor microenvironment, immune cells and stromal cells are two main types of tumor cells, by using the expression data, estimate algorithm can predict the score of stromal cells and immune cells, thereby predict the content of these two cells; Finally, the tumor purity in each tumor sample can be calculated[18]. The ESTIMATE Score, Immune Score, Stromal Score, Tumor Purity of different groups of GEO patients were calculated respectively by using the estimate (https://bioinformatics.mdanderson.org/estimate/rpackage.html) R package, and visualized by violin plot.
CIBERSORT is an analytical tool developed by Newman et al.[19] to provide an estimation of the abundances of member cell types in a mixed cell population, using gene expression data. With R package “e1071” (Version: 1.7-9) loaded as a precondition, the CIBERSORT was used to estimate the different abundance of 22 immune cells among different groups of GEO microarray count data, based on the grouping results of scRNA-seq data analysis above, in order to further analyze the clinical significance of immune cells infiltrated by omentum metastasis ovarian cancer. K-M curve of high and low level of B cells naive, Macrophages M2, T cells CD4 memory activated, T cells CD4 naive separately, to show differences in survival.
Several important immune checkpoints related with cancer were analyzed by Limma[16] (Version: 3.50.0) package. Differentially expressed immune checkpoint-related genes significantly associated with overall survival (OS) in clustered GEO patients were determined by K-M survival analysis.
The marker genes of different branches of scRNA seq were crossed by the genes in TCGA bulk seq and GEO microarray data, the intersection genes of the 3 datasets were taken as the candidate genes for OMAGs. The expression of these candidate genes in TCGA-OV were combined with the survival data for Lasso regression analysis, to obtain the genes related to prognosis. The glmnet[20] (Version: 4.1-3) and survival (Version: 3.2–13) R package was used to perform lasso Cox regression for analysis and 10-fold cross-validation to narrow the range of OMAGs (omentum metastasis-associated genes) candidate genes. It is a biased estimation for processing data with multicollinearity that can realise the selection of variables while estimating parameters to better solve the problem of multicollinearity in regression analysis. The OMGAs candidates were then screened again by Cox multivariate regression analysis in order to determine the ultimate OMAGs risk model. The calculation of the OMAGs signature was as follows: Risk score = Ʃ (βi x Expi), where βi represented the coefficient of gene i, standing for the weight of gene i, and Expi represented the expression level of gene i. Then the TCGA-OV dataset was set as the training set while GEO dataset as the test set, the effects of high and low risk scores on survival and the prognostic value of OMAGs risk model were evaluated.
The clinicopathological features and risk score of TCGA-OV cohort were analyzed by univariate Cox regression and multivariate Cox regression using survival (Version: 3.2–13) R package, with p < 0.05 as the criterion, and the HR and regression coefficient for each prognostic features were calculated. Finally, the rms (Version: 6.2-0) R package was used to construct a nomogram to predict OS of ovarian cancer patients, which incorporated these clinicopathological features and OMAGs. The nomogram model was a prognostic statistical model made using simple graphs according to previous studies.
A total of 9885 high-quality single cell samples were acquired from GSE147082. The quality control chart was shown in Fig. 2A, which illustrates the range of scRNA quantity, the sequencing counts, and the proportion of mitochondrial sequencing counts of each cell. 1500 variable genes were isolated from single cell samples, and the top 10 genes with the most significant difference were shown in Fig. 2B. The differences among 6 patients’ samples were showed by PCA chart (Fig. 2C). PCA was carried out based on highly variable genes (Fig. 2D,E), these single cell samples were divided into 20 different components, based on the standard deviation using ElbowPlot function (Fig. 2F), JackStraw function was used to resample the test and the p values of each PCs were calculated to find statistically significant components (Fig. 2G). Since there was no obvious elbow point, and the p values were all < 0.01, we selected 14 PCs for downstream analysis. the t-NSE algorithm was used for non-linear dimension reduction and successfully clustered the single-cell samples into 18 clusters (Fig. 2H). Afterwards, the 18 cell clusters were annotated by singleR package label.fine, and a total of 14 cells types were recognized (Fig. 2I).
The fate decisions and differentiation trajectories of ovarian cancer omentum metastasis were reconstructed with the Monocle package. The pseudotime, cluster, cell type and state estimation analysis of single cells (Fig. 3A-D) were performed based on the highly variable marker genes at the screening criteria of |log2 fold change(FC)|≥1 and adjust p value < 0.05. Among them, branch 1 has 320 characteristic genes, branch 2 has 11 genes, branch 3 has 462 genes, branch 4 has 282 genes and branch 5 has 289 genes. By combining these genes, a total of 833 differentiation trajectories of ovarian cancer omentum metastasis characteristic genes were obtained.
The 2 datasets GSE132342 and GSE135820 were merged, with the removement of batch effect and normalization, a total of 77846 patients’ microarray data of 510 genes and clinicopathology information were obtained. The GEO microarray data were clustered according to the 833 differentiation trajectories of ovarian cancer omentum metastasis characteristic genes in the previous analysis. The calculation was repeated 50 times using the ConsensusClusterPlus package, setting kmax = 9. According to the output results, the K value was determined as 3. Geo patients were divided into 3 subpopulations, and the characteristic genes of the 3 clusters were obtained. K-M curve was shown for survival analysis (Figure. 4A), with p < 0.01, indicating that there was significant survival difference among the 3 clusters of patients. It can be seen that the prognosis of the patients in cluster 2 is the worst, which is significantly different from the patients in cluster 1 and 3, P < 0.01, while the patients in cluster 1 and 3 have certain differences after 8 years of long-term survival. The histogram showed the diversity clinical information of the patients in 3 clusters. It can be seen from the histogram of patient clinical feature analysis that the prognosis of category 2 is the worst in which the tumor stage of the patients was later (Figure. 4B), the diagnosis age was higher (Figure. 4C), the proportion of omentum at the site of surgical resection was significantly higher than that in the other two clusters (Figure. 4D), and the sample source omentum was also higher than that in the other two clusters (Figure. 4E). Figure. 4F shows the expression difference of representative characteristic genes in the 5 branches of GEO patients, in which there were no significantly up-regulated or down-regulated genes in branch 2.
The tumor microenvironment analysis was carried out on the gene expression data of 3 clusters of GEO patients using the estimate R package. The results are shown in Fig. 5A. Showed that among cluster 2 with the worst prognosis, the estimate score and matrix score were significantly the highest, and the immune score and tumor purity score were the lowest. Then the CIBERSORT package was used for immune infiltration analysis, and only the cells with more accurate classification (P < 0.05) were selected for comparison. The proportion of immune cells in 22 cells in each group is shown in Fig. 5B, and the proportion difference is shown in Fig. 5C. It can be seen that the content of Plasma cells, T cells CD8, T cells CD4 naïve, T cells follicular helper, NK cells activated, Monocytes, Macrophages M1, Mast cells activated, Eosinophils, Neutrophils cells in 3 clusters were significantly different. Then, the K-M survival curves of 4 immune cells whose proportion is significantly related to survival and prognosis are drawn respectively. The Fig. 5D showed that the higher proportion of T cells CD4 Naïve and B cells Naïve has a better prognosis, while the higher proportion of Macrophases M2 and T cells CD4 memory activated has a worse prognosis.
A list of 38 immune checkpoint genes was obtained from the literature search, which intersected with GEO microarray data, and the expression difference between 3 clusters of the intersecting genes were analyzed. 6 immune checkpoints with significant difference were screened. Their expression levels in the 3 clusters were shown in Fig. 6A. Geo patients were divided into high expression and low expression groups according to the expression levels of the 6 differential immune checkpoints, K-M curves were drawn for survival analysis (Fig. 6B-G). It can be seen that high expression of CD8A, CD274, CTLA4, JAK2 and PDCD1 is associated with better prognosis, while high expression of IDH8 has lower early survival rate and better expression, and the survival rate decreases about 9 years, which is lower than that of low expression group.
The genes in TCGA-OV cohort, GEO cohort, and the 833 characteristic genes of 5 branches obtained from scRNA-seq analysis above were intersected, and a total of 76 candidate genes of OMAGs were obtained. 11 candidate OMAGs genes were screened by Lasso regression (Fig. 7A-B), which were "VCAN", "TIMP3", "FN1", "COLA2", "IGKC", "GSTP1", "RPL21", "UCHL1", "BNIP3", "RARRES1" and "IGF2". Then, multivariate Cox regression analysis was carried out to obtain 6 prognosis related genes "VCAN", "FN1", "IGKC", "RPL21", "UCHL1", "RARRES1". The coef value and P value of each gene were shown in Table 1. According to the expression level of 6-OMAGs, a molecular risk prognostic evaluation model was constructed, TCGA data is used as the training set (Fig. 7C), and geo data is used as the verification set (Fig. 7D). The relationship between the expression of 6-OMAGs and survival was verified respectively. It can be seen that the survival rate with high-risk score was lower and the survival rate with low-risk score was higher in both the training set and the test set. Then, the ROC curve was drawn and the AUC area was calculated. It can be seen that the sensitivity in the training set was 0.58 in 3 years, 0.684 in 5 years and 0.801 in 10 years (Fig. 7E), it increases with time, and the sensitivity in the test set is lower than that in the verification set. It may be because one is high-throughput sequencing data and the other is microarray data, which belong to different cohorts, therefore, there is a certain batch effect.
ID | coef | HR | HR.95L | HR.95H | pvalue |
---|---|---|---|---|---|
VCAN | 6.668590835 | 787.2854077 | 9.741963423 | 63623.55167 | 0.00292216 |
FN1 | 0.173180557 | 1.189080783 | 1.053293958 | 1.342372751 | 0.005122717 |
IGKC | -0.127767185 | 0.880058246 | 0.813871806 | 0.95162716 | 0.001360535 |
RPL21 | 0.258144129 | 1.294525384 | 1.073943409 | 1.560413663 | 0.006760239 |
UCHL1 | 2.132910074 | 8.439390361 | 1.43812874 | 49.52498873 | 0.018157007 |
RARRES1 | 0.456691178 | 1.578841229 | 1.140789919 | 2.185099627 | 0.005879973 |
Univariate Cox analysis on the clinicopathology characteristics of TCGA-OV patients was performed. The results were shown in Fig. 8A, in which age, Tumor status (with or without tumor), the initial treatment outcome and risk score have obvious statistical significance. Subsequently, multivariate Cox analysis was carried out on these clinical characteristics. The results were shown in Fig. 8B, which proved that age, tumor status, initial treatment outcome and risk score are the influencing factors of prognosis. It may be that most patients are stage III-IV and grade 3–4, so the grade of tumor stage’s p value > 0.05, but considering that it is a clinically recognized factor clearly related to tumor prognosis, it is still included in nomogram mapping. After that, we analyzed the expression levels of 6-OMAGs in 18 clusters of scRNA-seq of ovarian cancer omentum metastases. The results were shown in Fig. 8C-D. VCAN and FN1 had the highest expression in cluster1, corresponding to epithelial cells; The expression of IGKC was the highest in cluster 12, corresponding to B cells; RPL21 expressed in all clusters, but the highest expression in cluster11, corresponding to B cell Naïve; UCHL1 was highly expressed in cluster 4, but very low in other clusters, cluster 4 corresponds to epithelial cells; RARRES1 was significantly higher in clusters 4 and 6 than in other clusters, corresponding to epithelial cells and fibroblasts. Finally, we mapped nomograms related to the prognosis of ovarian cancer according to risk scores and clinicopathology characteristics (Fig. 8E). A horizontal line was drawn to determine the point of each variable, and the total points of each ovarian cancer patient, which were distributed between 0 and 100, was calculated by summing the points of all variables. Finally, a vertical line between the total points axis and each prognostic axis was utilized to evaluate the 1-, 3-, and 5-year OS of ovarian cancer patients.
In this study, we combined scRNA-seq data of 6 omentum metastasis sites from 6 ovarian cancer patients, GEO microarray data of 7846 patients, TCGA-OV bulk RNA-seq data of 379 patients and clinical information to construct a prognosis prediction model of ovarian cancer composed of 6-OMAGs signature and 5 clinicopathological features.
As we can know from the previous researches, VCAN (versian) is a chondroitin sulfate proteoglycan[21], while proteoglycans possibly participate in tumor progression and metastasis, whereas they are ubiquitous constituent of extracellular matrix and cell surface[22]. Clinical trials as well as in vivo and in vitro experiments have verified that VCAN is related to the formation of tumor microenvironment and partly epigenetically regulated, can form a prognostic model together with other genes, which is completely consistent with our results[23–27].
FN1 (fibronectin), together with transforming growth factor beta-induced protein and tenascin C had been proved some related to transcoelomic spreading via the peritoneal fluid or malignant ascites[28], moreover, networks built with FN1 and CDKN1A as the center were significantly relevant with epithelial-to-mesenchymal transition, could be hallmarks to taxanes resistance in ovarian cancer cells[29], also used as a marker and target in tumor prognosis and treatment[30–32].
IGKC (immunoglobulin kappa C), expressed in plasma cells, had been identified as one of the top genes of a prognostic B cell-metagene in breast cancer, correlate d to favorable prognosis and response to chemotherapy[33], study showed plasma cell infiltration in epithelial ovarian cancer has a significant impact on tumor progression and prognosis[34]. High expression of IGKC is bond up with good outcome[35], its coef value is negative therefore negatively correlated with risk score in our model, which is consistent with previous studies.
As we know, ribosomal proteins (RPs) are involved in the cellular process of translation, and in recent years, RPs’ dysfunction was found in tumors, like mutation, expression levels changes and correlation with differentiation[36–38]. RPL21 (ribosomal protein gene 21) has been found to be associated with the proliferation of pancreatic cancer cells[39] and may be used as a potential marker for cervical intraepithelial neoplasia 1[40], but there are hardly any reports in ovarian cancer, our findings may provide a reference for finding new mechanisms and therapeutic targets for omentum metastasis of ovarian cancer.
UCHL1 (ubiquitin carboxyl-terminal esterase L1) is an oncogene encoding a deubiquitinating enzyme which adjusts the balance between mTOR complexes[41], and pays a significant role in ubiquitin system as well as different cellular processes[42]. It has been widely studied and confirmed related to tumor progression, such as prostate cancer, lymphoma, lung cancer, breast cancer, neuroblastoma, and so on[42–45]. In ovarian cancer, UCHL1 was found may be one of the markers related to early stages of high-grade serous carcinoma, immunogenicity[46, 47], and cisplatin resistance[48, 49]. Our results also suggest its value in omentum metastasis and prognosis of ovarian cancer.
RARRES1 (retinoic acid receptor responder 1) is one of the common methylated loci in several cancers and believed to be a putative tumor suppressor gene[50, 51], promoter hypermethylation and loss of RARRES1 seemed to facilitate to cancer progression[52]. However, its specific role in omentum metastasis of ovarian cancer remains to be clarified, which needs further research.
Although there are various models related to the prognosis of ovarian cancer, few focus on omentum metastasis. Our model suggests the potential mechanism of omentum metastasis of ovarian cancer to a certain extent. There are still many deficiencies in our research, such as lack of first-hand data and no experimental verification, so more clinical data are needed to test it to improve its accuracy in the follow-up research.
Acknowledgements:
None.
Authors’ contributions:
Dongni Zhang wrote the main manuscript text and run bioinformatics analysis, Wenping Lu guided and modified the manuscript, Shasha Cui retrieved and downloaded data, Xiaoqing Wu, Heting Mei and Zhili Zhuo prepared figures1-8. All authors reviewed the manuscript.
Funding:
Beijing Municipal Natural Science Foundation, 7212192.
Availability of data and materials:
Please contact the corresponding author Wenping Lu ([email protected]).
Competing interests:
None.