Tumor cell heterogeneity in ovarian cancer
A schematic diagram of the study design was shown in Figure 1. Following the quality control standard, 9609 cells from ascites samples of ovarian cancer patients were included in the analysis (Figure 2A and B). Variance analysis showed 1500 highly variable genes in a total of 10048 corresponding genes, and the top 10 are IGLL5, IGJ, WT1-AS, CCL17, GNLY, CCL5, MZB1, HBB, and NKG7 (Figure 2C). Principal component analysis (PCA) was performed to identify available dimensions and screen correlated genes. PCA showed a mixed representation of cells intra-and inter-patient (Figure 2D). PCA results revealed that the P value of the first 20 principal components (PCs) were less than 0.05, we selected 20 PCs for follow-up analysis (Figure 2E). Afterward, ovarian cancer cells were successfully classified into 13 separate clusters by the t-distributed stochastic neighbor embedding (tSNE) algorithm (Figure 2F). A total of 6453 marker genes were identified and the top 10 differential expression genes from all 13 clusters were displayed in the heatmap (Figure 2G).
Identification of immune cells and GSVA
The resulting cell clusters were annotated as epithelial, stromal (fibroblasts, endothelial cells, and melanocytes), or immune cells (Figure 3A) by singleR and established marker genes(20). Cluster 9 and 10 containing 419 cells, was annotated as epithelial cells; clusters 2 and 7, containing 1895 cells, were annotated as stromal cells; cluster 0, 1, 3, 4, 5, 6, 7, 8, 11 and 12, containing 6706 cells, was annotated as immune cells. Immune cells (n =6706) were subsetted and annotated as macrophages, B cells, T cells, and Dendritic cells (DC); cluster0, 1, 3, and 4, containing 5930 cells, were annotated as macrophages; clusters 6, containing 384 cells, was annotated as B cells; cluster 8, containing 320 cells, was annotated as T cells; cluster 12, containing 72 cells, was annotated as DC (Figure 3B).
GSVA has been performed to analyze the B cell function status, such as anti-apoptosis, pro-apoptosis, proliferation, Naïve, cytokines, and Germinal center. The function status of pro-apoptosis showed heterogeneity in the same patient (Supplement figure1). For T cells, we estimated cytotoxic, native, regulatory, exhausted, costimulatory, G1/S, and G2/M related gene signatures. The function status of cytotoxic in patient2 is significantly enriched (Supplement figure2). Figure 3C showed a comprehensive analysis of the four kinds of immune cells. Macrophages are the largest proportion (88.4%) of immune cells. M2 up signature was highly enriched in macrophages (Figure 3C). Macrophages marked by CD163, CSF1R, MS4A7, VSIG4; B cells marked by CD79B, MZB1; T cells marked by CD2, CD3D; DC marked by CD1E, CD83 (Figure 3D). In summary, B cells, T cells, and macrophages showed heterogeneity in the same patient and among different patients.
High M2 and low M1 were associated with poor survival in ovarian cancer patients
CIBERSORT based Nu support vector regression algorithm was performed to calculate the immune score of the transcribed profile of bulk RNA seq data from TCGA-OV patients. The estimated proportion of 22 immune cell types was shown in Figure 4A and 4B. Among them, four kinds of tumor immune infiltration cells were negatively correlated with overall survival of OV patients, including macrophages M2 (P=0.031), mast cell activated (P=0.0033), T cell CD4 memory activated (P=0.04), and neutrophils (P=0.027), while macrophages M1 (P=0.00042) were positively correlated with overall survival of OV patients.
Identification of TAM-related genes
The expression values of 4043 genes with a coefficient of variation values greater than 0.1 were utilized to construct the gene co-expression network of OV by R package WGCNA. The samples of TCGA-OV were clustered by the average linkage and Pearson’s correlation values. We selected β = 3 (scale free R2 > 0.9) as the soft-thresholding power to construct a scale-free network. Dynamic hybrid cutting was used to construct a hierarchical clustering tree. Each tree leaf represented a single gene on the tree. A series of genes with similar expression data are clustered into branches to form a gene module. Furthermore, 10 modules were constructed (Figure 5A). Among 10 modules, the turquoise module had a strong relationship with the black and pink module (Figure 5B).
We correlated gene module and CIBERSORT fraction, found that the black module showed a higher correlation with macrophage M1(R2=0.52, P=2e-27) and B cell naïve (R2=0.5, P=5e-25), the pink module was highly correlated to macrophage M1 (R2=0.54, P=2e-30) and T cell CD8 (R2=0.36, P=1e-12), and the turquoise module was highly correlated to macrophage M1 (R2=0.47, P=1e-22) and macrophage M2 (R2=0.31, P=1e-09; Figure 5C). We were interested specifically in macrophages, so focused on the black, pink, and turquoise module that showed correlation with macrophages was identified as TAM-related modules. The scatter diagram showed a highly positive correlation between gene significance and module membership in the black module (cor=0.9, P=4.9e-32), turquoise module (cor=0.73, P=8.9e-124), and pink module (cor=0.79, P=3.6e-17; Figure 5D), indicating that these modules are highly correlated with TAMs. According to the cut-off criteria, 219 genes in 3 modules were identified as TAM-related gene signatures (Supplement table1).
Molecular subtype based on TAM‑related gene signature
To establish a classification of the expression patterns of the 219 TAM-related gene signatures, K-means-based unsupervised consensus clustering was performed on 375 OV patients from the TCGA database. The cumulative distribution function (CDF) plot showed k =2 (from 2 to 6) is the optimal number of clusters (Figure 6A). The consensus heatmap showed all OV patients were classified into two clusters: 151 (40.0%) in cluster 1 and 224 (60.0%) in cluster 2 (Figure 6B). The heatmap revealed the differentially expressed genes between the two molecular subtypes (Figure 6C). Kaplan- Meier survival analysis indicated that patients with cluster 1 had a significantly better OS than patients with cluster 2 (P=0.0071) (Figure 6D). The violin plot showed that cluster 1 had significantly higher macrophages M1 scores compared to cluster 2 (P=6.8e-13; Figure 6E). GSEA was performed on cluster one over cluster two of OV patients. Upregulated pathways included pathways related to apoptosis, antigen processing and presentation, angiogenesis, epithelial mesenchymal transition (EMT), and macrophage M1 upregulation in cluster one (Figure 6F). We compared 6 main immune checkpoints between two molecule subtypes of OV patients. PDCD1 (PD1; P=6.2e-10), CTLA4(P=6.6e-16), CD274 (PDL1; P=5.4e-06), CD80(P=7.7e-07), PDCD1LG2 (PDL2; P=1.9e-10), and CD86 (P=1.6e-05) were highly expressed in Cluster 1 (Figure 6G).
CD163, TAP1, VSIG4, MS4A7, CD3E, and IGKV4-1 are the 6 most significant survival-predicting TAMRG-based signature in human OV
Univariate Cox analysis was performed, and 25 prognosis-associated TAM-based signatures were identified in the TCGA training set (Supplement figure 3A). Least absolute shrinkage and selection operator (LASSO) followed by multivariate Cox analysis was then performed, and 6 significant survival-predicting GDRGs were identified (Supplementary Figure 3B and 3C): CD163 (HR=1.19, 95%CI 1.05-1.34, P<0.01), transporter 1 (TAP1, HR=0.74, 95%CI 0.63-0.87, P<0.001), V-set and immunoglobulin domain containing 4 (VSIG4, HR=1.15, 95%CI 1.04-1.28, P<0.01), immunoglobulin kappa chain variable 4-1 (IGKV4-1, HR=0.64, 95%CI 0.47-0.86, P<0.01), CD3E (HR=0.82, 95%CI 0.70-0.96,, P=0.016), and membrane spanning 4-domains A7 (MS4A7, HR=1.16, 95%CI 1.02-1.32, P=0.023). We explored the expression of 6 gene signatures in the scRNA-seq set (Supplement figure 4A; MS4A7, VSIG4, and IGKV4-1 were not provided). CD163, MS4A7, and VSIG4 were significantly upregulated in macrophages. CD3E was significantly upregulated in T cells. TAP1 was upregulated in macrophages and B cells. The expression of IGKV4-1 was not detected. Furthermore, Gene Expression Profiling Interactive Analysis (GEPIA) database was performed to explore the expression of these 6 gene signatures in 426 OV (TCGA) samples and 88 normal (GTEx) samples. We found that CD163 was upregulated, while TAP1, CD3E, and IGKV4-1 were downregulated in OV compared to normal samples (Supplementary Figure 4B).
Generation and validation of the 6 gene signatures based prognostic risk score model
The prognostic risk score model was developed based on the above 6 gene signatures with the following formula: Risk score = ExpCD163 ´ 0.0284 + ExpCD3E ´ (-0.0297) + ExpIGKV4 ´ (-0.0007) + ExpTAP1 ´ (-0.0130) + ExpVSIG4 ´ 0.0009 + ExpMS4A7 ´ 0.0487. The risk scores of all patients in the TCGA training set were calculated, and the patients were divided into either a high-risk (high score) group or a low-risk (low score) group using the median value of the risk score as the cutoff value (Figure 7B). Kaplan-Meier survival analysis demonstrated that patients in the high-risk group had significantly poorer OS than those in the low-risk group (log-rank, P = 0.00016; Figure 7A). The C-index of the 6 gene signature for OS prediction was 0.614 (95% CI=0.593 to 0.636). Time-dependent receiver operating characteristic (ROC) analysis also indicated that the 6 gene signature showed excellent performance in predicting the 3-, 5- and 10-year OS rates, with respective area under the curve (AUC) values of 0.624, 0.68, and 0.718 (Figure 7C). The relationship between infiltration proportion of six immune cell types and the risk score was analyzed to validate the effect of 6 gene signatures on TAM-related genes. A significantly negative correlation was found between the risk score and infiltration proportion of the macrophages M1 (R=0.38, P=1.4e-14). The infiltration proportion of macrophages M2 was positively correlated with the risk score (R=0.29, P=1.3e-08; Supplement figure 4C).
Then, the predictive formula was validated similarly using the GEO cohort. As shown in Figure 7E, 185 OV patients were classified into high-risk or low-risk groups. Consistent with the results from the TCGA training set, the survival analysis also demonstrated that patients in the high-risk group had significantly poorer OS than patients in the low-risk group (log-rank, P = 0.0026; Figure 7D). The C-index of the 6 gene signature was 0.611 (95% CI=0.584 to 0.638). Time-dependent ROC analysis also suggested favorable values in predicting the 3-, and 5-year OS rates, with a respective area under the curve (AUC) values of 0.674, and 0.704 in the GEO validation set (Figure 7E). These results indicate that the TAM-based prognostic risk score model can serve as a prognostic predictor for different populations of OV patients.
Comparison with clinical characteristics and other signatures
We compared the model predictive accuracy of 6 gene signatures with clinical characteristics including age, stage, grade, and residual as well as 8 reported ovarian cancer prognostic signatures. Continuous prognostic scores were calculated from each signature to compare among different datasets. Of the 13 survival predicted factors, MPSOV had a highest mean C-index (0.613) compared with stage (0.519), grade (0.532), age (0.593), residual (0.556; Supplement table 2) and other signatures (0.516 to 0.584; Supplement table 3).