Analysis of single-cell RNA-seq data
Ovarian cancer single cell sequencing data(GSE147082) from 6 samples containing a total of 9885 cells. According to strict quality control process, quality control and filtering of scRNA-seq data were carried out, in which cells with Feature_RNA < 200 or percent.mt > 20% were filtered. After quality control, the number of detected genes was nFeature and count was evenly distributed in 6 samples (Figure S1A, B), and the proportion of ribosomes in cells was approximately 5% (Figure S1A, C). After filtration, 9583 high-quality cells were obtained, and subsequent analyses were based on these cells. According to the correlation analysis, it was found that the number of detected genes (nFeature) had a strong positive correlation with the sequencing depth (UMIs total number, nCount) (Figure S1D). We then normalized the data using the LogNormalize method. The normalized data were analyzed for variance analysis to identify highly variable genes, and the first 2000 highly variable genes were selected for the downstream analysis (Figure S1E).
We then apply the FindIntegrationAnchors function in the Seurat package to remove the batching effect and perform a t-SNE clustering analysis. Finally, 9583 cells from 6 samples were clustered into 16 cell clusters (Figure 1B).Visual clustering results showed that after removing the batch effect, the difference of sample origin was no longer the main difference among cell clusters (Figure 1A right).At this point, we need to know the cell type of each cell cluster, and use the R software package SingleR to annotate the known cell type of the cell cluster after clustering.The annotated cells mainly contain six known cell types: :Monocytes(CD16−, CD14−)、T cells ( CD4+ Th1)、NK cells、B cells(naive)、T cells(CD4+, memory TREG)、Monocytes(CD16+, CD14+) (Figure 1C).Marker genes (CD14, FCGR3A, COL1A1, CTLA4, FOXP3, IL2RA, CD79A, MS4A1, CD96) of known immune cells were used to verify cell types identified (Sakaguchi et al., 2020; Abplanalp et al., 2021). As shown in Figure 1D, marker genes of immune cells are highly expressed in corresponding cell clusters. In addition, FindAllMarkers function were used to identify the differentially expressed marker genes among cell clusters. 1000 cells were randomly selected from each cell cluster and the top 10 differentially expressed genes in each cell type were displayed with heatmap (Figure 1E). The patient samples were ranked by disease score (Olalekan et al., 2021), from low to high: PT-1 to PT-6, as shown in Table1 (Olalekan et al., 2021). The proportion of cells from different sample sources indicated that the proportion of tumor infiltrating B cells varied greatly among all samples. As shown in Figure 1F, the sample with the largest proportion of tumor infiltrating B cells was GSM4416535, whose ovarian cancer stage was ypT3a Nx M1/ Ivb, and the lowest disease severity score was PT-1. And GSM4416537, which had the second largest proportion of tumor-infiltrating B cells, had a disease score of PT-2. The results showed that the proportion of tumor infiltrating B lymphocytes in the tissues of 6 patients with ovarian cancer was opposite to the seriousness of the disease.
Marker gene enrichment analysis of B cells
We used GO and KEGG enrichment analysis to study the enrichment of 140 marker genes in related pathways found in tumor infiltrating B cells of ovarian cancer samples, as shown in Figure 2A-C. The results showed that the biological process (BP) of marker genes in B cells were mainly enriched in: positive regulation of cell activation, positive regulation of lymphocyte activation, positive regulation of leukocyte Activation, B cell activation, antigen receptor-mediated signaling pathway, immune response - activating cell surface Receptor signaling Pathway, B cell receptor signaling Pathway. Cellular component (CC) was mainly enriched in: Immunoglobulin complex, external side of plasma membrane, immunoglobulin complex, Circulating, Blood Microparticle, MHC Class II Protein Complex. Molecular function (MF) are mainly enriched in: Antigen binding, immunoglobulin receptor binding, MHC class II protein complex binding, MHC protein complex binding, immune Receptor activity, peptide binding, peptide antigen binding, MHC class II receptor activity, immunoglobulin Binding. Figure 3D mainly shows the top 10 pathways in KEGG enrichment analysis results, which are as follows: Cell adhesion molecules, Hematopoietic Cell lineage, Hematopoietic Cell lineage, and Intestinal immune network for IgA Production, epicardial thyroid disease, Antigen processing and presentation and other pathways. These enrichment results suggested that the marker genes we screened were closely related to the function of B cells, further indicating that the genes we screened were reliable marker genes of B cells.
Construction of prognostic model of marker gene in tumor infiltrating B cells
To further assure that the maker genes we screened for TIL-B cells were immune-related genes, the ImmuneSigDB subset (V7.5.1) from the immune-related dataset C7 was downloaded from the GSEA database(Liberzon et al., 2015). 88 TIL-B cell marker genes were obtained by intersection with marker genes found in TIL-B cells (Figure 3A), which could be considered as immune gene sets associated with TIL-B cells. To investigate which of these genes are associated with patient survival prognosis, we obtained a dataset of TCGA-OV cohort of ovarian cancer patients from the TCGA database (N=379), which was randomly assigned to a training set (N=265) and a test set (N=114) in a 7:3 ratio. Based on the TCGA training set, we adapted univariate Cox regression analysis of marker genes in 88 TIL-B cells to search for potential prognostic genes. The results showed that 10 of these genes were significantly correlated with the survival and prognosis of patients (Figure 3B): IGHM (HR = 0.93, P=0.0314), CCR7 (HR=0.66, P=0.015), ISG20 (HR=0.74, P=0.0098), IGKC (HR=0.95, P=0.0375), IGHG1(HR=0.95, P=0.0277), IGJ (HR=0.91, P=0.0106), CD38 (HR=0.64, P=0.003), SLAMF7 (HR=0.76, P=0.0053), MZB1 (HR=0.86, P=0.0184) and FKBP11(HR=0.74, P=0.0441). LASSO Cox regression analysis was then used to further screen these genes and lambda.min, which was cross-validated 10 times, was selected as the optimal lambda (Figure 3C, D). The corresponding model contained six non-zero coefficients (Figure 3E). Patients with high expression of these genes had better survival prognosis than those with low expression, such as IGL, SLAMF7, CCR7, FKBP11, CD38 and ISG20 (Figure 3F). In order to construct a prognostic model for patients, all six prognostic protective genes were selected for multivariate Cox regression analysis to construct a prognostic model for TIL-B cells. In the process of model construction, the AIC model was selected based on the model AIC value, and the optimal AIC model was selected through the stepwise backward algorithm. The optimal model ultimately contained genes ISG20 and SLAMF7 (Figure 3g). Based on these two genes, a prognostic score model for TIL-B was constructed: RiskScore = -0.23*exp(ISG20) -0.23*exp(SLAMF7).
Validation of prognostic model performance
The risk score of each sample in the training set, test set and external test set was calculated using the TIL-B immune-related prognosis model constructed by us, and the patients were split into high and low risk groups according to the best cutoff value of risk score. Heatmap was used to demonstrate the expression of relevant genes in TIL-B immune-related prognosis model in patients, and it was found that the expression values of ISG20 and SLAMF7 were contrary to the trend of risk score (training set -Figure 4A, test set - Figure 5A, external test set - Figure 6A). Subsequently, the Wilcox test was used to test the difference between the expression of ISG20 and SLAMF7 in the high-risk and low-risk groups. As shown in the box diagram in Figure S2, ISG20 and SLAMF7 were up-regulated in the low-risk group in all data sets, and their expression levels were negatively correlated with risk score. Scatter plots were used to show the risk score, survival time and survival status of patients in each data set (training set-Figure 4 B and C, test set-Figure 5 B and C, external test set-Figure 6 B and C). In order to verify whether the risk score constructed by us is associated with the prognosis of patients, we used Log Rank to test the K-M survival curve of patients in the high and low score groups, and the results showed that patients in the high risk group had a worse prognosis than those in the low risk group (Log Rank test of training set p<0.0001, HR=2.72, As shown in figure 4D; Log Rank test p=0.0038, HR=2.31, as shown in Figure 5D; External test set log Rank tests P =0.022, HR=1.34, as shown in Figure 6D).The 3-year and 5-year survival were predicted by the model. ROC curves (AUC) were 0.619 (3 years) and 0.736(5 years) in the training set and 0.694 (3 years) and 0.758 (5 years) in the test set (Figure 5E). ROC curves (AUC) in the external test set were 0.6(3 years) and 0.61(5 years) respectively (Figure 6E). These results indicate that the model constructed based on TIL-B immune-related genes has a good prognostic value for survival.
Assessment of tumor immune cell infiltration
Tumor infiltrating immune cells play an important role in tumor progression and clinical treatment response. EPIC, MCPcounter and ssGSEA were used to evaluate the tumor infiltrating immune cells using bulk RNA sequencing data. It is generally believed that patients with higher proportion of CD8+T cells, B cells, Cytotoxic lymphocytes, NK cells and aDC cells in tumor microenvironment have better clinical prognosis(Becht et al., 2016b; Fridman et al., 2012; Becht et al., 2016a). In our results, the proportion of CD8+T cells, B cells, Cytotoxic lymphocytes, and aDC cells in the low-risk group was higher than that in the high-risk group (Figure 7).
Immune checkpoints and regulators of immunogenic cell death
Patients with ovarian cancer are commonly diagnosed at advanced stage or with metastatic disease, and the usual surgical interventions have failed to react. A combination of radiotherapy, chemotherapy, targeted therapy, and immune checkpoint inhibitors is used to treat cancer, and even therapeutic tumor vaccines are used. Immune checkpoints (ICs) and immunogenic cell death (ICD)(Krysko et al., 2012; Huang et al., 2021) are important in cancer immunotherapy strategies and chemotherapy(Gebremeskel and Johnston, 2015), and we studied their expression levels in patients with different risk scores. It can be seen that in the TCGA-OV cohort data set, ICs expression levels in different risk groups are significantly different. Overall, ICs expression levels in the low-risk group are higher than those in the high-risk group (Figure 8A). The expression of ICs related genes CD86, LAG3, LAIR1 and CD44 in the TCGA-OV dataset and the external GEO dataset were significantly different in the high and low risk groups, and the expression was up-regulated in the low risk group (Figure 8A, B). CD86, as a ligand of Cytotoxic T-lymphocyte-associated protein 4 (CTLA-4), can interact with it to inhibit T cell activation (Dyck and Mills, 2017). LAG3 is an important ICs with important implications for cancer, infectious diseases, and autoimmunity. The binding of leukocyte associated immunoglobulin-like receptor 1 (AIR-1) to its ligand results in the loss of immune function in the tumor microenvironment (TME), and reduced T cell function and immune response of antigen presenting cells (Xu et al., 2020). Patients in the high-risk group all had low ICs expression levels, suggesting that ICs inhibitors may be less effective in the high-risk group. As for ICD-related gene sets, most of them were also highly expressed in the low-risk group, and CXCL10 and TLR4 showed statistically significant differences in TCGA-OV and GEO data sets (Figure 8C, D), suggesting that patients in the low-risk group might have better chemotherapy effects. Consequently, the risk score can reflect the expression level of ICs and ICD regulatory factors, and can be used for drug guidance in tumor immunotherapy and chemotherapy.
Clinical nomogram construction
Considering individual differences of clinical patients, clinically related characteristics of patients were included (Table 2), and the relationship between clinically related characteristics and risk score and prognosis of patients was studied by univariate Cox regression and multivariate Cox regression analysis. Univariate Cox regression analysis showed that risk score and clinical stage were correlated with the prognosis of patients (Figure 9A). Multivariate Cox regression analysis showed that, after adjusting other factors, there was no significant correlation between clinical stage and prognosis (P>0.05), while risk score (HR=3.151, P=0.013) was significantly correlated with prognosis (Figure 9B). This indicated that the risk score was an independent prognostic factor. Subsequently, we established a prognostic linear plot based on risk score to quantitatively estimate the 3,5-year survival probability of ovarian cancer patients (Figure 9C). The Calibration curves shows that TCGA-OV cohort data (Figure 9D) and external validation data GSE18520 (Figure E) are in good agreement with the optimal prediction probability (gray dotted line in figure) at 3 and 5 years. These results suggest that this comprehensive nomogram can be used as a reliable tool for predicting OS in ovarian cancer patients.