Endometrial stromal cell viability and purity from patient samples
Endometrial stromal cells isolated from endometrial biopsies were grown in culture and stored frozen (Fig. 1A). We selected 22 frozen samples for analysis and to ensure pure, viable cells of mesenchymal lineage from thawed preparations conducted two channel FACS sorting with forward and side scatter (Fig. 1B), propidium iodine (PI) exclusion (Fig. 1C) and platelet derived growth factor β+ (PDGFB+) expression (Fig. 1D). The mean cell concentration after initial thaw was 2.19 x 106; (Range 0.172 x 106 – 3.46 x 106) with 90.0% remaining viable after the thawing process. One sample did not yield a sufficient cell concentration, and two samples had a final viability < 80% and were not carried forward to single cell analysis, resulting in a final 19 samples.
Single cell RNA-sequencing
In order to increase the size of the patient cohort, we used a multiplexing approach to combine samples from different patients into four microfluidic runs of the 10X Genomics Chromium Platform (P1: Patients 1–5, P2: 6–10, P3: 11–14 and P4: 15–19) (Fig. 1A). We obtained sequencing data from the four scRNA-seq libraries constructed from our four pools of 19 endometrial stromal cells (ESC).
For each pool, the number of cells obtained were 10,411 (P1), 9,869 (P2), 10,894 (P3) and 9,808 (P4) making a total of 40,982 cells. Demuxlet identified 3,982 doublets that were randomly distributed across each pool (Supplementary Fig. 1A-D). We also excluded 358 ambient cells, 2,884 with > 10% mitochondrial DNA that were considered stressed or dying cells, and cells with either very high (> 6,500), or very low (< 200) numbers of expressed genes. From the initial 40,982 cell dataset we retained 33,758 high quality singlets for analysis with an average read depth of 58,541 (P1), 58,277 (P2), 53,825 (P3) and 55,836 (P4) for each pool. We detected a median of 19,803 (P1), 20,095 (P2), 20,651 (P3) and 22,472 (P4) unique molecular identifiers (UMI) per cell with the total number of genes with measurable expression of 21,885 (P1), 21,736 (P2), 22,178 (P3) and 21,996 (P4). In total, 20,590 unique genes were identified across all four pools with the median number of genes expressed per cell as 3,780 (P1), 3,761 (P2), 3,941 (P3) and 4,050 (P4). After doublet-filtering and quality control, between-pool batch effects were corrected using Harmony (Fig. 1E-F).
We next investigated transcriptome similarity and assessed whether potential cellular subtypes were present through unsupervised Louvain clustering using Seurat v3.0.2 (14). A critical step in deriving relevant data from single cell datasets is selecting the appropriate resolution for clustering. Increasing resolution increases cell clusters, although potentially at the expense of biological relevance (Supplementary Fig. 2). Using the clustree (15) package, we produced a cluster tree with 13 levels of resolutions ranging from 0.01-1.0 (Fig. 1G) to visualise the similarity between cells at multiple resolutions, and track how cells move between clusters as resolution is varied. At a coarse resolution (0.1), three distinct nodes were identified that established stable clusters with minimal movement across nodes at increasingly finer resolutions. At this course resolution, there was also a distinct spatial separation for each cluster (Fig. 1H) and the number of cells from each experimental pool within each cluster was consistent (Fig. 1I). In contrast, clustering at a finer resolution (0.6) generated 20 clusters which lacked distinct spatial resolution and revealed mixing between clusters and possible over clustering resulting from technical artifacts (Supplementary Fig. 2).
Cell cycle scoring
Genes can be periodically regulated during the cell cycle (16), influencing their transcriptome and affecting the ability to accurately cluster cells based on phenotype. To characterise cell cycle for each cell, we calculated G1, G2M and S scores using the CellCycleScoring function in Seurat and human cell cycle phase gene expression profiles (17). The majority of cells analysed (69.65%) showed a G1 phenotype, whilst 15.77% of cells were classified as G2M and 14.58% were classified as S phase (Fig. 1J). We observed an enrichment of the proliferating cells (G2M) in cluster 1 (54.94%). Cluster 1 also had an increased proportion of cells in S phase (31.94%) compared to clusters 0 (7.88%) and 2 (12.64%). The majority of cells in cluster 0 and 2 were classified as the quiescent G1 phase (90.80% and 84.11% respectively).
Differential gene expression between clusters reveals discrete signatures
We next examined the differentially expressed genes (DEGs) that underlie these cluster differences (Fig. 2A). Setting a log fold change (logFC) > 0.25 and adjusted p value < 1x10− 4 we found 152 significant DEGs between cells in cluster 0 and all other cells (Supplementary table 1). A comparison between cluster 1 and all remaining cells found 707 DEGs (Supplementary table 2), and cluster 2 (Supplementary table 3) and all other cells found 113 DEGs. Spatial representation of three of the top DEGs in cluster 0 (IGFBP5; logFC = 0.984; adj. p value < 1.0 x10− 305; MMP11; logFC = 0.887; adj. p value < 1.0 x10− 305 and ACTA2; logFC = 0.728; adj. p value < 1.0 x10− 305) (Fig. 2B) revealed strong variation and non-synonymous distribution within the cluster, accompanied by low but consistent expression in the two other clusters. A similar spatial resolution was observed for UBE2S (logFC = 1.67; adj. p value < 1.0 x10− 305) with high expression in cluster 1 but low, consistent expression in the remaining clusters, although both PTTG1 (logFC = 1.833; adj. p value < 1.0 x10− 305) and UBE2C in particular (logFC = 2.02,; adj. p value < 1.0 x10− 305) showed a limited expression confined mostly to cluster 1. In cluster 2, MMP3 (logFC = 2.31; adj. p value < 1.0 x10− 305), CST1 (logFC = 2.19; adj. p value < 1.0 x10− 305 and MMP10; (logFC = 1.56; adj. p value < 1.0 x10− 305) showed significant differential expression, and both CST1 and MMP10 expression were limited mostly to cluster 2. We also performed differential expression analysis between specific cluster pairs (cluster 0 vs 1, 1 vs 2, and 0 vs 2) and detected 242, 246 and 17 significant DEGs, respectively (logFC > 0.5, adj. p value < 1x10− 4; Supplementary Tables 4, 5 and 6).
To gain further insight into the biological differences underlying the three clusters, we performed pathway analysis using the top 200 significant DEGs for each cluster using Reactome, KEGG and gene ontology databases (Fig. 2C). This revealed significantly enriched processes involved in extracellular matrix organisation (adj. p value < 1.33x10− 12) and focal adhesion (adj. p value = 1.1x10− 3) for cluster 0. For cluster 1 we observed significant enrichment of cell cycle (adj. p value = 6.97x10− 12), progesterone-mediated oocyte maturation (adj. p value = 4.28x10− 8) and oocyte meiosis (adj. p value = 2.53x10− 7), whereas cluster 2 DEGs were enriched for extracellular matrix organisation (adj. p value = 1.89x10− 14) but also antigen processing and presentation (adj. p value = 0.0049), and allograft rejection (adj. p value = 0.0389), indicating a potentially immune-reactive cell population (Fig. 2C).
Cell- type annotation identification
Mesenchymal maturation can take multiple pathways leading to divergent progeny such as fibroblasts, adipocytes and smooth muscle cells (18). To annotate our transcriptomically defined cell clusters we applied SingleR (19). This utilises the transcriptomic signatures from the Human Primary Cell Atlas, a database curated from publicly available microarray datasets of human primary cells (20). The analysis confirmed a close alignment with cells of mesenchymal lineage, albeit with variations in mesenchymal progeny distributed across the clusters (Fig. 3A). The five most prevalent cell types identified were fibroblasts (84.78%), mesenchymal stem cells (MSCs, 13.66%), smooth muscle cells (1.21%), induced pluripotent stem (IPS) cells (0.24%) and tissue stem cells (0.11%) (Fig. 3B).
Overlay of the different cell types with the clustering analysis revealed fibroblasts were the predominant cell type of cluster 0, representing (98.34%) of the cells in this cluster, with 0.93% identified as MSCs. As this was the largest cluster of fibroblast cells we designated this cluster ‘fibroblast major’. Similarly, cluster 2 while distinct from cluster 0 was predominately fibroblasts (97.34%) with the inclusion of some MSCs (2.50%) and was subsequently termed ‘fibroblast minor’. The clustering differences observed between the fibroblast clusters (clusters 0 and 2) could not be attributed to cell cycle differences (Fig. 2E). Cluster 1 identified predominantly as MSCs (57.42%), and as such was named the ‘MSC cluster’, although 42.53% of cells within this cluster also identified as fibroblasts (Fig. 3C).
Cell fate trajectory between MSCs and fibroblasts
To investigate dynamic biological processes within our dataset we applied pseudotime and RNA velocity analysis using Monocle 2 and scVelo (21, 22) (23, 24). This allowed the opportunity to study cellular differentiation or lineage progression by ordering individual cells along a trajectory, which can then be used to infer the state of individual cells in processes such as cell maturation (Fig. 3D). Overlay of the clustering data on the pseudotime trajectory predictions suggested a directional progression from cluster 1 (MSC cluster) as the root cell directing a cell fate lineage towards cluster 0 (fibroblast major) (Fig. 3E). This directional progression was also observed in the RNA velocity analysis (Fig. 3F), supporting the hypothesis that the cell differentiation pathway extends from cluster 1 (MSC cluster) to cluster 0 (fibroblast major). Cluster 2 (fibroblast minor) in contrast, was spread uniformly across the differentiation trajectory. Finally, we overlaid cell cycle information onto the trajectory plot and observed that most G2M phase cells, as well as the S phase cells corresponding to MSCs and the less differentiated fibroblasts consistent with the initial cell cycle analysis of each cluster and indicative of higher proliferative ability of the MSCs. (Fig. 3G).
Deconvolution of individual patient cells
To ascertain whether cell types, or cell clusters were associated with clinical phenotype we assigned each cell from the 33,758 cell dataset to the source patient. To demultiplex the individual patient samples in the 4 microfluidic pools we collected SNP genotyping information and used demuxlet (25) to assign each cell to the patient from which it was derived. Demuxlet uses statistical modelling to identify RNA-seq reads that overlap single nucleotide polymorphisms (SNPs). Using SNP data and imputation generated from genotyping the most likely donor for each cell can be identified. Demuxlet identified an average of 1,229 SNPs (range = 13 − 3,970) per cell across pools; P1 (mean = 1,264 SNPs/cell), P2 (mean = 1,270 SNPs/cell), P3 (mean = 1,213 SNPs/cell) and P4 (mean = 1,176 SNPs/cell) (Supplementary Fig. 3A-D), allowing the confident assignment of 100% of the cells. Using this method the number of cells analysed for each patient in each pool was identified (Fig. 4A).
Clinical relationship with identified cell types
Using the clinical data we compared the number of MSCs, fibroblasts and smooth muscle cells derived from women with and without endometriosis. In total, 16,650 cells (49.3%) were sourced from 9 women without endometriosis and 17,108 (50.7%) cells from 10 women with endometriosis. The results indicate no significant variation in the percentage of fibroblasts from the endometrium of cases (84.93%) compared to controls (81.74%) (Fig. 4B). There was also no difference between the percentage of MSCs derived from the cases (14.48%) versus the controls (17.83%) (Fig. 4C). The percentage of smooth muscle cells derived from cases (0.24%) was larger compared to controls (0.08%) (Fig. 4D) although the difference did not reach significance (p = 0.0738).
Our data indicates that fibroblasts could be split into two distinct groups. We therefore also compared the number of fibroblast major (cluster 0), fibroblast minor (cluster 2) and MSC cluster (cluster 1) cells that were from women with and without endometriosis (Fig. 4E). The MSC cluster (cluster 1) had a total of 9,033 cells of which 4,199 (46.5%) were from women with endometriosis and 4,834 (53.5%) from women without endometriosis. The fibroblast major cluster (cluster 0) contained 22,881 cells with 11,418 (49.9%) from women with endometriosis and 11,463 (50.1%) from women without endometriosis. The fibroblast minor cluster (cluster 2) which contained 1,844 cells consisted of 1,491 (80.9%) cells from women with endometriosis and only 353 (19.1%) cells from women without endometriosis (Fig. 4F). A Chi-squared test indicated significantly more fibroblast minor cells (cluster 2) were from women with endometriosis (p < 0.0001).
Relationship between cell type, cell clusters and in vitro growth
Finally, we performed an analysis of cell growth rates for a continuous 100 hour period using a subset of 11 cell preparations and the xCELLigence assay. We compared growth rates of individual cell preparations, endometriosis status, and the percentage of cell type (MSCs and fibroblasts) or cell subset (MSC cluster, fibroblast major, fibroblast minor). For individual preparations, the growth rates varied for both endometriosis cases and controls (Fig. 5A). Grouping cell preparation by endometriosis status showed variable rates of proliferation between cases and controls at different time points. Cells from controls had an initial (0–15 hours) increased rate of proliferation with the growth rate eventually plateauing after 35 hours. In contrast the growth rate of cells from endometriosis cases continued to increase until the end of the incubation period (100 hours) resulting in an increased number of cells from endometriosis cases, although the difference was not significant (p = 0.4725)(Fig. 5B).
We also investigated the association between the contents of each cell preparation and growth rates by plotting the correlation between the percentage of each cell type, as determined by scRNA-seq and SinglR ananlysis and cell growth rates against time. The analysis revealed a positive association between MSC content and cell index that reached the strongest correlation between 8.25–9.75 hours (Pearson’s r = 0.6364, p = 0.0402) (Fig. 5C). Similarly there was an opposite negative correlation with the percentage of fibroblasts and cell index between 8.5–9.5 hours (Pearson’s r = -0.618, p = 0.0478 ) (Fig. 5D).
As the scRNA-seq analysis identified two subsets of fibroblasts (fibroblast major and fibroblast minor) we further assessed the association between the percent content of these cells in each cell preparation and growth rates. We found that each subtype displayed contrasting growth profiles, with the fibroblast major cluster (cluster 0) showing a non-significant positive association with growth rate and the fibroblast minor cluster (cluster 2) showing a significant negative correlation between 17.25–26 hours (Pearson’s r = -0.681, p = 0.025) (Fig. 5E). confirming the relative presence of each fibroblasts influenced in vitro growth profiles at a later time points after seeding compared to MSCs.