2.2. Features and utilities
Framework of BMDB The framework of BMDB web server encompasses four primary functional groups: "Project Catalog" for data management, "Exploration" for display of integrated datasets, "Reference Atlas" for investigating reference atlases, and "De novo Analysis" for users' data analysis (Fig. 1B). Within the "Exploration" group, the "Project Information" sample manager is comprised of "Publication Info" and "Sample Info", along with four interactive visualization modules: "Clusters Visualization", "Developmental Trajectory", "Pathway Enrichment", and "Cell-cell Communication" (Supplementary Fig. 1A). Detailed information can be acquired by accessing the DOI link leading to the publication's homepage. The utilization of "Sample Info" facilitates the management of data and associated reference information. The establishment of the "Clusters Visualization" module enables fundamental data visualization, encompassing cell type, percentage of a particular cell type, and gene expression within cell clusters. The "Developmental Trajectory" module serves the purpose of visualizing the cell-fate trajectory within the chosen lineage. Furthermore, the "Pathway Enrichment" module is employed to visualize the enrichment of pathways enriched with highly variable genes among distinct cell clusters. The term "cell-cell" is employed to visually represent the activation of intercellular signaling pathways between various cell types.
Construction of reference atlas The construction of a high-quality atlas for mouse bone marrow niche cells was accomplished by utilizing a pipeline that integrates canonical analysis methods (Fig. 1C, Supplementary Fig. 1B and Supplementary Fig. 1C), based on three representative datasets(29–31). The UMAP plots were used to present the outcome of the initial cell clustering, which encompassed all cells. Subsequently, a manual filtering process was conducted to attain unbiased cell type annotation, utilizing classical gene markers from a previously published article(29). This annotation categorized the cells into six types, namely MSCs, Chondrocyte-like cells(CLCs), Osteolineage cells(OLCs), fibroblasts, Endothelial cells(ECs), and pericytes (Supplementary Fig. 2). Further refinement of the annotation was carried out for each cell type, employing marker genes identified by Findmarkers (Supplementary Fig. 3, Supplementary Table 2). The subsequent procedure for identifying cell types involves a sequential set of actions, encompassing the elimination of cells with excessive potential, verification through developmental trajectory analysis, and integration of clusters through manual intervention. Subsequently, a final manual correction of cell types was executed to establish a comprehensive transcriptome reference, which will be utilized in the BM landscape project. A comparable approach was employed to construct the human fetal single-cell reference atlas of BM niche, drawing upon the datasets that have been previously published(32). Based on established cell markers, the human fetal BM niche was found to contain three primary cell types: hematopoietic cells, ECs, and stromal cells (Supplementary Fig. 4A). Following the removal of hematopoietic cells, further annotation was conducted on both ECs and stromal cells using previously published marker genes(32). This analysis revealed four subsets of ECs (aECs, ECs_prolif, lyECs, and sECs) and three types of stromal cells (Bone marrow stromal cells(BMSCs), Chondrocytes, and OLCs) (Supplementary Fig. 4B). Subsequently, meticulous annotation was conducted in BMSCs, Chondrocytes, and OLCs with the objective of delineating the subclusters based on documented cell markers. Specifically, two distinct subsets of BMSCs, namely Cxcl12-abundant-reticular cells(CARs) and fibroblast, were identified, along with two subsets of Chondrocytes, namely Chondrocytes and Chondrocytes-prolif, and two subsets of OLCs, namely Osteoblasts(OBs) and Osteoprogenitor cell(OPs), were respectively assigned annotations (Supplementary Fig. 4C, Supplementary Fig. 5). The "Reference Atlas" page lists all the processed datasets and associated information. For further analysis and visualization, users can view the UMAP of integrated atlas and detailed demonstration of cell types composition and genes expression (Fig. 2).
Data curation and basic display The dataset management process was conducted by BMDB utilizing an internally developed pipeline. BMDB exhibits a user-friendly interface and offers a versatile access pathway, enabling users to efficiently retrieve information on individual datasets and their associated details. Upon selecting the "Dataset" option on the homepage, the Dataset catalog will appear, presenting a comprehensive collection of currently amassed datasets. Pertinent information such as publication details (author, journal, and year), species, tissue, and cell count will be readily displayed within the “Project Catalog”. By clicking the "Info" button (Fig. 2A), users are directed to the last column, which provides access to detailed publication information and sample information. The "Publication information" page displays the DOI, title, and abstract of the publication. On the "Sample Information" page, users can access comprehensive and structured details about the sample, including Data ID, species, tissue, cell count, and library construction method. Furthermore, clicking the "View" button in the last column of the "Sample Table" reveals the cell clustering and annotation.
Users can have the option to select from three clustering types, namely main annotation, fine-tune annotation, and Louvain cluster (Fig. 3B). Fine-tune annotation, in comparison to main annotation, notably enhances the accuracy of identifying cell types. The outcomes of cell clustering will be visually depicted using UMAP plots, while the distribution of each cell type will be represented using a pie chart. In order to alleviate the computational burden on the system, BMDB offers the capability to sample cells by a predetermined percentage. This allows for proportional sampling of cell clusters, facilitating visualization. Furthermore, by choosing a gene from the provided drop-down menu, the expression pattern of the selected gene across various cell types will be displayed (Fig. 3C).
Extended display of built-in data The "Exploration" functional module allows users to conduct various complex analyses, such as cell differentiation trajectory, pathway enrichment, and cell-cell communication (Fig. 4). On the trajectory analysis page, users have the option to customize several parameters in order to obtain the temporal-spatial development patterns of the chosen cell lineage through a drop-down box (Fig. 4A). By selecting the "State of Trajectory" using the "State Type" drop-down box, the monocle state of the selected cell lineage will be displayed on DDRTree. Furthermore, the subcluster of the cell lineage can also be projected onto the DDRTree when “Clusters” are selected via the drop-down box. Pseudotime analysis is conducted by utilizing monocle2 and slingshot. As previously stated, users have the option to sample cells based on a predetermined percentage for this analysis. To investigate the involvement of the gene of interest in the development of a specific cell type, users can choose the gene from the activated gene list. DDRTree will display the gene expression in each individual cell, along with the demonstration of gene expression dynamics along the pseudotime trajectory.
The "Pathway Enrichment" module facilitates the execution of pathway analysis on cell clusters. Through the selection of specific pathways, such as GO:0030953 and GO:0002062, from the provided drop-down box, users are able to visually represent their expression patterns using UMAP plots (Fig. 4B). For instance, the expression of pathway GO:0030953 is observed across all cell clusters, whereas the expression of pathway GO:0002062 is predominantly found in EC cells.
The module "Cell-cell Communication" provides users with the ability to analyze the process of cell-cell communication (Fig. 4C). To illustrate this concept, we have utilized the MK signaling pathway as an example, which can be selected from the drop-down box. On the bubble chart page, the y-axis displays the various pairs of Mdk and their respective receptors, while the x-axis represents the pairs of two different cell types. The color of each bubble corresponds to the level of contribution that the Mdk-receptor pair on the y-axis has towards the communication between the cell pair on the x-axis. Mdk-sdc4 exhibits a significant role in facilitating communication between MSC.5–19 and various cell types, such as CLCs.3, CLCs.4-14-27, fibroblasts.2, and fibroblasts.13–20. Furthermore, the comprehensive impact of the integrated signaling pathway can be visually represented through the utilization of a heatmap, network graph, and circle plot. The findings suggest that the Mdk signaling network primarily serves as a mediator for cell communication within the bone marrow niche, connecting MSCs or OLCs with other cellular entities. Violin plots can effectively depict the expression levels of the individual gene within each cell type along the pathway. In particular, Mdk exhibits predominant expression in MSCs and OLCs. The receptors associated with Mdk display diverse expression patterns among bone marrow cells. Notably, sdc1 is primarily expressed in MSCs, fibroblasts, and OLCs, while Ncl and ltgb1 exhibit widespread expression across all cell types within the bone marrow niche.
De novo analysis for users’ data BMDB platform incorporates an online de novo analysis function for users' data, enhancing its capabilities as a robust tool for the exploration of bone marrow scRNA-Seq data (Fig. 5). As previously stated, the current version of BMDB offers a comprehensive collection of high-quality mouse and human fetal single-cell reference atlases of the bone marrow niche. The availability of high-quality reference atlases enables the realization of online data analysis for user data in BMDB. BMDB supports both H5AD and RDS file formats. Upon uploading the files to the "De novo Analysis" section, cell mapping can be conducted by selecting the appropriate reference atlas and mapping method from the provided drop-down box (Fig. 5A). Two optional mapping methods, namely WNN and scANVI, are available for cell mapping (Fig. 5B). Additional analysis can be conducted using the function modules incorporated within the "De novo Analysis" group, encompassing cell clustering and annotation, pathway enrichment, and cell-cell communication.