Data acquisition and pre-processing
The publicly available dataset GSE113253, which contained 170 samples of various mesenchymal stem cells, was downloaded from Gene Expression Omnibus (GEO). Among all the samples, 33 total RNA-sequencing data of bone marrow-derived human mesenchymal stem cells (BMSCs) were selected for further analysis. First of all, the “edgeR” package in R 3.6.3 (R Core Team, 2020, https://www.r-project.org) was applied to filter the low-expression raw data and normalise the sequencing depth difference of all the samples. R is a free software environment for statistical computing and graphics . Furthermore, we conducted hierarchical clustering and principal component analysis (PCA) to eliminate outlier samples accordingly. Finally, 31 samples (14 osteogenic BMSCs, 14 adipogenic BMSCs, and 3 undifferentiated BMSCs) were included in our analysis.
Construction of Weighted Gene Co-expression Network
We constructed the co-expression network through “WGCNA” package under R environment (https://horvath.genetics.ucla.edu/html/CoexpressionNetwork/Rpackages/WGCNA/). Firstly, the gene expression file and trait file were transformed into an appropriate format and the soft thresholding power (β value) was filtered based on the calculation of scale-free topological fit index and mean connectivity. The best β value was confirmed with a scale-free fit index bigger than 0.85 as well as the highest mean connectivity by performing a gradient test from 1 to 30. After that, the topological overlap matrix (TOM) was constructed by calculating the topological overlap between pairwise genes, and hierarchical clustering analysis was performed. The co-expression relationships among different modules were analysed and modules with high similarity were merged at the threshold of 0.25.
Module-trait correlation analysis and identification of interesting modules
To excavate interested modules which are highly related to the differentiation of BMSCs, correlation analysis between each module and different time points of adipogenic or osteogenic differentiation were conducted. This relationship is determined by Spearman’s correlation coefficient between module eigengene (ME, the major component of gene expressions in a module) and differentiation traits. Modules which had significant correlations with differentiation traits were selected for further validation.
Subsequently, gene significance (GS) and module membership (MM) were used for intramodular analysis. Gene significance is the relationship between gene expression level and differentiation trait while module membership represents the association between gene expression profile and ME of a given module. The modules which contained genes with a significant correlation between GS and MM were considered meaningful.
Enrichment analysis for biological function and Gene Set Enrichment Analysis
To investigate the biological function of the differentiation-related modules, enrichment analysis such as Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway, hallmark gene set, etc., was performed through Metascape (https://metascape.org/gp/), which is a comprehensive gene annotation and analysis resource (Y. Zhou et al., 2019). Terms with a P-value < 0.01, a minimum count of 3, and an enrichment factor > 1.5 were collected and grouped into clusters based on their membership similarities.
For further verification of the function of the modules during the respective differentiation process, we conducted the Gene Set Enrichment Analysis (GSEA) for each time point of differentiation. The gene expression pattern in certain differentiation point was compared to the undifferentiated BMSCs. The criteria for statistical significance were set as p < 0.01 and FDR < 0.25.
Construction of Protein-Protein Interactions network and Hub Gene identification
To study the potential mechanism under the differentiation of BMSCs, we constructed a protein-protein interactions (PPI) network through the Search Tool Retrieval of Interacting Genes / Proteins (STRING) database, which is a trustworthy online database that can predict the physical and functional association between known and predicted protein-protein interactions.
The MCODE package in Cytoscape (version 3.8, https://cytoscape.org/) was applied to identify the hub proteins which are high degree of connectivity in the whole network. Meanwhile, we calculated the intra-modular connectivity and significance for all genes in the interested module. The genes with absolute gene significance > 0.8 and absolute intramodular connectivity > 0.8 were screened to compare with the hub genes from the PPI network. Moreover, the ”edgeR“ package was used to determine the DEGs between differentiated cells and undifferentiated BMSCs at each time point. The overlapped genes between PPI hub protein, intramodular hub genes, and DEGs were deemed to be key candidate genes that may regulate the differentiation of BMSCs.
Transcription factors and target-miRNA interaction analysis for interested modules
To recognize the crucial regulation factors for the differentiation of BMSC, we performed the enrichment analysis of transcription factor binding motifs (TFBMs) through the RcisTarget package under the R environment. The databases, hg19-tss-centered-10kb-7species.mc9nr (species = Homo sapiens, genome = hg19, distance = 10kb around the transcription start site (TSS), number of orthologous species (nOrt) =7, motif collection = Version 9 (mc9nr): 24453 motifs), were utilized to analyse the gene list of each interested module. The significant motifs based on the Normalized Enrichment Score (NES) and the genes with the best enrichment for each motif were outputted. Additionally, the TFs-genes interaction network was constructed by NetworkAnalyst (https://www.networkanalyst.ca/), which is a comprehensive network visual analytics platform for gene expression analysis.
The Tarbase v.8, a decade-long collection of experimentally supported miRNA-gene interactions, was used to seek out the potential miRNAs that can target the mRNAs within interested modules(Karagkouni et al., 2018). For improving the reliability of the predicted miRNAs, only the experimentally validated mRNAs-miRNAs pairs were included. Based on the number of targeted mRNAs, the top 5 miRNAs were extracted and the miRNA-mRNA network was visualized by Cytoscape 3.8.
Cell culture and differentiation
BMSCs extracted from the human femoral head (H. Li et al., 2016) were cultivated in Dulbecco’s Modified Eagle Medium (DMEM; 21885-025; Gibco, Germany) supplemented with 10% fetal bovine serum (FBS; P30-3702; PAN BIOTECH, Germany) and 1% penicillin-streptomycin (15140-148; Gibco, Germany) at 37°C in an incubator with 5% CO2. The culture medium was changed every 3 days. To induce osteogenic differentiation, the culture medium was replaced by DMEM with 100nM dexamethasone (D2915-100MG; Sigma, Germany), 10nM sodium-β glycerophosphate (G5422-25G; Sigma, Germany), 0.05mM L-ascorbic acid (A8960-5G; Sigma, Germany), 10 % FBS and 1% penicillin-streptomycin. The adipogenic differentiation was induced by DMEM with 1µM dexamethasone, 0.5mM 3‑isobutyl‑1‑methylxanthine (IBMX; I5879; Sigma, Germany), 200µM indomethacin (I7378-5G; Sigma, Germany), 10µM insulin (I9278; Sigma, Germany), 10 % FBS and 1% penicillin-streptomycin. The medium was changed also every 3 days.
ALP staining, Alizarin Red S staining, and Oil Red O staining
After differentiation for a certain time, the BMSCs were fixed with 4% PFA for 30min at room temperature and washed with PBS twice. After that, cells were stained with ALP staining kit (ab242286; Abcam, Germany), Alizarin Red S (C.I. 58005; ROTH, Germany), and Oil red O (O0625-25G; Sigma, Germany) working solution for 15min, respectively. The stained cells were observed and recorded by microscopic analysis .
RNA extraction and Real-Time quantitative PCR (RT-qPCR)
Total RNA of BMSCs and differentiated cells was extracted with RNA-Solv Reagent (R6830-02; Omega, Germany) and was reverse transcribed into complementary DNA by High-Capacity RNA-to-cDNATM Kit (4387406; Thermo Fisher Scientific, Germany) according to the manufacture instruction. RT-qPCR was performed on a 7300 Real-Time PCR system (Applied Biosystems, Germany) using SYBRTM Green Mastermix (4385612; Thermo Fisher Scientific, Germany). The results were normalised to the expression level of GAPDH and the relative expression levels of each gene were calculated by the 2-ΔΔCt method. The sequences of primers are listed in a Supplementary table 1.