RNA-seq analysis
We constructed clustering maps using the t-distributed stochastic neighbor embedding (t-SNE) statistical method to visualize the transcriptomic similarities and differences between the samples. The clustering maps showed all human MSC samples clustering together in a distinct cluster apart from tissue-specific cell (TSCs) samples (Fig. 1a). Likewise, the mouse MSC samples clustered together, while TSC samples clustered independently (Fig. 1b). Next, we compared the gene expression of human-derived MSCs against their tissue-specific counterparts and identified 20,973, 24,365, 8,296, and 29,197 DEGs for h_AM_MSCs, h_BM_MSCs, h_AT_MSCs, and h_UC_MSCs, respectively. We constructed a Venn diagram to visualize the common DEGs shared by the MSCs derived from the four different tissue sources. The common DEGs made up 4.6% of the examined DEGs, equivalent to 2,181 common DEGs (Fig. 2a). Similarly, for the mouse datasets, we compared the gene expression of mouse-derived MSCs with their tissue-specific counterparts and identified 14,843 DEGs for m_AT_MSCs and 12,783 DEGs for the m_BM_MSCs. We found that the common DEGs made up 78.8% of the examined DEGs, equivalent to 12,178 common DEGs (Fig. 2b). Finally, we compared both lists of common DEGs to determine if the human and mouse MSCs shared any common DEGs. The Venn diagram showed that the two species had 1,583 (13.3%) DEGs in common (Fig. 2c). Interestingly, the heatmap of the 1,583 common DEGs showed three main clusters: a cluster that included all the human MSC samples, another cluster that included all the mouse-derived MSC samples plus m_BM_TSCs, and, finally, the last cluster included the rest of the TSC samples. Each of these clusters included subclusters that grouped the triplicate of each tissue type (Fig. 3).
DEGs ontology and enrichment analysis
In the gene ontology analysis, We produced a list of 157 enriched gene ontology (GO) terms with a threshold P-value of 10−3. ReViGO generated a scatter plot of the enriched GO terms organized according to their significance (p-value) and uniqueness. Following visualization, we identified a unique GO term (GO:2000736) to regulate stem cell differentiation with a significant p-value of 8.69E-4, a q value of 4.64E-2, and an enrichment score of 1.48. Moreover, this GO term had a frequency of 0.010%, a log10 p-value of –3.0610, and a uniqueness score of 0.70 (Fig. 4). Other GO terms were more general, less unique, and not explicitly specific to stem cell function. The GO term GO:2000736 included 23 genes that belonged to the proteasomal degradation pathway (Fig. 5) (Table1).
Table 1
List of DEGs belonging to the proteasomal degradation pathway identified in the GO enrichment analysis.
HUGO ID
|
Systemic ID
|
Gene
|
PSMA1
|
α6
|
proteasome (prosome, macropain) subunit, alpha type, 1
|
PSMA5
|
α1
|
proteasome (prosome, macropain) subunit, alpha type, 5
|
PSMA7
|
α4
|
proteasome (prosome, macropain) subunit, alpha type, 7
|
PSMB1
|
ß6
|
proteasome (prosome, macropain) subunit, beta type, 1
|
PSMB2
|
ß4
|
proteasome (prosome, macropain) subunit, beta type, 2
|
PSMB3
|
ß3
|
proteasome (prosome, macropain) subunit, beta type, 3
|
PSMB4
|
ß7
|
proteasome (prosome, macropain) subunit, beta type, 4
|
PSMB5
|
ß5
|
proteasome (prosome, macropain) subunit, beta type, 5
|
PSMB6
|
ß1
|
proteasome (prosome, macropain) subunit, beta type, 6
|
PSMB7
|
ß2
|
proteasome (prosome, macropain) subunit, beta type, 7
|
PSMC1
|
Rpt2
|
proteasome (prosome, macropain) 26s subunit, atpase, 1
|
PSMC2
|
Rpt1
|
proteasome (prosome, macropain) 26s subunit, atpase, 2
|
PSMC4
|
Rpt3
|
proteasome (prosome, macropain) 26s subunit, atpase, 4
|
PSMC5
|
Rpt6
|
proteasome (prosome, macropain) 26s subunit, atpase, 5
|
PSMD1
|
Rpn2
|
proteasome (prosome, macropain) 26s subunit, non-atpase, 1
|
PSMD2
|
Rpn1
|
proteasome (prosome, macropain) 26s subunit, non-atpase, 2
|
PSMD3
|
Rpn3
|
proteasome (prosome, macropain) 26s subunit, non-atpase, 3
|
PSMD4
|
Rpn10
|
proteasome (prosome, macropain) 26s subunit, non-atpase, 4
|
PSMD5
|
Rpn4
|
proteasome (prosome, macropain) 26s subunit, non-atpase, 5
|
PSMD7
|
Rpn8
|
proteasome (prosome, macropain) 26s subunit, non-atpase, 7
|
PSMD8
|
Rpn12
|
proteasome (prosome, macropain) 26s subunit, non-atpase, 8
|
PSMD13
|
Rpn9
|
proteasome (prosome, macropain) 26s subunit, non-atpase, 13
|
PSMD14
|
Rpn11
|
proteasome (prosome, macropain) 26s subunit, non-atpase, 14
|
Gene expression analysis
Now that our attention was drawn to these 23 genes, we wanted to take a closer look at their behavior. To begin with, we inspected their expression patterns in both species. We found that the 23 genes appeared to be upregulated in both the human and mouse MSC samples, except for PSMD4, which was downregulated in the mouse MSC samples compared to the TSC samples (Fig. 6a and b). PSMD4 had a P-value of 6.27E-03 and a fold change of 0.136934557 in m_AT_MSCS, while in m_BM_MSCs, it had a P-value of 9.79E-03 and a fold change of –0.224675507. Since most genes were upregulated, we attempted to explore further the interplay between these genes and other systems that assist the proteasome in antioxidant defense, namely, antioxidant enzymes.
Consequently, members of the superoxide dismutase, glutathione peroxidase, peroxiredoxin, thioredoxin, and peroxidasin families were cross-referenced against the 1,583 common DEGs identified earlier. We found SOD3, GPX7, GPX8, PRDX2, PRDX4, TXN2, and PXDN werepresent in our list of common DEGs. We found that the majority of the transcripts of these genes was upregulated but not all. Out of the seven genes, four (GPX7, GPX8, PXDN, TXN2) were upregulated in all analyzed MSC samples. Each of the other three (SOD3, PRDX2, and PRDX4) was upregulated in all the MSC samples except human AT-MSCs and BM-MSCs and mouse BM-MSCs, respectively (Fig. 6c and d).
Gene interaction network
Next, we wanted to shed more light on the interaction between these antioxidant enzymes and the 23 members of the proteasomal degradation system identified earlier. We employed the help of Cytoscape and the Genemania database to understand the interplay between these antioxidant enzymes and the proteasomal genes. A gene interaction network was generated, and it showed all 23 genes of the proteasomal degradation pathway to be co-expressed together and co-expressed with the antioxidant genes. Specifically, it showed PSMA7 to be co-expressed with GPX7, which in turn is co-expressed with GPX8, SOD3, and PXDN. Also, it showed that PRDX2 is co-expressed with PSMA7, PSMB3, PSMB6, PSMB7, PSMC4, PSMD3, and PSMD8. Furthermore, TXN2 was co-expressed with PSMA1, PSMA7, PSMB3, and PSMB6. Finally, PRDX4 was co-expressed with PSMA5, PSMB1, PSMB2, PSMB5, PSMB6, PSMC1, PSMC2, PSMC5, PSMD1, PSMD8, and PSMD14 (Fig. 7).
Predictive model
Finally, we wanted to test the genes’ efficiency in predicting the identity of MSCs across the different tissue sources in both human and mouse species. To test this, AutoWEKAClassifier performed 486 evaluations of available classifiers and found Random Forest to be the best classifier with the best error rate. The Random Forest tree classifier was used to train the model with 10-fold cross-validation, and the trained model was finalized. The final model was loaded to test its performance in predicting stem cell type on the testing data. We used the upregulated proteasomal genes as attributes, and we removed PSMD4 from the list since it had inconsistent expression across both species. We proceeded with the other 22 genes and ran the Random Forest model. The model tested the data 40 times and showed that MSCs were correctly classified in all 40 instances. To test if all 22 genes contributed equally to the classification process, we ran the gain ratio attribute selection evaluator in WEKA. We found six genes to be the top contributors in the classification: PSMB5, PSMB1, PSMD14, PSMC4, PSMA1, and PSMD8 (Supplementary Table 3). We repeated the Random Forest model using these six genes and these six genes were enough to correctly classify the MSCs all 40 instances (Supplementary Table 4).