Hemoglobin electrophoresis and thalassemia gene detection
The hemoglobin electrophoretic patterns of the six cord blood samples used for RNA sequencing (RNA-Seq) analysis are shown in Additional File 1. The average HbF levels of the full-term female (FF) group and premature female (PF) group statistically differed (84.5 ± 8.1% and 57.8 ± 8.1%, respectively; P = 0.016), the gestational age was 34.33 ± 0.92 weeks and 40.43 ± 0.14 weeks, respectively. The gene detection results showed that none of the six newborns carried any of the three α-globin gene deletion mutations (--SEA, -α 3.7, or -α 4.2), three non-deletion mutations (Hb CS, Hb QS, or Hb WS), or 17 β-globin gene mutations most commonly seen in the Chinese population (Additional File 2).
Isolation of NRBCs
NRBCs were successfully isolated from each cord blood sample using magnetic bead technology based on the CD71 and CD45 cell surface markers. All isolated cells (CD71+CD45-) were analyzed by flow cytometry to determine purity. The average purity of all samples was >92% (Fig. 1a, b). In addition, we used flow cytometry to analyze the CD235a expression of CD71+CD45- cells, and we found that almost all cells were CD235a positive (Additional file 3).
The isolated cells were stained with Wright’s stain and observed under a microscope. Most of the cells in the field of vision were NRBCs; the NRBCs were scattered, and mostly comprised orthochromatic normoblasts with rare polychromatic erythroblasts, basophilic erythroblasts, and proerythroblasts (Fig. 1c). The enucleated red blood cells were further identified as reticulocytes by Brilliant cresyl blue staining (Fig. 1d). NRBCs and reticulocytes accounted for >99% of the sorted cells.
Differentially expressed small RNAs, mRNA and lncRNA
More than 23 million clean 50-bp single-end reads were generated for each sample. A Poisson distribution method was used to screen the differentially expressed small RNAs between samples, and NOIseq software was used to screen the differentially expressed small RNAs between the two groups of newborns. We detected 1,767 differentially expressed miRNAs between the FF and the PF group, including 1,033 miRNAs that were upregulated and 734 miRNAs that were downregulated in the FF group compared with the PF group.
With a fold change threshold of ≥2 and a deviation probability value of ≥0.6 as our screening criteria, we identified 79 differentially expressed miRNAs in the FF group compared with the PF group, including 53 that were upregulated and 26 that were downregulated (Additional file 4,8). The volcano map (Fig. 2a) and hierarchical clustering heat map (Fig. 2b) were constructed to illustrate the distribution of the differentially expressed miRNAs; some of the significantly differentially expressed miRNAs are marked on the volcano map. Several members of the let-7 miRNA family— let-7a-3p, let-7b-3p, let-7b-5p, let-7f-5p, and let-7g-3p—were more highly expressed in the FF group, whereas miR-675-3p and miR-675-5p were expressed at significantly lower levels (Table 1).
Table 1 Differential expression of the let-7 family and miR-675
GENE ID
|
TPM (PF)
|
TPM(FF)
|
Expression in FF
|
Fold change
|
Probability
|
let-7a-3p
|
415.00
|
3144.71
|
Upregulated
|
7.58
|
0.85
|
let-7b-3p
|
15.82
|
354.43
|
Upregulated
|
22.41
|
0.95
|
let-7b-5p
|
18.81
|
197.24
|
Upregulated
|
10.49
|
0.88
|
let-7d-3p
|
388.96
|
977.82
|
Upregulated
|
2.51
|
0.56
|
let-7f-1-3p
|
1005.86
|
1244.35
|
Upregulated
|
1.24
|
0.27
|
let-7f-5p
|
104.48
|
687.31
|
Upregulated
|
6.58
|
0.83
|
let-7g-3p
|
229.79
|
3285.48
|
Upregulated
|
14.30
|
0.94
|
let-7i-3p
|
1111.79
|
2918.72
|
Upregulated
|
2.63
|
0.57
|
miR-675-3p
|
9.78
|
0.26
|
Downregulated
|
38.12
|
0.88
|
miR-675-5p
|
2.69
|
0.10
|
Downregulated
|
26.03
|
0.82
|
A total of 114–164 million clean 150-bp paired-end reads were generated per sample. Principal Component Analysis (PCA) analysis showed that the samples in FF group distinguish from PF group. (Additional File 5). Between the FF and PF groups, there were 12,793 differentially expressed mRNAs (11,464 known mRNAs and 1,329 novel mRNAs), and 9,121 differentially expressed lncRNAs (8,417 known lncRNAs and 704 novel lncRNAs). The analysis parameters to determine differential expression with the DEGseq software were set to a fold change of ≥ 2 and a P-value of ≤0.001. We found 7,166 differentially expressed mRNAs (6,549 known mRNAs and 617 novel mRNAs) and 3,243 differentially expressed lncRNAs (3,009 known lncRNAs and 234 novel lncRNAs) that met these criteria (Additional File 4,9,10). Volcano maps illustrating the distribution of the differentially expressed mRNAs and lncRNAs are shown in Fig. 3.
To evaluate whether there was a difference in erythroid differentiation between NRBCs from preterm and full-term samples, the top 25 genes expressed by orthochromatic erythroblasts were selected as the reference gene set [10], and Spearman correlation coefficient analysis was performed on 6 samples, the results indicated that the correlation between replicates consistent with correlation of samples from different groups (Additional File 5).
Table 2 Differential expression of globin genes and related genes
Gene ID
|
FPKM
(PF)
|
FPKM
(FF)
|
Expression in FF
|
Fold change (FF/PF)
|
P-value
|
Q-value
|
HBB
|
81.20
|
225.95
|
Upregulated
|
2.78
|
0
|
0
|
HBD
|
3.95E-02
|
0.85
|
Upregulated
|
21.53
|
0
|
0
|
HBE1
|
0.16
|
6.20E-03
|
Downregulated
|
26.45
|
0
|
0
|
HBG2
|
646.38
|
413.42
|
Downregulated
|
1.56
|
0
|
0
|
BCL11A
|
2.05E-05
|
2.31E-04
|
Upregulated
|
11.26
|
7.53E-29
|
1.19E-28
|
ZBTB7A
|
6.04E-04
|
1.66E-03
|
Upregulated
|
2.75
|
0
|
0
|
KLF1
|
6.35E-02
|
9.06E-02
|
Upregulated
|
1.43
|
0
|
0
|
SOX6
|
2.57E-03
|
3.08E-03
|
Upregulated
|
1.19
|
9.83E-16
|
1.03E-15
|
GATA1
|
1.12E-02
|
1.52E-02
|
Upregulated
|
1.36
|
1.70E-33
|
3.01E-33
|
HBS1L
|
4.76E-03
|
6.69E-03
|
Upregulated
|
1.41
|
9.57E-81
|
3.76E-80
|
MYB
|
1.06E-04
|
5.11E-04
|
Upregulated
|
4.81
|
2.01E-35
|
3.74E-35
|
LIN28B
|
1.06E-05
|
1.83E-06
|
Downregulated
|
5.83
|
1.35E-02
|
5.73E-03
|
HMGB2
|
4.29E-03
|
5.05E-02
|
Upregulated
|
11.79
|
0
|
0
|
SIRT6
|
1.72E-04
|
6.03E-04
|
Upregulated
|
3.50
|
2.76E-16
|
2.97E-16
|
RUNX1
|
2.05E-05
|
4.61E-05
|
Upregulated
|
2.25
|
1.01E-03
|
5.09E-04
|
RUNX1T1
|
9.43E-06
|
4.03E-05
|
Upregulated
|
4.28
|
8.39E-07
|
5.62E-07
|
JAZF1
|
2.02E-03
|
6.20E-03
|
Upregulated
|
3.07
|
0
|
0
|
CA1
|
7.62E-02
|
7.45E-01
|
Upregulated
|
9.78
|
0
|
0
|
Among the differentially expressed transcripts, HBB (encoding β-globin) expression was 2.78-fold higher, HBG2 (encoding Gγ- globin) expression was 1.56-fold lower, HBG1 (encoding Aγ- globin) expression was 1.47-fold lower, and HBD (encoding δ-globin) expression was 21.52-fold higher in the FF samples than in the PF samples (Table 2). The genes for the transcription factors BCL11A, ZBTB7A, KLF1, and SOX6, which are known to be involved in γ-globin repression, were 11.26-, 2.75-, 1.43-, and 1.19-fold higher, respectively, in the FF compared with the PF samples. Expression of LIN28B, which is associated with high HbF levels [11], was 5.83-fold lower in the FF samples. In addition, we found that H19 was expressed at much lower levels (85.28-fold) in the FF samples, and RN7SK, SNHG3 were expression higher (Table 3).
Table 3 Differential expression of lncRNAs
Gene ID
|
FPKM
(PF)
|
FPKM
(FF)
|
Expression in FF
|
Fold change (FF/PF)
|
P-value
|
Q-value
|
DANCR
|
3.37E-04
|
1.30E-03
|
Upregulated
|
3.87
|
3.33E-24
|
4.64E-24
|
RN7SK
|
9.84
|
36.35
|
Upregulated
|
3.69
|
0
|
0
|
SNHG3
|
5.62E-04
|
2.65E-03
|
Upregulated
|
4.71
|
0
|
0
|
H19
|
2.06E-01
|
2.42E-03
|
Downregulated
|
85.28
|
0
|
0
|
SBF2-AS1
|
5.13E-03
|
4.32E-04
|
Downregulated
|
11.87
|
0
|
0
|
Comparative analysis with published datasets
Because the previous study on the transcriptomic changes in globin switching was based on erythroblasts cultured in vitro, in this study, we analyzed those derived in vivo. We compared our transcriptome results with previously published datasets i.e., one miRNA dataset was derived from Lessard et al. and three mRNA datasets were from Lessard et al., Huang et al., Xu et al. [12-14] These three datasets are comparisons of the transcriptomic differences of erythroblasts from fetal livers and adult peripheral blood or bone marrow. Although the samples had different sources, we found some overlap between these datasets (Table 4). For miRNA, we found that the let-7 family were upregulated in both miRNA datasets, and miR-675 were downregulated. For miRNA, the adult globin HBB, HBD, transcription factor BCL11A were upregulated in 4 mRNA datasets and embryonic globin HBE1, HBZ were downregulated in 4 mRNA datasets. In addition, LIN28B, IFG2BP1, and IGF2BP3 were downregulated in 3 out of 4 mRNA datasets. The results were drawn as Venn diagrams (Fig.4).
In addition to the above genes, we observed that CA1, JAZF1, GCNT2, FOXO1, FAS were all upregulated in 4 mRNA datasets, suggesting that these genes may be involved in globin switching, CA1 and JAZF1 were then detected using qPCR.
Table 4 Overlap of differentially expressed genes in all datasets
Categories
|
Gene ID
|
Upregulated miRNA
|
miR-3200-5p, let-7a-3p, let-7b-3p, let-7g-3p, let-7b-5p, let-7f-5p, miR-1246, miR-26a-1-3p, miR-27a-3p, miR-182-5p, miR-16-2-3p
|
Downregulated miRNA
|
miR-675-3p, miR-675-5p, miR-379-5p, miR-493-5p, miR-409-5p, miR-431-5p, miR-412-5p, miR-1185-5p, miR-377-3p
|
Upregulated mRNA
|
HBB, CA1, HBD, CD36, JAZF1, TACC1, GCNT2, SLFN13, TSPO, RAB6B, LRP1, GCH1, RNASE2, FAS, MMD, MFSD6, ACSL1, PIK3AP1, EPSTI1, SPTBN2, BCL11A, ATP7B, SNTB1, VEGFA, IMPA2, STX11, PLSCR4, PEAR1, AKR1C3, KCNN4, EPDR1, MAP3K5, SLC26A2, MSRB3, SLC25A15, HOOK1, OSM, FGFRL1, LCA5, STXBP1, ACSL5, TPM1, CAPS2, FOXO1, ABCA5, ICA1, HOMER2, OCIAD2, THRB
|
Downregulated mRNA
|
HBE1, HBZ, GPNMB, C11orf95, PTPN13, STAR, CPA3
|
Hub gene of differentially expressed transcription factors
To identify the differentially expressed genes with transcriptional regulatory functions, we selected all 595 transcription factors from the differentially expressed mRNAs, and a PPI network was constructed to understand the interaction among differentially expressed transcription factors in globin switching. In addition, we calculated the node degree of the PPI network, identified the top 30 hub genes, and constructed a network which was visualized by Cytoscape (Fig. 5). In the hub gene network, ZBTB7A, MTA2, and GATAD2A have been reported to be related to globin transformation [15], whereas other transcription factors have not yet been reported. The network also contains proto-oncogenes, such as MYC, JUN, FOS, tumor-suppressor gene TP53, and the epigenetic regulator TET2. Whether these genes are involved in globin regulation is not known at present.
Target mRNA of miRNA and lncRNA
We predicted the target genes of 79 differentially expressed miRNAs, with 4759 transcripts in total. Because miRNA and mRNA are negatively regulated, we found that 23 downregulated mRNAs were targeted by upregulated miRNAs, and 270 upregulated mRNAs were targeted by downregulated miRNAs (Additional file 11,12).
The function of lncRNA is mainly realized by cis- or trans-acting on target genes. Cis-regulation considering that lncRNA is related to genes in close genomic proximity, whereas trans-regulation is predicted by calculating binding energy. The predicted results showed that differentially expressed lncRNAs targeted 4037 transcripts, indicating the presence of 5932 lncRNA–mRNA pairs, including 2527 cis-acting pairs and 3405 trans-acting pairs.
GO and KEGG enrichment analysis
GO enrichment analysis of the 6,549 known differentially expressed mRNAs, with a corrected P-value of ≤0.05 as the threshold, yielded 967 terms for enriched biological processes (Fig. 6a), including DNA replication, DNA conformation changes, and RNA splicing. Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis revealed that the differentially expressed mRNAs were enriched in 57 pathways (Q-value ≤ 0.05) related to the spliceosome, RNA transport, cell cycle, systemic lupus erythematosus (SLE), the Fanconi anemia pathway, DNA replication, and other critical regulatory processes (Fig. 6b).
We also performed GO enrichment analysis of the target genes of the differentially expressed lncRNAs. For biological processes, 457 GO terms were enriched, such as DNA conformation changes, RNA splicing, DNA replication, and RNA transport (Fig. 6c). KEGG pathway enrichment analysis of the differentially expressed lncRNA target genes revealed enrichment of 11 pathways (Q-value≤ 0.05), including the spliceosome, ubiquitin mediated proteolysis, SLE, the mRNA surveillance pathway, and chronic myeloid leukemia (Fig. 6d).
GO and KEGG enrichment analysis was also performed for the target genes of the miRNAs that were differentially expressed between the two groups. We identified 49 significant enrichment GO terms at the biological process level, including histone modification, covalent chromatin modification, negative regulation of protein phosphorylation, positive regulation of neuron projection development, cell growth, transforming growth factor beta receptor signaling pathway, protein methylation, stem cell differentiation, and Wnt signaling pathway. KEGG analysis revealed enrichment for involvement in two pathways (Q-value ≤ 0.05): morphine addiction and nicotine addiction (Additional File 6).
Quantitative polymerase chain reaction analysis
We collected 75 umbilical cord blood samples from which we isolated NRBCs to verify the differential expression of 14 mRNAs and lncRNAs by quantitative polymerase chain reaction (qPCR). Compared with the preterm group, the full-term group had higher mRNA expression levels of HBB, CA1, HMGB2, JAZF1, BCL11A, ZBTB7A (3.52-, 5.07-, 1.59-, 2.45-, 3.24-, and 1.45-fold, respectively). The expression of Fanconi anemia related genes such as FANCA, FANCC, FANCD2 was 1.36-, 1.35-, and 1.35-fold higher in the full-term group. The expression of H19 and LIN28B was 69.98- and 2.38-fold lower in the full-term group. The results were consistent with those of the RNA-Seq. We observed no significant differences in the expression of DANCR and SIRT6, but FANCF was lower in the full-term group. In addition, we found that BCL11A expression significantly differed based on biological sex. Cord blood from preterm male neonates had 4.25-fold higher expression than cord blood from preterm female neonates, and cord blood from full-term male neonates had 2.64-fold higher expression than cord blood from full-term female neonates, the other 13 genes showed no significant gender differences. (Fig. 7)
H19‑related lncRNA/miRNA/mRNA network
RNA-seq and qPCR results showed that H19 was a significantly differentially expressed lncRNA in premature and full-term NRBCs. Thus, we constructed the H19-related network, comprising 1 lncRNA,6 miRNAs, and 20 protein-coding genes, among which there are three globin genes (Fig.8). As a lncRNA, H19 possesses sponge function to absorb miRNA, such as let-7, and interact with mRNAs or proteins, such as ZBTB7A and IGF2BP1. We speculate that H19 may be involved in the regulation of globin via various pathways/mechanism.