The transcriptomes of RBCs and RBC-derived EVs were analyzed using next generation sequencing. For transcriptomic analysis, the mean amounts of uniquely aligned reads produced for the four RBC samples and four EV samples were 39.6x106 and 21.7x106, respectively (Supplementary Table S1). The RNAseq data quality was good as judged by body read coverage and principal component analyses (Supplementary Fig. S1A and S1B). In general, according to the correlation plots, diversity of transcriptomes among the EV group was higher than among the RBC group (Supplementary Fig. S2).
An abundant repertoire of expressed genes was found in all RBC samples (Supplementary Table S2 and S4). ‘RBC transcriptome’ used for subsequent annotations, as well as functional category enrichment and pathway analyses, consists of 2070 (40%) expressed genes that were shared among all 4 samples (Fig. 2A). Corresponding EV samples contained fewer expressed genes (Supplementary Tables S3 and S5) and were more diverse in the RNA content, as only 686 (15%) of the expressed genes were shared in all samples (Fig. 2B). Most of the transcripts (555 out of 686) in EVs, were also found in RBCs. In addition, most highly expressed genes were shared between RBC and EVs, such as well-known erythrocyte biology-related gene hemoglobin subunit beta (HBB, vast majority of transcripts), ferritin (FTL), hemogen (HEMGN), and 5'-aminolevulinate synthase 2 (ALAS2). The remaining 131 transcripts in EVs (highlighted in yellow in Supplementary Table S3) that were not found in RBCs comprise a probable contamination of RNA from EVs of other cells, mostly platelets, possibly also leukocytes and endothelial cells due to a small residue of plasma in the blood unit production process (maximum 20 ml/RBC unit). In western blot analysis, however, we did not detect platelet marker CD61 (Supplementary Fig. S3). Instead of contamination markers, RBC marker CD235a and Hb were clearly visible in Western blot indicating that most of the EVs are of RBC origin. The lack of leukocyte or platelet transcripts in the RBC data confirmed that the RBC transcriptome is of RBC origin.
Functional category and canonical pathway analyses reveal processes related to RBC maturation and function
Among the most enriched canonical pathways in RBCs were eukaryotic initiation factor 2 (eIF2) signaling and its regulation, mitochondrial dysfunction, protein ubiquitination pathway, oxidative phosphorylation, and mechanistic target of rapamycin (mTOR) signaling (Fig. 3A and Table 1, Supplementary Table S6). The same pathways were also enriched in EVs, of which eIF2 signaling and mitochondria -related pathways had even lower p-values than RBCs (Fig. 3B, Table 1, Supplementary Table S7). Different enrichment in pathways was also evident between RBCs and EVs, such as heme biosynthesis II, NRF2-mediated oxidative stress response- and hypoxia signaling -pathways in RBCs with very significant p-values, while two latter were not statistically significantly enriched in EVs.
The top shared molecular and cellular functions in RBC and EV transcriptomes were related to gene expression, protein synthesis, RNA modification, and cell death and survival (Table 1, Supplementary Table S8 and S9). Under ´physiological system development and function´ –category, ´hematological system development´ and ´hematopoiesis´ were among the top five most enriched categories as expected, with several erythroid cell -related subcategories being most significant (Table 1, Supplementary Table S8 and S9). Other obvious RBC-functions included synthesis and homeostasis of different RBC–related molecules, such as porphyrin, heme and iron (category ´small molecule biochemistry´, p-values 7.61E-08 - 7.3E-05 in RBCs) and free radical scavenging, which was clearly more significant category in EVs compared to RBCs (Table 1, Supplementary Table S8 and S9). Furthermore, the most significant subcategory under ´cellular function and maintenance´ was autophagy (p-value 1.26E-11 in RBCs), strongly related to reticulocyte maturation.
Suspected contamination present in the EV transcriptome is reflected by the presence of ´inflammatory response´ and ´immune cell trafficking´ -categories and different leukocyte–related subcategories in, e.g., ´hematological system development´ (Supplementary Table S9). Due to contamination the differential expression analysis between RBC and EV transcriptomes was not performed. However, it was still evident that more mitochondrial transcripts were present in the EV transcriptome, as judged by expression intensity and diversity (Supplementary Table S3). That is most likely why mitochondria-related canonical pathways had considerably more significant p-values in EV transcriptome compared to that of RBCs (Table 1).
Sequencing on the single-cell level reveals separate RBC subpopulations defined by differential expression of hemoglobin and the long non-coding RNA, MALAT1
To study the cellular heterogeneity among RBCs, we analyzed fresh and blood unit RBCs using the single-cell transcriptomics method from 10x Genomics (23) aiming at a capture of 2000 cells per sample. A total of 396 RBCs in blood unit and 173 RBCs in fresh blood samples were initially judged as cells based on the total number of detected transcripts. Total number of detected genes per sample was 1142 for blood units and 992 for fresh RBCs. As expected, the number of captured transcripts, detected as unique molecular identifiers (UMIs), varied greatly between individual cells (4-4201 for blood unit and 12-2258 for fresh RBCs, mRNA capture efficiency of our single-cell instrument being 10-15%), and the median number of genes detected per cell was low (5 for blood unit and 15 for fresh RBCs).
A total of 554 RBCs from the blood unit and fresh blood was used for final analysis. The genes with 5 or more total transcripts across the cells (158 in total, Supplementary Table S10) included 122 (77%) shared genes with bulk RBC transcriptome (Supplementary Fig. S5). Among these were genes known to be associated with RBC biology such as hemoglobin subunit beta (HBB), hemoglobin subunit alpha 1 and 2 (HBA1/HBA2), ferritin light chain (FTL) and glycophorin C (GYPC). This finding demonstrated the consistency between the RNA sequencing analysis methods used.
The RBCs were further analyzed to identify subpopulations. The cells from the blood unit and fresh blood samples clustered in an overlapping manner, supporting our decision to use combined analysis with merged data (Fig. 4A). Clustering of the RBCs based on their transcriptome profiles resulted in three distinct cell clusters (Fig. 4B). Cells in cluster 1 had the highest amount of detected total transcripts (AvgUMI=110), while in clusters 0 and 2 the total transcript counts were clearly lower (AvgUMI=60 and 65, respectively) (Fig. 4C and 4D). This could mean that cells in different clusters may represent RBCs, more specifically reticulocytes, at various maturation stages. The variation measured by the number of genes detected was highest in cluster 1 and 0, while the percentage of mitochondrial genes was highest in cluster 0 (Fig. 4D). Cells in cluster 1 were characterized by a high expression of RBC-specific genes such as HBB, HBA1, HBA2, FTL and GYPC (p<0.0001 for all) (Fig. 5A-C, Supplementary Table S11), suggesting that these cells most likely represent the population of young reticulocytes, while the cells in clusters 0 and 2 could represent more mature reticulocytes or early erythrocytes.
The cells in cluster 0 were significantly enriched with long non-coding RNA MALAT1 transcripts (p<0.0001) (Fig. 5A-C, Supplementary Table S11). Cluster 0 was also characterized by the low or absent expression of HBB, HBA1 and HBA2, and it was enriched in the expression of mitochondrial and ribosomal genes such as mitochondrially encoded cytochrome c oxidase I and II (MT-CO1 and MT-CO2), ribosomal protein S19 (RPS19) and ribosomal protein S18 (RPS18) (p<0.0001 for all, Fig. 5A, Supplementary Table S11). Additionally, few genes typically expressed in platelets, including beta-2-microglobulin (B2M), thymosin beta 4, X-linked (TMSB4X) and thymosin beta 10 (TMSB10), were detected in cluster 0 cells.
Cells in cluster 2 were characterized by very low total UMI count and expression of HBB, HBA2, and FTL genes (p<0.0001 for all, Fig. 5A, Supplementary Table S11). The single cell analysis also provided further evidence for the purity of the RBC population. In case of contamination with leukocytes, they would have formed a separate cluster characterized by relevant markers.
All reads for MALAT1 obtained from single-cell sequencing map to short transcript forms
According to current annotations in Ensembl Genome Browser, 16 shorter transcripts of MALAT1 are predicted in addition to the well-annotated full length (8708 bp) transcript MALAT1-202: http://www.ensembl.org/Homo_sapiens/Gene/Splice?db=core;g=ENSG00000251562;r=11:65497762-65506516
We used IGV with Ensembl gene annotations to visualize short reads mapping to the MALAT1 gene. Based on the alignment, the sequencing reads did not map to the 3´ end of the full length transcript MALAT1-202. Instead, they mapped to the three short transcripts MALAT1-203, MALAT1-211 and MALAT1-217 (Fig. 6). This finding agreed with the bulk RNAseq data. According to our knowledge this is the first report of the predicted splice site shared by the transcripts MALAT1-203, MALAT1-211 and MALAT1-217 as an exon 1 start site.
Characterization of CD71-enriched reticulocytes
With imaging flow cytometry we wanted to study the dynamics of RNA content in enriched reticulocytes. Double positive events for CD71 and RNA-binding dye thiazole orange (TO) were gated as CD71high, CD71medium and CD71low-negative populations based on intensity values on each channel and images. Of those cells, 35% expressed CD71 at high level, 33 % at medium level and 17% had low or negative expression (Fig. 7A). Although high CD71 expression generally correlated with high TO intensity, especially in CD71high population, there were also cells with high CD71 expression and low TO intensity, and vice versa (Fig. 7B and C) indicating high diversity in the amount of RNA in reticulocyte subpopulations. CD71-enriched cells had varying morphologies and concave shaped cells having high CD71 expression and TO intensity were also observed (Fig. 7C). Also RBC marker CD235a expression was detected with varying levels of intensity (Fig. 7C), which, however, did not correlate with CD71 expression or TO intensity (data not shown).
The bulk expression of MALAT1 lncRNA in enriched reticulocytes was confirmed with quantitative real time PCR (qPCR). As expected, the expression of HBB was high in CD71-enriched cells with the Cq values ranging from 15.7 to 20.2 (mean 17.5, SD 1.9), while MALAT1 had lower but clearly detectable expression with Cq values (mean 32.9, SD 2.2) (Fig. 7D).
We did not detect any contamination from other blood cell types, i.e., the expression of CD41a indicating platelets and CD45 for leukocytes was not detected (data not shown), which confirms that MALAT1 expression originated from reticulocytes.