Numerous studies conducted by our group as well as others in the last decade showed the potential of utilizing peripheral blood samples from malaria-infected patients to enable studies of in vivo gene expression profiles of malaria parasites including the deadliest species, P. falciparum[10,11,13,15]. For that, it was essential to develop methods for optimal extraction, amplification, and detection of RNA in these samples. Here, we compiled all methodological aspects for studies of in vivo transcriptomes of P. falciparum and support these by extensive laboratory-based optimizations in order to provide a comprehensive methodological “manual” for conducting transcriptomic studies of malaria parasites from clinical samples. The results supporting this pipeline come from our optimizations using in vitro laboratory cultures of P. falciparum (3D7) and a large set of clinical samples (n=710) (data not shown) from patients with uncomplicated malaria infections collected across Greater Mekong Subregion during the Tracking Resistance to Artemisinin Collaboration study (i.e. TRAC2) conducted between 2016 to 2018[17].
RNA Preservation, Extraction and Quantification of Malaria Blood Samples
The main prerequisite for studying genome-wide transcriptional profiles (transcriptomes) is the ability to obtain total RNA extracts that are representative of the exact abundance of all messenger as well as other RNA species in the cell. These RNA isolations must be of sufficient quality to support subsequent transcriptome-wide amplifications (TWA) and readout platforms thus allowing genuine measurements of RNA abundance in a particular cellular/biological system. The vast majority of P. falciparum RNA studies conducted in the past two decades utilized a guanidinium thiocyanate-phenol-based reagent, TRIzol™, for cell lysis and subsequent preservation of all nucleic acid molecules (DNA/RNA). Hence, we investigated several RNA extraction methods that provide the most optimal procedure for RNA isolation from P. falciparum TRIzol™-based cell lysates. These included the Direct-zol-96 RNA kit (Zymo) further referred to as “Direct-zol”; the RNeasy 96 Universal Tissue Kit (Qiagen) subsequently referred to as “RNeasy” and the previously described isopropanol precipitation method further referred to as “STANDARD”[15]. Starting with 500 µl of iRBCs with 2 % ring-stage (12-14 hours post-invasion (hpi)) P. falciparum infection, we lysed cells in 10 volumes (i.e. 5 ml) of TRIzol™ and supplemented it with 5 volumes (i.e. 1 ml) of chloroform. Aqueous (top) phase was mixed 1:1 (v/v) with 100% ethanol and mixtures were applied on silica columns as described in manufacturer’s manuals, or precipitated in ice-cold isopropanol as described previously for STANDARD extraction[15]. Direct-zol kit generated the highest RNA yield of 3.8 µg (SD 0.13, n=3) followed by the STANDARD isopropanol precipitation and RNeasy generating, 2.74 µg (SD 0.11, n=3) and 1.97 µg (SD 0.33, n=3), respectively, as measured by fluorescence-based assay (Fig 1a). Spectrophotometric estimation of residual protein impurities in the RNA eluents measured by the signal ratio between 260 nm and 280 nm wavelength (260/280) was >1.8 which indicated sufficiently high RNA purity. However, the 260/230 signal ratio, indicating potential organic solvent contamination (residuals from TRIzolTM), indicated the optimal performance of the Direct-zol method as the ratio values consistently reached ~2. In this regard, the Direct-zol method outperformed the RNeasy and STANDARD extraction techniques (Fig 1a). We also compared Direct-zol-based extraction in the presence or absence of chloroform by extracting total RNA from 500 µl of iRBCs with 2% ring-stage parasites (12-14 hpi, n=8). Indeed, the inclusion of chloroform in the Direct-zol method improves the RNA yield by around 20% maintaining the same RNA purity (Fig 1b). Finally, the Direct-zol was highly efficient in extracting P. falciparum RNA from samples with low blood volumes, (useful in high-throughput drug assays, see Discussion) yielding 130 ng of total RNA (SD 0.6, n=8) from as little as 8 µl of iRBCs with rings-stage parasites (12-14 hpi, 4% parasitemia, without chloroform supplementation). Taken together, these results indicate that the tested Direct-zol is a method of choice for high yield and superior quality RNA extractions from TRIzolTM lysates of P. falciparum-infected blood samples (and possibly other Plasmodium spp.).
An important factor for RNA extraction from blood samples is the overall yield of P. falciparum RNA from varying parasite loads in individual malaria infections. It is necessary to estimate what parasitemia levels can generate sufficient parasite RNA quantities (purity and intactness) for downstream transcriptomics analysis. To test this, we determined RNA yields from 500 µl of iRBCs infected with P. falciparum ring-stage parasites (12-14 hpi), serially diluted from 5% to 0.001% parasitemia (Fig 1c). Here we used Direct-zol as an extraction method and quantified the total RNA yield using a fluorescence-based assay. The RNA yield from 5 % ring stage samples was 7.9 µg (the number approaching maximum binding capacity of the column) with a subsequent drop to 0.84 µg from 0.1% iRBCs. From 0.05% to 0.001% the RNA yield readings leveled between 0.75-0.54 µg, which also reflected the total RNA yield from uninfected red blood cells (RBCs). These results suggest that Direct-zol can purify P. falciparum total RNA along with the host RNA residing in the iRBCs and uninfected RBCs present within the samples.
Since the tested Direct-zol extraction method uses TRIzol™ as a tissue homogenizing solution, we assessed its capacity for RNA preservation in lysates of iRBCs. This would eliminate the need for other (preserving) reagents, thus reducing steps in the extraction protocol. We compared two reagents: TRIzol™ (Invitrogen) and dedicated commercially available nucleic acid protective solution DNA/RNA Shield™ (ZYMO) further referred to as DNA/RNA Shield™. It is important to note that DNA/RNA Shield™ is fully compatible with TRIzol™ reagent and Direct-zol method, as stated in manufacturer’s description. In the first step, we wished to investigate the effect of temperature and duration for which iRBCs lysates can be stored in these reagents without any significant RNA degradation. For these experiments, we homogenized 50 µl of RBCs infected with 8% young trophozoite-stage parasites (20 hpi) either with 500 µl of DNA/RNA Shield™ or with 500 µl of TRIzol™. Samples were incubated for 1, 3, 7, and 15 days at 4 °C, 21 °C, and 37 °C. The RNA was subsequently purified with the Direct-zol method described above without chloroform supplementation. For an assessment of RNA degradation, we have applied the RNA Integrity Number (RIN) assay from Agilent Bioanalyzer[18]. RIN algorithm estimates RNA quality by measurements of 28S:18S rRNA ratio, as well as several other factors such as size of 5S rRNA, area under the electropherogram curve, the ratio of rRNA peak heights to marker peak height etc.[19] Overall, we found that both TRIzol™ and DNA/RNA Shield™ performed similarly well in preserving the RNA at 4 °C for up to 15 days as shown by RIN numbers approaching the maximum value of 10 (Fig 1d). RNA/DNA Shield™ preserved RNA better at 21 °C for up to 3 days but neither of the reagents could support RNA preservation past that. None of the tested preserving agents could protect P. falciparum RNA at 37 °C over one day (Fig 1d and Supplementary Fig S1). Taken together, our results suggest that both TRIzol™ and RNA/DNA Shield™ could protect P. falciparum RNA for up to 15 days at 4 oC (e. g. refrigerator) and no more than 3 days at 21 oC, before the lysates must be processed or transferred to -80 oC to store indefinitely. Given the equal performance of the two preserving reagents tested here, we selected TRIzol™ as lysis and preserving agent for subsequent experiments due to its cost-effectiveness, practicality, and application universality.
To test whether the derived RNA extraction method performs equally across a wide range of parasitemia, we prepared seven dilutions of the identical 5% iRBCs sample (rings, 12-14 hpi) to obtain a range of parasitemia from 5% to 0.005% and uninfected RBCs. Samples were extracted using Direct-zol method and equal amounts of total RNA (100 ng) were analyzed on Agilent Bioanalyzer (for details see Protocol in supplementary section). Electropherograms and the RIN values for each sample were obtained. As shown in Fig 2a, electropherograms of RNA isolated from high parasitemia samples (i.e. 5% to 1%) contained unique distinct peaks that presumably correspond to the P. falciparum 28S, 18S and 5S rRNA transcripts. At lower parasitemia (i.e. 0.5% to 0.05%), the fluorescence intensity (manifested as peak height on electropherograms) of the parasite-specific rRNA peaks progressively diminish. This is particularly evident for the P. falciparum and human 18S rRNA peaks that migrate approximately 1 second apart of each other but are fully distinguishable in the electropherograms (Fig 2a, right panels). At the lowest parasitemia tested (e.g. 0.01-0.005%), P. falciparum 18S rRNA peak approaches and subsequently “sinks” below the detection limit while the human 18S rRNA transcript dominates the electropherogram profile. As a result, RIN values for the RNA extracts from high and low parasitemia are higher, than those observed for medium parasitemia (i.e. 0.1-0.05%), likely due to the apparent dual peaks (Fig 2b). Taken together, Direct-zol RNA isolation method is applicable to a range of parasitemia for purifying high-quality RNA originating from both, P. falciparum parasites and the host erythrocytes simultaneously.
The presence of two distinctive peaks of the parasite and human 18S rRNA on the Agilent Bioanalyzer electropherogram gives the possibility to use these to estimate the iRBCs concentrations within the RBCs sample. Indeed, the ratio of 18S Plasmodium peak height to 18S human peak height (i.e. 18S-Pf/18S-Hs) correlates linearly with parasitemia for in vitro cultured P. falciparum parasites in which the erythrocytes have been purified from other blood components (Pearson Correlation Coefficient (PCC)=0.99, n=5) (Fig 2c, red circles). In clinical samples derived from the TRAC2 study, weaker correlations were observed between the 18S-Pf/18S-Hs ratio and parasitemia (PCC=0.35, Spearman Correlation Coefficient (SCC)=0.48, n=171) (Fig 2c, black triangles). This could be due to parasitemia not being the unique contributor to 18S-Pf/18S-Hs in the field samples. For example, the presence of host RNA from remaining white blood cells (WBCs) that were not completely removed from the sample could significantly alter the 18S-Pf/18S-Hs peak height ratio. Therefore, predicting levels of iRBCs in peripheral blood samples based on the 18S-Pf/18S-Hs ratio gives only limited confidence presumably due to broad variation in WBC depletion that is objectively achievable in varying conditions of field laboratories.
Whole Transcriptome Amplification from P. falciparum Blood-Stages Total RNA
Once total RNA has been extracted from iRBCs, gene expression profiling requires synthesis of complementary DNA (cDNA) through in vitro reverse transcription. Here, we utilized a modified version of the Switching Mechanism At the 5’ end of the RNA Transcript (SMART-seq2) method which allows for amplification of full-length cDNA molecules from single-cell or low RNA inputs of as little as few picograms[20]. First, we applied modified SMART-seq2 protocol to a range of RNA inputs starting from as low as 10 ng up to 250 ng of total RNA isolated from RBCs infected with 1 % or 0.1 % identical P. falciparum 3D7 cultures (12-14 hpi) (for details see Protocol in Supplementary). To assess the usefulness of the amplified product to study parasite gene expression we performed microarray hybridization with the amplified cDNA and compared these to a high-resolution P. falciparum 3D7 strain reference transcriptome generated from 24 time points collected across the IDC (Microarray GEO Acc. No GSE149865; RNA-Seq GEO Acc. No GSE150484). Indeed, transcriptomes generated with various RNA inputs from P. falciparum iRBCs with 1 % parasitemia maintained consistently high correlation with the 3D7 IDC reference transcriptome (PCC~ 0.8) (Fig 3a) and to each other (Fig 3b) for as low as 10ng of total RNA input. This result indicates that when applying Direct-zol extraction combined with SMART-seq2 amplification using P. falciparum iRBCs samples with parasitemia >=1%, as little as 10 ng of total RNA is sufficient to generate reliable transcriptome profiles. On the other hand, when analyzing samples with a parasitemia of 0.1%, a drop in correlations to 3D7 IDC reference transcriptome was observed even for total RNA inputs of 250ng (PCC~ 0.7). Increasing initial RNA input proved to be beneficial for low, but not high, parasitemia samples (Fig 3a). RNA input of 50 ng in 0.1% ring stage parasitemia proved to be a minimum for generating genuine transcriptome profiles (PCC >0.68) as inputs below 50 ng reduced correlation to reference 3D7 transcriptome (Fig 3a) and to higher RNA input samples (Fig 3b) as evidenced in a drop of PCC values. These results indicate that the main factor affecting transcriptome amplification is not the initial RNA input but the percentage of uninfected RBCs and thus human RNA material in the starting RNA sample.
To explore this in greater detail we generated transcriptomes of serially diluted ring-stage parasites with parasitemia ranging from 1% to 0.001% (12-14 hpi) using a consistent RNA input of 250 ng. After correlating them to the reference 3D7 IDC transcriptome we observed a severe reduction in PCC-based correlations and erroneous parasite age estimations for samples below 0.05% parasitemia (Fig 3c). This is possibly due to the overwhelming amount of human cDNA present in the total amplicon mix derived from low parasitemia samples that interferes with both amplification and microarray hybridization. Indeed, in the Agilent Bioanalyzer electropherogram of cDNA amplified from the 0.1% or lower parasitemia samples (the microarray target material) we particularly noticed two sharp cDNA peaks (around 700 bp) that we determined by capillary sequencing to represent amplified cDNA from human haemoglobin A (HBA) and haemoglobin B (HBB) transcripts (SupplementaryFig S2a). The HBA and HBB peaks are also observed in the cDNA originating from 1% parasitemia samples; however, these are accompanied by a broad cDNA peak ranging from 800 bp to 5 kb that represents the amplified P. falciparum cDNA products (Supplementary Fig S2b). In highly parasitized samples (e.g. 10%), the entire electropherogram is composed solely of P. falciparum cDNA in the broad peak between 800 bp to 5 kb, and no dominant HBA and HBB peaks are seen (Supplementary Fig S2c). Given that 0.1% parasitemia iRBCs can yield RNA extracts that genuinely reproduce the parasite transcriptome, we suggest that the presence of the human material (e.g. HBA and HBB transcript) does not interfere with microarray hybridization if enough parasite cDNA is available.
At the sequencing step, we have analyzed exonic x-coverage and parasite unique to human unique reads proportion across the two parasite dilutions of iRBCs. We have estimated that for ring-stage parasites (12-14 hpi) and high parasitemia (1 %), 24 samples can be safely multiplexed per lane of Illumina HiSeq4000 in order to obtain 5x106 unique P. falciparum reads (approx. 1.5 GB) per sample (Supplementary Fig S3). This generated high correlations to our reference transcriptome (PCC~ 0.8) during comparative transcriptomics analyses (Supplementary Fig S4). When multiplexing 24 samples with 1% parasitemia, we obtained close to 200x exonic coverage on average (Supplementary Fig S5). Decreasing parasitemia to 0.1% resulted in non-linear drop of parasite-specific reads and average exonic x-coverage (~75%). That, in turn, has caused lower correlation values to the reference transcriptome (PCC~0.6) (Supplementary Fig S4). Taking all these data into account and based on our experience with RNA-seq data derived from field samples (unpublished data), we have created our internal standards when performing comparative transcriptomics where we advise a safe margin of 3x106 parasite unique paired-end reads and 100x average x-coverage in order to generate reliable conclusions.
We did not see much sequencing reads mapping to intergenic regions suggesting negligible DNA presence (Fig S5). However for assays/studies very sensitive to gDNA contamination, or if leucocyte depletion was not initially performed, we have included a fast and simple additional DNase treatment step to the pipeline (See Supplementary Protocol).
High-Throughput Extraction of Total RNA Using Magnetic Beads
The next step in this endeavor was to derive a high throughput method that could facilitate RNA isolation on a large scale using robotic platforms. This was particularly applicable to our population transcriptomic effort during the TRAC2 project when we isolated RNA from more than 1000 peripheral blood samples. Hence, we explored the utility of RNA extraction using TRIzolTM-compatible magnetic bead-based purification using Direct-zol-96 MagBead RNA (ZYMO) further referred as “Direct-zol MagBeads”. We extracted RNA from 50 µl of iRBCs (4% parasitemia, 6 hpi, in 300 µl of TRIzolTM) using Direct-zol MagBeads method (n=3), or, Direct-zol method (n=3) described above, with or without chloroform supplementation. Compared to the column-based purification method, the magnetic bead purification yielded 14-19 % lower amounts of RNA from the same starting material (Supplementary Fig S6). Adding chloroform to the homogenate slightly improved yields in both methods. RNA purity and RINs analysis confirmed a comparable quality of the resulting RNA isolations with 260/280 and 260/230 ratios in acceptable range (Supplementary Fig S6) and RINDirect-zol=7.7 (SD 0.6) and RINDirect-zol MagBeads=7.5 (SD 0.8). Finally, the bead-isolated RNA supports transcriptomic analysis (microarray) that is comparable to the column based approach with PCC~0.77 to the 3D7 IDC reference dataset (Supplementary Fig S7). These results suggest that the application of the magnetic bead purification approach is a viable option for isolation of P. falciparum RNA.
Assessing the Efficiency of Described RNA Processing Pipeline for Studying Malaria Clinical Isolates
Development of robust, high-throughput transcriptome analyses employing microarray and/or RNA-seq opens up new opportunities to study molecular mechanisms underlying multiple parasite phenotypes such as drug resistance, fitness associated with diverse drug selection pressures, virulence, transmissibility etc. In this section, we demonstrate that our RNA processing and amplification pipeline works equally well for malaria clinical samples. First, we extracted RNA from ten selected clinical isolates obtained from the TRAC2 study (iRBCs homogenized in TRIzol™) and found a wide range of total RNA yields that are not completely proportional to the parasitemia of the samples (Fig 4a). This is due to a variable presence of human RNA (e.g. sample no. 7 and 8) or higher median age of the parasites in the peripheral circulation of the patient during blood collection (e.g. sample no. 2). Both of these factors affect the overall RNA yield as measured by fluorescence-based assay. Regardless, for all ten samples, the RIN values fell between 5.9 and 8.3 (median 7.6) (SupplementaryFig S8 and Fig S9). Sample number 1 to 5 show P. falciparum unique reads ranging between 30 % to 97 % versus unique human reads (Supplementary Fig S10). The lower parasitemia samples 6 to 10, showed less than 15% reads mapping to P. falciparum genome uniquely (Supplementary Fig S10 and Fig S11). Once again, the proportion of parasite reads is not tightly correlated to parasitemia of the samples possibly due to the presence of excessive human RNA or older parasites (Supplementary Fig S12). For example, sample number 3 with parasitemia over 60000 parasites/µl has low unique parasites reads proportion and very visible haemoglobin (Hb) transcripts bands seen on the gel indicating the significant presence of human material (SupplementaryFig S9, bottom panel, orange box). Additionally, no clear parasite-specific 18S rRNA peak is seen on the gel (Supplementary Fig S9, top panel, red arrows). All that would support the possibility of excessive human nucleic acids present in the sample.
For seven of these samples (No. 1-7) the resulting transcriptome obtained from RNA-seq platform correlated well with the respective time points in the 3D7 IDC transcriptome reference dataset with PCC > 0.7 to early ring stage (except sample 2 that appears to be older) (Fig 4b). The three remaining samples (No. 8-10) showed low correlation to the reference dataset which sets a practical threshold of parasitemia for the RNA-seq based transcriptomics in field studies (for the particular sequencing conditions used in this experiment) at approximately 8000 parasites/µl of whole blood when successful CF11 column WBCs depletion was performed. A higher threshold of minimum parasitemia for clinical isolates as compared to 3D7 in vitro culture may be due to incomplete WBCs depletion and younger parasite age in patient peripheral blood samples. Interestingly, the same samples processed using DNA microarray platform, yielded stronger correlations to the reference 3D7 IDC transcriptome including those with low parasitemia (samples 8-10) (Fig 4c). This indicates that DNA microarrays are less sensitive to the presence of contaminating host RNA and supports genuine transcriptomic measurements in clinical samples with as little as 4176 parasites/µl of whole blood (Sample no. 9), only slightly higher than our lab culture-derived microarray-based threshold of 0.05% (approx.. 2000-3000 parasites/µl).
Furthermore, after sequencing 193 clinical samples from the TRAC2 study (unpublished data), we have found that our 18S-Pf/18S-Hs ribosomal peak-height ratio can be used to estimate the proportion of unique Plasmodium reads to total reads with good accuracy before sequencing experiments (Fig 4d). Notably, we obtained a strong exponential correlation of 18S-Pf/18S-Hs peak ratio to the proportion of parasite unique reads (measured as the percent of total reads) (PCC=0.75, SCC=0.85, n=171). This provides a key parameter for estimation of the expected proportion of unique parasite reads in the overall RNA-seq dataset. Correlations of unique parasite read proportions to parasitemia (by microscopy) were much less pronounced (PCC=0.4, SCC=0.42), which further highlights the confounding factors for RNA analysis including the presence of human host material and IDC stage representations (see above) (SupplementaryFig S12). Based on the analysis of data obtained from these 171 TRAC2 samples we can conclude that in order to obtain 3 million Plasmodium unique reads (a safe threshold we established for reliable analysis of differential expression of the whole transcriptome) approximately 16% of the total reads must uniquely map to Plasmodium genome (for sequencing conditions used in this experiment) (Fig 4d, blue dotted line). In other words, when the 18S-Pf/18S-Hs ribosomal peak-height ratio value reaches approximately 0.4 (Fig 4d, red dotted line) we can estimate that at least 16% of total reads will map uniquely to Plasmodium. This peak ratio indicator should be applied when estimating how many samples per lane can be multiplexed in order to obtain enough parasite transcriptome coverage.
CRISPR-Cas9 assisted depletion of haemoglobin transcripts
Detailed inspection of our RNA-seq data revealed that the majority of the human mapped reads belong to the Hb gene family (majorly HBA1, HBA2 and HBB). Hence, it is feasible to speculate that suppression of the HBA1, HBA2, and HBB transcript amplification during TWA can significantly improve the overall coverage of the P. falciparum transcriptome, particularly in samples with a high content of host material. To do so, we treated our cDNA libraries with sequence-specific guide RNA (gRNA) and Cas9 nuclease to perform in vitro selective depletion of targeted Hb sequences. This method has been originally reported as Depletion of Abundant Sequences by Hybridization (DASH) but here, we modified the original DASH protocol to suit our needs[21]. Specifically, in order to deplete Hb transcripts, we performed pre-amplification PCR in two steps. The first-strand cDNA was amplified for 5 cycles using KAPA HiFi HotStart ready mix. Subsequently, the purified amplicon was treated with gRNA-Cas9 complex (1:1000 ratio of cDNA:gRNA) followed by another 14 cycles of PCR (Fig 5a). This modification allowed us to start with higher cDNA amount for gRNA-Cas9 treatment thus improving cDNA recovery after bead-based purification of the depleted amplicon.
To assess the performance of CRISPR-Cas9 assisted Hb transcript depletion, we treated both high (1% rings, 12-14hpi, n=3) and low (0.1% rings, 12-14hpi, n=3) parasitemia samples derived from identical parasite population and analyzed the corresponding transcriptomes by RNA-seq. We observed a significant reduction in the read count (shown as transcripts per million, TPM) for HBA1, HBA2 and HBB transcripts in CRISPR-Cas9 treated 1% and 0.1% parasitemia cDNA samples when compared to untreated amplified cDNA (Fig 5b). Our data revealed more than 70% depletion of Hb transcripts after gRNA-Cas9 treatment. The total number of parasite transcripts detection and parasite unique read count improved for the lower (0.1%) parasitemia samples (Fig 5c and Fig 5d) with 276 new P. falciparum transcripts being detected after Hb transcript depletion. These new transcripts belong to the low-abundance group with averaged median TPM of 1.36 (SD 0.75) (Supplementary Fig S13). The “missing transcripts” in treated 1 % parasitemia samples also belong to the low-abundance group with averaged median TPM of 1.8 (SD 0.91) (Supplementary Fig S13). Taken together, the above observations suggest that CRISPR-Cas9 depletion of unwanted transcripts would be most applicable to low parasitemia samples.