Enrichment of bradyzoite cysts decreases host protein interference. T. gondii bradyzoite cysts were enriched from homogenized brain of host mice using dextran density centrifugation [9] for in vivo RNA-seq and proteomic analyses (Fig. 1A). Each replicate was a pool of several mouse brains (ranging from 3 mice to 13 mice per group, Fig. 1B). Bradyzoites were released from cysts by pepsin digestion. Approximately two-thirds of the parasites from each replicate were processed for protein analysis and one-third was processed for RNA (Fig. 1C). The highest parasite burden was at 28 and 90 DPI, and the lowest burden at the earliest (21 DPI) and latest (150 DPI) timepoints (Fig. 1B). As the chronic infection progresses, survival decreases [10], which limited the number of mice harvested at the latest timepoints. With few mice and lower parasites per brain, the 150 DPI timepoint resulted in the fewest total parasites collected (Fig. 1C).
Protein samples were analyzed by bottom-up LC-MS/MS to identify parasitic and host proteins. The total number of unique peptides ranged from 11,548 to 24,833 (Additional File 1). Dextran purification of the infected brains increased the number of peptides that mapped to the T. gondii genome for many of the samples. Without dextran purification, protein from infected brain tissue mapped 3.4% of the peptides to T. gondii (black bar, Fig. 2A), whereas many purified samples mapped more than 25% of the peptides to T. gondii. Notably, the 21 DPI samples were near the same T. gondii mapping percentage as unpurified tissue, possibly because the bradyzoite cysts were not mature enough to sediment under the dextran. The number of unique T. gondii peptides ranged from 462 at 21 DPI, to 8,163 at 28 DPI, and correlated with the number of bradyzoites in the sample (Fig. 2B). These peptides mapped to 1,683 unique T. gondii proteins (Additional File 2). We chose two samples in each of the middle timepoints (28, 90 and 120 DPI) with the highest T. gondii peptide count and enrichment for further analysis (Fig. 2A arrows). From these samples, we required peptides be identified in both replicates of each timepoint. This resulted in 6,528 peptides identified at 28 DPI, 3,617 at 90 DPI and 3,486 at 120 DPI, with 2,040 peptides common between all three timepoints (Fig. 2C). The identified peptides represent 870 proteins at 28 DPI, 504 proteins at 90 DPI, and 502 at 120 DPI for a total of 893 unique proteins with 366 common between all timepoints (Fig. 2D, Additional File 3).
StringTie program predicts novel T. gondii bradyzoite transcripts. RNA from the six samples with the highest percentage of T. gondii peptides (Fig. 2A arrows) were processed for Illumina sequencing. Samples were multiplexed and sequencing acquired over 265 million total reads, averaging 44 million reads per sample (Table 1). These samples were highly enriched for T. gondii, as 52-94% of the reads aligned to the T. gondii genome (21-41 million T. gondii reads per sample). This coverage is comparable to that of tissue culture tachyzoites, which ranged from 45-83% reads aligning to T. gondii (approximately 20 million total reads), and is 265-fold greater than the number of reads acquired from infected whole brain tissue [6], which mapped only 0.1% of reads to T. gondii.
To obtain higher transcript resolution for the bradyzoite datasets, we used the StringTie program [11] to predict novel transcripts (Fig. 3). The alignment files of all datasets (Table 1) were used to update the original T. gondii annotation, which changed the number of predicted genes from 8,920 to 8,805 and the number of predicted transcripts from 8,920 to 10,668. This updated annotation was used to remap the reads. Subsequently, the gene (Additional File 4) and transcript (Additional File 5) abundance was quantified using RSEM and DESeq2 [12, 13].
The principle component analysis (PCA) calculated from the normalized DESeq2 values shows distinct clustering of samples by their experimental groups (Fig. 4A). Tissue culture tachyzoites group on the far right of the X-axis, while the purified bradyzoites group on the left. Whole brain tissue lays in the middle, with acutely infected samples skewed towards tachyzoites and chronically infected samples skewed toward the bradyzoites. All timepoints show a similar number of highly expressed transcripts with a normalized expression value (transcripts per million; TPM) >50, as well as moderately expressed (TPM 11-50) transcripts. The overall number of low expression transcripts (TPM <10) are only similar between tachyzoites and purified bradyzoites (Fig. 4B). The lower read depth of T. gondii reads in the unpurified brain samples likely contributed to the increase in non-expressed transcripts and thus a shift away from the purified 28-day bradyzoite samples on the PCA plot.
Most identified proteins had high gene expression. We next analyzed normalized expression calculated from RSEM (TPM values). The ‘gene abundance’ output of RSEM is aggregated from all transcripts that map to each annotated gene. We focused on the 366 proteins identified by mass spectrometry that were common between all timepoints (Additional File 3). Of the 366 proteins, 266 were highly expressed in the purified and whole brain chronic infection samples with an average TPM >100 (Fig. 5). Of these, 89 genes were >2-fold higher during chronic infection relative to acute infection (Fig. 5, Group I, average of all chronic TPM values relative to tachyzoite and acute whole brain sample). Genes in this subset include known bradyzoite markers such as BAG1, ENO1 and LDH2. Approximately half of the genes are highly expressed during both tachyzoite and bradyzoite stages (Fig. 5, Group II, >100 average TPM value and <2-fold difference between stages) and include many housekeeping genes such as genes for tubulin, actin, and GAPDH. Interestingly, 100 proteins that were identified in all 6 bradyzoite samples had low gene expression (Fig. 5, Group III, <100 TPM), with 25 of these genes are higher during acute infection (Fig. 5, Group IV).
Analyzing purified bradyzoites finds more transcript changes. As described above, we used the StringTie program to predict novel transcripts (Fig. 3). Using this updated annotation, we remapped the sequencing files from our previous acute and chronic infection whole brain datasets to analyze differential expression with DESeq2. Using the new annotation, we identified 643 transcripts changing in abundance between acute and chronic whole brain tissues using a q-value (false discovery rate) threshold <0.05 and 2-fold cut-off (Fig. 6A), compared to the 547 genes previously identified [6]. As mentioned, the dextran purification allowed greater depth of T. gondii sequencing; thus, a direct comparison of T. gondii genes from whole brain tissue at 28 DPI with the purified bradyzoites from either the 28, 90 or 120 DPI resulted in approximately 2,000 differentially transcribed genes (Fig. 6A, tan colored boxes). Similarly, comparing all three purified brain bradyzoite timepoints to tissue culture tachyzoites showed approximately 4,000 of the transcripts changed during chronic infection (Fig. 6A, purple colored boxes).
T. gondii transcription remains static after 3 months post-infection. Comparing purified bradyzoites throughout chronic infection, 59 transcripts (from 50 genes) change at 90 DPI and 109 transcripts (from 93 genes) change by 120 DPI relative to 28 days (Fig. 6A, Additional Files 6 and 7). No transcripts change significantly between 90 and 120 DPI. Of the differentially expressed transcripts, 51 transcripts were differentially expressed at both 90 and 120 DPI: 15 increasing in abundance and 36 decreasing (Fig. 6B). Only 8 transcripts change specifically at 90 days, whereas 58 transcripts are specific for 120 days: 18 increasing and 40 decreasing in expression. 33 of the differentially expressed transcripts have an average TPM value <10 among all purified bradyzoite timepoints. This group of 33 included the transcript for SAG1 which had an average TPM value of 16 at 28 days, deceasing 3.5-fold and 4.8-fold at 90 and 120 DPI, respectively. This low level of bradyzoite expression is compared to average TPM values of 8,000 and 13,000 for tachyzoites and acute whole brain tissue respectively.
One of the transcripts which had one of the largest changes during chronic infection was for the gene sporoAMA1 [14]. Interestingly, while most transcripts map to the entirety of the predicted full gene, such as conventional AMA1 (TgME49_255260, Fig. 7A), reads to sporoAMA1 mapped to only the 3’ region of the gene and almost no reads mapped to the 5’ region (Fig. 7B). Coverage for all bradyzoite timepoints starts in the predicted intron region between exons 6 and 7. Compared to 28 DPI, both 90 and 120 DPI had an approximately 7.0-fold increase in transcript abundance. Furthermore, RNAseq coverage from T. gondii 7 DPI in the intestines of cats [15] shows a similar coverage profile of sporoAMA1, except coverage starts at the predicted intron region between exons 4 and 5. These isoforms of sporoAMA1 are shortened from those seen in sporulated oocysts [16], which include all 9 exons (Additional File 8). This suggest that T. gondii has stage alternative isoforms of sporoAMA1 that are expressed only during late-chronic infection or during the sexual stages in cats.