Study sample characteristics
The subjects for the study were recruited at Lwala hospital, Kaberamaido district, Eastern Uganda. The samples collected consisted of blood and cerebral spinal fluid (CSF) obtained from patients, with microscopic diagnosis of T. b rhodesiense parasites in the blood (early stage) and/or CSF (late stage). The confirmation of T. b rhodesiense parasites in the samples was carried out by species specific PCR of the serum resistance associated (SRA) gene; details of the infection characteristics can be found in Mulindwa et al. . As controls, blood samples were obtained from uninfected healthy individuals from the same focus (Figure 1). Nine subjects were used for this analysis; however, it is worth noting that one individual had both the blood and CSF samples analysed (Table 1). These nine subjects were selected on the basis of having good RNA yield and high parasitaemia as previously described . The study subjects were from the same Kumam speaking Luo ethnic group and consisted of 4 females and 5 males with age ranging from 6 – 35 years. All the cases showed presence of T. b rhodesiense parasites with higher parasitaemia observed in blood (mean parasite count/ml, 3.2x107) than the CSF (mean parasite count/ml, 3x105). The RNA was extracted from Paxgene blood, and rRNA and haemoglobin mRNA were depleted prior to cDNA library preparation. For CSF, RNA was extracted from the frozen cellular fractions in Trizol and rRNA depleted. For comparison purposes, we tried placing CSF in Paxgene tubes but did not succeed in recovering RNA. All samples were reverse transcribed and sequenced. We previously analysed the transcriptomes of the trypanosomes in the blood and CSF samples . Here, we have studied the transcriptomes from the human host, comparing them with blood RNA from three uninfected controls, prepared as for the HAT patients. The reads were aligned to the Human reference genome build GRCH38 (Table 1) using TOPHAT . The average number of mapped reads per sample was 104 ± 71SD million single end reads.
Correlation of genome wide expression across samples
The mapped reads were normalized for sequencing depth and gene length to obtain Reads Per Kilobase per Million mapped reads (RPKM) values, which were used to analyse for sample sequence quality. The blood case and control samples had similar median values except for samples 81B, Control 2 and Control 3, whereas the CSF samples except for 60C also had similar median values (Figure S1A). Analysis of distance between the samples using the Jensen-Shannon algorithm showed that the samples under the same category of CSF, blood Cases or Controls had the least divergence between them although highest distance was observed between the CSF and blood samples (Figure S1B). There was an even dispersion of RPKM mean counts within each sample (Figure S2) and similarly, pairwise comparison of sample RPKM values showed a higher correlation (R>0.8) between transcripts of blood samples (Cases and controls) than between the blood and CSF samples (R<0.7) (Figure S3) although significant genes were observed in all samples (Figure S4).
Sample transcriptomes clustered by phenotype category
To determine the differences in gene expression in the circulating blood and CSF of patients that result from T. b rhodesiense infection, we used DESeq2 to analyse the gene read count data output from Cuffdiff . Using the DESeq2 data normalized by the variance stabilizing transformation (VST) algorithm, we determined the sample stratification by principal component analysis (Figure 2). We observed that the CSF and blood transcriptomes formed distinct clusters with over 50% PC2 variance between them (Figure 2A). However, there was less variation (30%) observed between the blood cases and controls (Figure 2B, Figure S5A), with approximately 838 genes differentially expressed (padj < 0.05) between cases and controls (Figure S7Ai). A somewhat extreme variation (>70%) was observed between the stage 1 (blood) and stage 2 (CSF) individual transcriptomes (Figure 2C, Figure S5B), with approximately 4994 genes differentially expressed between them (Figure S7Aii). The downstream differential gene expression analysis was carried out between the Stage 1 and uninfected controls (Figure 2B) and the Stage 2 [CSF] and Stage 1 [blood] (Figure 2C).
Enrichment of innate immune response transcripts during the early hemolymphatic stage of infection
To determine the differentially expressed transcripts resulting from T. b rhodesiense infection in the early hemolymphatic stage, we compared the blood transcriptomes from the cases and control individuals but excluding sample 80B due to its low number of reads (Table 1). We identified genes which are differentially expressed with an absolute log2 fold change (Log2FC > 1.0) using the apeglm estimator , which corrects for effective size shrinkage by removing noise and preserving large differences (Figure S6 A). From this dataset, we extracted significantly differentially expressed genes (adjusted p value, padj < 0.05), and annotated them using the Ensembl database (Additional Table S1). This identified 839 significant differentially expressed genes (DEGs) of which 55% (462/839) coded for proteins (Figure S7 Ai). Of these protein-coding genes, 33% (154/462) were down regulated (log2 fold change < -1) whereas 67% (308/462) were up regulated (log2 fold change (log2FC) > 1) relative to the healthy controls. The DESeq2 rlog transformation of read counts (Figure S6 B) was used to present these significant coding sequence (CDS) genes in a clustering heatmap using Euclidean distance correlation with complete linkage (Figure S8). In order to determine which biological processes are most affected by T. b. rhodesiense infection, the CDS gene list was analysed for functional enrichment in cellular biological process genes using ToppCluster  and selected for genes with a Bonferroni corrected p value cut-off p < 0.05. For this we observed enrichment of 30% (139/462) genes annotated with immune response mechanisms (Figure 3, Figure S7 Bi, Table S2). We observed up-regulation (log2FC > 2.0) of the classical complement pathway genes (C1QA, C1QB, C1QC, C3Ar1, C4BPA, CR1) which initiate antigen-antibody binding and formation of the C3 convertase . Furthermore, high levels of HLA-DRB5 (log2FC 3.0), involved in presenting peptides from extracellular proteins and the immunoglobulin heavy chain variable transcripts (IGHVs, log2FC 3.0 - 6.0) were observed; these IGHVs are also involved in antigen presentation . Looking at the cytokine levels, there was observed up regulation of pro-inflammatory TNF-α induced proteins (TNFAIP6 [log2FC 2.6], TNFAIP8 [log2FC 5.3]) involved in systemic inflammation; and also, elevation of interleukins IL21 (Log2FC 3.7) and IL1 receptor (Log2FC 2.0) that respond to infection. In addition, there was up regulation of Haptoglobin (HP, log2FC 3.1), which binds to haemoglobin to form the HpHb complex which is involved in innate immunity against Trypanosoma infection . In addition up-regulated surface markers included CD163, a marker for macrophages and scavenger receptor for the haemoglobin-haptoglobin complex [30, 31], and CD177, involved in neutrophil activation . These results implicate an innate immune response pathway role during the hemolymphatic stage. Within the up-regulated genes, we observed a significant enrichment within the KEGG pathways of the Systemic lupus erythematosus (SLE) pathway (FDR 2.3E-31) (Figure S9). This possibly implied that the pathological immune response mechanism observed during T.b rhodesiense infection could be related to that in SLE . A network analysis of the up-regulated genes indeed showed that they were involved in innate immune response pathways (Figure 5A). The key gene hubs with multiple nodes in the network were Platelet factor 4, PF4, and thrombospondin 1, THBS1, which trigger innate response and cellular motility/adhesion mechanisms respectively [34, 35].
Anti-inflammation and neuro-activation during second stage CNS infection
There is migration of circulating activated lymphocytes from the venous blood across the blood brain barrier or choroid plexus into the cerebral spinal fluid  . Therefore, to determine the genes which are differentially expressed in the blood and CSF lymphocytes during active T. b rhodesiense infection, and possibly identify mechanisms that distinguish the early and late stages of the disease, we compared the stage 1 blood (Table 1, HB73, HB71, HB81) and stage 2 CSF (Table 1, HC57, HC60, HC71) transcriptomes (Figure 2C, Table S3). CSF could not be obtained from uninfected people for ethical reasons; we chose to compare with the stage I, rather than stage II blood because we wanted to identify candidate biomarkers in blood that could be used to diagnose CSF infection. 4234 genes were differentially expressed at padj < 0.05 (Table S3, and Figure 4 which shows 1808 genes at padj < 0.005). Of these 52% (2232/4234) were up regulated (log2FC > 1) and 48% (2002/4234) down regulated (log2FC < -1). When compared to the differential expression for the stage 1 blood samples versus control, there were over nine times more CDS significant genes (padj < 0.05) in the stage 2 (CSF) than stage 1 (blood) samples (Figure S7A). Functional analysis of these DEGs showed enrichment for genes mainly in the biological processes of cellular organization, morphogenesis, motility and signalling (Figure S7B.ii, Table S4). In order to determine which pathways were enriched for in the up-regulated genes, we probed the innate immune response database . We observed high enrichment (pvalue < 8.0E-4) of gene clusters involved in the ‘reactome’ pathways, that is, neuronal system, retinoid metabolism, diseases associated with visual transduction, visual photo-transduction and neurotransmitter receptor binding (Figure S10). Genes within these clusters including ADCY2 and AKAP have been associated with bipolar disorder and schizophrenia [37, 38]. Furthermore genes associated with gamma-aminobutyric acid (GABRA2, GABRB) have been implicated in sleep disorders  including ApoB . IL-10 which has been previously observed up-regulated in the CSF of T. b. rhodesiense patients [41, 42] was indeed more expressed in the CSF (log2FC 3.1) in addition to the other cytokines including IL12, IL17RD, IL20RA, IL21R, IL36B, IL32.
A network analysis of the up-regulated genes revealed a number of core signalling molecules and transcription factors involved in brain function (Figure 5B). A key up-regulated factor in the network was FOXP3, which plays a fundamental role in the development and function of regulatory T cells (Treg, FOXP3+CD4+) and cellular proliferation and migration [43–45]. The CD4 receptor was up regulated (log2 FC 2.7) in addition to IL10 (log2 FC 3.1) a key anti-inflammatory cytokine linked to CD4+ T helper cells that interacts with MHC class II molecules that are generated from extra cellular pathogens , which in this case would be the trypanosomes in the CSF. The presence of FOXP3 and CD4 is an indication of elevated Tregs which is a result of inflamed central nervous system . Furthermore, up-regulation of chemokine receptor CXCR3 (Log2FC 4.1) was indicative of CNS disease .
Peripheral blood signatures for CNS infection
We next looked at the genes that intersect between the DEGs for stage 1 and stage 2 individuals in order to identify genes co-expressed in both the blood and CSF (Figure 6A). We identified a total of 184 genes that are significantly differentially expressed (padj <0.05) both in blood (Stage 1 vs Controls) and CSF (Stage 2 [CSF] vs Stage 1[Blood]) transcriptomes (Table S5). Over 90% of these genes increased in blood and decreased in CSF, suggesting an antagonistic role played by them during the course of infection. However following hierarchical clustering, we identified 6 genes (C1QC, SOX5, METTL7A, SLCO4A1, MARCO and IGHD3-10), which were increased in both the blood and CSF of stage 2 patients (Figure 6B). Furthermore, C1QC, MARCO and IGHD3-10 were increased more than 5-fold in the blood of stage 1 patients (Figure 6C). If the corresponding polypeptides are similarly increased, they might in future be considered as possible diagnostic markers for CNS invasion. C1QC forms part of the complement component 1q, which constitutes the innate immune system ; MARCO is a scavenger receptor found on macrophages and involved in phagocytosis of pathogens ; IGHD3-10 Immunoglobulin heavy chain diversity antigen receptors expressed by B cells and are a major component of the adaptive immune response .