This study employed the WGCNA method [17] to construct the T. brucei weighted gene co-expression network using RNA-Seq data. The resulting co-expression network analysis allowed identification of modules (gene clusters) as well as enrichment analysis in GO [15] and KEGG [16] annotation databases to associate the modules with their functions. Highly connected genes in a module, known as intra-modular hub genes [17], were also determined as they are key drivers of a molecular process or act as a representative of the predominant biological function of the module. Here, we demonstrate the usefulness of the network for functional genomic analysis using an example of the cell cycle and protein biosynthesis enriched functions.
The cell cycle in eukaryotes comprises of four phases, namely: G0/G1, S, G2, and M phases [36]. The cell prepares for division in the first gap phase (G0/G1), replicates the DNA during the S phase, and then undergoes mitosis (M) in the second gap phase (G2). In T. brucei, the cell cycle is tightly regulated to ensure that single-copy organelles and structures such as Golgi body, mitochondrion, kinetoplast, nucleus, basal body, and flagellum are duplicated, maintained at precise positions in the cell and segregated accurately [37]. Various GO terms related to organelles were over-represented in the black module (Figs. 2 and 3) and included microbody, peroxisome, glycosome, and acidocalcisome (Additional file 10: Figure S4). The organelles duplicate in the first gap phase (G0/G1) [38]. This suggests that genes assigned to the black module (Figs. 2 and 3) could play a role in the cell cycle particularly during the G0/G1 phase. Furthermore, some cyclins and cdc2-related kinases (CRKs) that are key regulators of cell cycle such as CYC2 (Tb927.11.14080), CYC5 (Tb927.10.11440), and CRK10 (Tb927.3.4670) were assigned to the black module (Additional file 6). CY2 and CY5 regulate transition of G1 phase to S phase [39]. Co-expression of CRK10, whose regulatory role is presently unknown, with CYC2 and CYC5 and its demonstrated interaction with CYC2 in yeast two-hybrid assay [39], suggests a possible role in G1 to S phase transition.
The hub gene for the black module was identified as adenine phosphoribosyltransferase (APRT) (Table 2) that plays a crucial role in purine salvage pathway in T. brucei. This parasite lacks de novo purine biosynthetic pathway [40]. Purine nucleotides are precursors of DNA and RNA and are also constituents of second messengers in signaling pathways such as cyclic AMP [41]. In this regard, APRT may be important in enriched module functions such as cyclic nucleotide biosynthesis and synthesis of structural constituent of the ribosome particularly ribosomal RNA, and consequently, signaling and protein biosynthesis. Signaling is depicted by the black module’s over-represented GO terms such as adenylate cyclase activity, while protein biosynthesis is depicted by GO terms such as translation, unfolded protein binding, protein folding, and structural constituent of ribosome (Additional file 10: Figure S4).
The red module (Figs. 2 and 3) was functionally enriched for GO terms such as DNA replication and chromosome organization, and KEGG pathway term homologous recombination indicating that its genes are involved in the progression of cell cycle (Additional file 11: Figure S5 and Table 1). Additionally, the red module has some genes involved in cytokinesis such as BOH1 (Tb927.10.12720), that cooperates with TbPLK to initiate cytokinesis and flagellum inheritance [42], and Cytokinesis Initiation Factor 2 (CIF2) (Tb927.9.14290) which is involved in initiation of cytokinesis [43] (Additional file 6). Other genes assigned to this module were in concordance with the enriched functions. These were nucleus- and spindle-associated protein 1 (NuSAP1) (Tb927.11.8370) that is required in chromosome segregation and NuSAP2 (Tb927.9.6110) that promotes G2/M transition [44]. The hub gene for the red module is a hypothetical gene (Tb927.7.6920) which may play a key role in the progression of cell cycle.
The tan module (Figs. 2 and 3), whose enriched GO terms include spindle pole and microtubule cytoskeleton, has genes such as CIF4 (Tb927.10.8240), TLK1 (Tb927.4.5180) and FPRC (Tb927.10.6360) that are involved in cytokinesis [45, 46]. The hub gene for the tan module is RNA-binding protein RBP6 (Table 2). Interestingly, over-expression of RBP6 in vitro has been demonstrated to recapitulate the parasite’s tsetse fly stage developmental form that were previously elusive in culture [47]. However, the exact role of RBP6 during the parasite’s development in the tsetse fly is yet to be elucidated. Based on its assignment to the tan module, it is likely to be involved in regulating a key step during progression of the cell cycle.
The salmon module (Figs. 2 and 3) has enriched functions in RNA metabolic processing depicted by the module’s enriched GO terms which are RNA metabolism, nucleic acid binding, and RNA binding (Additional file 12: Figure S6). RNA binding may either involve binding of the mRNA by RNA-binding proteins (RBPs) as a post-transcriptional gene regulation mechanism in T. brucei [48, 49], or binding by translation initiation factors for protein synthesis [50]. The salmon module has translation initiation factor eIF4E1 (Tb927.11.2260) and poly(A) binding protein PABP2 (Tb927.9.10770) that have previously been shown to have similar co-localization in T. brucei [51]. An RNA-binding protein related to stress response, ZC3H30 (Tb927.10.1540), together with an associated stress response granule (Tb927.8.3820) [52], were assigned to the salmon module. The hub gene for the salmon module is 2-oxoglutarate dehydrogenase E1 component (Table 2). The 2-oxoglutarate dehydrogenase is an enzyme involved in the tricarboxylic acid cycle (TCA) in the mitochondrion implicated in the degradation of proline and glutamate to succinate which can enter gluconeogenesis pathway in procyclic trypanosomes [53]. This hub gene could be important in the role of the mitochondrion in responding to stress as a result of change in energy source in insect-stage trypanosomes.
Regulation of gene expression in T. brucei occurs almost exclusively post-transcriptionally as a result of polycistronic arrangement of their genes [50, 54, 55]. Post-transcriptional regulation of mRNA abundance mainly involves interaction of their cis-regulatory element and a trans-acting element such as an RNA-binding protein [56]. Genes with similar functions are co-regulated together thus their mRNAs are hypothesized to have similar cis-regulatory elements [19]. Since the gene modules of a co-expression network are composed of genes with similar functions, they can be used as a basis for identifying potential regulatory elements in the untranslated regions of mRNA.
Two motifs ([AU]A[CGU]AUGUA[CGU] and [CGU][CU]AUAGA.[ACU]) that had consensus sequences similar to previously identified motifs were found to be over-represented (Fig. 4a). The motif [AU]A[CGU]AUGUA[CGU] contains the core sequence, UGUA, that is recognized by the PUF family of RNA-binding proteins [57] and has previously been identified in T. brucei as targeting transcripts involved in the cell cycle [58–60]. The motif was over-represented in the black, pink and darkturquoise modules (Fig. 4a). [AU]A[CGU]AUGUA[CGU] co-occurs with other motifs which are [CGU]AAU.[AU]UA., .UUUUUUA., [AC]GGA[AG]U[AG]A. and [AGU]UUUGGUU[AGU] (lighter colors in Fig. 4b). Co-occurrence of motifs means that they either co-localize within the same untranslated region (UTR) which indicates that the presence of one motif implies presence of its putative counterpart [19]. These co-occurring motifs may provide further information on post-transcriptional regulation. For instance, co-localization of two motifs close to each other on a transcript could imply physical interaction of their binding elements, hence their functional interaction [19].
The other motif, [CGU][CU]AUAGA.[ACU], was over-represented in the red and greenyellow modules (Fig. 4a). This consensus motif contains the core AUAGA sequence similar to CAUAGAA that has been implicated in cell cycle regulation [61, 62] and was previously predicted in T. brucei [60]. Notably, genes in the red module were enriched for cell cycle functions while those in the greenyellow module were enriched for microtubule associated functions, including motility (Additional file 13: Figure S7). Motility in T. brucei is mediated through the flagellum [63]. Importantly, flagellum motility is essential for completion of the cell division [64, 65] suggesting co-regulation of genes in the greenyellow module together with those in the red module. The motif [CGU][CU]AUAGA.[ACU] does not co-occur with other motifs which possibly suggests that its functions have opposing effects compared with functions of the other motifs (Fig. 4b). Overall, characterization of these identified cis-regulatory elements will advance our knowledge on post-transcription gene regulation and provide potential chemotherapeutic targets against key regulatory functions in T. brucei for disease control.