Our database has multiple tools to allow for different analyses of the genomic and transcriptomic data. The data can be viewed with ease through pages such as species (www.protist.guru/species), genes (www.protist.guru/sequence/view/48072), gene families (www.protist.guru/family/view/186501), co-expression clusters (www.protist.guru/cluster/view/600), neighborhoods (www.protist.guru/network/graph/39535), family phylogenetic trees (www.protist.guru/tree/view/48179), Pfam domains (www.protist.guru/interpro/view/13405), and Gene Ontology terms (www.protist.guru/go/view/4133). Each page contains additional information relevant to the type of data being displayed. For instance, the gene pages contain information about cDNA and protein sequences, functional annotations, expression profiles, co-expression neighborhoods, and more. On the other hand, Gene Ontology pages show GO annotations, the genes in the 15 protists with the same GO term, and enriched co-expressed clusters for genes that have that particular GO term. The features page (www.protist.guru/features) lists a complete description of search functions and tools.
To exemplify how our tool can be used to uncover novel genes and conserved gene clusters in biosynthetic pathways, we analyzed the secondary carotenoid biosynthesis pathway in Haematococcus lacustris via co-expression analysis. Haematococcus lacustris is a unicellular freshwater microalga that is a rich source of astaxanthin, a highly valued red xanthophyll known for its potent antioxidant activity (Han et al., 2019).
Since phytoene is the first carotenoid precursor for astaxanthin biosynthesis, we queried our database with phytoene desaturase, one of the first two fundamental enzymes that catalyzes the conversion of C40 phytoene to ζ-carotene, an essential precursor for beta carotene, and hence astaxanthin. Upon entering the key intermediary enzyme, phytoene desaturase (lcl|BLLF01000057.1_cds_GFH06801.1_1278) for astaxanthin production in Haematococcus lacustris into our database, we arrive at its gene page (https://protists.sbs.ntu.edu.sg/sequence/view/152450). Stress inducing conditions have been shown to increase the yield of astaxanthin in Haematococcus lacustris cells, by inducing a preferential morphological transformation of vegetative, green, motile cells to mature, non-motile cysts filled with red astaxanthin (Minhas et al., 2016). An overview of the gene expression of phytoene desaturase in different strains of Haematococcus lacustris, developmental stages and physicochemical conditions, such as high-light illumination across different time periods and in different acidic cultures, can be found on the gene page.
The phytoene desaturase gene (lcl|BLLF01000057.1_cds_GFH06801.1_1278) is also found in a co-expression cluster (Cluster 8: https://protists.sbs.ntu.edu.sg/cluster/view/2936), which represents groups of functionally related genes. The cluster page can be navigated from the gene page and displays a cluster expression profile that takes into account the TPM values of gene members across different conditions (Fig 1A). Information on significantly enriched Gene Ontology (GO) terms (corrected p-value <0.05), InterPro Domains, and gene families for gene members are also available. Based on the expression profile, we saw that the expression of the genes in the cluster showed the highest expression after 24 hours of high light exposure with a corresponding decrease in expression after 48 hours. This suggests that transient stress induction via high light exposure could be ideal to obtain the highest yield of astaxanthin. Thus, the expression profile analysis can reveal the conditions when a given gene and its associated pathway is highly expressed.
The co-expression cluster consists of 76 genes that are involved in various processes such as acetyl-CoA and pyruvate synthesis, methylerythritol phosphate pathway (MEP) of isoprenoid biosynthesis, fatty acid biosynthesis, β-carotenoid biosynthesis, astaxanthin biosynthesis, transcriptional regulation and starch synthesis (Fig 1B). The presence of these processes in the cluster implies functional relevance between the processes. Acetyl-coA and pyruvate are essential first precursors for carotenoid biosynthesis, which could be fed into mevalonate (MVA) or MEP pathway for isopentenyl pyrophosphate (IPP) and dimethylallyl pyrophosphate (DMAPP) synthesis. IPP and DMAPP are precursors for synthesis of terpenoids such as β-carotene, which is an intermediate for astaxanthin production. Stoichiometric coordination and interdependence between fatty acid biosynthesis and astaxanthin production pathways were also reportedly observed in Haematococcus lacustris with some fatty acid biosynthesis acyltransferases postulated to be involved in astaxanthin esterification (Chen et al., 2015). Through co-expression analysis, the genes identified using our tool could provide greater insights into potential crosstalk between pathways that could affect astaxanthin biosynthesis in Haematococcus lacustris. This could be immensely useful for the metabolic engineering of Haematococcus lacustris cells to increase astaxanthin production.
The gene pages provide links to the respective gene families and gene trees. For example, the phytoene desaturase gene (lcl|BLLF01000057.1_cds_GFH06801.1_1278) belongs to a gene family (https://protists.sbs.ntu.edu.sg/family/view/183185) comprising 23 genes found in 15 species (Figure 1C) . The phylogenetic relationships between the genes can be also viewed by clicking on the phylogenetic tree link (https://protists.sbs.ntu.edu.sg/tree/view/44863), available on the gene page or gene family page (Figure 1D).
Additionally, information on similar clusters from all species is available from the cluster page and can be compared using the “compare” button. Here, we present a similar cluster to Haematococcus lacustris’s cluster 8 in another astaxanthin producing microalga, Chromochloris zofingiensis with a Jaccard index of 0.094. Upon clicking “Compare” to compare the clusters, we arrive at a page showing the co-expression network comprising genes in the conserved clusters in the two organisms (Fig 1E). The genes conserved between these clusters are involved in acetyl-CoA, pyruvate, and fatty acid biosynthesis (Fig 1E). As demonstrated in this example, our tool allows for the easy identification of novel genes in different biological pathways and functional orthologs through comparing conserved neighbourhoods with genes in the same orthogroup and with similar co-expression.