Background: The exponential increase in high-throughput sequencing data and the development of computational sciences and bioinformatics pipelines has advanced our understanding of microbial community composition and distribution in complex ecosystems. Despite these advances, the identification of microbial interactions from genomic data remains a major bottleneck. To address this challenge, we present OrtSuite, a flexible workflow to predict putative microbial interactions based on genomic content.
Results: OrtSuite combines ortholog clustering strategies with genome annotation based on a user-defined set of functions allowing for hypothesis-driven data analysis. OrtSuit allows users to install and run all workflow components and analyze the generated outputs using a simple pipeline consisting of 23 bash commands and one R command. Annotation is based on a two-stage process. First, only a subset of sequences from each ortholog cluster are aligned to all sequences in the Ortholog-Reaction Association database (ORAdb). Next, all sequences from clusters that meet a user-defined identity threshold are aligned to all sequence sets in ORAdb to which they had a hit. This approach results in a decrease in time needed for functional annotation. Further, OrtSuit identifies putative interspecies interactions based on their individual genomic content based on constrains given by the users. Additional control is afforded to the user at several stages of the workflow: 1) The construction of ORAdb only needs to be performed once for each specific process also allowing manual curation; 2) The identity and sequence similarity thresholds used during the annotation stage can be adjusted; and 3) Constraints related to pathway reaction composition and known species contributions to ecosystem processes can be defined.
Conclusions: OrtSuit is an easy to use workflow that allows for rapid functional annotation based on a user curated database. Further, this novel workflow allows the identification of interspecies interactions through user-defined constrains. Due to its low computational demands, for small datasets (e.g. maximum 100 genomes) OrtSuit can run on a personal computer. For larger datasets (> 100 genomes), we suggest the use of computer clusters. OrtSuit is an open-source software available at https://github.com/mdsufz/OrtSuit .