Sequencing and annotation
New sequencing technology has revolutionized the cost and pace of obtaining genome information, and there has been a drive to sequence the genomes of organisms which have economic applications, as well as those with environmental interest [74, 75]. This holds true for Rhodococcus genomes, of which only two were sequenced in 2006, while 13 years later 353 genomes are now available, mainly due to Whole Genome Shotgun sequencing efforts (Supp. Info. Table S1). The impact of better and faster sequencing, using improved sequencing techniques, is evident in this case of sequencing the R. rhodochrous ATCC BAA-870 genome: an initial assembly of a 36-cycle, single-ended Illumina library sequence performed in 2009, together with a mate-pair library, yielded a 6 Mbp genome of 257 scaffolds. A more recently performed paired-end Illumina library combined with the previous mate-pair library reduced this to only 6 scaffolds (5.88 Mbp), showing the improved second-generation sequencing results in only 10 years’ time. The presence of four copies of 16S-like genes was the main reason for the assembly to break into 6 scaffolds. Using third generation sequencing (Nanopore), this problem was overcome, and the genome could be fully assembled. Hence, we see second generation sequencing evolving to produce higher quality assemblies, but the combination with 3rd generation sequencing was necessary to obtain the full-length closed bacterial genome.
It has been assumed that the annotation of prokaryotic genomes is simpler than that of the intron-containing genomes of eukaryotes. However, annotation has been shown to be problematic, especially with over- or under-prediction of small genes where the criterion used to decide the size of an open reading frame (ORF) can systematically exclude annotation of small proteins [76]. Warren et al. 2010, used high performance computational methods to show that current annotated prokaryotic genomes are missing 1153 candidate genes that have been excluded from annotations based on their size [76]. These missing genes do not show strong similarities to gene sequences in public databases, indicating that they may belong to gene families which are not currently annotated in genomes. Furthermore, they uncovered ~38,895 intergenic ORFs, currently labelled as ‘putative’ genes only by similarity to annotated genes, meaning that the annotations are absent. Therefore, prokaryotic gene finding and annotation programs do not accurately predict small genes, and are limited to the accuracy of existing database annotations. Hypothetical genes (genes without any functional assignment), genes that are assigned too generally to be of use, misannotated genes and undetected real genes remain the biggest challenges in assigning annotations to new genome data [77-80]. As such, there is the possibility that we are under-estimating the number of genes present on this genome.
Apart from possible misannotation, the algorithm or software used for the annotation plays a huge role in the outcome. In this research both BASys (Figure 2) and RAST (Figure 4) were used as annotation tools, resulting in 7548 and 5535 predicted genes respectively. BASys annotation may provide an overprediction of gene numbers, due to sensitive GLIMMER ab initio gene prediction methods that can give false positives for higher GC content sequences [81]. This shows the importance of the bioinformatics tool used, which makes comparison to other genomes more difficult.
Size and content of the genome
The genomic content of R. rhodochrous ATCC BAA-870 was outlined and compared to other rhodococcal genomes. Sequences of other Rhodococcus genomes were obtained from the Genome database at NCBI [82] and show a large variation in genome size between 4 and 10 Mbp (Supp. Info. Table S1), with an average of 6.1 ± 1.6 Mbp. The apparent total genome size of R. rhodochrous ATCC BAA-870, 5.9 Mbp (consisting of a 5.37 Mbp genome and a 0.53 Mbp plasmid), is close to the average. From the well-described rhodococci (Table 1), the genome of R. jostii RHA1 is the largest rhodococcal genome sequenced to date (9.7 Mbp), but only 7.8 Mbp is chromosomal, while the pathogenic R. hoagii genomes are the smallest at ~5 Mbp. All rhodococcal genomes have a high GC content, ranging from 62 – 71%. The average GC content of the R. rhodochrous ATCC BAA-870 chromosome and plasmid is 68.2% and 63.8%, respectively. R. jostii RHA1 has the lowest percentage coding DNA (87%), which is predictable given its large overall genome size, while R. rhodochrous ATCC BAA-870 has a 90.6% coding ratio, which is in line with its smaller total size. Interestingly, the distribution of protein lengths on the chromosome is different from those on the plasmid. Together with the lower GC content, this shows that the plasmid content was probably acquired over different occasions [83].
Fundamental and applicable biocatalytic properties of rhodococci
Catabolism typically involves oxidative enzymes. The presence of multiple homologs of catabolic genes in all Rhodococcus species suggests that they may provide a comprehensive biocatalytic profile [1]. R. rhodochrous ATCC BAA-870 combines this with multiple transport systems (44% of total COG annotated genes), highlighting the metabolic versatility of this Rhodococcus, which facilitates the use of whole cells in biotechnological applications.
McLeod et al. reported that R. jostii RHA1 contains genes for the Entner-Doudoroff pathway (which requires 6-phosphogluconate dehydratase and 2-keto-3-deoxyphosphogluconate aldolase to create pyruvate from glucose) [10]. The Entner-Doudoroff pathway is, however, rare in Gram positive organisms which preferably use glycolysis for a richer ATP yield. There is no evidence of this pathway existing in R. rhodochrous ATCC BAA-870, indicating that it is not a typical rhodococcal trait, but the RHA1 strain must have acquired it rather recently.
Analysis of the R. rhodochrous ATCC BAA-870 genome suggests that at least four major pathways exist for the catabolism of central aromatic intermediates, comparable to the well-defined aromatic metabolism of Pseudomonas putida KT2440 strain [84]. In R. rhodochrous ATCC BAA-870 the dominant portion of annotated enzymes are involved in oxidation and reduction. There are about 500 oxidoreductase related genes, which is quite a high number compared to other bacteria of the same size, but in line with most other (sequenced) rhodococci [85]. Rhodococcus genomes usually encode large numbers of oxygenases [1], which is also true for strain BAA-870 (71). Some of these are flavonoid proteins with diverse useful activities [86], which includes monooxygenases capable of catalysing Baeyer–Villiger oxidations wherein a ketone is converted to an ester [87, 88].
The 14 cytochrome P450 genes in R. rhodochrous ATCC BAA-870 reflects a fundamental aspect of rhodococcal physiology. Similarly, the number of cytochrome P450 genes in R. jostii RHA1 is 25 (proportionate to the larger genome) which is typical of actinomycetes. Although it is unclear which oxygenases in R. rhodochrous ATCC BAA-870 are catabolic and which are involved in secondary metabolism, their abundance is consistent with a potential ability to degrade an exceptional range of aromatic compounds (oxygenases catalyse the hydroxylation and cleavage of these compounds). Rhodococci are well known to have the capacity to catabolise hydrophobic compounds, including hydrocarbons and polychlorinated biphenyls (PCBs), mediated by a cytochrome P450 system [89-92]. Cytochrome P450 oxygenase is often found fused with a reductase, as in Rhodococcus sp. NCIMB 9784 [93]. Genes associated with biphenyl and PCB degradation are found in multiple sites on the R. jostii RHA1 genome, both on the chromosome as well as on linear plasmids [1]. R. jostii RHA1 was also found to show lignin-degrading activity, possibly based on the same oxidative capacity as that used to degrade biphenyl compounds [94].
The oxygenases found in rhodococci include multiple alkane monooxygenases (genes alkB1–alkB4) [95], steroid monooxygenase [96], styrene monooxygenase [97], peroxidase [98] and alkane hydroxylase homologs [99]. R. rhodochrous ATCC BAA-870 has 87 oxygenase genes while the PCB degrading R. jostii RHA1 has 203 oxygenases, including 19 cyclohexanone monooxygenases (EC 1.14.13.22), implying that of the two, strain BAA-870 is less adept at oxidative catabolism. Rhodococcal cyclohexanone monooxygenases can be used in the synthesis of industrially interesting compounds from cyclohexanol and cyclohexanone. These include adipic acid, caprolactone (for polyol polymers) and 6-hydroxyhexanoic acid (for coating applications) [64]. Chiral lactones can also be used as intermediates in the production of prostaglandins [100]. The same oxidative pathway can be used to biotransform cyclododecanone to lauryl lactone or 12-hydroxydodecanoic acid [101, 102]. Cyclododecanone monooxygenase of Rhodococcus SC1 was used in the kinetic resolution of 2-substituted cycloketones for the synthesis of aroma lactones in good yields and high enantiomeric excess [103]. Similar to R. jostii RHA1, R. rhodochrous ATCC BAA-870 encodes several monooxygenases. All these redox enzymes could be interesting for synthetic purposes in industrial biotechnological applications.
The presence of an ectoine biosynthesis cluster suggests that R. rhodochrous ATCC BAA-870 has effective osmoregulation and enzyme protection capabilities. R. rhodochrous ATCC BAA-870, together with other Rhodococcus strains, is able to support diverse environments and can tolerate harsh chemical reactions when used as whole cell biocatalysts, and it is likely that ectoine biosynthesis plays a role in this. Regulation of cytoplasmic solute concentration through modulation of compounds such as inorganic ions, sugars, amino acids and polyols provides a versatile and effective osmo-adaptation strategy for bacteria in general. Ectoine and hydroxyectoine are common alternate osmoregulation solutes found especially in halophilic and halotolerant microorganisms [104, 105], and hydroxyectoine has been shown to confer heat stress protection in vivo [106]. Ectoines provide a variety of useful biotechnological and biomedical applications [107], and strains engineered for improved ectoine synthesis have been used for the industrial production of hydroxyectoine as a solute and enzyme stabiliser [108, 109]. The special cell-wall structure of rhodococci might make these organisms a better choice as production organism.
Terpenes and isoprenoids provide a rich pool of natural compounds with applications in synthetic chemistry, pharmaceutical, flavour, and even biofuel industries. The structures, functions and chemistries employed by the enzymes involved in terpene biosynthesis are well known, especially for plants and fungi [70, 110]. However, it is only recently that bacterial terpenoids have been considered as a possible source of new natural product wealth [111, 112], largely facilitated by the explosion of available bacterial genome sequences. Interestingly, bacterial terpene synthases have low sequence similarities, and show no significant overall amino acid identities compared to their plant and fungal counterparts. Yamada et al. used a genome mining strategy to identify 262 bacterial synthases, and subsequent isolation and expression of genes in a Streptomyces host confirmed the activities of these predicted genes and led to the identification of 13 previously unknown terpene structures [111]. The three biosynthetic clusters annotated in strain BAA-870 may therefore be an underrepresentation of possible pathways for these valuable compounds.
A total of five NRPS genes for secondary metabolite synthesis can be found on the chromosome, which is not much compared to R. jostii RHA1 that contains 24 NRPS and seven PKS genes [10]. Like strain ATCC BAA-870, R. jostii RHA1 was also found to possess a pathway for the synthesis of a siderophore [113]. The multiple PKS and NRPS clusters suggest that R. rhodochrous ATCC BAA-870 may host a significant potential source of molecules with immunosuppressing, antifungal, antibiotic and siderophore activities [114].
Nitrile conversion
Many rhodococci can hydrolyse a wide range of nitriles [115-118]. The locations and numbers of nitrile converting enzymes in the available genomes of Rhodococcus were identified and compared to R. rhodochrous ATCC BAA-870 (Table 2). R. rhodochrous ATCC BAA-870 contains several nitrile converting enzymes which is in line with previous activity assays using this Rhodococcus strain [33, 34]. However, in most R. rhodochrous strains these enzymes are on the chromosome, while in R. rhodochrous ATCC BAA-870, they are found on a plasmid. In R. rhodochrous ATCC BAA-870 the nitrile hydratase is expressed constitutively, explaining why this strain is an exceptional nitrile biocatalyst [36]. Environmental pressure through chemical challenge by nitriles may have caused the elimination of regulation of the nitrile biocatalyst by transferring it to a plasmid.
The R. jostii RHA1 16S RNA sequence indicates that it is closely related to R. opacus [10] according to the taxonomy of Gürtler et al. (Figure 1) [119]. R. jostii RHA1 expresses a nitrile hydratase (an acetonitrile hydratase) and utilises nitriles such as acetonitrile, acrylonitrile, propionitrile and butyronitrile [120], while R. opacus expresses nitrile hydrolysis activity [115]. R. erythropolis PR4 expresses a Fe-type nitrile hydratase [121], and R. erythropolis strains are well known for expressing this enzyme [115, 122, 123] as part of a nitrile metabolism gene cluster [119]. This enzyme has been repeatedly determined in this species from isolated diverse locations [124], expressing broad substrate profiles, including acetonitrile, propionitrile, acrylonitrile, butyronitrile, succinonitrile, valeronitrile, isovaleronitrile and benzonitrile [115].
The nitrile hydratase enzymes of R. rhodochrous have to date been shown to be of the Co-type [6, 123, 125], which are usually more stable than the Fe-type nitrile hydratases. They have activity against a broad range of nitriles, including phenylacetonitrile, 2-phenylpropionitrile, 2-phenylglycinonitrile, mandelonitrile, 2-phenylbutyronitrile, 3-phenylpropionitrile, N-phenylglycinonitrile, p-toluonitrile and 3-hydroxy-3-phenylpropionitrile [32]. R. ruber CGMCC3090 and other strains express nitrile hydratases [115, 126] while the nitrile hydrolysis activity of R. hoagii [115] is also attributed to a nitrile hydratase [127].
The alternative nitrile hydrolysis enzyme, nitrilase, is also common in rhodococci (Table 2), including R. erythropolis [128], R. rhodochrous [129-132], R. opacus B4 [133] and R. ruber [134, 135]. The nitrilase from R. ruber can hydrolyse acetonitrile, acrylonitrile, succinonitrile, fumaronitrile, adiponitrile, 2-cyanopyridine, 3-cyanopyridine, indole-3-acetonitrile and mandelonitrile [135]. The nitrilases from multiple R. erythropolis strains were active towards phenylacetonitrile [136]. R. rhodochrous nitrilase substrates include (among many others) benzonitrile for R. rhodochrous J1 [137] and crotononitrile and acrylonitrile for R. rhodochrous K22 [138]. R. rhodochrous ATCC BAA-870 expresses an enantioselective aliphatic nitrilase encoded on the plasmid, which is induced by dimethylformamide [36]. Another nitrilase/cyanide hydratase family protein is also annotated on the plasmid (this study) but has not been characterised. The diverse, yet sometimes very specific and enantioselective substrate specificities of all these rhodococci gives rise to an almost plug-and-play system for many different synthetic applications. Combined with their high solvent tolerance, rhodococci are very well suited as biocatalysts to produce amides for both bulk chemicals and pharmaceutical ingredients.
The large percentage of possible mobile genomic region making up the plasmid, together with the high number of transposon genes and the fact that the plasmid contains the machinery for nitrile degradation, strongly support our theory that R. rhodochrous ATCC BAA-870 has adapted its genome recently in response to the selective pressure of routine culturing in nitrile media in the laboratory. Even though isolated from contaminated soil, the much larger chromosome of R. jostii RHA1 in comparison has undergone relatively little recent genetic flux as supported by the presence of only two intact insertion sequences, relatively few transposase genes, and only one identified pseudogene [10]. The smaller R. rhodochrous ATCC BAA-870 genome, still has the genetic space and tools to adapt relatively easily in response to environmental selection.
CRISPR
CRISPRs are unusual finds in rhodococcal genomes. Based on literature searches to date, only two other sequenced Rhodococcus strains were reported to contain potential CRISPRs. R. opacus strain M213, isolated from fuel-oil contaminated soil, has one confirmed and 14 potential CRISPRs [139], identified using the CRISPRFinder tool [140]. Pathak et al. also surveyed several other Rhodococcus sequences and found no other CRISPRs. Zhao and co-workers state that Rhodococcus strain sp. DSSKP-R-001, interesting for its beta-estradiol-degrading potential, contains 8 CRISPRs [141]. However, the authors do not state how these were identified. Pathak et al. highlight the possibility that the CRISPR in R. opacus strain M213 may have been recruited from R. opacus R7 (isolated from polycyclic aromatic hydrocarbon contaminated soil [142]), based on matching BLASTs of the flanking regions.
The R. rhodochrous ATCC BAA-870 CRISPR upstream and downstream regions (based on a 270- and 718 nucleotide length BLAST, respectively) showed significant, but not matching, alignment with several other Rhodococcus strains. The region upstream of the BAA-870 CRISPR showed a maximum 95% identity with that from R. rhodochrous strains EP4 and NCTC10210, while the downstream region showed 97% identities to R. pyridinovorans strains GF3 and SB3094, R. biphenylivorans strain TG9, and Rhodococcus sp. P52 and 2G. Analysis by PHAST phage search tool [143] identified the presence of 6 potential, but incomplete, prophage regions on the chromosome, and one prophage region on the plasmid, suggesting that the CRISPR acquisition in R. rhodochrous ATCC BAA-870 could also have arisen from bacteriophage infection during its evolutionary history.
Identification of target genes for future biotechnology applications
An estimated 150 biocatalytic processes are currently being applied in industry [144-146]. The generally large and complex genomes of Rhodococcus species afford a wide range of genes attributed to extensive secondary metabolic pathways that are presumably responsible for an array of biotransformations and bioremediations. These secondary metabolic pathways have yet to be characterised and offer numerous targets for drug design as well as synthetic chemistry applications, especially since enzymes in secondary pathways are usually more promiscuous than enzymes in the primary pathways.
A number of potential genes which could be used for further biocatalyses have been identified in the genome of R. rhodochrous ATCC BAA-870. A substantial fraction of the genes have unknown functions, and these could be important reservoirs for novel gene and protein discovery. Most of the biocatalytically useful classes of enzyme suggested by Pollard and Woodley [147] are present on the genome: proteases, lipases, esterases, reductases, nitrilase/cyanohydrolase/nitrile hydratases and amidases, transaminase, epoxide hydrolase, monooxygenases and cytochrome P450s. Only oxynitrilases (hydroxynitrile lyases) and halohydrin dehalogenase were not detected, although a haloacid dehalogenase is present. Rhodococci are robust industrial biocatalysts, and the metabolic abilities of the Rhodococcus genus will continue to attract attention for industrial uses as further bio-degradative [6] and biopharmaceutical [148] applications of the organism are identified. Preventative and remediative biotechnologies will become increasingly popular as the demand for alternative means of curbing pollution increases and the need for new antimicrobial compounds and pharmaceuticals becomes a priority.