Overview. A total of 108 Binatota MAGs with >70% completion and <10% contamination were used for this study, which included 86 medium-quality and 22 high-quality genomes, as based on MIMAG standards 18. Binatota genomes clustered into seven orders designated as Bin18 (n=2), Binatales (n=48), HRBin30 (n=7), UBA1149 (n=9), UBA9968 (n=34), UBA12015 (n=1), UTPRO1 (n=7), encompassing12 families, and 24 genera (Figure 1, Table S1). 16S rRNA gene sequences extracted from orders Bin18 and UBA9968 genomes were classified in SILVA (release 138) 19 as members of class bacteriap25 in the phylum Myxococcota, order Binatales and order HRBin30 as uncultured phylum RCP2-54, and orders UBA1149 and UTPRO1 as uncultured Desulfobacterota classes (Table S1). RDP II-classification (July 2017 release, accessed July 2020) classified all Binatota sequences as unclassified Deltaproteobacteria (Table S1).
Methylotrophy in the Binatota.
1. C1 substrates oxidation to formaldehyde.
Methanol: With the exception of HRBin30, all orders encoded at least one type of methanol dehydrogenase (Figure 2a). Three distinct types of methanol dehydrogenases were identified (Figure 2a, b): 1. The NAD(P)-binding MDO/MNO-type methanol dehydrogenase (mno), typically associated with Gram-positive methylotrophic bacteria (Actinobacteria and Bacillus methanolicus) 20, was the only type of methanol dehydrogenase identified in orders UBA9968, UBA12105, and UTPRO1 (Figure 2a, Extended data 1), as well as some UBA1149 and Binatales genomes. 2. The MDH2-type methanol dehydrogenase, previously discovered in members of the Burkholderiales and Rhodocyclales 21, was encountered in the majority of order UBA1149 genomes and in two Binatales genomes, and 3. The lanthanide-dependent pyrroloquinoline quinone (PQQ) methanol dehydrogenase XoxF-type was encountered in nine genomes from the orders Bin18, and Binatales, together with the accessory XoxG (c‐type cytochrome) and XoxJ (periplasmic binding) proteins (Figure 2a). All later genomes also encoded PQQ biosynthesis. Surprisingly, none of the genomes encoded the MxaF1-type (MDH1) methanol dehydrogenase, typically encountered in model methylotrophs 22.
Methylamine: All Binatota orders except UBA9968 encoded methylamine degradation capacity. The direct periplasmic route (methylamine dehydrogenase; mau) was more common, with mauA and mauB enzyme subunits encoded in the Binatales, HRBin30, UBA1149, UBA12105, and UTPRO1 (Figure 2a, Extended data 1). Amicyanin (encoded by mauC) is the most probable electron acceptor for methylamine dehydrogenase 22 (Figure 2a). On the other hand, one Bin18 genome, and two Binatales genomes (that also encode the mau cluster) encoded the full complement of genes for methylamine oxidation via the indirect glutamate pathway (Figure 2a, Extended data 1).
Methylated sulfur compounds: Binatota genomes encoded several enzymes involved in the degradation of dimethyl sulfone, methane sulfonic acid (MSA), and dimethyl sulfide (DMS). Nine genomes (two Bin18, and 7 Binatales) encoded dimethyl sulfone monooxygenase (sfnG) involved in the degradation of dimethyl sulfone to MSA with the concomitant release of formaldehyde. Three of these nine genomes also encoded alkane sulfonic acid monooxygenase (ssuD), which will further degrade the MSA to formaldehyde and sulfite. Degradation of DMS via DMS monooxygenase (dmoA) to formaldehyde and sulfide was encountered in 13 genomes (2 Bin18, 9 Binatales, and 2 UBA9968). Further, one Binatales genome encoded the dso system (EC: 1.14.13.245) for DMS oxidation to dimethyl sulfone, which could be further degraded to MSA as explained above (Figure 2a, Extended data 1).
Dihalogenated methane: One Bin18 genome encoded the specific dehalogenase/ glutathione S-transferase (dcmA) capable of converting dichloromethane to formaldehyde.
Methane: Genes encoding particulate methane monooxygenase (pMMO) were identified in orders Bin18 (2/2 genomes) and Binatales (9/48 genomes) (Figure 2a, Extended data 1), while genes encoding soluble methane monooxygenase (sMMO) were not found. A single copy of all three pMMO subunits (A, B, and C) was encountered in 9 of the 11 genomes, while two copies were identified in two genomes. pMMO subunit genes (A, B, and C) occurred as a contiguous unit in all genomes, with a CAB (5 genomes), and/or CAxB or CAxxB (8 genomes, where x is a hypothetical protein) organization, similar to the pMMO operon structure in methanotrophic Proteobacteria, Verrucomicrobia, and Candidatus Methylomirabilis (NC10) 23, 24, 25, 26 (Figure 2c) ). In addition, five of the above eleven genomes also encoded a pmoD subunit, recently suggested to be involved in facilitating the enzyme complex assembly, and/or in electron transfer to the enzyme’s active site 27, 28. Phylogenetic analysis of Binatota pmoA sequences revealed their affiliation with two distinct clades: the yet-uncultured Cluster 2 TUSC (Tropical Upland Soil Cluster) methanotrophs 29 (2 Binatales genomes), and a clade encompassing pmoA sequences from Actinobacteria (Nocardioides sp. strain CF8, Mycolicibacterium, and Rhodococcus) and SAR324 (Candidatus Lambdaproteobacteria) 30, 31 (Figure 2d). Previous studies have linked Cluster 2 TUSC pMMO-harboring organisms to methane oxidation based on selective enrichment on methane in microcosms derived from Lake Washington sediments 32. All Binatota genomes encoding TUSC-affiliated pMMO, also encoded genes for downstream methanol and formaldehyde oxidation as well as formaldehyde assimilation (see below), providing further evidence for their putative involvement in methane oxidation. On the other hand, studies on Nocardioides sp. strain CF8 demonstrated its capacity to oxidize short chain (C2-C4) hydrocarbons, but not methane, via its pMMO, and its genome lacked methanol dehydrogenase homologues 33. Such data favor a putative short chain hydrocarbon degradation function for organisms encoding this type of pMMO, although we note that five out of the nine Binatota genomes encoding SAR324/ Actinobacteria-affiliated pmoA sequences also encoded at least one methanol dehydrogenase homologue. Modeling pMMO subunits from both TUSC-type and Actinobacteria/SAR324-type Binatota genomes using Methylococcus capsulatus (Bath) 3D model (PDB ID: 3rbg) revealed a heterotrimeric structure (a3b3g3) with the 7, 2, and 5 alpha helices of the PmoA, PmoB, and PmoC subunits, respectively, as well as the beta sheets characteristic of PmoA, and PmoB subunits (Figure 2e). Modeling also predicted binding pockets of the dinuclear Cu ions and Zn ligands (Figure 2e).
2. Formaldehyde oxidation to CO2: Three different routes for formaldehyde oxidation to formate were identified (Figure 3). First, the Actinobacteria specific thiol-dependent formaldehyde dehydrogenase (fadh/mscR) (EC: 1.1.1.306) was, surprisingly, detected in the majority (96 out of 108) of genomes (Figure 3a, Extended data 1). The enzyme requires a specific thiol (mycothiol 34), the biosynthesis of which (encoded by mshABC gene cluster) was also encoded in Binatota genomes (Figure 3a). Second, the tetrahydrofolate (H4F)-linked pathway comprising the genes folD (encoding bifunctional methylene-H4F dehydrogenase and methenyl-H4F cyclohydrolase) and either ftfL (the reversible formyl-H4F ligase) or purU (the irreversible formyl-H4F hydrolase) was also widespread (98/108 genomes). Finally, 40 genomes (Bin18, Binatales, HRBin30, and UTPRO1) also encoded the single gene/enzyme NAD-linked glutathione-independent formaldehyde dehydrogenase fdhA. Surprisingly, no evidence of the most common formaldehyde oxidation pathway (tetrahydromethanopterin (H4MPT)-linked) was detected in any of the Binatota genomes. The NAD- and glutathione-dependent formaldehyde oxidation pathway was found incomplete: while homologs of formaldehyde dehydrogenase (frmA) were detected in almost all Binatota genomes, S-formylglutathione hydrolase (frmB) were absent. Following formaldehyde oxidation to formate, formate is subsequently oxidized to CO2 by one of many formate dehydrogenases. The majority of Binatota genomes (103/108) encoded at least one copy of the NAD-dependent formate dehydrogenase (EC: 1.17.1.9) (Figure 3a, Extended data 1).
3. Formaldehyde assimilation. Two pathways for formaldehyde assimilation by methylotrophs have been described: the serine cycle, which assimilates 2 formaldehyde molecules and 1 CO2 molecule, and the ribulose monophosphate cycle (RuMP), which assimilates 3 formaldehyde and no CO2 molecules. In addition, some methylotrophs assimilate carbon at the level of CO2 via the Calvin Benson Bassham (CBB) cycle 22. Homologs encoding the RuMP cycle-specific enzymes were missing from all Binatota genomes, and only three genomes belonging to the Binatales order encoded the CBB cycle enzymes phosphoribulokinase and . On the other hand, genes encoding enzymes of the serine cycle (Figure 3b) were identified in all genomes (Figure 3c, Extended data 1), with the key enzymes that synthesize and cleave malyl-CoA (mtkA/B [EC 6.2.1.9] malate-CoA ligase, and mcl [EC 4.1.3.24] malyl-CoA lyase, respectively) encountered in 98, and 86 Binatota genomes, respectively (Figure 3c, Extended data 1). The entry point of CO2 to the serine cycle is the phosphoenolpyruvate (PEP) carboxylase (ppc) step catalyzing the carboxylation of PEP to oxaloacetate (Figure 3b). Homologues of ppc were missing from most Binatota genomes. Instead, all genomes encoded PEP carboxykinase (pckA) that replaces ppc function as shown in methylotrophic mycobacteria 35 (Figure 3b-c, Extended data 1).
During the serine cycle, regeneration of glyoxylate from acetyl-CoA is needed to restore glycine and close the cycle. Glyoxylate regeneration can be realized either through the classic glyoxylate shunt 36, or the ethylmalonyl-CoA pathway (EMCP) 37 (Figure 3b). All Binatota genomes exhibited the capacity for glyoxylate regeneration, but the pathway employed appears to be order-specific. Genes encoding all EMCP pathway enzymes were identified in genomes belonging to the orders Bin18, Binatales, HRBin30, UBA1149, and UBA12105 (Figure 3c, Extended data 1), including the two EMCP-specific enzymes ethylmalonyl-CoA mutase (ecm) and crotonyl-CoA reductase/carboxylase (ccr). On the other hand, order UBA9968 genomes lacked EMCP-specific enzymes but encoded the classic glyoxylate shunt enzymes isocitrate lyase (aceA) and malate synthase (aceB) (Figure 3c, Extended data 1).
Alkane degradation
Besides methylotrophy and methanotrophy, Binatota genomes exhibited extensive short-, medium-, and long-chain alkanes degradation capabilities. In addition to the putative capacity of Actinobacteria/SAR324-affiliated pMMO to oxidize C1-C5 alkanes, and C1-C4 alkenes as described above, some Binatota genomes encoded propane-2-monoxygenase (prmABC), an enzyme mediating propane hydroxylation in the 2-position yielding isopropanol. Several genomes, also encoded medium chain-specific alkane hydroxylases, e.g. homologues of the nonheme iron alkB 38 and Cyp153-class alkane hydroxylases 39. The genomes also encoded multiple long-chain specific alkane monooxygenase, e.g. ladA homologues (EC:1.14.14.28) 40 (Figure 4a, Extended data 1). Finally, Binatota genomes encoded the capacity to metabolize medium-chain haloalkane substrates. All genomes encoded dhaA (haloalkane dehalogenases [EC:3.8.1.5]) known to have a broad substrate specificity for medium chain length (C3 to C10) mono-, and dihaloalkanes, resulting in the production of their corresponding primary alcohol, and haloalcohols, respectively 41 (Figure 4a, Extended data 1).
Alcohol and aldehyde dehydrogenases sequentially oxidize the resulting alcohols to their corresponding fatty acids or fatty acyl-CoA. Binatota genomes encode a plethora of alcohol and aldehyde dehydrogenases. These include the wide substrate range alcohol (EC:1.1.1.1), and aldehyde (EC:1.2.1.3) dehydrogenases encoded by the majority of Binatota genomes, as well as bifunctional alcohol/aldehyde dehydrogenase (EC:1.2.1.10 /1.1.1.1) encoded by a few Binatota genomes (7 genomes), and some highly specific enzymes, e.g. the short-chain isopropanol dehydrogenase (EC:1.1.1.80) for converting isopropanol and other secondary alcohols to the corresponding ketone (20 genomes), and acetone monooxygenase (acmA, EC:1.14.13.226) and methyl acetate hydrolase (acmB, EC:3.1.1.114) that will sequentially oxidize acetone to methanol and acetate (6 genomes) (Figure 4a, Extended data 1).
A Complete fatty acid degradation machinery that enables all orders of the Binatota to degrade short-, medium-, and long-chain fatty acids to acetyl CoA and propionyl-CoA were identified (Figure 4b, Extended data 1). Acetyl-CoA produced from the beta-oxidation pathway could be assimilated via the ethylmalonyl CoA pathway (EMCP) or the glyoxylate shunt as discussed above. Further, two pathways for propionyl-CoA assimilation, generated from the degradation of odd chain fatty acids, were identified (Figure 4c). Orders Bin18, Binatales, UBA1149, UBA12105, and UTPRO1 all encode enzymes for the methylmalonyl CoA (MMCoA) pathway that carboxylates propionyl CoA to succinyl-CoA (TCA cycle intermediate) via a methylmalonyl-CoA intermediate. On the other hand, the majority of order UBA9968 genomes encode enzymes of the 2-methylcitrate cycle for propionyl-CoA degradation (prpBCD) where propionate is degraded to pyruvate and succinate via a 2-methylcitrate intermediate (Figures 4b-c, Extended data 1).
Electron transport chain
All Binatota genomes encode an aerobic respiratory chain comprising complexes I, II, and IV, as well as an F-type H+-translocating ATP synthase (Figures 5a-b, Extended data 1). Interestingly, genes encoding complex III (cytochrome bc1 complex) were sparse in Binatota genomes with some orders lacking genes encoding all subunits (e.g. HRBin30) and others only encoding the Fe-S (ISP) and the cytochrome b (cytB) but not the cytochrome c1 (cyt1) subunit (e.g. Binatales, UBA1149). Instead, genes encoding an Alternate Complex III (ACIII, encoded by actABCDEFG) were identified in 76 genomes, with 12 genomes encoding both complete complexes (in orders Bin 18, UBA9968, and UTPRO1). Complex III and ACIII transfer electrons from reduced quinones (all genomes encode the capability of menaquinone biosynthesis) to cytochrome c which, in turn, reduces cytochrome c oxidase (complex IV). Homologues of the electron transfer proteins belonging to cytochrome c families were rare in Binatota genomes, especially those encoding ACIII (Figure 5a, Extended data 1). However, the recent structure of ACIII from Flavobacterium johnsoniae 42 in a supercomplex with cytochrome c oxidase aa3 suggests that electrons could potentially flow from ACIII to complex IV without the need for cytochrome c, which might explain the paucity of cytochrome c homologues in ACIII-harboring genomes.
Based on the predicted ETC structure, the flow of electrons under different growth conditions in the Binatota could be envisaged (Figure 5b). When growing on methane, pMMO would be coupled to the electron transport chain at complex III level via the quinone pool, where reduced quinones would act as physiological reductant of the enzyme 43 (Figure 5b). pMMO was also previously reported to receive electrons donated by NADH 44. During methanol oxidation by periplasmic enzymes (e.g. xoxF-type methanol dehydrogenases), and methylamine oxidation by the periplasmic methylamine dehydrogenase (mauAB) electrons would be shuttled via their respective C-type cytochrome (xoxG, and mauC, respectively) to complex IV. In the cytosol, methanol oxidation via the mno/mdo-type or the mdh2-type methanol dehydrogenases, as well as formaldehyde and formate oxidation via the action of cytoplasmic formaldehyde and formate dehydrogenases would contribute NADH to the aerobic respiratory chain through complex I. Similarly, when growing heterotrophically on alkanes and/or fatty acids, reducing equivalents in the form of NAD(P)H, and FADH2 serve as electron donors for aerobic respiration through complex I, and II, respectively (Figure 5b).
Binatota genomes also encode respiratory O2-tolerant H2-uptake [NiFe] hydrogenases, belonging to groups 1c (6 sequences), 1f (22 sequences), 1i (1 sequence), and 1h (4 sequences) (Figure 5c). In E. coli, these membrane-bound periplasmically oriented hydrogenases transfer electrons (through their cytochrome b subunit) from molecular hydrogen to the quinone pool. Cytochrome bd oxidase (complex IV) then completes this short respiratory electron transport chain between H2 and O2 45. In E. coli, the enzyme functions under anaerobic conditions 46, and may function as an O2-protecting mechanism 47. Further, simultaneous oxidation of hydrogen (via type I respiratory O2-tolerant hydrogenases) and methane (via pMMO) has been shown to occur in methanotrophic Verrucomicrobia to maximize proton-motive force generation and subsequent ATP production 48. As well, some of the reduced quinones generated through H2 oxidation are thought to provide reducing power for catalysis by pMMO 48 (Figure 5b).
Pigment production genes in the Binatota.
Carotenoids. Analysis of the Binatota genomes demonstrated a wide range of hydrocarbon (carotenes) and oxygenated (xanthophyll) carotenoid biosynthesis capabilities. Carotenoids biosynthetic machinery in the Binatota included crtB for 15-cis-phyotene synthesis from geranylgeranyl-PP; crtI, crtP, crtQ, and crtH for neurosporene and all-trans lycopene formation from 15-cis-phytone; crtY or crtL for gamma- and beta-carotene formation from all-trans lycopene; and a wide range of genes encoding enzymes for the conversion of neurosporene to spheroidene and 7,8-dihydro β-carotene, as well as the conversion of all-trans lycopene to spirilloxanthin, gamma-carotene to hydroxy-chlorobactene glucoside ester and hydroxy-Ɣ-carotene glucoside ester, and beta carotene to isorenieratene and zeaxanthins (Figures 6a-b, Extended data 1). Gene distribution pattern (Figure 6a, Extended data 1) predicts that all Binatota orders are capable of neurosporene and all-trans lycopene biosynthesis, and all but the order HRBin30 are capable of isorenieratene, zeaxanthin, β-carotene and dihydro β-carotene biosynthesis, and with specialization of order UTPRO1 in spirilloxanthin, spheroidene, hydroxy-chlorobactene, and hydroxy Ɣ-carotene biosynthesis.
Bacteriochlorophylls. Surprisingly, homologues of multiple genes involved in bacteriochlorophyll biosynthesis were ubiquitous in Binatota genomes (Figure 7a-c). Bacteriochlorophyll biosynthesis starts with the formation of chlorophyllide a from protoporphyrin IX (Figure 7b). Within this pathway, genes encoding the first bchI (Mg-chelatase [EC:6.6.1.1]), third bchE (magnesium-protoporphyrin IX monomethyl ester cyclase [EC:1.21.98.3]), and fourth bchLNB (3,8-divinyl protochlorophyllide reductase [EC:1.3.7.7]) steps were identified in the Binatota genomes (Figures 7a, 7b, Extended data 1). However, homologues of genes encoding the second bchM (magnesium-protoporphyrin O-methyltransferase [EC:2.1.1.11]), and the fifth (bciA or bicB (3,8-divinyl protochlorophyllide a 8-vinyl-reductase), or bchXYZ (chlorophyllide a reductase, EC 1.3.7.15])) steps were absent (Figure 7a-b). A similar patchy distribution was observed in the pathway for bacteriochlorophyll a (Bchl a) formation from chlorophyllide a (Figure 7b), where genes encoding bchXYZ (chlorophyllide a reductase [EC 1.3.7.15]) and bchF (chlorophyllide a 31-hydratase [EC 4.2.1.165]) were not identified, while genes encoding bchC (bacteriochlorophyllide a dehydrogenase [EC 1.1.1.396]), bchG (bacteriochlorophyll a synthase [EC:2.5.1.133]), and bchP (geranylgeranyl-bacteriochlorophyllide a reductase [EC 1.3.1.111)) were present in most genomes (Figure 7a, Extended data 1). Finally, within the pathway for bacteriochlorophylls c (Bchl c) and d (Bchl d) formation from chlorophyllide a (Figure 7b), genes for bciC (chlorophyllide a hydrolase [EC:3.1.1.100]), and bchF (chlorophyllide a 31-hydratase [EC:4.2.1.165]) or bchV (3-vinyl bacteriochlorophyllide hydratase [EC:4.2.1.169] were not identified, while genes for bchR (bacteriochlorophyllide d C-12(1)-methyltransferase [EC:2.1.1.331]), bchQ (bacteriochlorophyllide d C-8(2)-methyltransferase [EC:2.1.1.332]), bchU (bacteriochlorophyllide d C-20 methyltransferase [EC:2.1.1.333]), and bchK (bacteriochlorophyll c synthase [EC:2.5.1.-]) were identified (Figure 7b, Extended data 1).
Ecological distribution of the Binatota.
A total of 1,889 (GenBank nt) and 1,213 (IMG/M) 16S rRNA genes affiliated with the Binatota orders were identified (Extended data 2 and 3, Figures 8, S1a). Analyzing their environmental distribution showed preference of Binatota to terrestrial soil habitats (39.5-83.0% of GenBank, 31.7-91.6% of IMG/M 16S rRNA gene sequences in various orders), as well as plant-associated (particularly rhizosphere) environments; although this could partly be attributed to sampling bias of these globally distributed and immensely important ecosystems (Figure 8a). On the other hand, a paucity of Binatota-affiliated sequences was observed in marine settings, with sequences absent or minimally present for Binatales, HRBin30, UBA9968, and UTPRO1 datasets (Figure 8a). The majority of sequences from marine origin were sediment-associated, being encountered in hydrothermal vents, deep marine sediments, and coastal sediments, with only the Bin18 sequences sampled from IMG/M showing representation in the vast, relatively well-sampled pelagic waters (Figure 8d).
In addition to phylum-wide patterns, order-specific environmental preferences were also observed. For example, in order Bin18, one of the two available genomes originated from the Mediterranean sponge Aplysina aerophoba. Analysis of the 16S rRNA dataset suggests a notable association between Bin18 and sponges, with a relatively high host-associated sequences (Figure 8a), the majority of which (58.3% NCBI-nt, 25.0% IMG/M) were recovered from the Porifera microbiome (Figures 8e, S1f). Bin18-affiliated 16S rRNA gene sequences were identified in a wide range of sponges from ten genera and five global habitat ranges (the Mediterranean genera Ircinia, Petrosia, Chondrosia, and Aplysina, the Caribbean genera Agelas, Xestospongia, and Aaptos, the Indo-West Pacific genus Theonella, the Pacific Dysideidae family, and the Great Barrier Reef genus Rhopaloeides), suggesting its widespread distribution beyond a single sponge species. The absolute majority of order Binatales sequences (83.0% NCBI-nt, 91.6% IMG/M) were of a terrestrial origin (Figures 8a, S1c), in addition to multiple rhizosphere-associated samples (7.5% NCBI-nt and 2.8% IMG/M, respectively) (Figure 8a, S1f). Notably, a relatively large proportion of Binatales soil sequences originated either from wetlands (peats, bogs) or forest soils (Figures 8b, S1c), strongly suggesting the preference of the order Binatales to acidic and organic/methane-rich terrestrial habitats. This corresponds with the fact that 42 out of 48 Binatales genomes were recovered from soil, 38 of which were from acidic wetland or forest soils (Figure 1, Table S1). Genomes of UBA9968 were recovered from a wide range of terrestrial and non-marine aquatic environments, and the observed 16S rRNA gene distribution enforces their ubiquity in all but marine habitats (Figures 8a, S1b-g). Finally, while genomes from orders HRBin30, UBA1149 and UTPRO1 were recovered from limited environmental settings (thermal springs for HRBin30, gaseous hydrocarbon impacted habitats, e.g. marine hydrothermal vents and gas-saturated Lake Kivu for UBA1149, and soil and hydrothermal environments for UTPRO1) (Figure 1, Table S1), 16S rRNA gene analysis suggested their presence in a wide range of environments from each macro-scale environment classification (Figures 8a, S1b-g).