Population genomes (PGs) from the Amazon River
Our dataset contained 106 Amazon River metagenomes from 30 different stations divided into 5 sections: upstream, downstream, estuary, plume and ocean (Fig. 1a). We generated 30 co-assemblies (Supplementary Table S1) and after binning, 54 high-quality PGs were selected featuring ≤500 contigs with half of them having >10 Kbp, completeness >50%, contamination <10%, and quality ≥50% (quality=completeness – 5 x contamination) [Supplementary Fig. S1]. PGs featuring >99.5% Average Amino-Acid Identity (AAI) were considered redundant and removed (Supplementary Table S3). A total of 51 non-redundant PGs were kept in our dataset, including 49% (25) high-quality and 51% (26) medium-quality PGs (Table 1 and Supplementary Table S4) according to previously used criteria . PGs ranged in size between 0.5 and 7.9 Mbp, the maximum contamination was 6.8%, with ~53% of the PGs displaying a contamination <1% (Supplementary Table S4). In total, we recovered 25 PGs from the upstream section, 9 from downstream, 6 from the estuary, 9 from the plume and 2 from the ocean (Fig. 1a-b). Proteobacteria (39% of PGs, considering Alpha-, Beta- and Gamma- subdivisions as well as PGs classified as Proteobacteria), Bacteroidetes (15.7%) and non-classified bacteria (15.7%) predominated (Fig. 1b). In contrast, Cyanobacteria represented only 4% of the recovered PGs (Fig. 1b). Only two archaeal genomes were retrieved: one belonging to Thaumarchaeota (from the Upstream section) and another one from Euryarchaeota (from the Plume). PG distribution and abundance along the river was heterogeneous (Fig. 2).
We identified 10 PGs that had a high similarity (>97% ANI) and non-significant (p > 0.05) differences to other known genomes (Supplementary Table S5): Richelia intracellularis-A with AM_2804, Trueperella pyogenes with AM_0546, Acinetobacter junii with AM_0608, Methylopumilus sp1 with AM_0507 / AM_0219, Coccinistipes sp. with AM_2208, Sphingobium sp2 with AM_1603, the taxonomically unannotated UBA11236 with AM_1606, Xanthomonas fuscans with AM_0519 as well as a few characterized species from Rokubacteria, GWA2-73-35 sp1 with AM_2207. Other PGs from the TARA-Oceans expedition  or freshwater environments [24, 30, 31] did not display high similarity to the Amazon PGs. Thus, ~80% of the Amazon PGs had no close genomic relative in databases or published datasets. The lowest taxonomic rank of these PGs was: Kingdom, 14% of PGs; Phylum, 14%; Class, 8%; Order, 16%; Family, 26%; and Genus 4%. The remaining 18% PGs could not be taxonomically assigned.
Cellulose and lignin oxidation
Terrestrial organic matter (TeOM) degradation is a fundamental process in the Amazon River and happens in two steps that are modulated by microbes: first, lignin oxidation mediated by laccases, and second, cellulose degradation mediated by specific glycosyl hydrolase (GH) families. Only 24 PGs out of the recovered 51 were able to degrade TeOM (Fig. 3). Laccases were present in all taxa, except Bacteroidetes. All PGs having laccases also had GHs, suggesting that the systems of hemi-/cellulose degradation and lignin oxidation are coupled. Furthermore, there were few cellulolytic PGs (~20%) that did not have lignin oxidation potential, pointing to two assemblages, one that besides being cellulolytic is also lignolytic, and another one that performs only cellulose degradation. Overall, the PGs with the highest potential for TeOM degradation were AM_0519 (Xanthomonas fuscans), AM_0876 / AM_0936 (both unclassified bacteria), and AM_1603 (Sphingobium sp2), according to our criterion of having a minimum of two protein families related to TeOM degradation, with at least two different genes.
Decoupling lignin-oxidation by-products from TeOM degradation
After lignin oxidation, small aromatic compounds are formed and need to be internalized into the cell via transmembrane transporters to complete lignin degradation. Among the PGs having transporters for lignin oxidation by-products (Supplementary Table S6) only two of them (AM_0630 and AM_0902) were also lignin oxidizers. Thus, the oxidation of lignin performed by lignolytic assemblages seems to be completed by cellulolytic microbes that degrade aromatic by-products.
PGs were analysed also for genes required to process aromatic compounds produced after lignin oxidation (Supplementary Table S7). Only two PGs (AM_0519 and AM_1603) seemed able to both degrade lignin-derived aromatic compounds and oxidize lignin. The PGs potentially able to degrade mono-/di-aryls derived from lignin did not possess genes for cellulose degradation or lignin oxidation. Therefore, there is an apparent decoupling of functions related to the oxidation of cellulose, lignin and the processing of by-products of lignin oxidation. This suggests that different microbial assemblages specialize in each step of the TeOM degradation process (that is, lignin oxidation, degradation of by-products generated by lignin oxidation, and cellulose oxidation).
Alternative carbon sources and carbon storage
TeOM degradation involves the formation of glucose (from cellulose hydrolysis) and various aromatic compounds (from oxidation of lignin and its derivatives); all viable carbon sources. Microorganisms tend to prefer specific carbon sources, like sugars, and in their absence, they metabolize other compounds, such as citrate, to obtain energy and structural carbon. Compounds that are metabolized only in the absence of the preferred carbon sources, such as glucose, are called alternative carbon sources. For an effective carbon flux in aquatic environments, transporter systems present in microbes are crucial to ensure that alternative carbon sources can be used, such as tricarboxylates, mono- and di-aryls generated during lignin oxidation. In the Amazon River, there are two main carbon contributors: The TeOM and the less complex compounds, such as humic acids and tricarboxylates. In particular, tricarboxylates are good examples of alternative carbon sources, being constituted by molecules containing three carboxyl functional groups (-COOH), e.g. citrate. Tripartite tricarboxylate transporters (TTT) use substrate binding proteins to sequestrate their ligands from the extracellular milieu and to import them into the cytoplasm (Fig. 4a).
Only seven PGs appeared to use tricarboxylates via the TTT system (Fig. 4b). The PGs containing the complete TTT system included Alphaproteobacteria (AM_0275) as well as Betaproteobacteria, mainly from the Burkholderiales family. One important characteristic of the TTT system is the specificity of each substrate-binding protein to a certain substrate (Fig. 4a). This promotes a high diversity of tctC genes, which were found to range from tens to hundreds across PGs (Fig. 4b). In contrast, <10 genes appeared to be needed for the membrane attached parts (tctA and tctB) of this system (Fig. 4b). PGs containing a complete TTT system seem uncapable of TeOM degradation, except for AM_0630, a Burkholderiales member containing laccase and GH8 genes. Interestingly, all PGs containing the TTT system (except AM_0630 and AM_0233) also had the biochemical machinery to process aromatic compounds derived from lignin oxidation.
Bacteria have developed impressive mechanisms to cope with adversity. Fluctuations in the water level, change in the concentration of nutrients and seasonality, are common disturbances in the Amazon river. The production and intracellular accumulation of nutritive polymers, later used to prevent starvation during unfavourable conditions, represent an important trait in multiple microbes. Specific mechanisms, such as carbon storage, are relevant also to understand the flux of carbon in ecosystems. One of the most important carbon storage systems is the polyhydroxy-butyrate (PHB) metabolism performed by a few enzymes (Fig. 4c). PHB biosynthesis enzymes were searched in PGs to evaluate their potential to store carbon via this polymer (Fig. 4d). Almost all PGs displaying the complete PHB pathway (phaA-C) (Fig. 4d) included also the TTT system, except for AM_0528 and AM_1603, which were found to be TeOM degraders and did not have the TTT system. Yet, the largest number of genes related to the PHB pathway were found in the TeOM degrader PG AM_1603, a Sphingobium. The largest gene diversity was related to the initial steps of PHB biosynthesis (genes phaA and phaB), which are not crucial for PHB production as they perform non-specific transformations, but ensure monomer availability. However, a few gene variants encoded the last steps performed by the phaC gene (Fig. 4d), which is the last and crucial step for PHB formation. The gene phaR, a transcription regulator protein also related to the accumulation of PHB, was present in 7 out of 13 PGs presumed to produce PHB (Fig. 4d). Only AM_1111 seemed to produce other polymers different than PHB, the polyhydroxy-alkanoate/butyrate, as it contains the phaE gene that allows this species to produce alternative monomers (Fig. 4d).