Genome-scale Co-occurrence and Co-transcription Networks Reveal Key Microbial Taxa in Mangrove Sediments


 BackgroundMangroves are highly productive ecosystems, with one of the highest microbial diversities among all ecosystems. The concerted activity of microbial community in mangrove sediment mediates element cycling, but the underpinning mechanism of microbial synergy remains unknown. ResultsHere, we reconstructed 671 strain-resolved metagenome-assembled genomes (MAGs) from three mangrove and two mudflat sediments in Mai Po Nature Reserve. We then inferred the genome-scale co-occurrence and co-transcription networks based on metabolic capacity and transcriptional activity of the carbon, nitrogen, and sulfur cycles. We observed that the centrality was significantly higher in co-transcription networks than in co-occurrence networks, indicating that MAGs had stronger interrelationships when transcriptionally active. Further, we classified 57 microbes with low relative abundance (0.01–0.79%) as keystone taxa, which play key roles in the maintenance of co-transcription network structure, and participate in carbon transformations, denitrification, and sulfate reduction processes. One of the keystone taxa is a newly proposed deltaproteobacterial order, Candidatus Mangrovidesulfobacterales, capable of dissimilatory sulfate reduction and an anaerobic mixotrophic lifestyle. These findings highlight the ecological importance of rare species. ConclusionsCollectively, this first screening of the potential keystone taxa in mangrove ecosystem based on genome-scale transcriptomic analysis revealed unique microbial functional assemblages, shedding light on microbial synergism in this ecosystem.

function [15], providing systematic insight into the microbial community structure and assembly patterns [16,17]. Computational removal of a keystone taxon from a network of co-occurrence of community members causes a dramatic shift in the composition of human and plant microbiome [18,19], highlighting its importance to the ecosystem function. While keystone taxa have been characterized in diverse terrestrial, aquatic, and human microbiomes [14,20,21], they have not yet been identi ed in mangroves.
The keystone taxa can be identi ed by OTU-based network analysis [14]; however, the resolution of the current OTU-based microbial co-occurrence network analysis is limited [22]. Consequently, it is not possible to distinguish strains with nearly identical 16S rRNA sequences but different functions in an ecosystem [22]. Genome-scale resolution of the metabolic potential of microorganisms in an ecosystem, not requiring laboratory isolation, could help to understand their relationships better and establish accurate strain-function linkages [23][24][25]. Further, the integration of a genome-scale network analysis based on metagenomics with metatranscriptomics could yield detailed insights into microbial synergism. To the best of our knowledge, no comprehensive reports on genome-resolved metatranscriptomics for microbial community in mangrove wetlands have been published to date, and the transcriptionally active interrelationships between different microbial functional groups remain elusive.
To address the above gaps, we performed in-depth shotgun DNA and RNA sequencing of ve sediment samples from two sites of mangrove wetlands sites (vegetated mangrove and unvegetated mud at) in the Mai Po Nature Reserve (Hong Kong SAR, China). We then used genome-scale metagenomics and metatranscriptomics analyses to recover high-quality genomes, reconstruct metabolic activities of major microbes in the mangrove sediments, and establish genome-scale co-occurrence and co-transcription networks to capture the keystone taxa. The presented investigation probes the microbial synergy mechanism from a genome-scale community-level view, and supports the notion that inter-microorganism interactions are vital modulators of the biogeochemical cycles of C, nitrogen (N), and sulfur (S) in mangrove ecosystems.

Results And Discussion
Reconstruction of metagenome-assembled genomes (MAGs) and key active metabolic modules in the mangrove sediments.To enable a genome-scale cooccurrence and co-transcription network analysis, we sampled the mangrove and intertidal mud at sediments in Mai Po Nature Reserve (Hong Kong) at depth intervals of 0-2 cm, 10-15 cm, and 20-25 cm for the former (metagenomes MP7, MP8, and MP9), and at 0-5 cm and 13-16 cm for the latter (metagenomes MP10 and MP11). We reconstructed 671 curated, non-redundant MAGs, 67 archaeal and 604 bacterial MAGs, representing 7 archaeal and 43 bacterial phylum-level lineages from these ve metagenomes. Overall, we mapped 0.24 billion reads (14.77% of the ve metagenomes) to the MAGs; 338MAGs were estimated to be ≥ 70% complete, 303 were estimated to be ≥ 80% complete, and 119were estimated to be ≥ 90% complete, with minimal domain-speci c single-copy marker gene duplications (SupplementaryTable 1).We also assembled the reads mapped to the SILVA132 database and calculated the abundance of these assembled 16S rRNA gene sequences. We compared the normalized genome abundance (genome copies per million reads) with normalized abundances of the assembled 16S rRNA gene sequences at the phylum level. The R 2 (linear regression) for the correlation in each sample was higher than 0.84 (SupplementaryFig. 1), showing that the two parameters were highly correlated. This indicated that the reconstructed MAGs could represent the microbial community in these samples.
The majority of bacterial MAGs belonged to the lineages Gammaproteobacteria (n = 97), Chloro exota (n = 78), Desulfobacterota (n = 77), Acidobacteriota (n = 63), and Bacteroidota (n = 55). The archaeal phyla of Asgard archaea (n = 18), Crenarchaeota (n = 15), Nanoarchaeota (n = 13), Euryarchaeota (n = 12), and Micrarchaeota (n = 7) were also identi ed. We pro led all Prodigal-predicted gene sequences from the MAGs using various databases (KEGG, TIGRfam, Pfam, and custom HMM databases) to determine their genome-scale metabolic capacities and transcript activities associated with C, N, and S metabolic pathways ( Fig. 1, Supplementary Fig. 2). We considered the genome-resolved phylum/class level to establish accurate linkages between the microbial lineages and their biogeochemical activities ( Supplementary Fig. 3). Gammaproteobacteria and Desulfobacterota were the two dominant lineages with the most transcripts in mangrove and mud at sediments. Gammaproteobacteria were substantially active in various cycles of C, N (e.g. denitri cation), and S (e.g. sulfate reduction) in the surface sediment. They were also prominently active in C xation, denitri cation, and methane oxidation in the subsurface layer. Desulfobacterota were mainly responsible for sul te reduction to sul de, both in the surface and subsurface sediments. Similarly, a recent functional gene-based metagenomics study revealed Gammaproteobacteria and Desulfobacterota as predominant microbial groups, implicated in the coupling of C, N, and S cycles [26].
Genome-scale co-occurrence and co-transcription networks of the C, N, and S metabolic modules in the mangrove sediments.We inferred genome-scale cotranscription networks based on signi cant correlations (Spearman's rank correlation, P < 0.001 and R 2 > 0.8) between the abundances of C, N, and S functional transcripts in pairwise MAGs. We then compared them with genome-scale co-occurrence networks inthe ve sediment samples (Fig. 2a, 2b). The undirected networks captured 32,305 co-occurrence and 23,888 co-transcription associations among 664 microbial MAGs (Fig. 2, Table 1). We then merged and collapsed the networks at phylum/class level to determine whether these microbial lineages were interconnected ( Supplementary Fig. 4). Consequently, 55.02% of network edges were identi ed as two microbial groups with signi cant associations of both metabolic capacity and transcriptional activity, while 31.39% and 13.59% of edges indicated paired microorganisms with signi cant connections of either type, accordingly. This indicated that more than half of the microbial lineages in the sampled ecosystem are metabolically and transcriptionally interconnected.
All generated co-occurrence networks exhibited scale-free characteristics, as indicated by the R 2 of power-law distribution of degree centrality, which ranged from 0.67 to 0.75 (Table 1); by contrast, the co-transcription networks were not scale-free (R 2 of power-law from 0.06 to 0.26, Table 1). These metrics indicated that the co-occurrence networks were nonrandomly structured, implying that the C, N, and S metabolic capacities of the microbiome responded differentially to the changing environment [21]. However, all the co-transcription network structures appeared to be random. It is reasonable to expect that the co-transcription network is a snapshot of microbiome activity, illustrating a response to changes in the environment. The shape of the degree distribution may affect the stability of architecture among different types of networks (e.g. biological, physical, and social networks) [27,28]; however, the universality of the scale-free network remains controversial [29,30]. Implementation of this analytical framework to time-series data could provide additional insight for understanding the unique characteristics, origins, evolutionary mechanisms, and dynamics of genome-scale co-occurrence and co-transcription networks.
We then examined whether the co-occurrence and co-transcription networks had unique topological features at MAG level. We focused on the betweenness centrality, which determines the number of shortest paths that pass through a given MAG, as a proxy for the location of the MAG in relation to others [28,31,32].
High betweenness centrality values indicate a core location of the MAG in the network, whereas low betweenness centrality values indicate a peripheral location. We observed signi cantly higher betweenness centrality values in the co-transcription networks than in the co-occurrence networks (P = 4.7e-5, Wilcox rank-sum test, Fig. 3, Supplementary Fig. 5a). The degree centrality value, an additional topological feature that takes into account only the immediate MAG neighbourhood, exhibited a similar signi cant tendency (Fig. 3, Supplementary Fig. 5b). Partitioning the MAGs using different grouping strategies did not result in signi cant differences in the betweenness centrality values of the co-occurrence networks in mangrove and mud at sediments (P = 0.3, Wilcox ranksum test, Supplementary Fig. 5c, 5d); however, the betweenness centrality value of the co-transcription network in mangrove sediments was signi cantly higher than that in mud at (P = 0.0087, Wilcox rank-sum test, Supplementary Fig. 5e). Within mangrove sediments, the betweenness centrality value of cotranscription networks was signi cantly higher in the subsurface layer than in surface sediments (P = 1.1e-13, Wilcox rank-sum test, Supplementary Fig. 5i), with no signi cant difference observed in the co-occurrence networks (P = 0.93, Wilcox rank-sum test, Supplementary Fig. 5g). On the other hand, there was no signi cant difference between surface and subsurface mud at sediments in terms of betweenness centrality values of the co-occurrence (P = 0.054, Wilcox rank-sum test, Supplementary Fig. 5k) and co-transcription networks (P = 0.036, Wilcox rank-sum test, Supplementary Fig. 5m). These ndings indicated that microorganisms in the subsurface mangrove sediments had stronger interrelationships and were more actively interconnected than those in the mangrove surface and mud at sediments.
Putative keystone taxa and their transcriptional abundances of C, N, and S metabolic modules in the mangrove sediments.To identify putative keystone taxa in the networks, we rst separated the co-occurrence and co-transcription networks into modules, i.e. groups of MAGs that are highly connected within a group, with few connections outside the group [33], and are usually considered functional units in a biological system [34]. The ve co-occurrence networks and ve co-transcription networks inspected were modular, with a signi cantly higher modularity (M) than those of the corresponding randomly rewired networks (Table 1). We then classi ed MAGs into four categories based on their within-module connectivity (Zi) and among-module connectivity (Pi) values [28,31,32], as peripherals, connectors, module hubs, and network hubs ( Supplementary Fig. 6). We considered both hubs and connectors as the putative keystone taxa in both co-occurrence and co-transcription networks. Overall, 443 MAGs were peripherals in the networks (Fig. 2). Myxobacterium MP_1.1789 was the only identi ed network hub in the co-occurrence network of surface mud at sediment (MP10); it acted as a connector in the co-occurrence network of surface mangrove sediment (MP7). We identi ed more module hubs in the co-occurrence networks of subsurface sediments (12 ± 1) than in those of surface sediments (4 ± 1). Further, except for six MAGs ( ve Desulfobacterota and one Methylomirabilota) in a surface mangrove sediment (MP7), we did not identify any module hubs in the co-transcription networks. These six module hubs were likely anaerobic autotrophs, based on the highly transcribed pathways of C xation and sulfate reduction to sul de (Fig. 2c).
We identi ed 57 MAGs as active keystone taxa across the ve co-transcription networks, among which Desulfobacterota (n = 17), Chloro exota (n = 9), and Acidobacteriota (n = 7) were the dominant microbial phyla. To the best of our knowledge, to date, Desulfobacteria had not been identi ed as a keystone taxon, either by computational inference or based on empirical evidence [14]. Our nding may be explained by the uniqueness of the mangrove ecosystem, since no other studies have focused on the keystone taxa in this biome. By contrast, Chloro exota was reported as a keystone taxon in a groundwater hyporheic zone [35], whereas Acidobacteriota is commonly identi ed as a keystone taxon in various terrestrial biomes, e.g. woodlands [36,37], agricultural lands [37], grasslands [31,38], and plant-associated microbiota [39,40], but not in aquatic environments [14]. We also identi ed other bacterial groups, e.g. Burkholderiales and Rhizobiales, all reportedly prevalent in various terrestrial and aquatic ecosystems, as active keystone taxa in the analyzed samples (three Burkholderiales MAGs and one Rhizobiales MAG). Interestingly, we identi ed two Asgard archaeal MAGs (Thorarchaeotal MP_init.2226 and MP_5.633 in MP8) and one Euryarchaeotal MAG (Thermoplasmata, MP_5.1569 in MP9) as connectors in the co-transcription networks of subsurface mangrove sediments. The two Asgard archaea were transcriptionally active with respect to C xation and fermentation, and Thermoplasmata were active in C xation and sulfate reduction to sul te. This was in agreement with our earlier studies where we reinforced the importance of these archaeal lineages in mangrove sediments [7,9,41]. It also highlighted the ecological roles of archaeal groups in biogeochemical cycles, which are frequently underestimated, in comparison with bacteria, in metacommunity level investigations.
The relative genome abundances of the identi ed active keystone taxa ranged from less than 0.01% to 0.79% (Supplementary Table 2). Considering such low abundance, they thus could represent the 'rare biosphere' [42], which has a disproportionally high role in biogeochemical cycles [43]. The fundamental premise of keystone taxa is congruent with the rare taxa concept in that the abundance of a species is not the best determinant of its contribution to the community [42]. The roles of rare microorganisms in biogeochemical cycles have been investigated in the past [36]. As an example, simultaneous phylogenetic identi cation and quantitation of metabolic activities of single microbial cells in an oligotrophic lake revealed that Chromatium okenii, representing 0.3% of the total cell number, contributes more than 70% of the total uptake of C in the system [44]. Therefore, the relatively less abundant taxa can be as important as or more important than the abundant taxa in maintaining the microbial networks [32].
The analysis presented above adds another dimension to the commonly used gene-based co-occurrence network analysis and can be integrated with empirical evidence to identify microbial keystone taxa. Nevertheless, ecophysiological data, e.g. generated by culturomics [45] or microbiome-on-a-chip [46], are needed to verify the roles of the predicted keystone taxa in these ecosystems.
We identi ed the keystone MAGs that were most active in the co-transcription networks involved in sequential redox transformations, e.g. sulfate reduction to sul te and sul te reduction to sul de, in surface sediments (both mangrove and mud at sediments, Fig. 2c). On the other hand, in the subsurface sediments, keystone MAGs tended to cooperate by ful lling individual steps of redox transformations of sulfate reduction to sul de, although, based on the annotated gene sets, many MAGs were capable of performing sequential reactions. Meanwhile, no single keystone MAG was capable of performing the entire denitri cation processes. These observations indicate that microorganisms have developed vertically strati ed strategies of cooperation to ful l sequential redox processes, in agreement with the notion that a highly interconnected co-transcription network in the mangrove subsurface sediments follows a scenario of metabolic handoffs [23].
Candidatus Mangrovidesulfobacterales, a new order in the class Deltaproteobacteria.Among the putative keystone taxa in the co-transcription networks, one Desulfobacterota MAG (MP_5.128) formed a distinct clade with two peripheral MAGs (MP_5.1535 and MP_5.2156) within the class Deltaproteobacteria in a phylogenetic tree based on 122 concatenated bacterial marker genes (Fig. 4). The three MAGs branched with a recently proposed order Ca. Acidulodesulfobacterales (formerly Sva0485) [47] in the 16S rRNA gene phylogenetic tree. The averaged 16S rRNA gene sequence identity value between the two close groups was 85.19%, and the averaged nucleotide identity (OrthoANI) values between the three MAGs and four Ca. Acidulodesulfobacterales ranged from 59.64 to 61.27. We therefore propose that the three MAGs represent a new order in class Deltaproteobacteria and designate it as Ca.
The relative abundance of Ca. Mangrovidesulfobacterales was highest in subsurface mangrove and mud at sediments, where pH value exceeds 7.31 (Supplementary Table 3), ranging from 0.44% to 0.81%. It was lowest in surface mangrove sediments (pH = 6.6), at 0.09%. Therefore, Ca.
Mangrovidesulfobacterales are most likely anaerobes that do not preferentially dwell in the acidic environment, unlike the close relatives Ca. Acidulodesulfobacterales, which are mainly found in arti cial acid mine drainage [47]. We investigated the global distribution pattern of the proposed new order by searching NCBI SRA databases for 16S rRNA sequences that were 97% identical with that of Ca. Mangrovidesulfobacterales MP_5.128. The analysis revealed that Ca. Mangrovidesulfobacterales are widely distributed in various habitats, e.g. terrestrial, freshwater, coastal, estuary, and deep-sea sediment ecosystems ( Supplementary Fig. 8).
We also reconstructed the metabolic pathways of the three MAGs representing Ca. Mangrovidesulfobacterales. The analysis revealed the genetic potentials of C xation via the reverse tricarboxylic acid cycle (rTCA) for autotrophs, glycolysis pathway, and non-oxidative pentose phosphate pathway for heterotrophs, indicating that they could be mixotrophic deltaproteobacteria. The transcriptomic analysis revealed that in the surface mud at sediment, Ca.
Mangrovidesulfobacterales actively x C and subsequently produce acetate for ethanol fermentation. However, we did not detect transcripts for C xation and glycolysis pathways in Ca. Mangrovidesulfobacterales genomes from subsurface mangrove sediments (Fig. 5). We also identi ed a complete dissimilatory sulfate reduction pathway in Ca. Mangrovidesulfobacterales genomes, suggesting that they are potential sulfate-reducing bacteria. Nevertheless, the dsrAB genes were not transcribed, and hence, Ca. Mangrovidesulfobacterales are only able to transform sulfate to sul te (Fig. 5). This indicated that Ca. Mangrovidesulfobacterales might pass the end product, sul te, to other sulfate-reducing bacteria, e.g. Desulfobacterota MP_init.1551 and Krumholzibacteriota MP_5.1718, which were not transcriptionally active in the transformation of sulfate to sul te but were active in the sul te to sul de transformation. Collectively, these observations demonstrated the metabolic versatility and transcriptional activity of Ca. Mangrovidesulfobacterales in the mangrove and mud at sediments, revealing that these microorganisms are important participants in the biogeochemical cycles of C and S.

Conclusions
In the current study, in an effort to understand the mechanism of microbial synergy, we reconstructed hundreds of MAGs representing 50 distinct microbial lineages, from a typical coastal wetland ecosystem, and de ned their metabolic capacities in the biogeochemical cycles of C, N, and S. We built co-occurrence and co-transcription networks of key metabolic processes of C, N, and S cycles for signi cantly correlated MAGs. Subsequently, we demonstrated that genome-scale network analysis can be used to describe the interrelationships between the microorganisms, and capture putative keystone taxa that actively participate in the biogeochemical cycles and maintain the topological structures of networks. We propose that one such keystone taxon represents a new deltaproteobacterial order, Ca. Mangrovidesulfobacterales. It is widely distributed in various habitats, and is likely anaerobic, mixotrophic, and capable of dissimilatory sulfate reduction. As a limitation, the keystone taxa inferred from network topology have not been con rmed by classical experimental validation; however, the study is the rst initial screening of the potential keystone taxa in the enormously diverse and complex microbial community in the mangrove ecosystem. We are only beginning to ll a gap in knowledge on the unique microbial key functional assemblages of mangrove ecosystems. Understandings of their interrelationships will allow comprehensive documentation of microbial synergisms to underpin the roles of potentially underestimated microorganisms in biogeochemical cycles and mitigating climate change.  [7] and is stated here brie y. RNALater (Ambion, Life Technologies) was added immediately after collection and stored in a pre-cooled sampling box with ice cubes, then transported to the laboratory within < 4 h. We took two portions of 5 g of wet sediments stored at -20 ℃ for DNA and RNA extraction with the PowerSoil DNA Isolation Kit (MO BIO) and RNA Powersoil Total RNA Isolation Kit (QIAGEN), respectively, following the manufacturer's instructions. The ribosomal RNA genes were removed from the total RNA using the Ribo-Zero rRNA removal kit (Illumina, Inc., San Diego, CA, USA), and the remaining mRNA was reverse-transcribed. DNA and cDNA were sequenced using an Illumina HiSeq sequencer (Illumina) with 150-bp paired-end reads at BerryGenomics (Beijing, China). For each sample, the rest wet sediments were taken for physicochemical parameters measurement, including redox potential, pH, organic matter, and the concentrations of ammonium, nitrate, and nitrite in pore water via centrifugation from the sediments.
Metagenomic assembly and binning. We used "read_qc" module from metaWRAP (v.1.1) [48] to trim the raw shotgun sequencing metagenomic reads. All clean reads from ve samples were pooled prior to de novo assemble to obtain one co-assembly. A total of 3,265,768,294 paired-end reads (150bp) were sent out to MEGAHIT (v1.1.2) [49] with ag "--presets meta-large" for co-assembling job. Sequencing coverage was determined for each assembled scaffold by mapping reads from each sample to the co-assembly using Bowtie2 [50]. The binning analysis was carried out eight times by eight different combinations of speci city and sensitivity parameters using MetaBAT2 [51] ("--maxP 60 or 95" AND "--minS 60 or 95" AND "--maxEdges 200 or 500") on the assembly with a minimum length of 2000 bp. DAS Tool [52] was used as a dereplication and aggregation strategy on those eight binning results to construct accurate bins. Manual curation was adapted for reducing the genome contamination based on differential coverages, GC contents, and the presence of duplicate genes. The completeness, contamination, and strain heterogeneity of the genomes were estimated by using CheckM (v.1.0.12) [53] and DAS Tool under the taxonomic scope of domain (i.e., Bacteria and Archaea).
Gene calling and metabolic annotation. We performed open reading frames (ORFs) prediction on genomic scaffolds using the single genome mode of Prodigal [54]. The Pfam [55], TIGRfam [56], KOfam [57], and a published in-house HMM database [23] were used to assign all predicted ORFs protein functions by using hmmsearch (sequence e-value < 1e -5 and domain e-value < 1e -4 ) [58]. The identi ers for genes involving speci c metabolic pathways are listed in Supplementary Table 4. All the genes from 671 MAGs with such annotations are listed in Supplementary Table 5.
Metagenomic and metatranscriptomic reads were mapped against scaffolds from all MAGs using "quant_bins" module from metaWRAP, which uses Salmon [60] to estimate the abundance of each scaffold in each sample. The genome abundances are expressed as "genome copies per million reads" normalizing for differences in read counts between samples (Supplementary Table 2). The genomic and transcriptomic abundances of genes were expressed as "Transcript Per Million (TPM)" extracted from Salmon output. The abundances were averaged if a number of genes in one microorganism were annotated to multiple HMMs belonging to one metabolic module (Supplementary Table 6).
16S ribosomal RNA assembly. We used MATAM to reconstruct the full-length 16S rRNA sequences from reads and provide a taxonomic classi cation by RDP classi er against the SILVA132 database. The script "matam_compare_samples.py" was used to generate an abundance table of the assembled 16S rRNA sequences. The correlation coe cient (R 2 ) was calculated as a measure of linear association between the relative abundance of assembled 16S rRNA sequences and genome copies per million reads. The bin MP_5.128 as type material ofCa. Mangrovidesulfobacterium sediminis initially contained a partial 16S rRNA sequence with length of 532 bp, which had been extended by MATAM assembling the mapped reads to a 1272 bp length. The global alignment showed their similarity is 99.52% wit 2 bp mismatches.
Phylogenetic analyses. Phylogenetic analysis of the 671 MAGs was performed using a syntenic block of 15 single copy universal ribosomal proteins (L2, L3, L5, L6, L10, L11, L14b_L23e, L16_L10E, S2, S7, S10, S12_S23, S15P_S13e, S19). Each ribosomal protein was identi ed by using GraftM [61] and aligned using mafft-linsi [62] with default parameters. Individual ribosomal protein alignments were trimmed by BMGE (v.1.12) with "-m BLOSUM30 -g 0.2 -b 3" and concatenated before further analyses. In total, the concatenated alignment spanned 2,110 columns. Phylogenetic analysis of ribosomal protein was inferred by IQ-TREE [63] with the best-t model of "LG4X" followed by 1000 times ultra-fast bootstrap approximation, and 1000 replicated SH-like approximate likelihood ratio tests. The phylogenetic tree was rooted with archaea, and visualized with gtree v.1.4.4 (http://tree.bio.ed.ac.uk/software/ gtree/). Taxonomic assignments for bacterial and archaeal MAGs were given by running the genomes against GTDB-Tk [64] (GTDB r89) using FastANI [65]. The assignments were manually veri ed by phylogenetic trees of 120 bacterial/122 archaeal marker genes inferred by FastTree [66] with "-wag -gamma" ag on the alignments spanned 5040 columns and 5124 columns respectively. The 16S rRNA sequences were aligned using mafft-linsi and trimmed by trimAl [67] with "-gt 0.95 -cons 50", and associated phylogenetic analysis of Ca. Mangrovidesulfobacterales was inferred by IQ-TREE with best-t model of "TIM3e+I+G4" followed by 1000 times ultra-fast bootstrap approximation, and 1000 replicated SH-like approximate likelihood ratio tests. The concatenation of 120 bacterial marker gene alignments was used to infer a phylogenetic tree of these deltaproteobacterial genomes by IQ-TREE with the best-t model of "LG4X" followed by 1000 times ultra-fast bootstrap approximation, and 1000 replicated SH-like approximate likelihood ratio tests. The above two trees were rerooted by Nesterenkonia.
Network construction, module detection and identi cation of node roles. Networks were constructed for co-occurrence and co-transcription patterns of averaged TPM values for each of 22 metabolic modules of carbon, nitrogen and sulfur cycles for paired MAGs, yielding a total of ten networks. Random networks were generated using the Maslov-Sneppen procedure [68], which is to keep the numbers of nodes and edges unchanged but rewire all of the edges' positions in the networks, so the sizes of networks are the same and the random rewired networks are comparable to the original ones. A total of 100 permutations were generated for each empirical network. Network properties analysis and visualization was performed using {igraph} R packages [69]. We characterized network modularity for each network using seven cluster methods based on edge betweenness, fast greedy optimization, propagating labels, leading eigenvector, multi-level optimization, short random walks, and infomap algorithms from {igraph} R package, selecting the method with highest modularity value for nal clustering. Infomap algorithm was used for MP7 co-occurrence network and MP8 co-transcription network, whereas fast greedy optimization algorithm was used for MP9 co-occurrence and co-transcription networks. Multi-level optimization was used for all the rest networks. The connectivity of each node was determined based on its within-module connectivity (Zi) and among-module connectivity (Pi) [34], which were then used to classify the nodes based on the topological roles they play in the network (Supplementary Table 7 Figure 1 Phylogeny of bacterial and archaeal genomes inferred by maximum likelihood, and their metabolic modules and relative abundance across ve metagenomes. The phylogenetic tree is based on 15 concatenated ribosomal proteins and was collapsed at the phylum/class level according to GTDB taxonomy with numbers of the genome in the bracket. Only genomes from this study are shown on this tree. The metabolic modules are shown in rainbow colors, and circles/dot represent presence/absence, respectively. The relative abundance of the sum of genomes from a branch is shown in different sizes of black circles.  Betweenness (a) and degree (b) centralization associated with different co-occurrence and co-transcription networks. Asterisk (***) indicates signi cant difference P < 0.001 from Wilcox rank-sum test.