Arsenic metabolism genes were pervasively presented in the paddy soils either with high (67.3–105 mg kg− 1) or low (2.5–10.8 mg kg− 1) As levels, and their relative DNA abundance was mostly comparable between these soils (Fig. 4). Microbial capacity of As metabolism has been shown to be an ancient trait developed in response to the pervasive presence of As in early Earth . Therefore, it is likely that the presence and relative abundances of As metabolism genes in the paddy soils are possibly a legacy effect of an early life characteristic, at least to some extent . Further, the As resistant mechanisms, including the most well studied detoxification As(V) reduction system (ArsC) followed by the efflux of reduced As(III) (ACR3 and ArsB) , and the recently identified organoarsenical detoxification systems (ArsM, ArsI, ArsH and ArsP) [10, 59] have been detected in various microbes . The respiratory reduction of As(V) or oxidation of As(III), which is thought to have been developed in the Archaean era [16, 61], have also been described in a broad diversity of microbes . Consistently, the prevalent As metabolism genes have been previously reported in various wetlands with low As levels, such as paddy soils and estuarine sediments (< 15 mg kg− 1) [21, 25, 26], corroborating the wide prevalence As metabolism genes in our paddy soils.
Notably, the As concentration of ~ 100 mg kg − 1 in high As paddy soils did not significantly correlated to a higher relative abundance of As metabolism genes in the resident As-associated microbial community. Previous study of As metabolism genes in mining impacted fields showed a significant (p < 0.05) linear correlation of arsC and aioA gene abundance with the increasing of As concentration from 34 to 821 mg kg − 1 . We studied the correlation of ars genes abundances with As concentrations using our own and the previously determined samples by Luo and colleagues and found, consistent with the previous study, a significant correlation (p < 0.05) between As metabolism gene abundance and high As concentration (especially in the 410–812 mg kg − 1 range; Supplementary Figure S14). These findings indicated that the 12 times fold higher As concentration in high (average of 92 mg kg − 1) vs low (average of 8 mg kg − 1) paddy soils of our study was not strong enough to significantly differentiate As genes abundances, and that higher difference in As level and/or high absolute As concentrations (e.g., > 410 mg kg − 1) is probably necessary in order to elicit more clear gene content and abundance differences. The possibilities that i) some of the As we measured in our high As soils is not bioavailable (and thus, not selecting for more As-metabolism genes), especially considering that the water soluble or exchangeable (bioavailable) As only accounts for about 0.4% – 24% of the total As in paddy soils , or that ii) our measurements are not representative of the average As level in the soils due to soil heterogeneity (e.g., As levels can vary substantially at the millimeter scale) and relative low number of samples analyzed, cannot be excluded at present. To better understand, however, the selection pressure exercised by different As levels, more samples and As gradients ranging from low (i.e. < 15 kg mg− 1) to higher than (at least) 400 kg mg− 1 should be included in future studies.
In any case, the increased gradient of As in the paddy soils did upregulated the ars (arsR, acr3, arsB, arsC, arsM, arsI, arsP and arsH) and aioA genes expression level. The co-occurrence patterns of arsR, acr3, and arsM genes revealed by our network analysis (Fig. 5B) further corroborated that the acr3, and arsM genes are usually organized in a single ars operon, which is nearly always under the control of an ArsR transcriptional repressor [12, 13]. Both the ArsR and AioA proteins are induced by As(III), which is the dominant As species in the paddy soils during the flooded period [18, 19], and most likely contributed to the significantly increased transcriptional activities of ars and aioA genes in soils with higher As levels (Fig. 4). The positively association of arsI with acr3 and arsM genes (Fig. 5B and 5C) could be attributed to that the increased transcriptional activity of arsI introduced more As(III) in the soil and consequently, induced the As(III) efflux and As(III) methylation processes, considering that arsI gene is responsible for the demethylation of MMAs(III) to As(III), and acr3 is responsible for As(III) efflux while arsM is responsible for As(III) methylation to MMAs(III) (Fig. 5A). Nevertheless, it cannot be excluded that these gene respond to the same external stimuli (e.g. higher overall As concentrations or the concentration of specific species of As) as opposed to interacting among each other. Related to this, the total As concentration did not significantly correlate with the transcription activity of any As metabolism gene except one arsM gene (arsM_SKSAsHig_gene_id_443484), indicating that the Spracc network analysis might have revealed several true biological interactions among these As genes. Further, the overall transcriptional activity of arrA gene, which is responsible for dissimilarity As(V) reduction  was not significantly different between high vs low As paddy soils (Fig. 4). Because ArrA is induced by As(V), it is possible that the dominant As species, i.e. As(III) in paddy soils [18, 19] did not significantly upregulated its transcriptional activity directly. To the best of our knowledge, there are no studies of the transcriptional activity of As metabolism genes in the environment, with the possible exception of a metaproteomic study of an As rich ecosystem, which detected As metabolism proteins, consistent with the findings reported here . Moreover, our study showed the increased transcriptional activity of As metabolism genes was due to several distinct clades (community-wide) in the high As paddy soils (Fig. 6 and Supplementary Figure S4-S12), and this result again corroborated the pervasive presence of As metabolism mechanisms in microbes of the paddy ecosystem.
It’s also noticeable that the relative abundance of aioA genes (DNA level) were about 3.5 times higher in high than low As paddy soils, although not statistically significant (p = 0.114). In contrast, the relative abundance of ars genes at the DNA level was comparable between high and low As levels (Fig. 4). Considering that the As respiratory As(III) oxidation catalyzed by Aio in microbes can couple As(III) oxidation to ATP production, while the Ars is only responsible for As detoxification [14, 60], our finding is consistent with the expectation that energy-production genes (e.g., aioA) will be at higher demand at both the DNA and transcriptional (RNA) levels with increased availability of the substrate, while for the detoxification metabolism genes (e.g., ars) transcriptional response might be adequate with increased substrate and the substrate might not be similarly toxic to all species in order to select for the corresponding genes.
In terms of whole microbial communities, geographic distances, rather than As levels, were found to shape microbial community differences among the paddy soil sites studied here (Fig. 7). The geographic patterns of soil microbial communities have been widely studied previously, and are mostly attributed to varied physicochemical and carbon sources in soil environments [65–67]. It is possible that the As levels in these paddy soils did not possess a strong-enough selective pressure on the whole microbial communities (discussed above). In contrast to As, the concentrations of TP (0.31–0.46 g kg− 1) and Fe (14.8–27.0 mg kg− 1), which were about 1.5–1.8 times different between high and low paddy soils, were significant in determining microbial community diversity among our samples (Fig. 7). Previous studies have shown that available organic soil P has an important influence on microbial community composition, shifting both the composition and function of soils microbes [68, 69]. Especially for rhizosphere-associated microbes such as the dominant Proteobacteria in paddy soils  their community structure can be altered by soil P availability [71, 72]. The Fe in paddy soils is likely to impact the microbially mediated redox processes , and its presence in the wetland environments has been demonstrated to correlate with microbial biogeographic patterns. It is also important to mention that the effect of Fe and TP on As microbes and genes may be interlinked due to the significant (p < 0.01) interaction of P and Fe since P is often the limiting nutrient in many terrestrial ecosystems due to strong sorption on Fe(III) hydroxides . A significant positive correlation between P and Fe in porewaters as well as soil matrix has been previously observed in flooded soils with rice growth . Moreover, their interaction could also affect As behavior in paddy soils  because iron oxyhydroxides could sequester As(V)  and reduce As bioavailability to microbes.
The dominant microbes in these paddy soils included Proteobacteria, Acidobacteria, Chloroflexi, Nitrospirae, Verrucomicrobia and Bacteroidetes, which have been all documented in paddy soils previously [23, 24]. It should be noted that this result was based on the classification of 16S rRNA gene fragments extracted from metagenomic datasets (Fig. 2). Because of the relatively low coverage as well as the high diversity of microbes in paddy soils, only five metagenomic assembled genomes (MAGs) of adequate quality were obtained from the high As paddy soils after population genome binning (completeness > 80% and contamination < 5%; Supplementary Table S7). These MAGs were assignable to the class of Deltaproteobacteria, Thermomicrobia, Clostridia, Acidobacteriaceae and Solibacteres based on MiGA . Only a couple of the As metabolism genes including arsR, acr3, arsC, arsM and arsP were detected in these MAGs, indicating that these MAGs were capable of As metabolism. However, the remaining As metabolism genes, i.e., arsH, arsI, arsB, aioA and arrA, were not detected in any of the MAGs, possibly due to the incompleteness of these MAGs, and also the relative long length of these genes compared to the other As metabolism genes, especially for AioA and ArrA (800 vs 100–400 amino acids, respectively), which might have prevented their robust assembly into long enough contigs for binning. To better understand, however, higher sequencing efforts should be applied in order to robustly assemble more MAGs, especially those representing the As-transforming taxa, and understand the distributions of As metabolism pathways in different taxa.