Comparative analysis of bacterial community diversity between soil and water of Dongzhai Harbor Mangrove reserve using 16S rRNA gene sequencing and shotgun metagenomic sequencing

The mangrove ecosystem has rich biological resources and is the fourth highest service value ecosystem in the world. The Dongzhai Harbor Mangroves (DHM) reserve is located in the intertidal zone where the freshwater and seawater are dynamically interlaced, and its unique habitat may hide special microbial resources. In this study, we analyzed and compared bacterial community composition and diversity between the DHM soil and water by 16S rRNA gene sequencing and shotgun metagenomic sequencing. We found that the dominant species in both soil and water of DHM were Proteobacteria and Actinobacteria, while the most differentially abundant species were Chloroexi and Bacteroidetes. However, shotgun metagenomic sequencing analysis identied more highly abundant species in the water than in the soil, and identied more species with signicant differences between the soil and the water (P ≤ 0.0001). Network analysis identied more co-occurring species in the soil and water. Kyoto Encyclopedia of Genes and Genomes (KEGG) functional analysis identied three relatively abundant pathways: metabolism (accounting for more than 50%), genetic information processing, and environment information processing. Our results increase our understanding of bacterial community diversity in the water and soil of the DHM. Additional information that is hidden in the environment may be obtained by shotgun metagenomic sequencing; this technique can be used to mine more microbial resources from various environments.


Introduction
Mangrove forests as special coastal ecosystems are found in tropical and subtropical coastal intertidal zones. They act as a transitional zone among land, sea, and rivers, and are the nal land environment before the land-based pollutants into the sea, which has been shown to purify pollutants 1 and is periodically submerged by the tide under certain conditions. The properties of these environments determine the strength and characteristics of microbial ecological functions 2 . High tide brings in salt water, and when the tide recedes, solar evaporation of the seawater leads to further increases in salinity.
In addition, mangrove trees shed a great deal of organic matter, particularly the fallen leaf covered with bacteria, protozoans, and fungi into water. As a result that the microbial community in mangrove waters is more pronounced than the nearshore coastal environment. Mangrove soils have an acidic pH, are high in organic matter and salt content, and are mainly anaerobic (although surface soils are aerobic) 3,4 . The unique characteristics of this habitat have led to the development of a rich and unique microbial community 5 .
Microbes play an important role in material circulation, energy ow, and ecosystem stability 6 . In recent years, microorganisms as important members of mangrove ecosystems have received increasing attention 7,8 . Microbial diversity is an important part of biodiversity, and understanding the microbial diversity is the basis of microbial research in mangrove wetlands, which is of great value for developing microbial resources. But mangrove microbial communities are still little understood, primarily due to the limitations of research methods 9 .
The Dongzhai Harbor Mangroves (DHM) reserve, which is currently the largest coastal mud at in China, is located in Yanfeng Town, Haikou City, China, northeast of Hainan Island (110° 32'-110° 37' E and 19°5 1'-20° 1'N ). There are three economically important tree species in the mangrove family including Kandelia candel, Bruguiera gymnorrhiza, and Rhizophora stylosa 10 . In recent years, numerous studies addressing the protection of plant and animal biodiversity in the DHM reserve have been published 11,12 . According to our knowledge, there is no report of comprehensive comparing soil and water microbial diversity study and developing microbial resources in DHM. Therefore, it is necessary to found more hidden information from microorganism in DHM. In this study, combining 16S rRNA gene and shotgun metagenomic sequencing, we comprehensively understand microbial community composition and microbial diversity in the soil and water of DHM.

Results
Sequencing statistics. The 16S rRNA gene sequencing results for the soil and water samples are shown in Table S1. Across all of the samples, the total number of sequences was 321,131; the total number of bases (bp) was 134,242,228; and the average sequence length was 418.04 bp. The quality control results for the shotgun metagenomic sequences are shown in Table S2. The average numbers of raw reads in the soil and water samples were 99,317,796.6667 and 98,984,089.3333, respectively, while the average numbers of clean reads in the soil and water samples were 98,645,909 and 97,975,913, respectively. After quality control, the average proportion of raw reads for both the soil samples and the water samples was 99%, and the average proportion of raw bases for both the soil samples and the water samples was 98%.
The Shannon curve shown as Fig. S2, was one of the indexes that was used to estimate the microbial diversity of each sample. The result showed that our sequencing data were su cient to re ect the vast majority of the bacterial diversity in the soil and water samples. The larger Shannon index for the soil samples indicated that bacterial diversity in soil was higher than that in the water in the DHM. The soil in DHM wetland as a highly heterogeneous medium are of marine alluvium, which is rich in organic matter.
It provides a variety of suitable habitat and environment conditions for microorganism, supporting the high soil microbial diversity and a variety of different microbial13. Species composition and difference analysis. As shown in the species community bar chart (Fig. 1a, b), Archaea and bacteria remained after the removal of Eukaryota and viruses. Most of the microorganisms detected in the soil and water samples from the DHM were bacteria (98.21% and 98.61%, respectively).
Only a small proportion of the samples were Archaea (1.60% and 1.18%, respectively). The 16S rRNA gene analysis identi ed the dominant species in the DHM water samples as Proteobacteria (54.1%), Actinobacteria (13.0%), and Bacteroidetes (26.7%). However, the dominant species in the DHM soil samples were Proteobacteria (47.9%), Actinobacteria (7.04%), and Chloro exi (16.5%) (Fig. 1a). The shotgun metagenomic analysis identi ed the dominant species in the water samples as Proteobacteria (65.55%), Actinobacteria (11.99%), and Bacteroidetes (16.44%), and identi ed the dominant species in the soil samples as Proteobacteria (62.32%), Actinobacteria (9.31%), and Chloro exi (7.65%) (Fig. 1b). However, bacterial community composition and structure in soil or water were affected by different environmental factors. As a results that bacterial communities abundance in water were different from those in soil, and the major reason for this discrepancy might be related to different habitats.
Species heatmap clustering is based on similarities in relative abundance among species and samples, and aggregates species with high abundance and low abundance in separate block 14 . The bacterial community heatmap of the 30 most abundant species based on 16S rRNA gene sequences and shotgun metagenomic sequences showed the certain species that were highly abundant in the water samples were moderately abundant or uncommon in the soil samples, such as Pseudarcicella. However, other species had similar levels of relative abundance in both soil and water, such as the highly abundant species of unclassi ed Gemmatimonadetes (Fig. 1c,d) The phylum Bacteroidetes was signi cantly different between soil and water (P ≤ 0.0001 Fig. 2a), as showed in the bar chart of species differences based on 16S rRNA gene analysis. However, shotgun metagenomic analysis identi ed additional species that were signi cantly different between soil and water (P ≤ 0.0001), as shown in the bar plot of Welch's t-test (Fig. 2b), including Bacteroidetes, Firmicutes, Cyanobacteria, Planctomycetes, Acidobacteria, Spirochaetes, Thaumarchaeota, and unclassi ed bacteria. Due to unequal amplication 16S rRNA gene sequence of species, it may be biased. However, shotgun metagenomes cover a widespread microbial community and generate huge number of reads with various length by using this sequencing technologies, as a result that covering more species with signi cantly difference and genetic information.
Network analysis. The co-occurrence network based on 16S rRNA gene analysis indicated that the species Chloro exi, Acidobacteria, Planctomycetes, Proteobacteria, Gemmatimonadetes, Cyanobacteria, lgnavibacteriae, Bacteroidetes, Actinobacteria, Firmicutes, and Verrucomicrobia co-occurred in the soil and water samples (Fig. 3a). Shotgun metagenomics analysis identi ed additional species that cooccurred in the soil and water (Fig. 3b). The species correlation network of the top 35 phyla indicated that most species correlated with others ( Fig. 4), including Euryarchaeota, Firmicutes, Verrucomicrobia, Chlorobi, and Tenericutes. Based on the clustering coe cient, these taxa play an important role in the species correlation network. Shotgun metagenomic sequencing not only elucidates species structure and systematic community evolution, but also supports the genetic analysis of functional metabolic networks within the microbial community.
Functional diversity analysis. 16S rDNA functional predictions indicated that three main categories, including metabolism, genetic information processing, and environment information processing, which have relatively high abundance in both the soil and the water (Fig. 5a). In the shotgun metagenomic functional analysis, we annotated six types of functional genes, and were also found to be enrich in three main categories of metabolism, environment information processing, and genetic information processing (Fig. 5b). Of these, metabolism represents more than 50% of all functional classi cations at KEGG pathway level 1. This suggested that microbiome of DHM display a relatively high level of metabolism activities and genetic stability.

Discussion
The microbial diversity research is of great value for many aspects, which includes 1) the development of biological resources, 2) accelerating the discovery process of new functional genes and bioactive substances, 3) the clari cation of the relationship between microbial communities and habitat, 4) the elucidation of the association between the community structure and function 15 . Previous studies have suggested that bacteria are the most dominant group of microorganisms in this environment 7 . Bacterial communities play an important role in material transformation in mangrove ecosystems 16 . A study has found that Proteobacteria and Chloro exi were ubiquitous and dominant in the DHM soil 17 , and our result is consistent with it (Fig. 1a,b).
In this study, we compared bacterial community structure and diversity between the soil and water of the DHM using 16S rRNA gene sequencing and shotgun metagenomics sequencing. The two approaches used here can better reveal and re ect the species composition, levels of species abundance, as well as the bacterial functions in the soil and water sample of DHM. However, the two methods generated slightly different results for the analyses of microbial community and structure. The shotgun metagenomic sequencing identi ed more similar, different, and associated species, and retrieved more hidden information than 16S rDNA sequencing. Shotgun metagenomic sequencing technique can thus be used to mine additional microbial resources from various environments.
The wetland ecosystem in DHM is open and the microorganisms in soil and water always keep contacting and communicating. Both the soil and water samples shared certain microbial community structures. Within the bacteria, Proteobacteria was the dominant phylum, followed by Bacteriodes, Chloro exi, and Actinobacteria. The similar bacterial phylum dominancy was detected like those observed in the Brazilian mangrove metagenomes and soil microbiome from a managed mangrove in Malaysia 18 . It also has been reported that members of the Proteobacteria are the most abundant group of bacteria in soil and are known to harbor a diverse group of metabolic enzymes 19 .
The species symbiosis and intercorrelation in soil and water sample could better explain that the habitat in DHM wetland ecosystem is not completely closed and isolated from each other. Functional pro les were analyzed and revealed the similar basic functional categories between the soil and water in DHM through the 16S rRNA gene and shotgun metagenomics sequences. As a result the high abundance of sequences assigned to metabolic pathways, which is also commonly found in other mangrove metagenomes 19 .
Our results increased our comprehensive understanding of microbial community diversity in DHM water and soil. However, the relationship between mangrove microbial diversity and the maintenance of environmental stability, as well as the connection between mangrove microbial diversity and microorganism functionality, remain to be clari ed. Future studies are necessary to investigate microbial diversity in response to human activities and environmental change. In particular, it is important to mine additional genetic resources using the metagenomics sequencing technique.

Methods
Sample collection and DNA extraction. Soil and water samples were collected in the DHM and the Yanfeng River basin during ebbing tides in July 2018. We set up ve sampling quadrats (A, B, C, D, and E) in the upper, middle and lower reaches of Yanfeng River. The sampling map and the latitude and longitude information as listed in Fig. S1. A composite sample is made by combining ve subsamples from the same area in a site. The soil samples were collected from the topsoil layer (5-10 cm deep) before the leaves and grass covering the topsoil at each sampling point were cleared. Soil samples had four replicates for each quadrats and twenty soil samples were taken from ve quadrats. We fully mixed all of the samples, removed debris, and divided the mixed sample into three equal parts, which were then stored in sterile plastic bags.
The water samples were collected simultaneously at each sampling quadrats. All of the water samples (20 L in total) were fully mixed, and then divided into three equal subsamples. Microorganisms in each subsample were collected into sterile tubs by centrifugation. Total DNA was extracted by Zhou's In-situ Pyrolysis 20 . The extracted DNA samples were quanti ed using a NanoDrop 2000 (Thermo Fisher Scienti c, USA) and DNA quality was determined by 1% agarose gel electrophoresis.
16S rRNA gene and shotgun metagenomic sequencing. Extracted DNA was sent to Shanghai Majorbio Bio-pharm Technology Co., Ltd for 16S rRNA gene sequencing and shotgun metagenomic sequencing.
In order to perform the shotgun sequencing, the extracted DNA was fragmented to an average size of about 300 bp using Covaris M220 (Gene Company Limited, China), and paired-end libraries were constructed using TruSeq TM DNA Sample Prep Kits (Illumina, San Diego, CA, USA). Adapters containing the full complement of sequencing primer hybridization sites were ligated to the blunt-ends of all fragments. The mixed PCR products of 16S rRNA genes and paired-end sequencing library were sequenced using an Illumina HiSeq PE 4000 platform at Shanghai Majorbio Bio-pharm Technology Co., Ltd. (Shanghai, China) with paired-end sequencing technology following the manufacture's instruction (www.illumina.com).
Comparative analyses of the soil and water bacterial communities. The 16S rDNA and metagenomic data were analyzed using the Majorbio I-Sanger Cloud Platform (www.majorbio.com). Sequences with ≥97% similarity were clustered into Operational taxonomic units (OTUs) using U Search (V Session 7.0; http://drive5.com/uparse/). To identify the species corresponding to each OTU, the Bayesian algorithm of the Ribosomal Database Project (RDP)classi er was used to carry out taxonomic analysis against the Silva database. Based on the species identi cation corresponding to each OTU cluster, all samples were attened based on the minimum number of sample sequences and further analyzed.
Representative sequences from the non-redundant gene catalog were aligned to the NCBI NR database using BLASTP (Version 2.2.28+, http://blast.ncbi.nlm.nih.gov/Blast.cgi) with an e-value cutoff of 1e-5 to obtain taxonomic annotation information for each sampled species. We visualized the composition of the bacterial community at the phylum level using a bar chart.
Species abundance and composition at the genus level (for the 30 most abundant genera) were compared between the soil and water communities using community heatmaps. Species hierarchal clustering methods and sample-level clustering methods were average. Welch's t -tests were used to identify species with signi cantly different numbers of reads between the soil and water communities.
Network analysis. The phylum co-occurrence networks were evaluated (for the 30 most abundant phyla) to explore the co-existence relationships among species in soil and water samples. To evaluate these networks, we used 16S rDNA network analysis and metagenomic-based species distribution network analysis. The phylum correlation networks were also evaluated to explore the interactions among phyla, based on the metagenomic univariate correlation network of the 35 most abundant phyla. For these analyses, the Spearman correlation coe cient model was used with a cutoff of 0.5 and a p-value of 0.05.