It is a big challenge for biologists to quantify environmental biodiversity just relying on morphological and behavioral characters since the traditional identification is often biased, time consuming, and dependent on a declining pool of taxonomic experts for identifying various kinds of organisms, especially for microorganisms1–6. Next-generation sequencing (NGS) provides an alternative for enhancing biodiversity monitoring in a wide range of environmental samples by overcoming some of the challenges of labor-intensive and time consuming morphological identification1,7−14. The advantages of NGS make the eDNA metabarcoding approach fast, effective, and cost efficient for monitoring biodiversity without significant damage to the target species or its habitats, where a short DNA region or multiple short DNA regions are amplified and sequenced from environmental DNA (eDNA) 11,14−23. The eDNA metabarcoding also has high detection probabilities for rare, cryptic and elusive species6,24−26. One remarkable opportunity provided by eDNA metabarcoding for biodiversity understanding is that it can monitor the dynamics of species, populations and communities over long time periods and across large spatial scales27–28.
However, challenges and limitations also exit for amplicon-based eDNA metabarcoding, e.g. reference database and PCR primer bias. First of all, assigning abundant NGS reads at the species level is important for biodiversity monitoring23,28−29, which mainly relies on assigning the reads to the available reference sequences in public databases like GenBank or EMBL. But most species just have one or a few genes, or even no genes in reference databases. In this case, some reads can only be assigned to higher taxonomic levels, which makes it difficult to associate eDNA data with existing biological and ecological knowledge6. The ways to build reference database also affect the identification accuracy, like blasting against the NCBI database directly, downloading a local copy of EMBL, or building a local reference28. The ecoPCR is able to obtain relatively complete and accurate reference sequences for one pair primers by silico PCR28,30. Secondly, successful amplification of molecular markers for eDNA depends highly on primer specificity, sensitivity, and efficiency31. However, it is almost impossible to amplify all taxa expected successfully for one gene marker, even for the conserved markers like 16S, ITS and 18S. Then special primers need to be designed for specific groups to get comprehensive taxonomic identification. Additionally, one short single gene usually provides insufficient information for identifying all species. Also, conserved gene primers have to be designed to get compatible short fragments with sequencing platforms within a limited length10,32, which often brings difficulty to metabarcoding. Researchers suggested that multiple genetic markers for eDNA metabarcoding should be considered for accurate molecular species identification, especially for eukaryotes33–34.
The method for amplicon-based NGS library preparation is also critical in reducing PCR bias, which mainly differ in whether the sequence adaptors are added by ligation35–38, single-step PCR39–43 or multi-step PCR44–46. A two-step PCR library preparation is recommended since it avoids potential bias due to the use of different indices in each primary amplification, and it offers increased versatility by minimizing the number of primers that must be synthesized1. While multiple gene barcodes are employed for amplicon-based metabarcoding the cost of library preparation should also be considered due to multifold cost in PCR, cleanup and DNA quantification, especially for large samples. Multiplex PCR is a widespread molecular biology technique for amplification of multiple gene targets in a single PCR experiment, which could be as a cost-effective method for metabarcoding if we address the challenges of suitable primers used, effective PCR conditions, and good library preparation47.
Biodiversity monitoring by eDNA is increasingly being conducted for freshwater and marine ecosystems29,48−50. However, most of these studies focused on fishes and amphibians. Although cyanobacteria, eukaryotic microalgae, and zooplankton all play key roles in aquatic ecosystems (like bloom) studies on these taxa are scarce. The water ecosystem has gradually been altered by climate change, increasing nutrient pollution, and activities of humans51–52 which often lead to eutrophication, algal growth and biomass accumulation in the upper photic zone and thus produce bloom among the globe53–56. Cyanobacterial blooms are currently increasing globally55, causing the death of other aquatic animal and destroying the aquatic ecosystem. Actually, not only the cyanobacteria bloom but also many other photosynthetic algae bloom, such as green algae and diatoms, are also serious27,57. While phytoplankton metabarcoding in marine environment has been triggered with 16S or 18S NGS27,57 the phytoplankton metabarcoding in inland lakes with bloom is lagging behind. Blooms could also cause a cascade of changes in planktonic microbial communities. There is increasing evidence that there is co-evolutionary arms race between bloom-forming cyanobacteria and their grazers55.
Lake Tai (meaning ‘Great lake’ in Chinese) is China’s third-largest freshwater lake and one of the most important water resources for drinking around population cities and development of agriculture and industry. However, the rapid economic and population growth has led to numerous eutrophications. The blooms in Lake Tai have occurred frequently since 1997, which brought ecosystems to the brink and led to serious environmental problems despite major government efforts to clean up the lake55,58−60. Blooms of Lake Tai originally occurred just in one position: Meiliang Bay with cyanobacteria54,61−64. Recent years the blooms often changed with seasons and gradually occurred among the whole lake34,56,64−68. The plankton diversity monitoring among the whole lake is important to understand the lake ecosystems and control the bloom. Currently, the identification of phytoplankton and zooplankton is mainly based on morphological characters by microscope64,67−68 which has bias in taxonomic identification as mentioned, especially for cryptic species69–70.
Take Lake Tai as an example, here we developed simple and effective-cost multiplex PCR-based eDNA sequencing and multiple new gene barcodes, aiming to: 1) compare the detection efficiency of 9 gene loci of 18S, 28S, 16S, 23S and rbcL for cyanobacteria, eukaryotic phytoplankton and zooplankton; 2) validate the efficiency of multiplex PCR-based eDNA sequencing; 3) reveal the diversity patterns and seasonal community dynamics of cyanobacteria, eukaryotic phytoplankton (each phylum separately) and zooplankton from spring and summer; 4) construct the molecular associate network of phytoplankton and zooplankton; 5) uncover the impact of various environmental factors to the diversity variation of phytoplankton and zooplankton; 6) finally infer the reason for lake algae blooms. Three multiplex PCR systems were carried out to amplify 9 gene loci, including three regions of 18S (18S-v1-3, 18S-V4-5, 18S-v9), two regions of 16S rDNA, on region of 23S rDNA, one region of 28S rDNA, and two regions of rbcL. Five gene barcodes, 18S-v4-5, 28S, 16S-new, rbcL-Chlo and rbcL-Cry, were newly designed in this study. 48 samples from two seasons with microscope identification were used as mock communities to compare the taxonomic assignments of multiple gene barcodes.