Uncovering the Competences of Commonly Used Nutrient Media, Multi-primer Approach and Reference Databases in the Identication of Microalgae Diversity from Environmental Samples

Microalgae are highly diverse microorganisms and have a variety of benets and use across different elds. On the other hand, their overgrowth can be extremely dangerous to our environment, thus, making it particularly important to continuously manage and track their abundance and diversity to oversee any potential of extinction or overgrowth. The vast diversity of microalgae imposes the challenge of their identication through the most common and economical identication method, morphological identication, and the more recent molecular-level identication tools. To enhance the identication of microalgae, we targeted enrichment of total microalgae diversity present in an environmental sample using four different enrichment media (BG-11, BBM, Modied media (MM), and half-strength Murashige and Skoog medium (MS)). Morphological identication of the enriched microalgae diversity was conducted every 4-days to monitor the population dynamics. After 14-day the DNA was extracted from the enriched population for molecular-level identication using 16S rRNA gene regions V1-V3 and V4-V5 and 18 rRNA gene V4 region. To further enhance microalgae identication through molecular-level identication, we evaluated three reference databases (SILVA, Greengenes, and Protist Ribosomal Reference (PR 2 )) to reveal their competence in microalgae diversity identication. A total of 38 microalgae were identied morphologically to the genus level, and the highest number of microalgae were identied through MM media (36), followed by BG-11 and BBM, 31 and 26, respectively. While sequencing the three-primer sets using the three databases, 87 microalgae were identied to the genus level. The highest diversity was identied using the MM media (71 genera) followed by BG-11 (69 genera), BBM (67 genera). Our multiple-media, primer, and reference database approach enabled us to identify a high microalgae diversity that would have been missed if a single approach was used over the other.


Introduction
Microalgae widespread arises mainly due to their aptitude to adapt and survive in different aquatic and terrestrial environments 1 . Their rich diversity provides us with various bene ts that include but are not limited to the following: pigments (carotenoids), antioxidants, toxins, and sterols 2,3 ; act as environmental bioindicators for aquatic ecosystems management. Since certain algae species in an ecosystem are directly correlated to the nutrient availability and concentration present in that habitat 4,5 , it is essential to constantly study the algal diversity present in various water resources.
One of the approaches to identifying microalgae from environmental samples is the traditional morphological identi cation method. This can be done directly by analysing the sample microscopically; however, the drawback of such an approach is the limiting detection threshold of microalgae present in a collected sample. An alternative approach is the enrichment of the environmental sample prior to morphological identi cation. The commonly used media for microalgae culture in the literature include Blue-Green media (BG-11) and Bold's Basal media (BBM), which have been widely used to identify and maintain species diversity as it can sustain a wide range of species 6 .
Another means of microalgae identi cation is through the molecular level analysis of conserved regions of the DNA, such as 16S and 18S ribosomal RNA (rRNA) genes used for the pro ling of prokaryotic and eukaryotic microalgae, respectively. The reference database to be used to align the sequenced results is of equal importance to the DNA barcodes sequenced. Hence, it is crucial to identify the most suitable database to be used for microalgae identi cation. The SILVA database is one of the predominantly used databases and is based on the small sub-unit (SSU) rRNAs that include the 16S and 18S rRNA in prokaryotes and eukaryotes, respectively. Therefore, they cover the three domains of life: Bacteria, Archaea and Eukarya. The SILVA database can assign taxonomic ranks to the genus level 7 . Greengenes (GG) database is another database that constitutes bacterial/archaeal 16S rRNA gene. The GG database is constructed from public databases that are aligned and checked for chimera 8,9 ). At the same time, the Protist Ribosomal Reference (PR 2 ) database contains sequences of the SSU rRNA and rDNA of the kingdom Eukaryota. The PR 2 database is used to identify eukaryotes through nuclear-encoded sequences, the 16S rRNA, present in the mitochondria or plastid of eukaryotic organisms 10 . Both GG and PR 2 provide taxonomic annotation down to the species level.
Several studies have relied on selective enrichment or enrichment using a single common nutrient media to improve microalgae identi cation. The in uence of culture media on microalgae population growth and diversity in the literature remains elusive; however, still utilised [11][12][13] . There has been a lack in investigating the capabilities of these different enrichment media in the enrichment of certain microalgae.
Additionally, the capabilities/limitation of the molecular identi cation approach regarding the use of different primer sets and reference databases have not been extensively studied for microalgae. In contrast, similar approaches have been extensively studied for bacteria [14][15][16][17][18] . We have previously investigated the e ciency of the commonly used BG-11 and BBM in microalgae enrichment for diversity identi cation and the effect of vitamin enrichment and reduced media nutrient on diversity enrichment. The results obtained using the reduced media provided us with an insight into the limitation of both media and the need to investigate other media compositions 19 . Hence, the current study investigates the effect of a modi ed media (MM) that is considered our intermediate media and a high nitrogen-rich media and half-strength Murashige & Skoog Basal Salt Mixture (MS) on the enrichment of microalgae diversity from an environmental sample. Therefore, this study aims to improve the detection and quanti cation of microalgae through (1) increasing algal abundance above the detection threshold using four different enrichment media and; (2) the utilisation of multi-primer sets of small subunit (SSU) rRNAs (16S and 18S rRNA) and; (3) using different reference databases (SILVA, Greengenes (GG), and Protist Ribosomal Reference database (PR 2 ) to evaluate their competence in the identi cation of microalgae.

Results
Overview of the freshwater microalgae taxa identi ed across different nutrient media The morphological assessment of the samples using the four different nutrient media aided in identifying 38 microalgae to the genus level. The highest diversity was belonging to Chlorella, Scenedesmus, Ankistrodesmus, and Selenastrum species. Figure 1A shows that the highest diversity was identi ed when the sample was cultured on MM media (36 genera), followed by BG-11 (31 genera), BBM (26 genera), and MS media (20 genera). The half-strength MS media had the lowest identi cation, and all the genera identi ed through the MS media have been commonly identi ed through the other media.
Using the three primer sets to analyse the samples and sequencing through the Illumina HiSeq platform and using the three databases enabled us to identify 87 microalgae to the genus level. Similar to the results of the morphological analysis, the highest diversity was identi ed using the MM media (71 genera) as a sample enrichment media followed by BG-11 (69 genera), BBM (67 genera), and MS media (45 genera) (Fig. 1B). However, through the molecular-level identi cation and unlike the analysis results conducted through the morphological features of microalgae, only the half-strength MS media enriched and identi ed Raphidonema.
Combining the two-identi cation method, molecular and morphological, and the enrichment of the samples on the four-enrichment media enabled us to uncover a total of 104 microalgae to the genus level (Fig. 1C). The highest diversity was identi ed using the three primer sets sequencing sections of the hypervariable regions of 16S and 18S rRNA genes. When looking at the morphological and molecular level analysis results side by side, 17 genera have been uniquely identi ed only through the morphological level analysis and not through the sequencing of the hypervariable regions (Fig. 1C). While the molecular-based approach uniquely identi ed 67 microalgae (Fig. 1C).

Taxonomic composition and diversity comparison from different databases
The diversity of the microalgae cultured on the four media was analysed through two regions of the 16S rRNA gene and one region in the 18S rRNA gene. In total, the two 16S rRNA regions sequenced (V1-V3 and V4-V5) combined had a total of 852,517 sequences. In comparison, the 18S rRNA V4 region had a total of 423,623 sequences. After the ltering and processing, the two 16S rRNA regions combined had 271,511 unique sequences, and the 18S rRNA region had 344,887 sequences. The 16S rRNA OTUs assigned by aligned them to three reference databases: SILVA, Greengenes (GG), and the Protist Ribosomal Reference (PR 2 ). The 18S rRNA sequences were aligned SILVA database.
A total of 5298 taxa were identi ed using the 16S V1-V3 and SILVA as a reference database, while using the GG and the PR 2 database, 5498 and 5942 taxa were classi ed, respectively. Moreover, 397 taxa were classi ed using the 18S rRNA through the SILVA database. Further ltering using RStudio packages was performed to eliminate unclassi ed phyla to focus on the classi ed microalgae phyla/genera.

Cyanobacteria Identi cation
Through the 16S rRNA genes, 19 Cyanobacteria genera have been revealed. The primer sets used for the 16S rRNA hypervariable regions V1-V3 and V4-V5 and three databases aided in identifying 18 and 17 genera, respectively ( Fig. 2A & B). Through the analysis of the two SSU 16S rRNA gene regions sequenced, the SILVA database aided in identifying higher diversity of cyanobacteria over the other tested databases (Fig. 3A & D) and the two 16S rRNA gene regions identi ed the same 14 cyanobacteria genera.
Yet, certain cyanobacteria were only identi ed through GG and PR 2 databases only. The 16S rRNA genes aligned to the GG database identi ed 11 genera through the V1-V3 region, and V4-V5 identi ed 10 genera ( Fig. 3B & E), respectively. Using the GG database, two genera of cyanobacteria, Aphanizomenon and Synechococcus, were identi ed through the V1-V3 and V4-V5 regions of the 16S rRNA gene and were not identi ed using SILVA. At the same time, Planktothrix was picked up by the V1-V3 and not the V4-V5 primers in the GG databases. We did not expect much from the PR 2 database to identify cyanobacteria ( Fig. 3C & F); however, two cyanobacteria genera were only identi ed through the PR 2 database. Using the V1-V3 region, Gloeothece was identi ed in the BG-11 media, and the V4-V5 region revealed Nostoc in the nutrient media BG-11, BBM, and MM. Nonetheless, we can consider that using the primers to the V1-V3 and V4-V5 regions helped uncover more cyanobacteria through SILVA and GG databases.

Eukaryotic Microalgae Identi cation
Both the 16S and the 18S rRNA sequenced regions were used to identify a total of 68 eukaryotic microalgae (Fig. 2C). The SILVA database and the 18S rRNA V4 region revealed the highest diversity of 43 eukaryotic microalgae, with 27 genera belonging to the phylum Chlorophyta and 12 genera belonging to the Ochrophyta. The remaining identi ed genera belonging to the phylum Streptophyta (3 genera) and Cryptophyta (1 genus) ( Fig. 4A-D). The PR 2 database and the two 16S rRNA genes sequenced together resulted in the classi cation of 37 eukaryotic microalgae genera. The 16S V1-V3 fragment divulged 28 genera, out of which 10 genera were uniquely identi ed by the PR 2 database and the V1-V3 fragment. In contrast, the V4-V5 fragment alignment identi ed 22 microalgae genera ( Fig. 4E-H). Using only the SILVA database and the 18S rRNA V4, 25 eukaryotic microalgae would have been missed from the study as these genera were only revealed through the sequencing of the V1-V3 and the V4-V5 regions of the 16S rRNA gene and annotated using the PR 2 .

Alpha Diversity Indices of Total Microbial Community based on OTUs
The alpha diversity was calculated for each location and based on total phyla identi ed through each region. The 16S V1-V3 region through the three databases had similar Good's coverage index ranging from 94-95% across the samples using GG database while PR 2 coverage index ranging from 91-93% ( Fig. 5A-C). Nevertheless, the observed richness (Sobs) in terms of OTUs was the highest in the Nile2 sample, followed by the Nile3 and Nile1 samples across the three databases ( Fig. 5A-C). However, the V4-V5 region of the 16S rDNA had a higher Good's coverage (< 96%) when calculated across the three databases. Furthermore, the Sobs is high when analysed through the V4-V5 region over the V1-V3 region.
Nile2 still had the highest Sobs, followed by Nile1 than Nile3 using either of the 16S regions ( Fig. 5D-F). The Shannon index for Nile2 was the highest amongst the other Nile samples when analysing the sequences for the V4-V5 region (Fig. 5D-F), while the higher Shannon index was observed in Nile3 when examining the V1-V3 region ( Fig. 5A-C), followed by Nile2 and Nile1 samples.
The high InvSimpson index re ects the increase in diversity. The InvSimpson values increase is observed across Nile3 samples across all the sequenced regions, followed by Nile2 and Nile1 samples. However, the 16S V4-V5 and the 18S V4 of Nile1 and Nile2 have a similar median of InvSimpson (Fig. 5D -F & G). The nal alpha diversity index evaluated regarding the OTUs abundance is the Berger Parker index. The decrease in this index indicates an increase in diversity. The Nile3 sample has the lowest Berger Parker index indicated the presence of high diversity followed by Nile2 and Nile3 across all the databases and regions (Fig. 5). This is in line with the other two calculated indices, Shannon and InvSimpson, for the three locations. The 18S V4 region for eukaryotic microorganisms analysed through the SILVA database had a Good's coverage > 99%, while the highest Sobs was observed in the Nile1 sample followed by Nile3 The relationship between identi ed phyla, media, and location was examined through the Bray-Curtis similarity index based on the microbial composition identi ed through SILVA 18S V4 results as it identi ed the highest overall diversity. An unexpected clustering pattern was observed, and samples were clustered with samples enriched on the same media rather than being clustered by their corresponding location (Fig. 6).

Discussion
This study aims to fully evaluate the culture-based morphological identi cation method and the multiprimer/multi-database molecular approach in identifying microalgae diversity from an enriched environmental sample, in our case, the River Nile. The challenge presented was to identify the capability of the currently used media in the enrichment of the diversity to identify culturable microalgae.
Additionally, pursuing the possibility of improving the media for better enrichment of microalgae population. This could be through nding a media that would provide an evenness to the present diversity that would enable the enrichment of rare species in the community. Growth analysis studied con rmed that none of the media hindered the growth of the microalgae (data not shown).
Through morphological identi cation of microalgae in the four tested media (BG-11, BBM, MM and MS) used to enrich the River Nile environmental samples, a total of 37 microalgae have been identi ed across the three samples combined. MM media alone identi ed 35 microalgae, while the commonly used media BG-11 and BBM identi ed 30 and 27 microalgae, respectively. Through a closer look at the four tested nutrient media used, each media had all the necessary macronutrients and micronutrients (such as N, P, K, Fe, Cu, Mn, Zn, Co and Mo) that would enhance microalgae and not hinder its growth. However, each nutrient media had different macro and micronutrient composition/concentrations. Differences in media components may have caused the favoring growth of certain microalgae over the other. One of the main differences in media composition is the source of nitrogen. BG-11, BBM, and MM media contain sodium nitrate as the main source of nitrogen. While MS media, containing the highest nitrogen content, has nitrogen in two forms, ammonium nitrate, and potassium nitrate. Microalgae can use nitrogen in different forms such as nitrate, nitrite, ammonium, and urea, an organic nitrogen source. The different nitrogen sources are rst reduced to ammonium which is assimilated into amino acids through different pathways in microalgae 20 . While ammonium is a preferred nitrogen source due to its assimilation being a low energy process, sodium and potassium nitrates are still preferred in media over ammonium. This is attributed to ammonium ions (NH 4 + ) being converted to ammonia gas (NH 3 ) under aeration at alkaline conditions and thus considered a loss of nitrogen source 21 . Hence, this may be considered a reason why MS media had the lowest identi ed diversity even though it was rich with nitrogen content.
Citric acid and EDTA are other critical media components. The citric acid present in BG-11 and MM media is responsible for solubilising salts components in the media and preventing iron precipitation, making it readily available to the microalgae, thus resulting in an enhanced growth rate as iron is considered an essential limiting micronutrient 21 . Therefore, it can be a leading cause for the higher diversity present in MM and BG-11 media. BG-11, BBM, and MM media also have EDTA, disodium salt acting as a chelating agent. In contrast, MS medium only has its iron chelated in the form of Ferric sodium EDTA, which has been reported to be a better source of iron 22,23 . Yet, MS medium did not uniquely enrich any microalgae over the other tested media when morphologically analysed. This indicated that the ratio of iron is relatively low compared to the other media and its increase could have resulted in a higher diversity enrichment.
Nevertheless, we have decided to continue molecular analysis, including MS medium, because morphological analysis could be limited by the volume analysed and the number of microscopic elds visualised from a single sample and the possibility of human error. The morphological analysis was conducted every seven days, and the highest diversity was observed on day 14. Hence, DNA extraction for the molecular level analysis was conducted on the samples on the 14th day. Even though nutrient content was not measured during the experiment, we assume that after day 14, nutrient depletion and accumulation of waste products from the enriched microalgae biomass can drastically increase pH. pH can also increase due to the microalgae uptake of inorganic carbon, and photosynthesis increases O 2 production, which increases the pH of the culture 24,25 . Exceeding the 14 days may also cause the reduction of the intermediate and rare microalgae (low in abundance) and further boost the dominant and most tolerant microalgae such as Ankistrodesmus, Chlorella, and Scenedesmus 26,27 . Furthermore, this is common as green algae are usually known to dominate and push down the growth of cyanobacteria 28 .
The combination of the hypervariable regions sequenced using universal primers speci c to the 16S V1-V3, V4-V5 rRNA, and the 18S V4 rRNA region provided us with complete coverage of the algal diversity in the enriched population. Each primer set aligned to the three reference databases contributed substantially to pro ling the different eukaryotic and prokaryotic microalgae present in the River Nile samples studied. Eighty-seven microalgae were characterised from the River Nile samples collected. Additionally, each individual reference database, along with each primer set, had a limited capacity to pro le a certain microalgae diversity in the samples analysed. Combining the three primer sets results resulted in a superior coverage that would have been missed if only a single set of primers and one reference database had been used.
Yet, several microalgae (17 genera) have been identi ed only using morphological characteristics and were missing from the list of species identi ed using molecular methods. This can be explained by the possibility that some microalgae have been incorrectly morphologically classi ed due to the close resemblance to other microalgae in the same phylum. For example, Tetraspora (Chlorophyta) may be confused with Chlamydamonas or Gloeococcus. Other genera such as Leptolyngbya and Spirulina (Cyanobacteria) can also be incorrectly classi ed with other lamentous Cyanobacteria 29 . Furthermore, the presence of high nutrients and asexual reproduction can hinder morphological identi cation of unicellular microalgae due to their phenotypic plasticity under nutrient and environmental alteration 30,31 .
The presence of synonyms of genera between databases can cause a double count between morphological and molecular-level identi cation or result in "hit and miss" between one database and another 32 . Genera such as Selenastrum identi ed through morphological-level identi cation has a synonym Monophidium pro led through SILVA 18S V4 region. Moreover, there is a complete sequence entry of the 18S rRNA gene of Selenastrum present in the SILVA database along with a sequence for Monophidium 33 . This limitation can be ascribed to SILVA database being curated database, and there are several entries with different synonyms of the same genera 34 . However, some of the microalgae identi ed only on a morphological level have characteristic features that can be easily distinguished microscopically and are not mistaken with other genera such as Actinastrum, Coelastrum, Dictyosphaerium, Largerheimia, Tetraedron, Pediastrum, and Selenastrum 35,36 . Nevertheless, they were not identi ed on a molecular level through any of the regions analysed or through the three reference databases used. The complete 18S rRNA gene of the genera Actinastrum, Coelastrum, Dictyosphaerium, Tetraedron, Pediastrum, and Selenastrum are present in the SILVA database, while Largerheimia 18S rRNA gene was not available on SILVA but is present in a different database ENA (European Nucleotide Archive) that has not been used in this study for the analysis of the 18S rRNA sequences 33 . These results suggest that the genera listed above could be new variants to the genera and, thus, the need for their isolation and characterisation. This also points out that databases that are not speci c to microalgae are not constantly updated with microalgal sequences compared to the examined continuously and updated microbial taxonomies such as in studies on the gut and oral microbiomes or general microbial communities in environmental samples [37][38][39][40][41] .
When it comes to eukaryotic microalgae, they were best identi ed through the SILVA database, and 18S V4 rRNA gene ~ 63.24% (n = 43 (total = 68)) of the identi ed genera was through the SILVA and 18S V4 region. Even though 25 eukaryotic genera were identi ed using the 16S V1-V3 region and PR 2 database.
SILVA identi ed a greater diversity; this can be accredited to the fact that SILVA is a bigger database 7 .
But the microalgae speci cally identi ed using PR 2 reference database 10 and the 16S rRNA gene can still be considered high and of added value to the pro ling of microalgae diversity from an environmental sample.
Through the alpha diversity indices calculated for the OTUs analysed through the different databases, the Good's coverage index was higher in the V4-V5 region in general. Moreover, through the V4-V5 region annotated by the three databases, the Nile2 sample had a high Shannon index indicating the presence of high diversity in terms of abundance, and the Sobs index con rms that. While, the InvSimpson index, a dominance index, indicates that Nile2 has a lower InvSImpson index than Nile3. Nile3 sample has a higher diversity with more evenness between the genera. Furthermore, this is also con rmed by the decrease of the Berger Parker index of the Nile3 sample across all the databases and regions sequenced. The diversity indices indicate that the Nile3 is the most abundant and with an even distribution of genera and the absence of the dominance of a genus. This is followed by the Nile2 and Nile1 samples.
Regarding the 18S V4 region annotated by SILVA, the diversity indices indicate that Nile1 has the richest diversity with a high evenness between the different genera identi ed. This is followed by Nile3 and then Nile2.
The biplot constructed using the SILVA 18S V4 region dataset con rms that media usage does enrich speci c diversity and enriched diversity is biased to the media. This further con rms that using one medium can lead to the enrichment of certain microorganisms over the other. Hence, the usage of one enrichment media will lead to the loss of identifying certain microalgae and provide misleading data concerning the diversity of studied environmental samples. Using BG-11, BBM and MM were the most suitable for culturing and enrichment of the highest diversity from an environmental sample, compared to MS media that did not identify any unique species. This enrichment culture-based method can also be considered an alternative to the more expensive approach of increasing the number of sequencing reads to investigate further the low abundant species that are not enough to appear due to the presence of a limiting detection threshold. The culture-based method followed by the molecular level through several different hypervariable regions eliminates these biases that can result from using one media or one sequencing region. Moreover, it is critical to recognise that using different enrichment media and sequencing regions complement each other and provide completeness to the diversity pro led.
Thereby, we may conclude that the combined use of both the culture-based method and sequence analysis of multiple rRNA regions/genes (targeting both nuclear and plastid genomes 42 ) facilitates a more reliable and comprehensive approach to identifying total microalgae biodiversity. This approach can be considered a more e cient methodology as it targets microalgae identi cation from an environmental sample over the commonly used metagenomic sequencing that utilises the environmental sample directly, which are usually high in microbial biomass and low in microalgae biomass.

Study areas and sample enrichment
To investigate the competence of the different databases and the multi-primers studied, three water samples were collected from the Nile River in the Cairo governate (Supplementary Table S1)

Physicochemical analysis
The physicochemical analysis was performed to assess the composition of the microalgae's natural habitat identi ed in each location's water samples (Supplementary Table S3). The physical and chemical characteristics measured include salinity, pH and Electrical conductivity (EC), were measured using a multimeter-probe (330i, WTW, Germany. The presence of cation and anions, such as sodium, chloride, sulfate, potassium, and calcium, were measured according to standard methods 45 . Total ammonia, nitrate and phosphorus were evaluated based on the standard protocol 46 .

Morphological identi cation of microalgae
The identi cation of microalgae was based on the morphological characteristics observed under a brighteld microscope following the standard procedures 35,36 . The strains were examined under a light microscope (Leica) using the software LAS EZ (Leica DM500). Genes-level identi cation of the microalgae was performed.

DNA extraction and sequencing
Total genomic DNA was extracted after 14 days of media enrichments of sample from a volume of 200 ml (~ 4.7 x 10 6 cells ml − 1 ) of each culture and centrifuged at 4000 rpm for 30 minutes. The wet pellets collected were then weighed, and aliquots of 150-250 mg were transferred into 1.5ml Eppendorf tubes for total DNA extraction. The Eppendorf was then dropped in liquid nitrogen and grind to powder form to extract DNA. Genomic DNA was extracted from the frozen wet pellet using the PowerMax Soil DNA Isolation kit (MoBio, Carlsbad, CA) following the manufacturer's instructions. The concentration and purity of the DNA extracted were monitored using an LVis Plate SPECTROstar® Nano (BMG LABTECH, UK) and 1% agarose gel. Raw sequence processing and data analysis Trimmed fastq les were ltered using MOTHUR v1.42.6 following the pipeline from the MiSeq standard operating procedure (SOP) available on their website (www.mothur.org/wiki/MiSeq_SOP) 51 . Brie y, the fastq les were converted to fasta, and the les were screened using the MOTHUR command screen.sEq. Reads shorter or more than 490, 420 and 690bp were discarded from the 16S rDNA, V1-V3, V4-V5 and 18S rDNA sequences, respectively, along with homopolymers longer than 8bp. The remaining sequences were de-replicated (unique.seqs) to merge duplicates and reduce the number of sequences to analyse. The remaining sequences were aligned to a database. The sequences of the 16S rDNA were aligned to the Greengenes database (Gg_13_8_99) and Protist Ribosomal Reference database (PR 2 ) 10 18S rDNA was aligned to the SILVA database (Silva.nr_v132). The lowest level of the taxonomic hierarchy present in SILVA is the genus, while in Greengenes lowest taxonomic level that can be identi ed is the species level 7,38 . Chimeras in the sequences were also ltered and removed (chimera.uchime and remove.seqs). Sequences were split into groups base on their taxonomy at the order level, and OTUs (operational taxonomic units) were assigned using the dist.seqs command. Alpha and beta diversities were calculated based on normalised OTU abundance information obtained using the sample with the fewest sequences as a standard. The diversity indices calculated include Good's coverage, species observed (S obs ), Shannon's diversity index, Simpson's diversity index (InvSimpson), and they were calculated using the MOTHUR SOP 52 .
The molecular data were further analysed and visualised with R 3.6.1 packages executed via RStudio ("Open source and enterprise-ready professional software for data science-RStudio"; "R: The R Project for Statistical Computing").

Data Availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.