Diversity Patterns of Protists are Highly Affected by Methods Disentangling Inter-specic Variants: A Case Study in Oligotrich (s.l.) Ciliates

An enormous amount of environmental sequences produced by high-throughput sequencing (HTS) is popular for inferring diversity and distribution patterns of protists, which are widely distributed and playing important roles in energy ux and nutrient cycling. However, study testing whether methods disentangling inter-specic variants affect the diversity and distribution patterns of protists using eld samples is insucient. Using oligotrich (s.l.) ciliates, one group of abundant and dominate planktonic protists, in eld samples as an example, the present study indicates that DADA2 performs better than UCLUST and UPARSE for inferring diversity patterns of oligotrich (s.l.) ciliates in Pearl River Estuary and surrounding regions. UPARSE, as an OTU-construction method might underestimate species richness and produce less reliable beta diversity pattern than DADA2. UCLUST with 97% and 99% clustering thresholds overestimate species richness, and the beta diversity pattern inferred by the former one is unreasonable. Salinity is shown to be one of the key factors responsible for variations in community distribution of ciliates, but infrequent marine-freshwater transitions occurred during evolutionary terms of this group.


Introduction
Protists, single-celled eukaryotes, are widely distributed in soil, marine and freshwater worldwide and play key roles in energy ux as well as trophic interactions and nutrient cycling [1][2][3]. However, protists with low abundance in a region are easily to be ignored in observation in vivo due to small size and di culty in identi cation and culture [4,5]. Foissner [6] assumed that probably more than a half of the diversity of many protist groups has not been revealed. Consequently, insu cient sampling dense of protists in studies using isolated cells might halt revealing their accurate patterns of diversity, phylogeny, transition and so on. Fortunately, a great number of sequences produced by high-throughput sequencing (HTS) without limitations of culturing and ignoring in observation that provide us a chance to investigate more accurate diversity and distribution patterns of protists [7][8][9][10][11].
Another is correcting amplicon errors by generating Amplicon Sequence Variants (ASVs) at single nucleotide resolution, e.g., DADA2 [21]. To our knowledge, a huge amount of references focusing on protist diversity based on HTS data are published every year, but only few studies comparing effects of different HTS data processing methods on analyses [15,22,23]. Some studies suggested that ASVs could more accurately reproduce a known alpha diversity than OTUs [23], USEARCH and QIIME strongly affected the number of predicted OTUs but not biogeography patterns [22]. Additionally, the numbers of OTUs varied with change of clustering thresholds [24]. Consequently, protists diversity inferred from HTS data are highly affected not only by sequence grouping approaches but also softwares and clustering thresholds producing OTUs. However, some questions are still unclear. For instance, testing whether ASVs and OTUs affect beta diversity patterns of protists is lack in real case scenarios using eld samples [23].
In the present study, we will explore whether methods disentangling biological variants will affect the diversity patterns of protists by using oligotrich (s.l.) ciliates, a major group of the abundant and dominate planktonic protist communities [1,[25][26][27][28][29], in eld samples as an example. Our aim is to provide suggestions for data analyses in future studies. What's more, we investigate the community distribution and transition patterns of oligotrich (s.l.) ciliates using both released sequences of identi ed species in GenBank and environmental sequences from the Pearl River Estuary and surrounding regions, which includes various salinity gradients.
Totally, four oligotrich (s.l.) datasets, viz., UPARSE-97E, UCLUST-97E, UCLUST-99E and DADA2-100E (the number of OTUs/ASVs and reads number of OTUs/ASVs are shown in Table 1), were produced from environmental sequences. The geographic locations of sampling sites and the salinity of each sites in this study are shown in Fig. 1.
In order to explore the diversity patterns of oligotrich (s.l.) ciliates among different datasets, Bray-Curtis distance matrix were used for quantifying the community dissimilarity. Hierarchical Clustering and Principal Co-ordinates Analysis (PCoA) were performed in the R platform [31] using the package 'ape' [32] and 'vegan' [33]. Hierarchical clustering were performed using the "ward.D2" algorithm of the hclust function, since the OTUs/ASVs numbers of different analyses methods were quite different, logarithmic calculations were performed on the data before performing the distance matrix calculation. In the PCoA, using "stat_ellipse (level = 0.95)" command to add ellipses by group except DY (too few points to calculate an ellipse), the differences between prede ned groups based on sampling areas were statistically tested by permutational multivariate analysis of variance (ADONIS) [34] using 1,000 permutations.

Sequence Alignment and Phylogenetic Analyses
All available SSU rDNA sequences (>1,000 bp) of identi ed oligotrich (s.l.) species as of August 2021, as well as 6 sequences of the subclasses Hypotrichia used for outgroup, were downloaded from GenBank (accession numbers in Table S1). The phylogenetic dataset DATA2-100P contained identi ed sequences in Table S1 and representative sequences from dataset DATA2-100E were produced and aligned using MAFFT [35]. Finally, Mesquite v3.6 [40] was used to infer the most parsimonious pattern of marine-freshwater transitions of oligotrich (s.l.) ciliates using the ML tree above.
PCoA results show that oligotrich (s.l.) samples basically group depending on six sampling areas (DY, GZ, PRE, SZ, ZH, ZJ), and PRE is divided into two groups (PRE1-PRE4, PRE5-PRE10). ADONIS results support signi cant (P < 0.001) differentiations of community structures among these seven groups based on all four datasets (Fig. 3). Consistent with the hierarchical clustering results (Fig. 2), samples of PRE1-PRE4 are clearly separated from those of PRE5-PRE10, ZJ, SZ and DY in all four datasets (Fig. 3). However, for DADA2-100E, samples of PRE1-PRE4, ZH and GZ tend to group together and are separated from other sampling sites (Fig. 3a).

Phylogeny and Transition Patterns of Oligotrich (s.l.) Ciliates
To investigate the ecological transition patterns of oligotrich (s.l.) ciliates, the phylogenetic tree including 197 sequences from identi ed species (Table S1) and 103 ASVs from dataset DADA2-100E are constructed (Fig. 4). Monophyly of the subclass Choreotrichia is supported by high supports (98% ML, 1.00 BI), but that of subclass Oligotrichia is not. In the phylogenetic trees, crown clades usually contain identi ed oligotrich (s.l.) species and ASVs from various habits. Though most ASVs group with identi ed species, some ASVs form isolated clades (Clade 1, Clade 2 in Fig. 4) without identi ed species. This indicates that oligotrichous diversity in low-brackish habitats might be underestimated in previous morphological investigations. Transition pattern analyses show that there is a high-brackish ancestor for oligotrich (s.l.) ciliates (Fig. 4). The subclass Choreotrichia appears to be evolved from the high-brackish ancestor with high support values (98% ML, 1.00 BI), then some transitions to freshwater, low-brackish and marine, and even back to high-brackish for some species (e.g., Tintinnopsis radix and Leprotintinnus nordqvisti) (Fig. 4). By contrast, the subclasses Oligotrichia seems to be derived from an equivocal ancestor with poor support values (ML < 50%, BI < 0.50), and some transitions to freshwater, low-brackish, high-brackish and marine, and possible additional transitions back to low-brackish for some species (e.g., Parallelostrombidium paraellipticum and P. obesum) (Fig. 4).  (Table 1 and 2, Table S2 and S3). This is consistent with previous investigations that the alpha diversity of protists is highly variable depending on sequence grouping approaches, as well as softwares and clustering thresholds producing OTUs [22][23][24]41]. Recent investigations revealed that compared to OTUs, ASVs could more accurately reproduce a known alpha diversity from mock communities of various groups, e.g., bacteria [12,42,43], fungi [42], ciliated protists [23]. In this study, the numbers of oligotrich (s.l.) OTUs/ASVs inferred by UPARSE-97E (63) and DADA2-100E (103) are more reliable than those inferred by UCLUST-97E (521) and UCLUST-99E (2154) ( Table 1). That is because only 592 morphological ciliate species were reported in South China Sea from 1991 to 2018 [44]. Surprisingly, the proportion of oligotrich (s.l.) OTUs/ASVs in each sampling area/site ( Table 2 and Table S2) as well as shared OTUs/ASVs in two or more sampling sites (Table S3) is much higher in UPARSE-97E than in the other three datasets. One possible explanation is that UPARSE, as an OTU-construction method with the least false-positive results [14], might underestimate oligotrich (s.l.) species richness. Therefore, our study using eld samples draws the same conclusion as studies using mock communities that the estimate of alpha diversity reproduced by ASVs in DADA2-100E might be better than OTUs.

Methods Disentangling Biological Variants Highly Affect
In this study, a general beta diversity pattern of oligotrich (s.l.) ciliates is revealed in four datasets, but methods disentangling inter-speci c variants also show some impact on inferring beta diversity patterns (Figs. 2 and 3). Clearly, community variations are observed between PRE1-PRE4 and PRE5-PRE10 in DADA2-100E, UCLUST-97E and UCLUST-100E, and PRE5 groups with PRE1-PRE4 instead of PRE6-PRE10 in UPARSE-97E. This might be explained by sharp increase of salinity between sampling sites PRE1-PRE4 (0.3-0.9 PSU) and PRE5-PRE10 (4.7-12.2 PSU). It is possible that the mixture of freshwater and seawater form a low-salinity front between sampling sites PRE4 and PRE5 [45]. Numerous studies have proved that salinity appears to be the factor that correlates best with distributions of phytoplankton and bacterioplankton in estuaries (e.g., [46][47][48][49]), and hence community compositions of oligotrich (s.l.) ciliates with phytoplankton and bacterioplankton as food also change heavily between PRE1-PRE4 and PRE5-PRE10. Sampling areas ZJ and DY are located in nearshore of South China Sea (surrounding regions of Pearl River Estuary), and most samples from these sampling areas cluster with PRE5-PRE10 in all four datasets (Figs. 2 and 3). Possibly, salinity also play an important role in this cluster pattern, since salinity value in sampling areas ZJ, DY and sampling sites PRE5-PRE10 are generally higher than other ones (Fig. 1). Similar to PRE1-PRE4, sampling sites GZ1-GZ5 are in upper estuary of the Pearl River Estuary, and are highly in uenced by large freshwater discharge from Pearl River. Theoretically, community structures of oligotrich (s.l.) ciliates in upper Pearl River Estuary with lower salinity and higher nutrient content should be much different from those in lower Pearl River Estuary. In present study, GZ samples group with PRE1-PRE4 in DADA2-100E, UCLUST-97E and UCLUST-99E, but not in UPARSE-97E (Figs. 2 and 3). It seems that beta diversity pattern of GZ samples are more reasonable in the former three datasets than later ones. Hierarchical clustering analyses show that most ZH samples group with PRE1-PRE4 in Clade A (Fig. 2), and most SZ samples fall into the Clade B including PRE5-PRE10 (Fig. 2a, b and  d), though sampling areas ZH and SZ are located at the similar latitude of PRE5. Different community structures of oligotrich (s.l.) ciliates in ZH and SZ samples might be due to two reasons. Firstly, the surface ow velocity on the west side (ZH) is usually greater than the east side (SZ) of Pearl River Estuary [50]. This indicates that community structures of oligotrich (s.l.) ciliates in ZH might be more highly in uenced by river runoff than in SZ. Thus, cluster pattern of SZ and ZH samples are less reliable in UCLUST-97E (Fig. 2c) than in other three datasets (Fig. 2a, b and d). Secondly, both sampling areas ZH and SZ are located in nearshore of Pearl River Estuary, where environmental factors and community structures of ciliates are highly in uenced by human activities [44,51,52]. All these indicate that beta diversity patterns of DADA2-100E and UCLUST-99E are more reliable than those of UPARSE-97E and UCLUST-97E.
In conclusion, among four datasets compared in this study, DADA2-100E performs best for inferring diversity patterns of oligotrich (s.l.) ciliates in Pearl River Estuary and surrounding regions.
Community Distribution and Ecological Transitions of Oligotrich (s.l.) Ciliates in Environments with Various Salinity Gradients As revealed in various groups of archaea, bacteria and protist (e.g., [46,47,49,53,54]), the present study also revealed that salinity gradients play a vital role in structuring patterns for community distribution of oligotrich (s.l.) ciliates in Pearl River Estuary and surrounding regions where encompass an entire freshwater-marine salinity gradient. Firstly, communities of oligotrich (s.l.) ciliates in sampling sites PRE1-PRE10 along the ow direction of Pearl River Estuary are divided into two distinct groups PRE1-PRE4 (Clade A) and PRE5-PRE10 (Clade B) (Fig. 2a), which is consistent with the sharp change of salinity gradients between sampling sites PRE1-PRE4 (0.3-0.9 PSU) and PRE5-PRE10 (4.7-12.2 PSU). Additionally, all samples from high-brackish and marine habits fall into Clade B, though these samples were collected from two sampling areas (ZJ and DY) with around 500 km distance. It is believed that both physicochemical barrier of salinity gradients and the presence of locally adapted taxa limit the colonization success of microbes in different habits with salinity gradients [55]. Within Clade B, several samples (DY2, ZJ1, PRE5-PRE10, SZ2-SZ4) are from freshwater and low-brackish habits, which indicates that except for salinity, other environmental factors and geographical distance also shape community distribution of oligotrich (s.l.) ciliates. This is also revealed in various prokaryotic and eukaryotic microbial groups in estuaries (e.g., [44,54,56]).
As above mentioned, salinity gradients have been proven to be important physicochemical factor in structuring community distribution and limiting transitions of microbes including protists. Ancestors of different ciliate groups seem originate in various habits. Oligotrich (s.l.) and hypotrich (present study, [57]) ciliates are ancestrally high-brackish, and peritrich and colpod ciliates have freshwater/terrestrial ancestors [58,59]. However, species in various habitats are reported for each of these ciliate groups. That is because infrequent marine-freshwater transitions always occurred during their evolutionary terms (present study, [57][58][59][60]), due to the ability of microbes to rapidly adapt and highly colonize to new environments [55,61]. In our phylogenetic trees (Fig. 4), some ASVs formed separated clades (Clade 1, Clade2) and might be new taxa. It is consistent with previous report that the oligotrichous morphospecies diversity is underestimated [28]. Additionally, these ASVs representing new taxa are from low-brackish habits (Fig. 4). By contrast, among all SSU rDNA sequences of identi ed oligotrich (s.l.) species deposited into GenBank, only 4% (8 out of 191) and 8% (15 out of 191) ones are from freshwater and low-brackish habits (Table S1). In future, more detailed transition and evolutionary patterns of oligotrich (s.l.) ciliates are expected be outlined, with the broader sampling, especially from freshwater and low-brackish habits. Figure 1 The location of sampling regions, modi ed from Zhu et al [11].   and PRE5-10 are indicated by different colors, respectively. The ellipses represent 95% con dence intervals except for DY (too few samples to calculate an ellipse). P represents global signi cance among oligotrichous communities from different sampling areas based on ADONIS analysis. P values < 0.05 are considered as signi cant differences among community structures.

Figure 4
Maximum likelihood (ML) tree based on SSU rDNA of 103 environmental sequences (ASVs in DADA2-100E) and 197 sequences of identi ed species from GenBank (accession numbers in Table S1). The tree shows the topology and transition pattern, support values (ML/BI) > 50%/0.50 are labeled. The number of