High-throughput DNA barcoding provides insight into the factors shaping spider diversity in the biodiversity hotspot of Wallacea

doi:10.21203/rs.3.rs-3997841/v1

Download PDF

Article

High-throughput DNA barcoding provides insight into the factors shaping spider diversity in the biodiversity hotspot of Wallacea

https://doi.org/10.21203/rs.3.rs-3997841/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Biodiversity hotspots, housing an immense portion of Earth's terrestrial biodiversity, are under severe threat due to anthropogenic change. The prevalence of undescribed taxa in these locations poses a significant challenge to community-level biodiversity monitoring and protection. In this study, we use a DNA mini-barcode to test predictions of biodiversity patterns in Sulawesi, Indonesia, part of the Wallacean biodiversity hotspot. We tested the elevational diversity gradient, the presence of areas of endemism (AOEs), and patterns of diversification in a diverse taxonomic group, the spiders (order Araneae). Our approach utilized individually sequenced adult specimens paired with juvenile specimens sequenced using a metabarcoding approach to assess diversity patterns within and between three mountains. 2,357spiders were sequenced and produced 926 amplicon sequence variants, clustered into 508 operational taxonomic units. Alpha diversity was significantly associated with elevation, peaking at the 1,000-1,500m elevation band, which supports the elevational diversity gradient. Distinct OTUs were found on each mountain, indicating areas of endemism for Sulawesi spider biodiversity. Additionally, communities between neighboring elevation bands within a mountain shared few OTUs; this high elevational turnover demonstrates how mountains act as biodiversity hotspots in and of themselves. Using a coarser sequence clustering approach, we found evidence for extensive in-situ diversification, reflected by a high number of 97% clusters that collapse into a low number of 95% clusters. Patterns of shared clusters between communities showed elevational structuring in line with niche conservatism with limited local diversification occurring between sites within a narrower elevational range. The approach we adopted greatly expanded our knowledge of Sulawesi spider biodiversity while also providing insights into the relative roles of pre-adapted lineages and local diversification in shaping overall diversity patterns across the island.

Biological sciences/Ecology/Biodiversity

Biological sciences/Ecology/Biogeography

Biological sciences/Ecology/Community ecology

Biological sciences/Evolution/Speciation

Global anthropogenic change has created a biodiversity crisis, leading many to believe a sixth mass extinction event is underway [1,2]. A major challenge in studying the effects of anthropogenic disturbances on biodiversity is the high number of undescribed taxa, especially in biodiversity hotspots. Biodiversity hotspots, supporting around half of terrestrial biodiversity on just 1.4% of Earth’s land area, contain most of the globe’s undocumented diversity [3–6]. Endemic species in these hyper-diverse locations are at particular risk of extinction due to their limited ranges and dependence on remnant and shrinking habitats [7,8]. These taxa are being lost at a rate higher than species discovery, making it extremely difficult to measure the extent of global extinctions9. It is paramount that we develop holistic, community-level approaches that allow rapid biodiversity documentation and enable the detection of broadscale shifts in species composition that may lead to ecosystem collapse.

High-throughput methods using next-generation sequencing (NGS) have emerged as tools to rapidly assess unknown diversity at the community level [10–12]. In our study, we use DNA barcoding to document novel biodiversity and test predictions of biodiversity patterns across mountain ranges in Sulawesi, Indonesia. Sulawesi, the largest island in the biodiversity hotspot of Wallacea, is a composite island consisting of continental fragments that collided in the early Miocene [13]. This collision created a chain of islands that remained largely disconnected until roughly 3Myr when changes in sea levels connected paleoislands and rapid uplift produced multiple high-elevation mountain ranges [14]. The island exhibits remarkable endemic biodiversity with patterns of in-situ diversification tied to the complex geological history, resulting in areas of endemism that mirror the paleoislands [15,16].

In our study, we focused on community assessment of the spiders (order Araneae), a hyperdiverse lineage with over 51,000 described species [17]. Spiders are both integral predators and prey in every terrestrial biome and can serve as bioindicators of ecosystem health [18,19]. Our goal was to understand the attributes of spider communities as a whole in Sulawesi. Toward this, we used a COI mini-barcode to sequence thousands of spiders, both individually and using a pooled metabarcoding approach, to assess patterns of diversity across three mountains in Sulawesi (Figure 1). A mini-barcode region was chosen to increase amplification and sequencing success across many diverse specimens; despite shorter lengths, mini-barcodes serve as reliable species identifiers [20,21]. We paired our approach with non-destructive whole-body DNA extractions to allow the retention of voucher specimens and increase throughput.

Our first objective was to test the utility of the COI mini-barcode in classifying species-like operational taxonomic units (OTUs) for the spiders of Sulawesi. Using the generated data, our second objective was to test for classic biodiversity patterns in Sulawesi, including the elevational diversity gradient, the presence of areas of endemism (AOEs), and in-situ diversification. We hypothesized that (1) species diversity will be highest at mid-elevations as expected under the elevational diversity gradient [22], (2) mountains will act as AOEs and contain distinct pools of diversity [23], and (3) patterns that emerge at differing genetic clustering thresholds will reflect processes involved in biodiversity formation – more shared clusters between mountains within the same elevation band may indicate divergence associated with colonization of preadapted propagules and niche conservatism (Figure 2a) while finding more shared clusters within mountains between elevation bands may indicate divergence associated with niche lability (Figure 2b). Finally, we may find within-mountain diversification between neighboring elevations, reflecting minor niche shifts, and coarser pre-adapted lineages shared between mountains, structured by elevation (Figure 2c).

Sequencing results

A total of 2,357 specimens were sequenced – 913 adult specimens that were individually barcoded and 1,444 juvenile specimens that were sequenced using a pooled approach. Following sequence denoising using DADA2 [24], decontamination using package decontam [25] and LULU for further data curation [26], 3,173,164 reads remained that were grouped into 926 amplicon sequence variants (ASVs) and further clustered using swarmv2 [27] to produce 637 OTUs, herein referred to as sOTUs (swarm OTUs) to differentiate between other clustering methods. Juvenile samples were pooled by site and collection method and then processed as bulk community samples; each pool produced an average of 11 sOTUs with a median of 11 sOTUs per sample. The inclusion of juveniles added 121 sOTUs not found in adult samples. Individual adult samples produced an average of 4 sOTUs and a median of 3 sOTUs per sample (Supplementary Materials, Figure 1), with 343 sOTUs not detected in juvenile samples.

Individually sequenced samples should have a single representative COI barcode but we identified multiple sOTUs per adult sample; multiple sequence variants may be a result of pseudogenes [28], mitochondrial heteroplasmy [29], index hopping [30] or contamination during library preparation. To determine the representative COI sequence, we morphologically identified 629 adult samples and compared the verified family identifications to the 425 confident sOTU family assignments. When selecting the most abundant sOTU in a sample, we found the family identity associated with an sOTU matched the morphological family identifications for 353 (83.1%) adult specimens. By examining cases of mismatched taxonomic identities, we found incorrect matches were associated with low read abundance in a sample, with a median of 48 reads for mismatches and 685 reads for matches (t = 15.6, p-value = 3.6e-48); Supplementary Figure 2). Accordingly, we filtered sOTUs that occurred with 25 or fewer reads in a sample. Next, we examined the association between the number of sOTU reads found in a single sample to the total reads for an sOTU across the library. Correct taxonomic matches made up a median of 4.3% of total sOTU reads across the library while incorrect matches made up only 0.3% of total reads (t = 14.72, p-value = 1.84e-43; Supplementary Figure 3). We removed sOTUs when the reads detected in the sample made up less than 0.5% of total sOTU reads across the library; this step reduced the mean and median number of sOTUs per adult sample to 2.194 and 1 respectively. For any remaining adult samples that had more than one sOTU, we used the standard method of selecting the sOTU with the highest read count. Following filtering, we constructed a maximum likelihood phylogeny using IQTREE [31] with all remaining sequence variants (Supplementary Figure 4). From this, we identified one highly divergent sequence variant found in 121 samples across all mountains and elevations; this was likely an erroneous sequence and was removed. 789 adult samples remained after filtering, making the final dataset include 2,233 individuals.

Patterns of diversity across elevational gradients and geographic locations

Alpha diversity

Following all filtering and cleaning procedures, 508 sOTUs consisting of 660 ASVs remained. 24 of the 508 sOTUs (4.72%) were shared across all mountains (Supplementary Materials, Figure 5). 98 of the sOTUs consisted of multiple sequence variants, which can be considered haplotypes. There was a significant association between the number of ASVs clustered within an sOTU and the number of mountains on which the sOTU was found (p-value = 2.7e-49) with a mean value of 1.12 ASVs per sOTU found on one mountain compared to 3.38 ASVs per sOTU when found on all three mountains (Figure 3).

Alpha diversity was calculated using Hill numbers based on sOTUs. All Hill numbers decreased with elevation; this was significant for q = 1 and q = 2 which both factor in abundance differences (Supplementary Materials, Figure 6). The highest diversity values were detected at mid-elevation sites between 1,000-1,500m (Figure 4). Alpha diversity also varied by mountain, again significant when using q = 1 and q = 2 (Supplementary Materials, Figure 7). Torompupu had the highest number of unique sOTUs followed by Dako and Ilomata (Table 1). While Dako had a higher sOTU richness than Ilomata overall, Dako showed the lowest mean diversity when accounting for differences in the number of individuals collected (Supplementary Materials, Figure 7).

Table 1. Table displaying the number of clusters found on each mountain at the 95% and 97% clustering thresholds and when clustered using swarmv2. Row labeled “All” refers to the total number of distinct clusters.

Mountain	95%	97%	sOTU
Torompupu	119	202	257
Ilomata	99	143	166
Dako	107	183	220
All	145	332	508

Community analysis

Because of the limited number of sOTUs shared between mountains (4.72%, Figure 6), phylogenetic beta diversity was used to assess community composition based on the previously referenced maximum likelihood tree (Supplementary Materials, Figure 3). Communities clustered in ordination space by mountain and by elevation (Figure 5). Both mountain and elevation proved significant in explaining community composition; mountain was the strongest predictor (R2 = 0.125, p-value = 0.001) followed by an interaction between elevation and mountain (R2 = 0.10, p-value = 0.016) tested using PERMANOVA. No group dispersion differences were identified when using mountains (F-value = 1.25, p-value = 0.308) or elevation band (F-value = 1.53, p-value = 0.235).

Clustering patterns between mountains and between elevation bands

Patterns between mountains

To gain insight into the processes shaping genetic diversity within and between mountains, we compared different clustering levels. When clustered at 97%, which may be considered a more generous species-like grouping than sOTUs, 41 of 332 (18.4%) of clusters are shared across all mountains, compared to 24 or 508 (4.7%) for sOTUs (Figure 6; Supplementary Materials, Figure 8). Clustering at 95%, which could be considered a genus-level grouping, greatly condenses sequences, producing 145 clusters in total. 42% of clusters were shared across all mountains (61 of 145) at a 95% clustering threshold and only 26 clusters were unique to single mountains (Figure 6; Supplementary Materials, Figure 8).

Patterns across elevation groups

Dendrograms were used to visualize differences in community relationships between mountains and elevation when clustered at 97% versus 95% (Figure 7). At 97% clustering, communities that were sister to one another were generally at neighboring elevations within the same mountain. The highest number of shared 97% clusters was between the different lower elevation sites on Torompupu, with a total of 40 clusters shared between sites at 500 – 1,000m and sites at 1,000m – 1,500m. The fewest shared clusters were between sites below 500m and sites above 2,000m both within and between mountains. Dako sites below 500m were distant from all other communities, including sites below 500m on Ilomata, which were more similar to communities at a higher elevation within the mountain. At 97%, branch lengths were long between sister communities, indicating a low number of shared clusters even between the most similar communities.

When clustering more coarsely at 95%, sites instead were grouped by elevation and showed shorter branch lengths, reflecting the increase in shared clusters between communities at this coarser threshold. The mid-elevation sites on Torompupu (500 – 1,000m and 1,000 – 1,500m) remained most similar to one another and shared the most clusters. These sites clustered with the 500 -1,000m sites of Dako and Ilomata, a pattern shared with the 97% dendrogram. No other sites within a mountain were sister to one another at the 95% clustering threshold. Ilomata sites between 1,000 – 1,500m emerged as most dissimilar from all other sites at this clustering threshold rather than the Dako communities below 500m, which instead grouped with Ilomata sites that were below 500m.

Biodiversity discovery

A low number of sOTUs returned matches above 95% using BLAST (65 of 508 sOTUs) making family identities uncertain. Only five sOTUs produced a BLAST match above 99%, suitable for species assignment; all were documented in specimens collected from lowland sites below 500m. These were identified as Steatoda cingulata, Leucauge decorata, Leucauge celebesiana, Cyclosa bifida and Herennia sp. A total of 26 spider families were found, including three not currently documented in Sulawesi according to the World Spider Catalog – Theridiosomatidae, Hersiliidae, and Scytodidae (Supplementary Materials, Figure 9). Scytodidae was detected only in the juvenile samples but by morphological assessment of the sample following sequencing, we were able to confirm that there were indeed scytotids present in the bulk juvenile samples.

Using a mini-barcode region, we demonstrated the remarkably high diversity of Sulawesi spider taxa and showed that: (1) species diversity as estimated using sOTUs is structured across elevation gradients, (2) mountains support distinct pools of diversity and (3) extensive diversification has occurred within Sulawesi with lineages structured by elevation, representing the dispersal of pre-adapted lineages between mountains, and local diversification occurring between sites in neighboring elevations, indicating limited niche shifts. The use of a mini-barcode region was successful in producing data across a diverse taxonomic group and allowed us to assess patterns of biodiversity and community structure without the immediate need for morphological identifications. The inclusion of juveniles was additionally important in increasing biodiversity discovery. By morphologically assessing a subset of the samples, we were able to inform taxonomic identities for juveniles and other sequences as well as establish filtering “rules” to generate a more accurate data set.

The elevational diversity gradient

Mountains are known to be biodiversity hotspots, associated with their complex topography that creates habitat diversity and distinct climatic zones [32,33]. Aligning with this, we found patterns of high elevational turnover within the mountains in our study. Sites between 500 – 1,000m had the highest richness at all clustering thresholds, indicating the highest diversity across different taxonomic levels. This reflects the well-documented pattern of mid-elevational peaks in diversity, known as the elevational diversity gradient [33,34]. The highest elevation sites were found to be lowest in sOTU richness, and also shared the fewest sOTUs with other elevational groupings. Sites below 500m were sampled only on Dako and Ilomata; despite their similar elevation, the sites sit far from one another in ordination space and only share similarities when clustered at a 95% threshold. This may be linked to the role of environmental filtering and disturbance in community composition. The lowland habitat on Dako has been converted to mixed agroforestry (Supplementary Materials, Figure 10) while the lowland sites of Ilomata, located near a protected reserve, contain some of the only intact lowland forest on Sulawesi.

Areas of endemism

Distinct species or haplotypes across Sulawesi have been documented for many vertebrate lineages, leading to the definition of areas of endemism across the island. This was first defined by the macaques of Sulawesi 16 and more recent work directly incorporating paleogeology has further supported these AOEs 15. A similar pattern was detectable for spiders when using the mini-barcode region; we found each mountain acted as a distinct AOE with mountain acting as the strongest grouping variable (Figure 5). We detected both distinct sOTUs on each mountain and, in instances where an sOTU was shared between mountains, found distinct sequence variants belonging to a mountain (Figure 4). This indicates population divergence occurring between mountains for shared species. The diversity between mountains remained largely unique until sequences were clustered at a 95% threshold, which may be considered a genus- or subfamily-like grouping level.

Patterns of diversification

Using different clustering thresholds, we identified extensive in-situ diversification within Sulawesi and between the three mountains in our study, reflected in a low number of shared 97% clusters between communities but a high number of shared 95% clusters (Table 1, Figure 6). The higher similarity of communities between mountains at 95% indicates the dispersal and diversification of lineages between the former island fragments, rather than individual lineages colonizing and remaining isolated on each paleoisland. The higher similarity of communities within each mountain paired with the longer branches at 97% shows that, within these higher lineages, local diversification has occurred. Each mountain did contain unique 95% clusters (Figure 6); this is likely a result of undersampling as our collection efforts were not sufficient to document all spider diversity. Unique 95% clusters, however, could also be examples of taxa that colonized and remained highly isolated within a paleoisland region.

Previous work on mountain diversity has found a combination of “centric” and “eccentric” taxa, with eccentric endemics being sister to taxa in similar environments outside of the mountain while centric taxa are sister to locally occurring species [35]. Most diversity appears to be derived from pre-adapted lineages and this is reflected at the 95% clustering threshold - the same elevation bands between mountains share more clusters and are most closely situated to one another on the dendrogram (Figure 7). This connectivity points to a pattern associated with niche conservatism rather than one of niche lability. At 97%, the most shared clusters were between neighboring elevations within a mountain and were sister to one another in the dendrogram (Figure 7). However, communities in distant elevational bands did not cluster by mountain, again indicating that major niche shifts were not prevalent and that the ancestral niche was crucial in how endemism was formed.

Biodiversity documentation

The curated sequence data produced by the mini-barcode approach greatly expanded our knowledge of Sulawesi spider biodiversity. 508 sOTUs were defined using swarmv2 and the utilization of juvenile samples added 121 unique sOTUs, confirming juvenile specimens’ usefulness in biodiversity discovery. This high number of sOTUs expands the currently known spider diversity of Sulawesi - only 278 species are known from the island, of which 197 species are endemic 17.

Similarly impressive was the high diversity of families we were able to detect in this relatively small sampling. We detected 26 families, three of which are currently not recorded on Sulawesi: families Theridiosomatidae, Scytodidae, and Hersiliidae (Supplemetary Materials, Figure 9). There are only three known species of theridiosomatids in Indonesia as a whole, all documented from Sumatra; we detected 16 sOTUs belonging to theridiosomatids on Sulawesi alone. Hersiliids are well documented across Indonesia and were therefore expected in Sulawesi, but still represent the first official record. Only three endemic species of scytodids are known in Indonesia, with two species of Scytodes in the Aru Islands [36] as well as one new species of Dictis in Sumatra 37.

We could classify very few sequences to species using BLAST; only six sOTUs produced above 99% matches, identified as Steatoda cingulata, Cyclosa bifida, Leucauge decorata, Leucauge celebesiana, Herennia sp. and Neoscona sp. Steatoda cingulata is known from Java and Sumatra and other parts of South and Southeast Asia but not Sulawesi. Cyclosa bifida is documented in the Philippines and New Guinea but, again, not Sulawesi. In total, there are only two Cyclosa species noted to be in Indonesia - Cyclosa caligata and Cyclosa seriata, each endemic to Sumatra and Java, respectively. No members of Cyclosa nor Steatoda are currently known from Sulawesi. The other species detected (Leucauge decorata, Leucauge celebesiana, Herennia sp.and Neoscona sp.)are known from Sulawesi. The incredibly small number of BLAST matches reflects the paucity of information available not only for spiders of Sulawesi but for spiders in Indonesia and the Indomalayan region broadly and again indicates the extent of biodiversity left to be discovered.

In the face of a biodiversity crisis, rapid assessment and monitoring of community-level diversity is essential to maximize conservation efforts. The use of a mini-barcode to sequence whole communities of spiders allowed us to both explore patterns of biodiversity and build knowledge about spiders in a biodiversity hotspot. We found high species turnover both between mountains, supporting the presence of areas of endemism, and high turnover across elevations, showing mountains on Sulawesi serve as biodiversity hotspots themselves. Diversity peaks on each mountain reflected the classic pattern of an elevational diversity gradient. Clustering at different thresholds allowed us to detect differences across taxonomic-like levels and tease out patterns consistent with specific processes of diversification. We were able to observe shared genus-like groups between mountains by using a 95% clustering threshold while also observing extensive in-situ diversification, resulting in many distinct 97% clusters. Comparison of different clustering thresholds provided a simple way to begin documenting evolutionary patterns, with evidence for niche conservatism structuring spider biodiversity between mountains.

Adopting an approach such as ours can begin to build important biodiversity knowledge in understudied localities in a cost-effective and time-efficient manner without the need for specialized taxonomic expertise upfront. This can build a foundation that guides future studies by identifying highly diverse or intriguing lineages. As studies grow and methods improve, the problems noted in this paper, such as low BLAST matches and multiple sequence variants per individual, will continue to be reduced. While the short COI region used is not suitable for robust phylogenetic or biogeographic analyses, it was successful in quantifying patterns of biodiversity and providing insights into the processes that may have shaped biodiversity while allowing us to build foundational knowledge in a hyper-diverse group located in an understudied system.

Study region

Three mountains in Sulawesi (Gunung Dako, Gunung Torompupu, and Gunung Ilomata; Figure 1) were sampled in coordination with a biotic survey of the Sulawesi mountains. This was a large collaborative project coordinating with the National Research and Innovation Agency (BRIN, formerly Indonesian Institute of Sciences, LIPI), Museum Zoologicum Bogoriense (MZB)) in Bogor, Indonesia and Tadulako University in Palu, Indonesia. Permits were granted by LIPI and RISTEKDIKTI (now BRIN) (23//SI/MZB/VIII/2018). Gunung Dako, located on the northern peninsula, has a summit at 2,260m; the majority of sites were located in the Gunung Dako Forest Reserve, but low elevations (< 500m) were located outside the reserve and have been converted to mixed agroforestry (Supplementary Materials, Figure 10). Gunung Torompupu, located in the central core, has a summit of 2,495m; this mountain borders the Lore Lindu National Park and has more intact lowland forest than Gunung Dako. The mountain we call Gunung Ilomata is also located on the northern peninsula, farther east of Gunung Dako; this mountain is located near Bogani Nani Wartebone National Park Sites and is part of a larger mountain range. Accessing higher elevations deeper into the park proved impossible and so the highest elevation sampled was 1,211m. The low elevation sites below 500m were in largely intact forest, contrasting the converted lowlands of Dako.

Sample collection

Sites of 20x20m were chosen at 400m intervals across the elevation gradient on each mountain, allowing thorough collection from distinct habitat types (Supplementary Materials, Table 2; Supplementary Materials, Figure 10). Elevation bands were defined as < 500m, 500-1,000m, 1,000-1,500m, 1,500-2,000m and > 2000m. Sites below 1,500m have lowland forest habitats with higher elevations displaying transitional forest type, and lower sites featuring higher anthropogenic disturbance. Above 1,500m, the habitat transitions to mossy forests. Two sites at each elevation band were chosen when possible. However, the number of sites at each elevation and the elevational range we were able to sample were dependent on local field conditions, accessibility, and time limitations.

Community sampling was conducted using standardized methods and were conducted by one individual. Each site was sampled using one hour of hand collection during the day, two hours of hand collection during the night, leaf litter sorting using a standardized volume of pre-sifted soil, timed beat sampling with 20 seconds of beating for seven distinct plant types, two pitfall traps and one hour targeted web documentation with subsequent specimen collection. No plant materials were collected in this study. Specimens were stored in 95% EtOH and, upon importation to the USA, samples were stored at -20°C until DNA extractions were performed after which samples were transferred to 70% EtOH and stored at room temperature.

Molecular Procedures

Juvenile specimens from each collection unit (sampling method and site) were combined into pools while adult taxa were grouped into morphospecies by unit. A female and/or male of each morphospecies was sequenced for each site individually. DNA was extracted from adult specimens from pulverized leg tissue or non-destructively by soaking the entire specimen in a solution of cell lysis buffer and Proteinase K for three hours, with volumes altered dependent on sample size. Pooled juvenile samples were extracted non-destructively by soaking in a 600 µL solution of cell lysis buffer and Proteinase K for three hours. Samples were incubated for only three hours to preserve morphological characters (Supplementary materials, Figure 11). Following lysing, DNA isolation was performed following the Qiagen PureGene protocol (Qiagen, Hilden, Germany) and eluted in 20µL of Elution Buffer. To increase the throughput of the protocol, all extractions were performed in 96-well plates. To test the success of extractions, DNA was spot-checked across each plate via NanoDrop (ThermoFisher Scientific, Waltham, MA, USA).

Amplification was performed using the Qiagen multiplex kit (Qiagen, Hilden, Germany). The mini-barcode region was amplified using the forward primer LCO1490 (5′‐GGTCAACAAATCATAAAGATATTGG‐3′) [38] and the reverse primer COI-CFMRb (5′‐GGNACTAATCAATTHCCAAATCC‐3′) [39], each which has the additional TruSeq tail for Illumina flow cell binding. The PCR reaction consisted of 5µL of the Qiagen PCR MasterMix (Qiagen, Hilden, Germany), 3µL of H20, 0.5µl of each primer, and 1µL of template DNA. Amplifications were conducted at an annealing temperature of 46°C for 30 cycles. A negative PCR control was included on each plate, consisting of Qiagen PCR MasterMix, H₂O, and each primer. PCR products were visualized using gel electrophoresis on a 3% agarose gel, run at 140V. PCR products and associated DNA from specimens lacking bands were checked for quality using Nanodrop and Qubit and then re-amplified when deemed appropriate.

A dual indexing strategy was implemented using a second round of PCR to attach 8bp indexes to the TruSeq tails. The annealing temperature for indexing PCR was 55°C and ran for 6 cycles. PCR products were visualized once again using a 3% agarose gel, run at a lower voltage (100V) to allow clear visualization of fragment length to confirm the addition of indexes to each amplicon. Final libraries were constructed by pooling PCR products proportionally based on band strength, adding either 1µL, 2µL, or 3µL. Separate libraries were constructed for individually-extracted adult specimens and bulk-extracted juvenile samples. Final library preparation was conducted at QB3 Genomics (QB3, Berkeley CA, USA). Libraries were quantified using both qPCR and a Qubit Fluorometer (ThermoFisher Scientific, Waltham, MA, USA). Size selection was performed using Pippin Prep (Sage Science Inc, Beverly MA, USA) followed by Fragment Analyzer (Agilent, Santa Clara, CA USA) to confirm amplicon sizes. Pools were sequenced using Illumina MiSeq v3 300PE (Illumina Inc, San Diego, CA USA) on two lanes, combined with other libraries unrelated to this project.

Bioinformatics

Samples were demultiplexed by QB3 staff (QB3 Genomics, Berkeley, CA, RRID:SCR_022170”) using custom methods. CutAdapt was used for primer removal with the paired-end approach [40]. Following primer removal, sequence data were denoised using DADA2 as implemented in R [24]; DADA2 relies on error models that incorporate error data produced during sequencing and accurately produces amplicon sequence variants that retain fine-scale nucleotide variation. Following DADA2 processing, sequences were filtered to the expected length of 181bp and chimeras were removed using the removeBimeraDenovo method implemented in the DADA2 package. The function isContaminant in package decontam was used to remove potential widespread contamination based on sequences detected in negative controls [25]. This method assesses prevalence in controls versus positive samples to identify contaminants. The threshold parameter was set to 0.5, which removes any contaminant more prevalent in negative controls than in true samples. The LULU curation algorithm was performed on the resulting sequence data to remove additional erroneous sequences [26]. Each of the above methods was performed by sequencing lane to account for run-specific errors and subsequently combined

The retained sequences were aligned in Geneious Prime v. 2022.0.2 and manually assessed for reading frame interruptions or stop codons, representing pseudogenes; these sequences were removed. Sequences were then assigned taxonomy using megablast and any non-spider sequences were removed. A maximum likelihood phylogeny was constructed using IQTREE based on the alignment of spider sequences. The appropriate substitution model was determined using ModelFinderPlus [41]. The general time reversible model was selected, using the FreeRate model of rate heterogeneity and empirical codon frequencies (GTR+F+R7). 440 total iterations were run to reach parameter optimization. Sequences with low BLAST matches, and therefore unknown family identifications, were assigned a family identity based on their clade placement. Targeted morphological identifications were performed to inform the taxonomy of uncertain clades.

Individual sequence variants were clustered into operational taxonomic units using swarmv2 [27] with the --fastidious option. Unlike other clustering methods, swarm does not rely on a universal threshold and instead uses an iterative single-linkage clustering algorithm to create units. The fastidious option reduces under-grouping by collapsing low-abundance OTUs into larger “parent” OTUs. This method was used to create species-like units while ASVs, with their fine-scale nucleotide variability, were treated as haplotypes. Despite the multiple well-supported filtering methods we used, many samples still contained more than one sOTU. The prevalence of multiple low-abundance variants appearing in a sample is a well-known problem when using NGS approaches. To produce quantitatively informed cutoffs to use as universal filtering methods, we compared our morphological family assignments to the family identity associated with an sOTU that was found in the identified sample and assessed what were the strongest predictors of a matched sOTU classification. The number of reads for each sOTU in a sample and the proportion of the reads found within a sample to the sOTU reads found across the libraries showed strong relationships to correct or incorrect identities (Supplementary Materials, Figures 1 and 2). Cutoffs were chosen based on interquartile ranges, separated by incorrect and correct family matches, and these cutoffs were applied to all samples, including juvenile pools.

Statistical Analysis

To test if sOTUs found across mountains were more likely to consist of multiple haplotypes (ASVs), we performed Pearson’s Chi-squared test. To examine alpha diversity, we used three hill numbers: q = 0 which represents pure richness, q = 1 which is the exponential of Shannon’s entropy index, and q = 2 which is the inverse of Simpson’s concentration index. To look at the relationship between alpha diversity and elevation, we constructed three linear models for each Hill number, using elevation as a linear predictor. To examine beta diversity, we calculated phylogenetic beta diversity using the package BAT [42] using the maximum likelihood tree as input. Rarefaction was set to 20 to account for differences in abundance. We chose to use rarefaction due to skewed results between sites of similar elevations that had different sampling sizes; upon examining data, this seemed to be an artifact of abundance differences rather than a lack of similarity. The resulting mean beta diversity values were used for analysis.

We used NMDS as our ordination method to visualize group differences based on the mean total beta diversity with k = 2 and then tested for significant compositional differences by the categorical groupings of mountain and elevation using the ADONIS variant of PERMANOVA with adonis2 implemented in vegan [43]. Differences in group dispersions can make PERMANOVA unreliable so we additionally tested for dispersion differences using PERMDISP using the function betadisper (Supplementary Materials, Table 2). More aggressive clustering was performed using function otu in the package kmer [44] at 0.97 and 0.95 to represent coarser taxonomic units. Throughout this paper, these are referred to as 97% clusters and 95% clusters, while the swarmv2 species-like clustering results are referred to as sOTUs. Venn diagrams were constructed using the package ggvenn [45].

Analyses were run in R v4.2.2 and Rstudio v2022.12.0. Figures were made using ggplot2 and ggpubr and tables were made using stargazer. Code was written using the tidyverse. Further figure edits were made in Inkscape v1.2.

Acknowledgments

We thank the National Science Foundation for the funding enabling this project (NSF DEB 1457845). We are grateful for the support of LIPI and BRIN and the permits granted. This project was in close collaboration with many scientists at the Museum Zoologicum Bogoriense as well as scientists from UNTAD. We would like to thank Anang Achmadi, Mohammad Irham, and Dede Avandi in particular for their support. Additionally, we would like to express our extreme gratitude to the numerous local field guides and coordinators who were instrumental in the success of this project. We thank undergraduate research assistants Sarah Helmueller and Emilee Easter for their assistance in sorting and plating specimens. Lastly, we thank the other primary investigators involved who made the project possible, especially lead PI Jim McGuire and co-PI Rauri Bowie.

Data Availability Statement

All relevant data and associated code are hosted on GitHub at https://github.com/ajholmqu/COI-MiniBarcode. Associated sequences will be hosted on GenBank and accession numbers available in the supplementary materials if accepted for publication.

Competing Interests

The authors declare no competing financial interest.

Author Contributions

A.H. and R.G. conceived of the project and obtained funding. A.H. collected and maintained specimens, conducted molecular procedures, processed sequencing data, performed statistical analyses, and wrote the manuscript. P.L. and F.F. obtained permits for the project and organized logistics for field seasons. P.L. and F.F. assisted in the translation of the abstract. All authors were involved in the revision and approved of the manuscript.

Cowie, R. H., Bouchet, P. & Fontaine, B. The Sixth Mass Extinction: fact, fiction or speculation? Biol. Rev. 97, 640–663 (2022).
Kolbert, E. The Sixth Extinction: An Unnatural History. (A&C Black, 2014).
Joppa, L. N., Roberts, D. L., Myers, N. & Pimm, S. L. Biodiversity hotspots house most undiscovered plant species. Proc. Natl. Acad. Sci. 108, 13171–13176 (2011).
Riedel, A. & Narakusumo, R. P. One hundred and three new species of Trigonopterus weevils from Sulawesi. ZooKeys 828, 1–153 (2019).
Scheffrahn, R. H. et al. Taxonomy, Distribution, and Notes on the Termites (Isoptera: Kalotermitidae, Rhinotermitidae, Termitidae) of Puerto Rico and the U.S. Virgin Islands. Ann. Entomol. Soc. Am. 96, 181–201 (2003).
Tixier, M.-S. & Kreiter, S. Arthropods in biodiversity hotspots: the case of the Phytoseiidae (Acari: Mesostigmata). Biodivers. Conserv. 18, 507–527 (2009).
Bellard, C. et al. Vulnerability of biodiversity hotspots to global change. Glob. Ecol. Biogeogr. 23, 1376–1386 (2014).
Trew, B. T. & Maclean, I. M. D. Vulnerability of global biodiversity hotspots to climate change. Glob. Ecol. Biogeogr. 30, 768–783 (2021).
Costello, M. J., May, R. M. & Stork, N. E. Can We Name Earth’s Species Before They Go Extinct? Science 339, 413–416 (2013).
de Kerdrel, G. A., Andersen, J. C., Kennedy, S. R., Gillespie, R. & Krehenwinkel, H. Rapid and cost-effective generation of single specimen multilocus barcoding data from whole arthropod communities by multiple levels of multiplexing. Sci. Rep. 10, 78 (2020).
Shokralla, S. et al. Massively parallel multiplex DNA sequencing for specimen identification using an Illumina MiSeq platform. Sci. Rep. 5, 9687 (2015).
Valentini, A., Pompanon, F. & Taberlet, P. DNA barcoding for ecologists. Trends Ecol. Evol. 24, 110–117 (2009).
Hall, R. Southeast Asia: New Views of the Geology of the Malay Archipelago. Annu. Rev. Earth Planet. Sci. 45, 331–358 (2017).
Nugraha, A. M. S. & Hall, R. Late Cenozoic palaeogeography of Sulawesi, Indonesia. Palaeogeogr. Palaeoclimatol. Palaeoecol. 490, 191–209 (2018).
McGuire, J. A. et al. Species Delimitation, Phylogenomics, and Biogeography of Sulawesi Flying Lizards: A Diversification History Complicated by Ancient Hybridization, Cryptic Species, and Arrested Speciation. Syst. Biol. syad020 (2023) doi:10.1093/sysbio/syad020.
Evans, B. J., Supriatna, J., Andayani, N. & Melnick, D. J. Diversification of Sulawesi Macaque Monkeys: Decoupled Evolution of Mitochondrial and Autosomal DNA. Evolution 57, 1931–1946 (2003).
World Spider Catalog. Natural History Museum Bern https://doi.org/10.24436/2 (2023).
Gerlach, J., Samways, M. & Pryke, J. Terrestrial invertebrates as bioindicators: an overview of available taxonomic groups. J. Insect Conserv. 17, 831–850 (2013).
Marc, P., Canard, A. & Ysnel, F. Spiders (Araneae) useful for pest limitation and bioindication. (1999).
Hsieh, C.-H., Huang, C.-G., Wu, W.-J. & Wang, H.-Y. A rapid insect species identification system using mini-barcode pyrosequencing. Pest Manag. Sci. 76, 1222–1227 (2020).
Meusnier, I. et al. A universal DNA mini-barcode for biodiversity analysis. BMC Genomics 9, 214 (2008).
Sanders, N. J. & Rahbek, C. The patterns and causes of elevational diversity gradients. Ecography 35, 1–3 (2012).
Rahbek, C. et al. Humboldt’s enigma: What causes global patterns of mountain biodiversity? Science 365, 1108–1113 (2019).
Callahan, B. J. et al. DADA2: High-resolution sample inference from Illumina amplicon data. Nat. Methods 13, 581–583 (2016).
Davis, N. M., Proctor, D. M., Holmes, S. P., Relman, D. A. & Callahan, B. J. Simple statistical identification and removal of contaminant sequences in marker-gene and metagenomics data. Microbiome 6, 226 (2018).
Frøslev, T. G. et al. Algorithm for post-clustering curation of DNA amplicon data yields reliable biodiversity estimates. Nat. Commun. 8, 1188 (2017).
Mahé, F., Rognes, T., Quince, C., Vargas, C. de & Dunthorn, M. Swarm v2: highly-scalable and high-resolution amplicon clustering. PeerJ 3, e1420 (2015).
Song, H., Buhay, J. E., Whiting, M. F. & Crandall, K. A. Many species in one: DNA barcoding overestimates the number of species when nuclear mitochondrial pseudogenes are coamplified. Proc. Natl. Acad. Sci. 105, 13486–13491 (2008).
Berthier, K., Chapuis, M.-P., Moosavi, S. M., Tohidi-Esfahani, D. & Sword, G. A. Nuclear insertions and heteroplasmy of mitochondrial DNA as two sources of intra-individual genomic variation in grasshoppers. Syst. Entomol. 36, 285–299 (2011).
Wright, E. S. & Vetsigian, K. H. Quality filtering of Illumina index reads mitigates sample cross-talk. BMC Genomics 17, 876 (2016).
Minh, B. Q. et al. IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era. Mol. Biol. Evol. 37, 1530–1534 (2020).
Perrigo, A., Hoorn, C. & Antonelli, A. Why mountains matter for biodiversity. J. Biogeogr. 47, 315–325 (2020).
Rahbek, C. The Elevational Gradient of Species Richness: A Uniform Pattern? Ecography 18, 200–205 (1995).
Guo, Q. et al. Global variation in elevational diversity patterns. Sci. Rep. 3, 3007 (2013).
Merckx, V. S. F. T. et al. Evolution of endemism on a young tropical mountain. Nature 524, 347–350 (2015).
Strand, E. Araneae von den Aru- und Kei-Inseln. (1911).
Fomichev, A. A. & Omelko, M. M. New data on the spitting spiders (Araneae: Scytodidae) of Southeast Asia. Acta Biol. Sib. 9, 975–986 (2023).
Folmer, O., Black, M., Hoeh, W., Lutz, R. & Vrigenhoek, R. DNA primers for amplification of mitochondrial cytochrome c oxidase subunit I from diverse metazoan invertebrates. 3, 294–299 (1994).
Jusino, M. A. et al. An improved method for utilizing high‐throughput amplicon sequencing to determine the diets of insectivorous animals. Mol. Ecol. Resour. 19, 176–190 (2019).
Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal 17, 10–12 (2011).
Kalyaanamoorthy, S., Minh, B. Q., Wong, T. K. F., von Haeseler, A. & Jermiin, L. S. ModelFinder: fast model selection for accurate phylogenetic estimates. Nat. Methods 14, 587–589 (2017).
Cardoso, P., Rigal, F. & Carvalho, J. C. BAT – Biodiversity Assessment Tools, an R package for the measurement and estimation of alpha and beta taxon, phylogenetic and functional diversity. Methods Ecol. Evol. 6, 232–236 (2015).
Dixon, P. VEGAN, a package of R functions for community ecology. J. Veg. Sci. 14, 927–930 (2003).
Wilkinson, S. kmer: an R package for fast alignment-free clustering of biological sequences. (2018).
Yan, L. Package ‘ggvenn’. CRAN (2021).

No competing interests reported.

supplementary.docx

Download PDF

Version 1

posted

You are reading this latest preprint version

High-throughput DNA barcoding provides insight into the factors shaping spider diversity in the biodiversity hotspot of Wallacea

Status:

Version 1

Abstract

Figures

Introduction

Methods

Discussion

Conclusion

Methods

Declarations

References

Additional Declarations

Supplementary Files

Status:

Version 1