In this study we assessed the ability of metabarcoding to detect low abundance pest insects within mock communities of aphid and psyllid species, and then validated the approach on field-trapped insects collected from potato and vegetable crops. Metabarcoding of mock communities indicated that while all species were usually detected when all three loci were used, an increase in the number of individuals in a pool led to a decrease in detection of single specimen species (Fig. 3; Table S1 and S4). The rate of missed detections increased when only 18S or 12S data was used but remained the same with only COI data, which is likely due to COI having a favourable bias profile towards the targets. Inability to recover low frequency taxa is a common finding in metabarcoding studies [11, 21, 76] and can be exacerbated by non-destructive DNA extraction for certain taxa [39]. Nevertheless, sequencing reads for all taxa were present in the raw data but were under the detection threshold required to remove index switching.
We found the use of unique dual indexes dramatically reduced the rate of index switching compared to combinatorial indexing, thereby enabling a lower detection threshold and increasing sensitivity of the metabarcoding assay. However, even with unique dual indexes, low rates of index switching can still be seen due to rare occurrences of switching at both ends of the molecule [32]. The use of the Free Adapter Blocking Reagent has been recommended to further reduce index switching caused by free adapters on Illumina ExAmp chemistry (i.e. HiSeq, NovaSeq) [77]; however, it is unclear how this would affect the bridge amplification-based cluster generation of the Illumina MiSeq platform used in this study. Choosing an appropriate filtering threshold to remove cross-contamination remains a challenge for metabarcoding studies [2], as our filtering threshold derived from the mock communities did not enable detection of TPP in the field trap samples. Furthermore, while our threshold was based on index switching rate, our approach did not account for well-to-well contamination during library preparation. This has recently been raised as a major source of contamination, especially when libraries are prepared in microtiter plates or in automated liquid handling systems [78]. A more robust method of estimating cross-contamination may be to include a positive spike-in control during DNA extraction in the form of a taxa alien to the target environment [79] or a synthetic sequence [80].
While metabarcoding was not able to detect all the species present in the mock communities, it did reveal the presence of a pest insect that was missed by morphological identification. The presence of RWA in the 1000 Pool 1 mock community was initially thought to be a false positive (Fig. 3); however, re-examination of the preserved mock community specimens revealed an RWA nymph (Fig. 4), highlighting the value of non-destructive DNA extraction. COI barcoding of the nymph demonstrated that non-destructive DNA extraction preserves specimens adequately for both morphological identification (Fig. 1) and / or individual barcoding. Metabarcoding also detected RWA in field traps 2 and 3, despite this species not being recorded in the initial morphological identification. Re-examination revealed an aphid nymph in Trap 2 and an RWA adult in Trap 3. However, PCR amplification of the COI barcode was not possible for the nymph specimen, which could be due to field trap specimens having more degraded DNA compared to the mock community specimens. While the COI primers were designed to amplify a relatively short region of COI (337 bp; Table S3), perhaps an even shorter region could help to improve amplification from the field trap specimens, with barcodes as small as 100 bp successfully used for species identification [81].
Unlike RWA, no specimens were found to confirm the TPP detections in Traps 8 and 10. The erroneous TPP detections could have been caused by cross-contamination during library preparation, and future studies should include negative controls to provide a cumulative measure of physical contamination [82]. While we cannot rule out physical contamination, the strong PCR bias toward TPP could have led to increased index switching [83]. This bias was present in all three loci in the field trap samples and not present in the mock community results (Fig. 3), suggesting that degraded DNA in trapped specimens could be flooded by well-preserved DNA from the spiked TPP specimens. Further study using field-trapped TPP is required to determine the suitability of metabarcoding for surveillance of this pest species.
Quantitative estimations of the three multiplexed loci were impacted by the overall ratio of aphids to psyllids within the pools (Table S5) and PCR batch effects (Figure S4). The impact of PCR batch effects on quantitative estimates was greatest in 250 Pool 3, where the abundance estimates varied considerably for the larger and smaller community size with the same composition (100 Pool 3, 500 Pool 3, 1000 Pool 3) that were run in a different PCR batch (Fig. 3). The large difference may be due to the confounding factors of PCR batch effects, primer biases associated with community composition [26], and PCR competition between loci. This indicates that tandem rather than multiplexed PCR reactions or microfluidic multiplexing [29] may be more appropriate for quantitative estimates in multi-locus assays. Furthermore, we suggest future studies include identical mock communities across each library preparation and sequencing run to allow estimation and correction of these batch effects [84, 85]. These calibration communities could also be used to derive correction factors to account for taxon-specific quantitative bias [25, 86, 87]. However, assembling appropriate calibration communities may be difficult for the diverse range of species captured by the wind-based surveillance traps used in this study. Therefore, if accurate abundance estimates are necessary then an approach that does not utilise PCR, such as hybridisation probes/capture baits [70, 88, 89] or whole mitochondrial genomes [90], may help to improve quantitative estimates. Nevertheless, these techniques still possess their own individual biases [86] and do not currently have wide acceptance in validated diagnostic protocols [8].
Despite a Hemiptera-based primer design (Table S2), the sequences from the field survey traps revealed a broad diversity of Arthropod species (Fig. 5), including three tentative first detections for Australia. The most notable was the detection of the aphid species A. varians in Trap 10, identified to species level through exact matching of COI to reference sequences in public databases and confirmed via COI barcoding of an aphid abdomen found upon re-examination of Trap 10(Figure S3). Aphis varians is a Neartic species that belongs to a complex group associated with wild and cultivated Ribes spp. (Grossulariaceae) as primary hosts and Epilobium spp. (Onagraceae) as secondary hosts [91]. In Australia, there are no records of aphids causing damage to commercial Ribes spp. (currants and gooseberries), including the recently introduced and closely related A. oenotherae. Nevertheless, the detection of a new aphid associated with Ribes spp. warrants further investigation and surveys from hosts such as Epilobium spp. that are common in Australia. Importantly, the detection of A. varians from a lone abdomen represents a situation that would not have occurred when following a conventional diagnostic approach, and indeed this taxon was overlooked during the initial sorting of the traps. Laboratories conducting insect diagnostics are unlikely to individually barcode every incomplete specimen in a trap sample due to the significant costs involved, and this demonstrates the effectiveness of a non-destructive metabarcoding approach for flagging samples that contain unexpected non-target taxa, which can then be more thoroughly inspected and confirmed using conventional diagnostic methods.
In contrast to the detection of A. varians, the other tentative first detections for Australia, M. rhois and C. sexpustulatus, were indicated only by the 18S locus. While the M. rhois detection was based on an exact match to 18S, this reference sequence was the only representative for its genus. Singletons such as these present problems for taxonomic classification because there is no way to calibrate the assignment confidence with the taxonomy and sequence similarity of closely related sequences [92], and therefore they are often removed from reference databases [93]. However, in this case, removal of singletons would have resulted in loss of a large proportion of the 18S and 12S references, which already have marginal representation in the database (Fig. 2A). This issue is further compounded by the highly conserved nature of 18S, which while useful for detecting a broad diversity of taxa at higher taxonomic ranks (Fig. 2C), can struggle to differentiate many taxa at the species level [94]. For example, in the case of the C. sexpustulatus detection, all reference 18S sequences for this genus showed less than 1% variation, so while this was assigned to species with an exact match, it likely represents a closely related native Carpophilus spp. for which an 18S reference sequence does not currently exist [95]. Furthermore, while the RDP classifier used in this study has previously performed well with COI where there is a broad diversity of reference sequences [96, 97], it can suffer from over-classification in the case of sparse reference data [98]. Therefore, for loci other than COI to be effective, a greater emphasis needs to be placed on conducting baseline surveys and improving the taxonomic coverage of reference databases for endemic species at the beginning of a surveillance program. This results in improved ASV assignment at every taxonomic rank, as seen in the mock community data (Fig. 2B).
As DNA metabarcoding begins to be applied in diagnostic situations, increasing regulatory confidence will be critical for widespread uptake. While the multi-locus assay did not perform effectively in providing validation of detections, this was primarily due to insufficient availability of reference sequences for loci other than COI. Due to the already widespread availability of reference databases and high resolution for species-level discrimination, we recommended the use of COI with additional PCR replicates for metabarcoding studies aiming to detect insect pests. On the other hand, non-destructive DNA extraction proved extremely useful for validating detections, enabling confirmation of both target insects and off-target species A. varians, which has not been previously recorded in Australia. While in this case A. varians is not a serious pest, the ability to detect off-target insects may help prevent situations similar to the initial establishment of RWA in Australia, where a surveillance program was not initiated until after the first detection, revealing an already widespread distribution beyond the hope of eradication [46]. Furthermore, non-destructive DNA extraction could be used to continuously build up local databases over the course of a surveillance program [44]. For example, in this study many of the trap samples contained ASVs that were unable to be assigned to genus or species (Fig. 2C) and these could be revisited to locate and generate reference information for previously unbarcoded taxa. This circular workflow would greatly aid the timely implementation of metabarcoding in surveillance and partially alleviate the need to generate high-quality databases prior to commencing a surveillance program.
Rather than metabarcoding replacing the role of traditional diagnostics, this study highlights the importance of maintaining taxonomic expertise that can follow up detections and place the results of high-throughput methods in a broader systematic context. In fact, availability of taxonomic expertise may remain a limiting factor for surveillance, as in this study confirming detections took significantly longer than any other part of the metabarcoding pipeline. Conventional DNA barcoding and morphological taxonomy currently benefit from a close and reciprocal interaction, and we believe that integration of non-destructive DNA extraction into metabarcoding protocols will lay the foundation of a robust quality assurance framework for high-throughput insect surveillance.