Estimation of Geographic Origin from Dust Using Plant DNA Metabarcoding


 Information obtained from the analysis of dust, particularly biological particles such as pollen, plant parts, and fungal spores, has great utility in forensic geolocation. As an alternative to manual microscopic analysis, we developed a pipeline that utilizes the environmental DNA (eDNA) from plants in dust samples to estimate previous sample location(s). The species of plant-derived eDNA within dust samples were identified using metabarcoding and their geographic distributions were then derived from occurrence records in the USGS Biodiversity in Service of Our Nation (BISON) database. The distributions for all plant species identified in a sample were used to generate a probabilistic estimate of the sample source. With settled dust collected at four U.S. sites over a 15-month period, we demonstrated positive regional geolocation (within 600 km 2 of the collection point) with 47.6% (20 of 42) of the samples analyzed. Attribution accuracy and resolution was dependent on the number of plant species identified in a dust sample, which was greatly affected by the season of collection. In dust samples that yielded a minimum of 20 identified plant species, positive regional attribution improved to 66.7% (16 of 24 samples). Using dust samples collected from 31 different U.S. sites, trace plant eDNA provided relevant regional attribution information on provenance in 32.2%. This demonstrated that analysis of plant eDNA in dust can provide an accurate estimate regional provenance within the U.S., and relevant forensic information, for a substantial fraction of samples analyzed.


INTRODUCTION
The ability to match a person or object with a particular location (geographic attribution) can be a crucial part of a forensic investigation. Geographic attribution often relies on the analysis of dust because of its ubiquity, abundance, adherence to most surfaces, and wide variety of component particles that includes bacterial and fungal cells, spores, minerals, soil, plant and animal components, and products of combustion and other human activities (1). Each of these particle types can be indicative of the local exposure history of the sampled object or person, and their analysis and interpretation has been used for almost a century as a tool in forensic and investigative geographic attribution (2)(3)(4).
Biological components of dust and pollen, in particular, have proven to be a valuable investigative tool.
Pollen is ubiquitous, stable, taxonomically diverse in structure, and able to discriminate geographically due to plant biogeography (5)(6)(7). Since its release varies seasonally, pollen can also provide information on the timing of dust accumulation (8). Historically, the characterization of pollen has relied heavily on analysis of the exine structure using microscopy (9). Proper pollen preparation for microscopy and analysis is time consuming, which limits sample testing capacity (10). In addition, accurate taxonomic identification and geographic inference from pollen images requires significant technical expertise and, even with that, is difficult for many pollen types. This has limited widespread application of the technique for forensic investigation as well as other applications (11). A number of approaches have been taken in an attempt to address this, including the Pollen Identification and Geolocation Technology (PIGLT) system. PIGLT is a standardized digital database of pollen images and software to augment taxonomic geographic distribution information to reduce the amount of expertise required (12,13) and an automated system for pollen microscopic imaging and characterization (14). DNA barcoding has been used for taxonomic identification for over two decades in environmental tracking, biodiversity studies, and product authentication (15)(16)(17), and has more recently been applied to pollen 4 classification (18). Barcoding targets a genomic DNA region that is common among taxa so it can be readily amplified, but whose sequence differs sufficiently to enable discrimination. Taxonomic identification is typically performed by matching a generated barcode sequence to a database of barcode sequences from identified taxa. Barcodes can be specific to a particular family or genus, but "universal" DNA barcodes have been implemented to help identify taxa in broad phylogenetic categories such as bacteria/archaea, fungi, and animals, including those from mixed samples in a process called metabarcoding. Compared to DNA metabarcoding of other taxa, plant DNA metabarcoding is challenged by the lack of truly universal barcode that can effectively discriminate most plant species (19)(20)(21)(22). For this reason, several different plant barcodes are utilized, depending on the application. This includes the chloroplast gene encoding the large subunit of ribulose 1,5-bisphosphate carboxylase gene (rbcL), the maturase K gene (matK), or the intron region of the chloroplast tRNA gene (trnL), as well as the nuclear genome encoded internally transcribed spacer region (ITS2) and others (21).
Though initially used to characterize separated pollens (23), the combination of DNA metabarcoding with next generation sequencing has enabled characterization of pollen mixtures for allergen analysis in airborne samples (24,25) characterization of honey (26-29), and has been further developed for forensic geolocation (18) and even pollen quantitation (30). Though not without limitations, DNA metabarcoding has the potential to rapidly advance many pollen application areas (31). In this study, we demonstrated that DNA metabarcoding of pollen and plant eDNA in dust samples can be used for rapid, high throughput estimation of sample origin. We intentionally focused on available datasets, National Center for Biotechnology Information (NCBI) Genbank for species identification of plant DNA barcodes and the Biodiversity in Service of Our Nation (BISON) data repository for estimation of the geographic species distribution, to demonstrate the utility of these resources. Our hypothesis was that plant remnants in dust were sufficient to provide valuable 5 information on sample provenance rapidly using DNA metabarcoding and automated analysis using currentlyavailable taxonomic and species distribution datasets.

Characterization of Plant DNA from Dust
Plants were the focus because of the extensive, available barcode sequence and species distribution data.
Though a greater diversity of bacterial and fungal DNA barcodes could be generated with most dust samples, the lack of matching sequence and species distribution data limited their utility using our methods. Our pipeline for estimating geographic origin (summarized in Figure 1) was tested using total eDNA isolated from settled environmental dust collected from four U.S. locations.
Using a relatively high stringency for OTUs (> 2 reads, present in 3 of the 3 sequencing replicates, relative abundance > 10 -4 ), there was an average of 42.2 ITS2 and 40.4 rbcL-3A OTUs per MA dust sample, with a large variance in the OTU number that depended on the date (time of year) of dust accumulation and collection ( Figure 2). There was a significant difference (p = 1.6x10 -5 ) in the number of OTU found in each season. Over 200 total ITS2 and rbcL-3A OTUs present in triplicate samples were generated in mid to late spring (April and May 2016). The number of OTUs decreased in samples over the progression from summer to fall and reached a minimum in winter, when samples collected between November 2016 to February 2017 yielded 20 or fewer total OTUs, including two samples with zero OTU. Overall, 2 to 6 times fewer OTUs were recovered from winter samples. In 2017, the OTU number per sample started to increase with the onset of spring. As expected, there was significant seasonal variation in the presence of most individual OTUs, with the date of maximum abundance dependent on taxon. This is illustrated in Figure 3, which shows the abundance of detected OTUs with the highest read counts according to time of year. There appeared to be three main groups 6 of OTU: those most prevalent in early spring, those most prevalent in late spring/early summer, and those most prevalent in late summer.
Surprisingly, the length of time that slides were exposed to the environment had only a minor impact on the number of OTUs per sample, with no significant difference (p = .51) in the number of OTUs in dust collected after coincident 14-, 28-, or 56-day exposure ( Figure 2). This indicated that substantial plant material deposited on a slide in 14 days or less. As expected from this data, the most prevalent OTUs at any one collection time were found in coincident 14, 28, and 56 days samples with few exceptions (Supplementary

Estimation of Geographic Origin
The taxonomy of the plant OTUs derived from DNA extracted from dust were assigned by matching to plant barcode records in NCBI Genbank using a 100% identity threshold. Only 38.3 ± 19.2% of the total OTUs per sample were defined at the species level, the remainder were discarded. 43.6% of the ITS2 OTUs matched a single species and 69.2% matched 5 or fewer species, while 24.7% and 60.3% of the rbcL-3A OTUs matched one and less than five species, respectively (Supplementary Figure 3). This showed that ITS2 performed better that rbcL-3A in assigning single plant species using the defined parameters.

7
A majority of the point-to-grid maps, which showed the percentage of OTU from that sample that had at least two occurrence records within each 250 km 2 grid, indicated the region of sample origin within the U.S., meaning that grids with the highest percentage of plant OTUs from that sample co-located within the region of the collection site. Examples of point-to-grid maps generated from each collection site are shown in Figure 4.
Gaussian mixture modeling (GMM) was applied to the point-to-grid map generated from 14-day samples from the four sites to enable quantitation of accuracy (TP) and resolution (AT5PE) ( Figure 4B). TP indicated the percentage of grids that contain fewer OTU than the grid with the actual sample location and was used with a cutoff of 90%. At that threshold, the grid containing the truth point contained more mapped OTUs than 90% of all grids. AT5PE indicated the mean distance (in km) of the truth site from the five highest probability peaks as determined by our analysis. We utilized an AT5PE of 600 km as a threshold, and considered a positive for regional attribution to hit both thresholds, >90% TP and <600 km AT5PE. When 14-day dust samples across all four sites were analyzed, the TP was greater than 90% in 20 of 42 (47.6%), the AT5PE was < 600 km in 24 of 42 (57.1%) and 20 of 42 (47.6%) were deemed to produce positive regional geolocation ( Table 1).
The geolocation accuracy varied significantly by collection site, with MA (53.5%), NM (60.0%), and SC (50.0%) showing a similar percentage of positives, while FL samples yielded none. The number of mapped OTUs found in a dust sample appeared to significantly impact the TP and AT5PE ( Figure 5). With both metrics, samples that yielded fewer than 20 mapped OTUs showed greatly reduced attribution accuracy (TP and AT5PE) compared to those with 20 or more OTUs ( Table 1)  Though dependent of the location of dust collection, these data demonstrated that plant DNA from dust combined with available species distribution information could accurately estimate the region of origin from a significant proportion of samples. With sufficient OTUs, two thirds of the dust samples yielded positive regional attribution. Samples that did not produce correct regional attribution generally had a dispersed OTU distribution, i.e. poor resolution, and did not define an incorrect possible region or origin. These would have been indicated by < 90% TP and 600 km or less AT5PE.

Dust Samples from Other U.S. Locations
To characterize the achievable geographic attribution accuracy and resolution with dust samples collected from a broader set of locations, metabarcoding was performed on 31 environmental dust samples from different U.S. locations that were collected as part of the Wild Life Our Homes (WLOH) citizen science project (32). When these samples were analyzed using our pipeline, 10 of the 31 dust samples (32%) generated minibarcodes that resulted in positive attribution (TP > 90%, AT5PE < 600 km) ( Figure 6A). The number of OTUs appeared to have less of an impact on the attribution accuracy with this sample set, and the percentage of positive attribution improved only slightly when samples with more than 20 mapped OTU were considered.
Mapping the 31 samples to assess the impact of the region of sample origin on the attribution accuracy and resolution showed that the highest attribution accuracy and precision were achieved with samples from Montana, Texas, and the Middle Atlantic Region. Samples from the west coast of the U.S. and Midwest produced reasonable accuracy (75% or higher truth percentage) but generally low resolution ( Figure 6B). This suggested that OTUs derived from samples from these locations may be less informative, though many more samples would have to be processed to generate significance. It is worth noting that the WLOH sampling method differed from that used for the louvered shelters in that dust accumulation in the WLOH samples was not standardized so that accumulation occurred over a variable duration longer than 14 days. In addition, exposure to environmental factors, which can affect DNA stability, was less controlled.

DISCUSSION
We have utilized currently available plant sequence and species distribution data to demonstrate a streamlined system for exploiting plant eDNA in dust for forensic attribution. Plant barcodes generated by standard metabarcoding methods were fed into a data processing pipeline that demonstrated trace plant DNA in dust can provide an accurate estimate of regional geographic attribution (within 600 km or less) from nearly half of samples collected.
One unique aspect of our pipeline was the use of publicly available biogeographic data from BISON for determination of the geographic distribution of the species found in the samples. Our objective was to determine if currently-available reference data like that in BISON could be applied to attribution determination to avoid the cost and time needed for sample collection and analysis to create a new reference dataset. The BISON dataset has over 400 million species observation records from across the U.S., and thus can provide widely applicable georeferenced information. While this study focused on attribution of samples gathered within the U.S. using BISON, the technique could be extended to other global regions with a high density of plant species occurrence information through the use of the Global Biodiversity Information Facility (GBIF) (33) or other datasets. For our pipeline, the total area covered by observation records of a plant species was used as an indicator of its geographic distribution, then individual species distributions were normalized and merged into OTU distributions, which were then overlaid using a geographic information system (GIS). This generated a map that indicated the geographic areas with the largest OTU distribution overlap to provide an estimate of the sample origin. Using this attribution system, plant minibarcodes provided more provenance information than animal minibarcodes, which generated fewer OTU of less diversity, or than fungal or bacterial barcodes and generated numerous OTUs that were less able to match the NCBI Genbank reference database or have available species distribution information in BISON.
Fungal DNA analysis has been demonstrated to be able to delineate soil samples (34,35) as well as be informative in unguided geographic attribution. Fungal ITS1 minibarcode OTUs from over 900 WLOH dust samples from different U.S. locations, a subset of which were used in this study, enabled the estimation of geographic provenance with a median prediction margin of 230 km (36). A similar approach was used to determine the worldwide country of origin from dust samples (37,38). These approaches avoid many of the pitfalls of taxonomic identification and species distribution estimation but, unlike our approach, require generation of a new reference dataset. The analysis of bacterial barcode sequences has also been used for forensic attribution, particularly to link soil or other samples to a source location (39,40). Their use for unguided geographic attribution is challenged by the tremendous local and worldwide diversity (41, 42) as well as a lack of reference data. Using our attribution system, roughly 40% of the 80 dust samples collected at four different U.S. locations and 32% of the 31 dust samples collected from different U.S. locations provided accurate regional attribution estimates. This percentage increased if samples with less than a threshold of 20 detected OTUs were excluded. 18 of the 29 samples that had fewer than 20 OTUs were collected between the months of November and February, which showed that dust collected in the winter was less able to support accurate attribution. This confirmed our expectation that the amount of available airborne plant eDNA from plant-derived particles such as pollen, seeds, and spores released by plants is reduced in winter. Snow cover in some sites may also have prevented aerosol dispersion of ground particles. It should be noted that, for all samples, there was no evidence of erroneous attribution, where the site of origin was estimated to be in an incorrect regional location. In samples that did not yield positive attribution, the estimated area of attribution was broad and undefined, meaning there were no samples with a low TP and AT5PE <600 km 2 .
The data also indicated that 14 days was a sufficient time for dust collection to capture the most prevalent OTU, which implied that an object needed to reside at a location for only 14 days or less (depending on the season) to accumulate an identifiable signature. Further investigation into samples with outdoor environmental exposure shorter than 14 days is needed to determine the minimum amount of time required for attributable signature accumulation, but preliminary studies have indicated sufficient OTUs can be generated with dust that accumulates in as little as 3 days.
The geographic attribution accuracy and resolution was significantly impacted by the number of mapped OTUs (Figures 4 and 5). This is partly due to system design, where OTUs are mapped but the subset with narrow geographic distributions are most informative. The OTU number is mainly affected by the amount and variety of plant eDNA deposited on the slide surface, which is dependent on the local plant abundance and diversity, air flow in the shelter, the length of time of dust accumulation, the season of dust accumulation, and exposure to environmental factors such as light and precipitation that could impact DNA stability. Our method ensured collection of dust that was fresh and relatively protected from sunlight and precipitation. Attribution capability would likely be degraded with samples exposed to the environment, where the degradation rate of eDNA could vary by several orders of magnitude depending on the matrix and environmental conditions (43).
Dust from more exposed environmental surfaces, such as the door tops in the WLOH samples or from exposed surfaces in the same locations at the louvered shelters, confirmed this. Our slide preparation and transport methods also ensured that only dust originating from the collection site accumulated on slides. Objects of forensic interest are more likely to have traveled to, or been used in, more than one location where dust can accumulate. Plant OTUs from more than one location could be indicative of multiple source locations, though the ability to discriminate more than one site would be complicated by the duration of residence in each location, the exposure to environmental elements, season of exposure, the distance between locations, and the time since the object was in the previous location(s). The effects of these parameters on attribution accuracy need to be better characterized to determine the applicability of the current pipeline to dust from objects that have resided in more than one location.
The number of mapped OTUs was also significantly impacted by inefficiencies in matching to the NCBI Genbank and BISON databases. Recent estimates are that barcode sequences from 25 to 40% of the roughly 390,000 plant species in the world (44) are represented in NCBI Genbank, with the estimated coverage of the 51,000 U.S. plant species higher (45). However, these estimates include entries representing all plant barcodes, meaning that for any one barcode there are significant gaps in taxonomic coverage. In addition, the short sequence length of minibarcodes, necessary for compatibility with next generation sequencing, limited their ability to discriminate among plant species records in the NCBI Genbank database. One OTU matched a single species record 50 to 60% of the time, with the majority of the remainder matching 2 to 10 species. In fact, we used a similarity threshold of 100% because a lower threshold increased the number of OTUs assigned to the same species while not substantially increasing the number of new species identified. Recently developed sequence alignment algorithms that are alternatives to nBLAST may enable improved plant taxonomic assignment using minibarcodes (46), as may use of longer barcode sequences generated through the use of improved sequence chemistry, amplification and sequencing of long barcode amplicons using nanopore-type sequencing, chloroplast genome sequencing, or genome skimming (31,47).
The ITS2 and rbcL-3A plant minibarcode primer pairs were selected for OTU yield and taxonomic identification after comparison to other primer sets targeting the chloroplast loci trnL, rbcL, or matK or the nuclear ITS region due to their representation in NCBI Genbank. These had been previously utilized in studies involving metabarcoding, taxonomic identification, or in silico studies (19,(48)(49)(50)(51)(52). Chloroplast barcodes typically amplify better due to multiple genome copies but have more limited discrimination of related species, 13 while nuclear-based genome barcodes often have more difficulty in amplification and recovery. Using our protocol, the primer sets for the ITS and matK minibarcode regions did not amplify well, while the trnL minibarcode regularly produced more reads and OTUs when compared directly to other primer sets but was less able to define OTUs to the species level. Two different ITS2 minibarcode primer sets, including the pair used in this study, and rbcL minibarcode primer pairs targeting both the 5' and 3' regions of the gene amplified most consistently and produced the most mapped OTU (data not shown). Inclusion of additional minibarcode primer sets would likely improve plant OTU detection at the cost of having multiple OTUs representing the same species.
The quality and availability of plant biogeographic reference data is perhaps the most important factor affecting attribution applicability, accuracy, and resolution. BISON provided a tremendous wealth of information for determining species distributions to enable a proof-of-concept demonstration, but limitations in taxonomic, geographic, and temporal coverage (53) impacted the achievable attribution resolution and accuracy of our pipeline. Incomplete taxonomic coverage affected the ability to fully characterize the biodiversity distributions of sample OTUs. This was exacerbated by the difficulty in harmonizing the different taxonomic nomenclature systems used by NCBI Genbank and BISON, which likely resulted in an inability to retrieve occurrence records for some species. Geographic coverage, or how well the actual distribution of a species is documented by the occurrence records, and uneven spatial distribution of occurrence records also likely impacted the accuracy of attribution predictions (54). The species occurrence records may also be temporally skewed if a species' range has significantly changed in response to environmental shifts. To mitigate these issues, occurrence records can be augmented with data from other available sources, or with species distribution modeling, to enhance both taxonomic and geographic coverage. Lastly, though BISON data is significantly curated, there are possible data quality issues due to incorrect taxonomic assignment or duplicate or erroneous entries. The same is true of NCBI Genbank, which is known to have sequence and taxonomic errors. Curation of these reference datasets could provide a significant improvement in attribution accuracy. This analysis demonstrated that plant eDNA in dust from a significant percentage of samples is capable of reproducibly defining the U.S. region of sample origin within a radius of 600 km or less. The capability to acquire a regional estimate of provenance in many trace samples rapidly, without specialized expertise, can have value in many types of forensic investigation. We believe that, by streamlining metabarcoding protocols (using multiplexing, for instance) and automating data analysis, attribution information could be gained from hundreds of dust samples in days.

Dust Collection
Dust was collected on standard 72x25x1 mm glass microscope slides (e.g., VWR VistaVision 16004-422) that were cleaned with glass cleaner, twice rinsed with distilled water, and dried with compressed air. Nine slides were secured with magnets onto three platforms of a louvered shelter (SRS100LX radiation shield, Ambient Weather, Chandler, AZ) mounted on a tripod one meter off the ground and at least 10 meters from buildings or other structures that could impede airflow (Supplementary Figure 4). The louvered shelter enabled Pools were sequenced in triplicate using an Illumina MiSeq using the Reagent Kit v2 300-cycle kit (catalog # MS-102-2002). Sequences were demultiplexed using Golay barcodes (56) via QIIME v1.9.1 (57) and merging of paired end reads and trimming were performed with USEARCH (58). CUTADAPT v1.8.1 was then used to identify and remove remaining primer and adapter regions (59). Sequences were quality trimmed to have a maximum expected number of errors per read of less than 0.5. General quality filtering and OTU construction was completed as per the UPARSE pipeline (60) with de novo clustering at 99% sequence similarity. These parameters help to ensure that individual reads are correctly mapped to their respective OTU. Merged reads from ITS2 and single reads from rbcL-3A were clustered into OTUs (99% similarity using a de novo method).
OTUs that had fewer than 3 reads, those that were not present in 3 of the 3 sequencing replicates, or those that had a relative abundance less than 10 -4 were culled. This eliminated most of the OTUs (60.9% to 92.4%) from each sample, particularly those of lower abundance. The blank (buffer only) and negative control (clean slide) samples yielded no OTUs post-stringency filtering. Statistical analysis on the number of OTUs was performed nu analysis of variance (ANOVA).

Taxonomic Assignment and Species Distribution Determination
Taxa were assigned using the Genbank nBLAST homology inquiry tool using a query threshold of 100%. 40-50% of OTUs matched more than one species using a 100% homology threshold and, in this case, all species were retained. However, a plant species was only represented once per sample, even if it matched more than one OTU. Genbank taxonomic nomenclature was not completely consistent with that of BISON, so assigned taxa were edited by trimming to only their genus and species, i.e. removing subspecies and variety names, then processed using the R package taxize (61,62) to better align species names to those present in BISON. As a final step, manual curation corrected misspellings and removed unassigned taxa (uncultured, environmental sample, etc.).
Occurrence data was retrieved from the BISON database by using the taxonomic serial number (TSN), since ~98% of entries in BISON had an associated TSN, then by genus and species. Record retrieval from the BISON database was initially performed using a custom R script that retrieved records from the BISON application program interface (API) using the rbison package (63). The records retrieved were modified to exclude groups based on certain data fields, for instance records that have been flagged for having apparently invalid or mismatched latitude/longitude coordinates, countries, or continents. Up to 10 5 species occurrence records were retrieved for a single query. Some species produced no recorded occurrences with a geographic reference due to the lack of complete taxonomic coverage of the BISON biogeography database and the inability to resolve all nomenclature inconsistencies.

Mapping
The ArcGIS 10 geographic information system (GIS) package (Environmental Systems Research Institute (ESRI, Redlands, CA) was used to estimate the geographic attribution achieved from the species distributions associated with each sample. The primary output was a point-to-grid U.S. map that, within each grid, displayed the percentage of OTU from a sample with 2 or more occurrence records in that grid. To do this, the total observation records for each species assigned to an OTU were merged to generate an OTU-based geographic distribution. This step improved the attribution accuracy of the pipeline compared to consideration of each species within a sample independently. OTU-based geographic distributions were converted to an analytical layer that was intersected with a global 250-km grid map displaying the number of occurrences per grid. Species prevalence in each grid was normalized by positive (two or more occurrences) or negative (less than two occurrences) designation. The maps of the normalized OTU-based geographic distributions for every OTU in a sample were overlaid and, for each grid, the proportion of the total OTU was calculated. This method minimized the impact of OTUs with wide geographic distributions (with occurrence records in many grids) that were less informative for geographic attribution, and enabled detection of OTU with a more localized distribution, which were more informative for geographic attribution. These steps were merged into a custom Python script, that utilized the ArcGIS library Arcpy (2014, ESRI) to enable automated analysis of multiple data sets and high throughput mapping.
To better compare point-to-grid maps, geographic attribution metrics were generated from a Gaussian Mixture Model (GMM) fitted to the point-to-grid map (Error! Reference source not found.). The GMM was fitted with Scikit-learn using the variational inference extension to the expectation maximization (EM) algorithm with the Dirichlet process to determine the number of OTUs in the mixture (64). This incorporated the analytical advantages of having a probabilistic model for the PtG data while retaining the robustness to low quality data (normalization) provided by the PtG method. The primary metrics utilized truth data, in this case the actual collection location, to determine the accuracy and resolution of the geographic attribution estimated from the plant OTU. Accuracy, designated truth percentage (TP), indicated how closely an attribution map came to measuring the location of sample origin by measuring the likelihood percentile of the truth point in the data, i.e. the percentage of grids with less than the OTU count value in the grid containing the location of sample collection. A higher TP reflected better accuracy. Attribution resolution described the spatial precision of the data, with the primary metric calculated by determining the distance(s) between the truth point (location of the sample origin) and top 5 points (map grids) with highest likelihood/OTU number. This was referred to as the average top 5 peaks error (AT5PE), and was the standard indicator of attribution resolution.

DATA AVAILABILITY
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.      Total number of 14-day dust samples, or the subset of 14-day dust samples with ³ 20 OTU, that yielded a ³ 90% TP, less than 600 km AT5PE, or true positive geographic attribution from analysis of constituent plant eDNA from samples collected at the sites indicated. A true positive attribution was defined ³ 90% TP with < 600 km AT5PE. Figure 1 Diagram summarizing the plant eDNA geographic attribution pipeline used in this study. Starting from settled dust, DNA is extracted then subjected to metabarcoding with ITS2 and rbcL-3A, sequencing, and data processing to obtain an estimate of the site of origin.     This is a list of supplementary les associated with this preprint. Click to download.