ddRAD Sequencing: A Novel Arsenal Added to the Biosecurity Toolbox to Trace the Origin of Brown Marmorated Stink Bug, Halyomorpha Halys

Brown marmorated stink bug (BMSB), Halyomorpha halys (Hemiptera: Pentatomidae) is native to East Asia but has invaded many countries in the world. It is a polyphagous insect pest and causing signicant economic losses to agriculture worldwide. Knowledge on the genetic diversity among BMSB populations is scarce but is essential to understand the patterns of colonization and invasion history of local populations. Efforts have been made to assess the genetic diversity of BMSB using partial mitochondrial DNA sequences but genetic divergence on mitochondria is not high enough to precisely identify and distinguish various BMSB populations. Therefore, in this study, we applied a ddRAD (double digest restriction-site associated DNA) sequencing approach to ascertain the genetic diversity of BMSB populations collected from 12 countries (2 native and 10 invaded) across four continents with the ultimate aim to trace the origin of BMSBs intercepted during border inspections and post-border surveillance. A total of 1775 high condence single nucleotide polymorphisms (SNPs) were identied from ddRAD sequencing data collected from 389 BMSB individuals. Principal component analysis (PCA) of the identied SNPs indicated the existence of two main distinct genetic clusters representing individuals sampled from regions where BMSB is native to, China and Japan, respectively, and one broad cluster comprised individuals sampled from countries which have been invaded by BMSB. The population genetic structure analysis further discriminated the genetic diversity among the BMSB populations at a higher resolution and distinguished them into ve potential genetic clusters. Similarly, separated the Japanese and other Further genetic structure analysis the presence at three genetic in the possibly from multiple novel enhance knowledge of diversity among trace individuals for future invasion events. the large size of the BMSB genome (~1 Gbp) (Hhal_2.0). Therefore, cost effective approach such as the RAD-based approach with restriction enzymes (ddRAD) used in this study can show a higher resolution in revealing genomic diversity and population structure among BMSB populations. Taken together, we recommend applying genome-wide SNP markers-based study using ddRAD to explore the genomic diversity in insects and other eukaryotic organisms in future applications. Most importantly, our study will help to trace the geographic origin of the BMSB samples, irrespective of their life stages, and sex, that are intercepted at the New Zealand border. The advancement and ndings resulted from our study will not only help in claiming the BMSB freedom status to New Zealand but also will greatly assist in the decision making during an incursion and response situation.

The detection of genetic markers is crucial for genetic diversity study. Recently, a high-throughput sequencing-based method (HTS) has replaced traditional gel-based experiment to discover genetic marker [27].RADseq (Restriction-site associated DNA sequencing), is often applied for genome-wide SNP (single nucleotide polymorphism) identi cation in large genomes because of its relatively low cost and high-throughput [28]. The RADseq technique utilises one (or more) restriction enzyme to digest the whole genome into short genomic fragments that are then subjected to high-throughput DNA sequencing [28]. Restriction site-associated DNA markers provide a well-established basis for population genetics, as they are sensitive to both SNPs and insertion or deletion events (indels) in genomes [29]. So far, RADseq has been widely used in population genetic studies for many taxa including plants [30], and animals [28]. Double digest restriction-site associated DNA (ddRAD) sequencing is using two restriction enzymes to allow greater control of the genomic regions sampled for sequencing and allowing more reproducible recovery of sequenced regions [31]. Therefore, in this study, we applied ddRADseq to explore the genetic diversity among BMSB specimens collected from 41 populations across 12 countries.

Results
EcoR I-Msp I restriction enzyme pair was suitable for ddRAD sequencing To select the most suitable restriction enzyme (RE) pairs for digesting BMSB genomes, in silico test using 15 combinations of RE against the BMSB scaffold were conducted. The prediction revealed that more than 100K fragments produced from most of the RE pairs selected except the pairs, MseI-MluCI, MseI-MluCI and EcoRI-PstI (Table 1). Since the genome used for the test was a scaffold, thus the simulation results might not re ect the real situation. A pilot ddRADseq in vitro experiment was conducted with genomic DNA samples derived from two BMSB individuals. Of the 15 pairs of REs used for the in-silico test, nine different pairs of RE were selected for ddRADseq. After the Hiseq run, approximately, 2 Gb of raw RADseq sequencing data were generated for each individual. EcoR I-Msp I restriction enzyme pair recovered the highest number of genetic variances (i.e. SNPs) after SNP high standard quality control (QC) ltering, thus selected as the most suitable pairs of restriction enzymes for digesting the BMSB genome via ddRAD sequencing (Table 1).

ddRAD sequencing statistics and SNPs estimation
In total, 399 ddRAD sequencing datasets were obtained from the BMSB individuals, which yielded 3.6 billion of raw paired end reads (2x150 bp) (min: 4 million, max: 40 million and median: 7.6 million). On average, 9 million reads were generated for each individual. Using quality-trimming of the sequence data, 387,629 SNPs were estimated from 399 BMSB individuals. A high standard QC criterion was applied for ltering the SNPs, and only those loci that were shared by all the individuals were retained. As a result, the sequences from 389 individuals passed QC with 1775 high con dence biallelic SNPs that were used for the subsequent analysis of genomic diversity and population structure.
Genetic clusters were observed among the BMSB populations At least three genetic clusters comprising China, Japan, and the invaded countries (Austria, Chile, Georgia, Hungary, Italy, Romania, Serbia, Slovenia, Turkey, and the USA) were revealed by Principal component analysis (PCA) ( Figure 1). All BMSB individuals from Japan formed an isolated cluster, whereas BMSBs collected from the invaded countries were genetically closer to those of China. Interestingly, one BMSB individual from Chile was clustered together with the Chinese populations whereas the rest of the individuals were mixed into the individuals from Europe/USA populations ( Figure 1).

Individuals from the same geographical region were genetically linked
To further emphasise the outcome of genetic clustering pattern via principal component analysis, minimum spanning networks (MSN) were constructed using SNPs pro le of each individual, and their genetic variability were visualised among the population lineages ( Figure 2). The MSN showed that all the individuals from China are genetically linked together in the network, which also applies for the individuals from Japan ( Figure 2). There was a genetic divergence among the BMSB individuals from native regions of China and Japan, while those of invaded countries were more closely related in the network. Similar to the PCA results, one individual from Chile was found in the same clade of the Chinese samples, suggesting that this BMSB specimen might have originated from a recent invasion from China. The rest of the Chilean sample were distantly related from those in China and Japan but were more closely related to the samples from the European/USA groups, indicating that those possibly originated from secondary invasions from European/USA regions. The MSN also showed that one individuals from Italy and three from Slovenia were genetically linked to the Chinese populations, whereas the rest from these two countries were more closely related to those from European and the USA, suggesting multiple invasion might have been occurred ( Figure 2).
Genetic distance between native populations of China and Japan was relatively higher Population genetic divergence in the form of pairwise F ST revealed signi cant (p<0.05) genetic difference (except for that between China and Serbia) among 12 geographical groups or countries, with F ST value ranging from 0.00055 between BMSB populations from Hungary and Serbia, to 0.20842 between BMSB populations from Japan and Romania (Table 2). We also observed that the genetic distance between native populations of China and Japan was moderately higher (F ST =0.08) than that between the populations of China and many other BMSB-invaded countries, such as Slovenia (F ST =0.03792). Similarly, the genetic distance between the invaded populations in the USA and Chile was relatively low (F ST =0.0393) compared to genetic distance between BMSB populations in Chile and the native regions, China (F ST =0.09844) and Japan (F ST =0.17654). Moreover, the F ST between the BMSB populations from the neighbouring countries was very small, for example, Turkey and Georgia (F ST =0.01651), Austria and Slovenia (F ST =0.02033), Hungary and Serbia (F ST =0.00055) ( Table 2). A Neighbour-Joining (NJ) tree constructed using the F ST pairwise values among the individuals from the 12 countries revealed the similar relationships among the BMSB populations from the 12 countries ( Figure 4). The tree depicted the overall relationships of the populations and showed that Chinese and Japanese populations were clustered together while the others were formed into a clade ( Figure 4).
Five genetic clusters exist in the BMSB populations Furthermore, insights into the BMSB genetic diversity were achieved by population genetic structure analysis using fastSTRUCTURE. This analysis expanded the results of PCA ( Figure 1) and provided more in-depth clustering for the BMSB populations from the invaded countries. It predicts the presence of at least three genetic clusters within the BMSB-invaded countries ( Figure 3). The rst is of those populations from the USA, Italy, Chile, Turkey, Georgia, and Hungary, the second is formed by Romania, and the third cluster is formed by Slovenia, Austria, and Serbia. The BMSB populations from China and Japan were clearly separated from the invasive populations ( Figure 3).

Discussion
To the best of our knowledge, this is the most comprehensive population genomic study so far to unravel the genetic diversity and population structure of BMSB. The study utilised ddRAD sequencing to enhance the knowledge of global BMSB diversity and invasion history. We identi ed a suitable restriction enzyme pair for genomic digestion of BMSB genome for ddRAD sequencing study, which will be useful in future application. The ddRAD data were analysed using a combination of approaches, including principal component analysis (PCA), phylogenetic analysis and population structure analysis to elucidate the population structure and genetic diversity among the BMSB populations. The present study unambiguously proved that the BMSB populations in the two native regions of China and Japan were genetically distinct. Many BMSB populations from the invaded countries were genetically closer to those of China. Conversely, the Japanese BMSB populations were isolated and showed genetically less similarity to those from the invaded countries. Overall, this study has provided a remarkable resolution in unravelling the population structure and estimating the genetic relatedness among the BMSB populations from a set of native and invaded regions of the world.

Genetics isolation between the BMSB populations from China and Japan
The PCA and MSN analyses showed that the Japanese BMSB populations were isolated from the other populations studied here. The fastSTRUCTURE analysis showed that population structures of the BMSB samples from China and Japan were derived predominantly from different genetic components ( Figure 3). These results suggested that there are genetic isolations for the BMSB populations between the two countries. Xu et al. [22] also observed genetic divergence between the BMSB populations from China and Japan via haplotype analysis of COII and 12S/CR gene regions. One possible explanation for such phenomenon could be the geographical distance between the isolated populations. The width of Tsushima Strait is from 41.6 km to 222 km which is the distance between mainland and Japan [34]. However, the average non-stop ight distance of BMSB is less than 1 km [35]. Therefore, active spread of BMSB individuals from China to Japan is unlikely.
Estimation of the BMSB invasion pathway BMSB populations from the invaded countries are more closely related to China than that of Japan. For example, the USA populations with Fst of 0.084 to the Chinese populations and 0.16 to the Japanese populations, indicating that the USA populations originated more likely from China, but Japan. The results are consistent with the ndings of a previous study [36]. Moreover, there were large genetic differences (0.15 < F ST < 0.25) between the BMSB populations in Japan and the invaded countries of Chile, Georgia, Italy, Romania, Turkey, and the USA (Table 2). Therefore, we can conclude that the BMSB populations in the six invaded countries might not have been originated from Japanese populations.
Moreover, PCA plot ( Figure 1, Additional le 3) indicated that the established populations in Chile, Hungary, Georgia, Italy, Turkey and the USA belonged to one genetic cluster. The fastSTRUCTURE analysis supported this point that those populations had similar genetic components ( Figure 3). The neutrality test (Fu's Fs) in a previous study (Yan et al. unpub.) for these populations was negative (p < 0.05), indicating that these established populations have been under population expansion stage [37].
One individual from the Italy cohort was genetically closely related to the Chinese population (Figure 2), suggesting the possible recent arrival of that specimen from China. Besides, the rest of the BMSB samples from Italy showed their close relatedness to the USA and other European populations, suggesting that multiple BMSB invasions have occurred in Italy in the past [16,38]. Such phenomenon was also observed in Chile [17]. Most of the BMSB specimens from Chile, except one individual, were genetically close to each other (Figure 2), implying that there were at least two separated invasions of BMSB in Chile.

Predication of the BMSB genetic cluster
The population structure analysis of BMSB indicated that there were at least ve genetic clusters identi ed among the BMSB populations studied here. Besides the Chinese and Japanese BMSB populations, there were three genetic clusters in Europe, the USA and Chile. The analysis showed that the BMSB populations from the USA, Georgia, Turkey, Italy, Hungary, and Chile were more likely belong to one genetic cluster due to their similar genetic structure. Since rst detected in the USA in 1996 [39], BMSB has invaded many countries. In Europe, BMSB was rst recorded in Switzerland in 2004 [14,21,40], then spread to many countries such as Germany, France, Italy and Hungary [41,42]. Therefore, based on the timeline of reports, BMSB invasion might have started in the USA (1996) [2], then Europe (2004) and Chile (2017) [17], though a more transparent history of invasion remains unknown and needs further investigation.
Another interesting phenomenon is that the genetic distances between BMSB populations originating from Slovenia, Austria, and Serbia, where the species had been reported more recently, and the Chinese populations were smaller than those between the Chinese and the other European populations. This suggests that BMSB found in these three European countries could have recently been invaded directly from China rather than from one of their neighbouring countries in Europe. In saying that, the population genetic structure analysis also showed that the BMSB populations from Slovenia, Austria, and Serbia contained the components of the Europe-established BMSB populations ( Figure 3). This analysis indicated that there were inbreeding between the between the BMSB individual from the adjoined established European BMSB populations and the Chinese populations. The principal component analysis also supported this as these populations were located between Chinese populations and other established BMSB populations from Europe. This result indicated that these BMSB populations were still under early stage of invasion, which supports the ndings of haplotype analysis using COI and COII of BMSB samples from the same populations (Yan et al. unpub.).
BMSB was rst reported in Romania in 2015 [10] and have spread around the country [43]. The F ST values between the BMSB samples from Romania and the two native countries, China, and Japan, were 0.13334 and 0.20842, respectively, which were larger than the F ST values from other invaded populations. The population structure analysis showed that Romanian BMSB samples had a unique genetic structure, which indicates that this group could have been invaded from a place of origin that was not covered in this study. Therefore, this study further emphasised the importance of sampling. Overall, this study covered a wide range of areas including wide sampling from the BMSB native regions and the invaded countries, although we still miss samples from some countries, such as Korea, Greece, Canada and Switzerland. Therefore, further study of ddRADseq of BMSB samples from those countries will provide more insights into BMSB genetic diversity of the BMSB populations.
Novel genetic analysis for the BMSB populations from the invade countries Nevertheless, this study also provided novel genetic information on the BMSB populations for some of the European countries, such as Austria, Serbia, Slovenia, and Georgia. Since BMSB was detected in Austria [34] and Serbia [35] in 2015 and Slovenia [36] in 2017, Turkey [37] in 2019 and Georgia [38] in 2016, to the best of our knowledge, no genetic studies have been reported on those populations, and this study shows the rst effort to characterise the BMSB populations for these countries. The results showed that the BMSB populations from Georgia and Turkey were more likely the next generation of the BMSB from the USA whereas those populations from Austria, Serbia and Slovenia were the hybrids of Chinese and USA populations (Figure 3). The results obtained here will be useful for future monitoring of the pest and assessing the potential pathways of invasion in those countries.
Finally, in compassion with the mitochondrial based (i.e. COI and COII) haplotypes analyses, here we show that ddRADseq can provide high resolution in identifying BMSB genetic populations. As an alternative, analysis of genome-wide SNPs via whole genome sequencing can be applied for deeper understanding the BMSB genomes and genetic diversity, however, it is still more expensive due to the large size of the BMSB genome (~1 Gbp) (Hhal_2.0). Therefore, cost effective approach such as the RAD-based approach with restriction enzymes (ddRAD) used in this study can show a higher resolution in revealing genomic diversity and population structure among BMSB populations. Taken together, we recommend applying genome-wide SNP markers-based study using ddRAD to explore the genomic diversity in insects and other eukaryotic organisms in future applications. Most importantly, our study will help to trace the geographic origin of the BMSB samples, irrespective of their life stages, and sex, that are intercepted at the New Zealand border. The advancement and ndings resulted from our study will not only help in claiming the BMSB freedom status to New Zealand but also will greatly assist in the decision making during an incursion and response situation.

Conclusions
This study demonstrates that the restriction enzymes pair of EcoRI and MspI is suitable for generating restriction-associated DNA fragments for identifying SNPs within the BMSB genome through ddRAD sequencing. Via population analyses using ddRADseq data we detected genetic differentiations among different BMSB populations studied in both the native and invaded countries. We believe that this study could also assist in studying the invasion history of BMSB populations and tracing the potential geographical origin of unknown BMSB intercepted at the border or in post-border scenarios in biosecurity settings using the population structure model. Thus, ddRAD sequencing going to be an invaluable arsenal added in to the biosecurity toolbox for tracing the geographic origin of insects at the border/post-border in the near future.

Methods
BMSB specimen collection and DNA extraction BMSB samples were collected from 41 regions in 12 countries, namely Austria, Chile, China, Georgia, Hungary, Italy, Japan, Romania, Serbia, Slovenia, Turkey, and the USA (Table 3, Additional le: 1). The eld-collected samples were preserved in plastic vials containing 95% ethanol and imported into New Zealand in accordance to the Import Health Standard, Section 22 of the Biosecurity Act 1993. All the specimens were identi ed by the collectors and con rmed by the Entomologists at PHEL. No speci c permits were required for all the sample collected from all the countries covered in this study. Ethics approval was not required as insects are not classi ed as animals for the purposes of the Animal Welfare Act, 1999, New Zealand Legislation.
Total genomic DNA was extracted from each individual using QIAGEN DNeasy® Blood & Tissue Kit with QIAGEN RNase A treatment (Qiagen, Valencia, CA, USA). Brie y, head and thorax or abdomen was ground using sterile plastic pestle in 1.5 mL tube with ATL buffer and Proteinase K added, and then incubated at 56 °C overnight. The DNA extraction followed the manufacturer's instructions and the DNA was eluted in 150 µl AE buffer. DNA quality was assessed using NanoDrop™ spectrophotometer (ThermoFisher Scienti c, USA) and quanti ed using QuantiFluor™ system (ThermoFisher Scienti c, USA). It was followed by an assessment for DNA shearing on a 1% agarose gel against 1Kb Plus DNA ladder (Invitrogen™,CA, USA) in TAE buffer stained with SYBR safe (Life Technologies, CA, USA) and visualised using a Gel Doc Software system (BioRad, Hercules, CA, USA).

Selection of restriction enzymes in silico for RAD sequencing
The selection of restriction enzyme (RE) is the crucial step for experimental ddRAD-seq analysis and discovery of the SNP across the genome for genetic diversity study, thus initial tests on the RE used for the library was conducted. To select the best RE, a simulation based on the BMSB reference genome (submitted as Hhal_2.0 by The i5k Initiative on 15/12/2017) and 15 combinations of 9 enzymes (AvaII, EcoRI, MSeI, MspI, NlaIII, PstI, SbfI, SphI, MluCI). A Python script was developed to calculate the number of segments after using different combination of restriction enzymes to cut reference genome.

Selection of restriction enzymes in vitro: RAD sequencing and bioinformatics analysis
Nine pairs of enzymes which had higher numbers of segments in in silico analysis or had been used in other published studies [31] were selected. To further con rm the suitability of the RE pairs, two DNA samples were selected to be used for ddRADseq as parallels to eliminate the effects from individual difference. Eighteen RADSeq libraries (9 pairs of enzymes for 2 samples) were prepared using Illumina® TruSeq DNA Nano following the manufacturer's instructions and ToBo lab ezRAD Protocol-v3.2 [49]. Before sequencing, all RADSeq libraries were assessed using a Fragment Analyzer for quality control. The sequencing was conducted by Annoroad Gene Technology Co., Ltd (Beijing, China) on a HiSeqX ten sequencing platform (paired end, 2x150 bp). The 3'end adaptors of raw reads were trimmed using AdaptorRemoval v2 [50] and reads with phread quality score of less than 20 and length of less than 50 bp were discarded using an in-house script. Then, quality-trimmed sequence datasets, were mapped to the BMSB reference genome, Hhal_2.0, using BWA version 0.7.17 [51] default setting and les with the mapping information (i.e. in SAM format) were converted to BAM format, sorted and indexed using SAMtools version 1.9 [52] before identifying the SNPs. SNPs were called using GATK v4.1 [53]. A hard ltering was conducted to discard the false positive SNPs as recommended by GATK developers (QD (QualByDepth) < 2.0, FS (FisherStrand) > 200.0, SOR (StrandOddsRatio) > 10.0, MQRankSum (MappingQualityRankSumTest) < -12.5, ReadPosRankSum (ReadPosRankSumTest) < -8.0). Population level quality control was done using Plink 1.9 (parameters used: call rate 0.8) [54] for subsequent analysis. All bioinformatics analysis was performed on the high-performance computing platform of the New Zealand eScience Infrastructure (NeSI), Auckland, New Zealand.

Library preparation and ddRAD sequencing
After selecting the most suitable pairs of restriction enzymes (EcoRI and MspI) for digestion of BMSB genome via RADseq approach, 399 BMSB individuals were further subjected to double-digest restriction-associated DNA (ddRAD) sequencing. ddRAD libraries were prepared using 500 ng of genomic DNA from each sample following the protocol described by Peterson et al [31] with some modi cations. Brie y, genomic DNA was digested at 37 °C for 5 h using 10 Units of the two selected restriction enzymes, namely EcoRI and MspI (NEB, MA, USA), and deactivated at 65 for 20 mins, then followed by the ligation of Illumina adapter sequences and unique 8 bp barcodes, which varied by at least three bases. Sets of 24 differentially barcoded individuals were pooled and run on a 1% agarose gel, where 220~450 bp fragments were manually excised and puri ed using a Zymoclean TM Gel DNA recovery kit (Zymo Research, CA, USA). Each pool was ampli ed by PCR reactions, which were carried out at a total volume of 25 μl, each containing 5 μl 5×Reaction buffer, 5 μl 5×High GC enhancer, 0.25 μl Q5 polymerase, 5nM of library DNA and a unique indexing primer for each pool that corresponds to the standard Illumina multiplexed sequencing protocol. Temperature cycling for PCR comprised one initial denaturation at 98 °C for 30 s, followed by 14 cycles of denaturation at 98 °C for 15 s, annealing at 65 °C for 30 s, and extension at 72 °C for 30 s, followed by a nal extension at 72 °C for 5 mins. The PCRs were carried out in a Veriti 96-well thermal cycler (Life Technologies). DNA libraries were quanti ed using the Agilent high-sensitivity DNA kit in a 2100 Bioanalyser (Agilent Technologies, CA, USA). Library pools were combined in equimolar concentration to form a single genomic library and sequenced in one lane of a HiSeqX ten Illumina sequencer (paired-end, 2×150 bp) by Personalbio co. (Shanghai, China) Sequencing data pre-processing and Quality control of ddRAD data To pre-process the raw sequencing data generated via ddRAD, a similar pipeline and data quality ltering approach was applied as described in the section of processing of RADSeq data. In order to call SNPs from ddRADseq data, the raw fastq les were mapped to reference genome, Hhal_2.0, to produce a mapped read le (sam le) using Burrows-Wheeler Aligner v 0.7 [51]. The result was converted into binary format (BAM les) and sorted to increase calculation speed for further analysis using Samtools v1.9 [54]. SNPs were called from BAM les using GATK4 [53].
The quality control parameter used here was the same as that used for the selection of restriction enzymes above. In addition, the obtained SNPs were further ltered using SNP call rate of 1 and MAF (minor allele frequency) of 0.03 using Plink 1.9 [54]. As a result, 10 samples were discarded due to lower SNP call rate.

Population Genomic Analysis
Using the SNPs identi ed from the 389 BMSB individuals, we conducted principal component analysis (PCA), neighbour-joining tree (NJ tree) and population structure analysis using fastSTRUCTURE to elucidate the population structure and genetic diversity among the BMSB populations. PCA was conducted based on with 1775 SNPs obtained from 389 individuals using Plink 1.9 [54] (parameters used: --allow-extrachr --pca) and plotted using Tableau 2019 [55]. To further explore the genetic relatedness among BMSB individuals, a Minimum Spanning Network based on the SNP results of each individual was also reconstructed using R package poppr [56].
To test for the presence of population structure, pairwise F ST values were generated from the ddRAD data by combining the individual data from the same country as one geographic group. This analysis was implemented in Arlequin 3.5 [57] by converting the SNP pro le (in Plink format) to Arlequin format using PGDSpider 2.1.1.5 [58]. The obtained population average pairwise F ST values were further used to construct a Neighbour-Joining tree in MEGA X 10.1 [32] to reveal the population level's genetic distance.
To provide additional insight into the genetic variation and population differences, population genetic structure analysis was conducted. The population genetic structure was inferred using the SNPs pro le of individual samples. In this process, fastSTRUCTURE 1.0 [59] was used to conduct model-based clustering of all individuals. The best K value (i.e. the number of populations or clusters that the samples are best divided into) was determined as 5 using a python script, chooseK.py within the fastSTRUCTURE software. No speci c permits were required for all the sample collected for this study. No samples were collected in national parks and no endangered or threatened insects were included in this study, thus collection permits were not required for all the collections. All specimens imported into New Zealand were in accordance to the Import Health Standard, Section 22 of the Biosecurity Act 1993. Ethics approval was not required as insects are not classi ed as animals for the purposes of the Animal Welfare Act, 1999, New Zealand Legislation.

Consent for publication
Not applicable

Availability of data and materials
The SNP data can be found using the link below https://1drv.ms/u/s!AvRFcQuxR5sIgpcYfFCZ3C3pSAa1Xg?e=L7PmSe

Competing interests
The authors declared that they have no competing interests.

Funding
The research project (number 405731) was funded by the Operational Research programme from the Ministry for Primary Industries (MPI), New Zealand. The funding body provided the nancial supports for conducting the research, in which the results were produced. The costs for publishing this paper is also covered by the research fund. The funding body played no role in the design of the study and collection, analysis, and interpretation of data and in the preparation of the manuscript.     geographical groups comprising Austria, Chile, China, Georgia, Hungary, Italy, Japan, Romania, Serbia, Slovenia, Turkey, and the USA. Each node represents an individual specimen and the edge indicates the genetic distance (dissimilarity: fast pairwise distances) between the individuals. The colour in each circle represents the countries where the samples were collected from. The picture was created using R package poppr [56].