Haplotype Network
We reconstructed a haplotype network from the analysis of 677 sequences spanning 658 nucleotides of the CO1 mitochondrial gene. A total of 52 distinct haplotypes were delineated from the analysis of this CO1 fragment of one to 230 fly samples (Figure 1).
We found 73 segregating sites with a Tajima-D of 45.1413, suggesting a lack of rare alleles, which could come from a contraction of the population.
There were 31 haplotypes made of a single sequence. We observed 11 haplotypes made of between two and four sequences derived from the same continent; that total of 42 haplotypes will be referred to as minor haplotypes. We observed ten haplotypes made of between five and 230 sequences, including six multi-continental haplotypes that we will refer to as major haplotypes. In conclusion, we found an important diversity both within and between continents using the CO1 phylogenetic mitochondrial gene. Minor haplotypes represent 8.5% of all sequences (58 sequences) and 80.8% of haplotypes. Major haplotypes represent 91.5% of the sequences (619 sequences) and 19.2% of the haplotypes.
In Europe, we found ten distinct haplotypes from 118 sequences, nine for Asia from 277 sequences, five for Oceania from 106 sequences, 30 for South America from 55 sequences, four in Africa from 78 sequences, and seven in North America from 43 sequences. Some major haplotypes are represented in a more restricted area. For example, haplotype B is mainly present in Asia (figure 1). We observed a high haplotype diversity in South America with 28 minor haplotypes that contain between one and four sequences. The haplotype C contains all the sequences derived from farmed strains of the industry working on BSF.
Global distribution analysis
We visualized the actual distribution of the different haplotypes found previously on a distribution map (figure 2).
Each haplotype from A to J is now represented with a color; haplotypes that are numbered in the haplotype network (figure 1) are now represented in grey.
We observed great variability in South America highlighted on the map by the grey pie charts representing mostly minor haplotypes. We observe 22 singletons in South America out of the 52 haplotypes found, more than 40% of the total diversity. On the other hand, we found six out of ten major haplotypes in Europe.
Haplotype A and C were the most represented haplotypes in our sampling: Haplotype A was present on all continents and represented 34% of all sequences, and haplotype C was only absent from South America and represented 18% of all sequences. Some haplotypes seemed to be restricted to some world areas, such as haplotype G, which was present in all the localities in West Africa and surprisingly in a Portuguese sample, or the haplotype H in South America. Haplotype F was present in Europe, Uganda, and South Korea. Haplotype E was more present in Asia and Oceania as well as in North America. Finally, some haplotypes such as B, I, and J present rare sequences found in more restricted areas such as B in Korea or H and J in France (figure 2).
CO1 Phylogeny
During our CO1 gene analyses, some BSF-like larvae were identified as Exaireta spinigera larvae. The sequences were amplified and sequenced with the same primers as for BSF.
By taking only one sequence for each haplotype, we have reconstructed a phylogeny of the CO1 gene sequence. The CO1 phylogeny was rooted with the Exaireta spinigera (figure 3). A sample from Cameroon was revealed to belong to a closely unidentified species of BSF and was termed Hermetia sp. Cameroon in our subsequent analysis (Blastn score between Hermetia sp. from Cameroon and BSF gives us 85.01% identity with an E Value of 2e-174).
The CO1 gene phylogeny supports the analyses performed with the haplotype network (Figure 1), showing remarkable sequence diversity among the CO1 gene. Interestingly, all the commercial strains are found in the C haplotype (a detailed phylogenetic tree is given as a supplementary with an asterisk placed on individuals from farms (Figure S03 - Supplementary).
A group containing haplotypes 35 to 42 and haplotypes B and C forms a divergent group from the others, with strong statistical support (bootstrap value of 0.99). Some individuals captured in the wild have been found near BSF commercial breeding sites. For example, we have captured C haplotype individuals in Avignon close by companies working with the BSF.
Although the global resolution of the deep branches of the tree is low, it is also possible to better understand the relationship between the haplotypes: haplotype B and C are close to each other and well separated from other haplotypes. The other haplotypes are not very well resolved with the CO1 sequences. Finally, the samples coming from Latin America are scattered all over the tree, showing their great diversity and supporting the hypothesis of the geographical origin of the BSF.
The global BSF diversity is mirrored at the local level in France
Among the ten major haplotypes, our sampling effort in France showed that five are present in France in addition to a unique haplotype only found in France (Figure 4) in La Rochelle. Haplotype A remains the dominant haplotype in France (71.1% of all sequences) in seven of the eight localities (excluding Beaufort en Vallée, where the flies come from a company). Regarding the haplotype C, BSF belonging to this group have been found in four locations in the regions of Poitiers, Bordeaux, Beaufort-En-Vallée and Montpellier. Except for Beaufort-En-Vallée, where the flies come from a company, the other BSF come from natural environments. BSF industries exist in all these localities. More surprisingly, different haplotypes coexist in some identical localities. For example, we found three different haplotypes in the region of Paris and four in the region of Montpellier. Finally, we found a singular haplotype in Colomiers (J) and La Rochelle (1).
Complete mitochondrial genome analyses
To better understand BSF phylogeny and estimate the timing of haplotype divergence, we reconstructed and analyzed the mitochondrial genomes of 60 individuals, including 3 Exaireta spinigera (used as an outgroup). The 60 reconstructed mitochondrial genomes have the typical structure of insects mtDNA: 13 protein-coding genes, 22 tRNA, two rRNA-coding genes, and non-coding regions and control regions (an example is given as a supplementary (Figure S04 - Supplementary). The genetic differences are homogeneously distributed overall mtDNA and are not localized on specific genes (Figure 5A). A great genetic diversity at the mitochondrial genome level is observed. Two very homogeneous groups, A and C, are well represented, and sequences of the C haplotype are wildly divergent all along the genomes from the others. The two sequences with the most differences are 55-F-Angouleme and 21-Ghana, with a similarity percentage of 96.373% (all results are in supplementary (Figure S05 - Supplementary).
We also wanted to know if the mutations are equally (or not) distributed between the different genes. Therefore, we compared the level of similarity of the 13 mitochondrial genes of the 57 BSFs (fig 5B) to those of Exaireta spinigera. The results are presented as percent sequence similarity.
We grouped the individuals into three different categories:
-Haplotype A (Figure 5 B1), all genes show the same differences with those of Exaireta spinigera; only the NAD6 gene shows slight differences.
Haplotype C (Figure 5 B2) shows that most of the mitochondrial genes are identical except for the ATP8 gene, which is different for four individuals (11 Angouleme; 14 Kenya; 19 Kenya, and 15 Kenya). Slight differences are also present in the NAD3 and NAD6 genes.
-Haplotype D-F-G-H-I (Figure 5 B3), the profiles are slightly different, with a more substantial disparity for the ATP8, NAD1, and NAD5 genes, especially the Mexican one (57). The profiles show the same pattern except for the ATP8 gene of the individual from Mexico (57), which is more conserved.
Complete mitochondrial genome phylogeny
The complete mitochondrial genome phylogeny demonstrates the very high genetic divergence among BSF populations found with the CO1 gene analysis (Fig 6A). The increased sensitivity of the analysis, due to the size of the sequences (13.96 kb, 13 genes), robustly resolves the relationships between different haplotypes identified previously (nodes >95%). In addition, it allows us to distinguish subgroups within the previously determined haplotypes. Two major divisions emerge within the tree: group C (which includes all commercial strains) and others. Two subgroups emerge within the C group (11 – F – Angouleme; 14 Kenya; 19 Kenya; 15 Kenya and the others).
The other large group contains the sequences of the A D I G and H haplotypes. The I G and H haplotypes are related and close to each other. The very cosmopolitan A haplotype is well conserved with limited mitochondrial sequence divergences.
Moreover, although phylogenetically close, the I G and H haplotypes are found in very disparate areas (the G haplotype is mainly in Africa, the I in Europe, and the H in Latin America), supporting multiple worldwide introductions.
Strikingly the high level of the conservation of the ATP8 sequence described above in the Mexican sample (haplotype D) does not explain the basal position of the sequence in the phylogeny; indeed, we tested the phylogeny by removing the ATP8 gene, and no change was found (Figure S06 - Supplementary).
The ATP8 gene alone is not sufficient to explain the appearance of the observed subgroups. However, unlike the CO1 gene, which varies only slightly within this haplotype, the NAD2, NAD3, NAD5, and NAD6 genes are more variable.
The average percentage of identity between haplotypes A and C is about 96.39%. Between haplotype C and D 96,54%. Between haplotype C and H 96.40% and 96.43% between C and I. Between A and D 97.82% (all results are in Figure S03 - Supplementary).
The divergence time of the different groups
The time tree (fig 6B) gives us information about the divergence time of the different mitochondrial genomes. We obtained identical Bayesian and ML trees. We noted a relatively long divergence time between haplotype C and the others (2.2 million years). The others have diverged more recently (about 1.2 million years). Within the C haplotype, a divergence has been about 300.000 years, while in the other groups, the divergence times are much lower, about 20.000 years for the A haplotype.
The time of divergence with Exaireta spinigera has been estimated at 110 million years, confirmed by the literature that gives us 112 million years with the Timetree tool 59.