Genome sequencing and assembly
The genome of D. hydei from EC I (Fig. 2A) was sequenced using single-molecule real-time (SMRT) sequencing (PacBio Sequel) and generated 550,241 reads, about 10 Gb data. The sequencing depth is around 61.7× in total with reads contig N50 = 21,326 bp, and the average reads length is 10,899 bp, reads length distribution was shown in Fig. S1. Finally, we assembled these reads into 520 contigs and the estimated genome size was 141.52 Mb, with the contig N50 length of 4.72 Mb and the longest contig of 10.45 Mb (Table S1). Illumina pair-end (PE) reads were used for error correction. To evaluate the completeness of the genome assembly, our assembled genome was compared to the established core gene sets with Benchmarking universal single-copy orthologs (BUSCO) and showed 91.9% of the eukaryotic BUSCO conserved gene set were complete (Table S2, Fig. S2). The draft assembly was then evaluated by aligning short PE reads from 39 individuals to the assembled draft genome and the mapping rate ranged from 89% to 98% (Table S13).
The reference genome was annotated with 12,006 protein coding genes and the gene completeness assessed by BUSCO was 90.53% (Fig S3, 4A and 4B). We identified 296 tRNA, 25 rRNA and 186 non-coding RNA and 21 regulatory elements (Table S3, and 4). In total, 69,563 short tandem repeats including 2-mer, 3-mer and 4-mer were identified from the assembled genome, and the total length is 908,596bp, which covers 0.64% of the whole genome (Table S5, 6 and 7). There are 309,511 transposable elements were annotated in Table S8. The number of genes annotated to each species was listed in Table S9. Functional annotation and annotation summary was listed Table S10, 11, 12, and Fig S4C, D, E, F.
The evolution of gene families was investigated by identifying the gene family expansion and contraction. To investigate the phylogenetic relationship and divergence time between D. hydei in Israel and other closely related species, a phylogenetic tree with seven species was constructed based on 6,865 orthologous single copy genes. The D. hydei split with the ancestor of D. buzzatii ~16.9 Mya, and D. mojavensis split from D. buzzatii later ~12.51 Mya (Fig. 1).
Sample collection and genome resequencing
We collected 39 individuals of D. hydei from Evolution canyon I (EC I), including 19 and 20 genotypes from African slope (AS), and European Slope (ES), respectively, in April, 2018. The ES and AS are also called North Facing Slope (NFS) and South Facing Slope (SFS), respectively. Whole genome resequencing of the 39 individuals were performed on Illumina HiSeq ×10 and generated 330 Gb data in total (Table S13). The average sequencing depth for each individual was 16.29× after alignment to its reference genome we assembled above. In total, we identified 202,963 and 695,805 high-quality population-specific SNPs from AS and ES populations, respectively, of which 1,762,522 were shared between the two populations (Fig. S5).
Population genetic structure
A neighbor joining tree was constructed to infer the genetic relationship of the two abutting D. hydei fly populations from EC I. All the individuals from AS are on one clade marked in red, and the individuals from ES were marked in blue (Fig. 2B). Secondly, principal component analysis (PCA) on population differentiation was performed showing clear-cut separated two clusters, AS and ES. The two abutting AS and ES populations were separated by the first component (Fig. 2C). To better understand the genetic background and population admixture between the two diverging populations, genetic structure analysis was performed. When the K value was set to 2, all the individuals were separated to 2 clusters and those from AS were assigned to one cluster while individuals from ES were assigned to the other cluster, with several individuals having some admixture (Fig. 2D). Several recombinants were found with two genetic backgrounds from both AS and ES populations, suggesting gene flow between the two slopes. When K was set to 3, there are several recombinants in AS, but not sub population structure, there are 2 sub-population structure in ES; When the K value increased from 3 to 4, we found sub-population structure and recombinants from both the AS and ES populations. The best K value was selected by calculating the cross-validation value, and the lowest one is when K=2 (Fig. S9).
Genetic diversity and genetic differentiation between the two abutting populations
The genetic diversity, measured byπ, for AS and ES populations is 3.86×10-3 and 4.47×10-3, respectively. Genetic diversity is significantly higher in ES than that of AS (Mann-Whitney U test, P=2.2E-16) (Fig. 4E, Fig. S7), which is consistent with the higher recombination rate in ES population, indicating different stresses in the new environment. Linkage disequilibrium (LD) decay was estimated for both the AS and ES population of D. hydei from hot and dry AS and cool, and humid ES populations, respectively. The LD curve dramatically decreased below 0.2 within 10 kb in AS and ES populations (Fig. 3A). The LD for AS population is higher than that of ES population, reflecting the higher abiotic stresses on AS. Identity by descent (IBD) among individuals within each population and between populations are estimated, and we could see that the length of shared haplotype within each population is bigger than that between the populations (Fig. 3B, Fig. 3F). The average length of shared haplotype is 27.30 Mb and 18.3 Mb within AS and ES populations, respectively. It is significantly longer than that of between the two populations with the length of 7.95Mb (P=1.54E-11 and P=1.25E-10). Tajima's D of AS (mean = 1.178) population was significantly higher than that of ES (0.671 , P = 2.2×10-16, Mann-Whitney U test) indicating balancing selection, i.e, reflecting non-randomality, and rejecting neutrality (Fig. 3D). Recombination rate (ρ = 4Ne×r) in ES was 131.11, significantly higher than in AS of 119.00 (P < 0.01), enabling richer genetic diversity and adaptive evolution, for tolerating the new stresses of humid, cold and pathogenic ES. The genetic relationship between individuals within AS and ES populations can be seen in Fig. 3C, E. Clearly, the genetic relationship is much closer within population than between the populations, reflecting sharp inter-slope ecological divergence between tropical AS and temperate ES.
Genome divergence between the AS and ES populations and genomic Island evolution
The inter-slope genetic divergence between the ancestor species on AS and new sympatric derivative species on ES is 0.06 measured by FST, and absolute divergence DXY (0.257). To analyze the differences between genomic island and background, 137 10-kb windows were identified across the genome (top 1% of the FST value and FDR < 0.01). The adjacent windows were combined as one genomic island and finally we identified 104 islands (Fig. 4A). FST distribution across the genome showed L-shape (Fig. 4B), suggesting most loci are homogenized by gene flow and only a few loci showed large divergence. The absolute genetic divergence (dxy) between the two populations was mainly between 0.2 to 0.4 (Fig. 4C, D). We identified 117 dxy-islands. The number of islands with both high FST and dxy is 35. The differences in genetic diversity between the ES and AS populations was shown in Fig. 4E and Fig. S7, and it is higher at most of the loci in ES than that in AS population. When we compared the high FST-islands and the background genome, genetic diversity (π) was significantly lower in islands regions (Fig. 4F). Tajima's D was significantly higher in island of ES population (Fig. 4G), but not significant in AS population. The dxy (Fig. 4H) and FST (Fig. 4I) was significantly higher in genome islands than that of background genome. The recombination rate (ρ) were highly reduced in island regions(Fig. 4J) in both AS and ES populations. The relationship between π, DXY, FST, Tajima's D, LD and ρ of AS and ES populations showed that FST is positively correlated with DXY, but negatively correlated with linkage disequilibrium and genetic diversity, π (Fig. 4K).
Putatively selected genes
To understand how the two populations of D. hydei adapted to the divergent contrasting microclimatic slopes of Evolution Canyon I, selective sweep across the genome were screened on each population. There are 194 and 155 genes were identified as PSGs from AS and ES respectively. Gene Ontology (GO) enrichment (Fig. 5A) were conducted on the two gene sets and we found some shared GO items like neurogenetics, growth (GO:0040007), response to nutrient levels (GO:0031667), response to starvation (GO:0042594) and autophagy (GO:0006914), which may be related to scarce food resources. There are also adaptive GO items related to their population specific stresses. Like aging (GO:0007568), cell death (GO:0008219) and programmed cell death (GO:0012501) in AS population. ES is new and challenging environment for the arid-originated cactophilic species, there are several items related to responses to environmental (GO:0104004) or abiotic (GO:0071214) stimulus in ES population. Remarkably, GO items related to reproduction (GO:0048477, GO:0007292, GO:0048608, GO:0061458, GO:0008406, GO:0007297, GO:1905879, GO:0009994), related to defense to bacterial and fungus (GO:0006955, GO:0002376, GO:0045087, GO:0050832, GO:0009620, GO:0050776, GO:0002682, GO:0042742, GO:0009617), related to fighting and defense (GO:0098542, GO:0006952, GO:0031347, GO:0042060, GO:0009611), related to eye development and photoreceptor (GO:0048592, GO:0150063, GO:0001654, GO:0001745, GO:0001751, GO:0048749, GO:0001754, GO:0009416), and responses to toxic substance and drug (GO:0009636, GO:0042493) and temperature (GO:0009266) helped to adapt to the cool and humid, temperate European slope. Hmr and Nup96 genes were reported to be related to postzygotic isolation by multiple hybrid incompatibilities in flies [70-72]. The FST of these two genes between the AS and ES populations was 0.08 and 0.13, respectively, which is higher than the average of 0.06, suggesting possible reproductive isolation, and this should be examined in future experiments.
Copy number variations between the AS and ES populations
Estimates of gene copy number variations (CNVs) were conducted using both Control-FREEC and a read-depth-based approach. We identified 1,439 and 1,567 CNVs related to genes from the AS and ES populations, respectively, accounting for 234,530,201 Mb and 211,636,179 Mb of the genome. There are significantly more genes related with CNV gain in ES than AS population (Fig. 5B). Gene copy number differentiation between the AS and ES populations was measured by VST (Fig. 5C), with an average VST of 0.03. Permutation test was performed to help identify significant inter-specific differentiation of copy numbers, and those with a VST larger than 95% permutated value was classified as differentiated CNVs. We identified 199 genes as differentiated CNV related genes between the AS and ES populations. ES population has more average CNVs than that in AS. The gain and absence of CNV genes from the two populations were shown in Fig. 5D, where we can see absence mainly existed in AS population, and CNV gain was popular in ES population.
Demographic history
To further explore the demographic divergence, we used a composite likelihood approach by fastsimocoal2 to evaluate models of speciation. The best-fit model indicated that two populations of D. hydei diverged with a constant gene flow around 22.6 kya (Fig. 6A), which suggests sympatric speciation of D. hydei, driven by dramatic inter-slope biome divergence. Additionally, we inferred the effective population size (Ne) for each population with the SMC++ method (Fig. 6B). The result revealed that AS population and ES population experienced both a severe decline after last glacial maximum (LGM) and diverged subsequently, suggesting the differentiation the two populations of D.hydei is driven by LGM. ES population has a higher Ne than AS population, corresponding with a higher genetic diversity of ES population.