Genome sequencing and de novo assembly
We obtained 62.39 Gb (150×) of nanopore long-reads and 49.65 Gb (120×) of Illumina paired-end reads to construct the contig-level genome for this butterfly. We first used nanopore long-reads for de novo assembly. Due to the high error rate of the nanopore reads, the assembly was then polished three times using Illumina short reads. Our result indicate that 95.27% Illumina reads were mapped to the initially assembled genome with a near normal distribution. For completeness assessment, the Nanopore reads were mapped back to the assembly. After above steps, a 423.87 Mb assembly (scaffold:53, N50: 14.32 Mb) was yielded, which is consistent with the genome size estimated based on the k-mer analysis (421.41 Mb) (Fig. S1). The BUSCO completeness value in assembly was 99.1%.
Chromosome-level assembly and synteny analysis
Combined with 24.7 Gb (50×) of Hi-C fragment libraries sequencing, 96.2% scaffolds from H. assimilis were anchored onto 30 chromosomes, ranging from 1.48 Mb to 34.8 Mb in length, and 2 small unplaced scaffolds (Fig. S2). The karyotype of H. assimilis (n=30) was suggested. Finally, a 423.87 Mb (scaffold N50: 15.84 Mb, GC content: 35.4%) of haplotype genome assembly without heterozygous sequence was yielded. The BUSCO completeness value in the assembly was 99.3%. We also applied BUSCO analysis to other published Nymphalidae genomes using the same parameters and subsequently compared the completeness of each genome assembly. The results showed that genome assembly integrity of H. assimilis is comparable to, or higher than, the others and is therefore adequate for further analysis (Table. S1, Fig. S3).
Using D. plexippus as reference, which is a kind of nectarivorous butterfly, synteny analysis revealed that H. assimilis and D. plexippus have highly consistent and extensive gene collinearity, which suggests that the close relationship between these two species at the gene level and the similarity of chromosome structure (Fig. 1). Compared with D. plexippus, H. assimilis has no large-scale chromosome rearrangement. Our results preliminarily excluded the influence of the chromosome rearrangement on the feeding habits of butterflies.
Synteny alignment of the H. assimilis and D. plexippus. The tagged HA01 to HA30 refer to the chromosomes of the sequenced H. assimilis and the black columns refers to the chromosomes of D. plexippus. The densities of GC contents, DNA variations in H. assimilis are shown in the outer rims in orders.
Gene prediction and functional annotation
Using EDTA, we identified ~ 46.37% of the assembled genome as repetitive sequences (Table S2). Among them, the repetitive sequences were dominated by DNA transposons (25.75%) and long terminal repeats (LTRs, 15.68%). By integrating the methods of ab initio, homology-based and transcriptome-based prediction, a total of 16,815 protein-coding genes were identified and the BUSCO value was 98.0% in the H. assimilis genome. In total, 13,747 (81.8%), 15,084 (89.7%) and 9,807 (58.3%) predicted genes were supported by functional annotation from the eggNOG-mapper, Interproscan and UniProt/SwissProt databases (Table S3-5). The average lengths of gene and CDS were 1,447 bp and 8,678 bp, which was comprehensive in the comparison of 14 other lepidoptera species (Fig. S4). Pfam motif results show that the most abundant three classes of motif were RVT_1, Exo_endo_phos_2 and rve (Fig. S5), which were important proteins closely related to transcription.
Demographic history reconstruction
H. assimilis is highly adaptable and widely distributed. This species adapts to the present distribution environment, showing rich genetic polymorphism. PSMC analysis showed that after the last interglacial period of pleistocene, the effective population size of H. assimilis increased dramatically and reached to millions (Fig. 2). During this period, the climate changed warm and wet, with the widely distributed tree residues or pollen, which provided an indispensable material basis for saprophagous butterfly.
Comparative genomic analysis
Comparative genomic analyses were performed between H. assimilis and 14 other lepidopteran species (Aricia agestis, Danaus plexippus, Heliconius erato, Leptidea sinapis, Manduca sexta, Papilio polytes, Papilio xuthus, Papilio bianor, Pararge aegeria, Parnassius apollo, Pieris rapae, Spodoptera frugiperda, Vanessa tameamea and Zerene cesonia). We found 112 unique gene families in H. assimilis (Fig. 3A). Among the common gene families, H. assimilis has 1000 expansion and 513 contraction gene families.
By reconstructing the phylogenetic tree, H. assimilis and the same saprophagous butterfly V. tameamea formed a branch, and diverged approximately 55.69 million years ago (Mya). They diverged from nectarivorous butterfly H. erato about 78.4 Mya, suggesting that the change of feeding habits was caused by early evolutionary events (Fig. 3B). In addition, 93 gene families were expanded and 426 were contracted at H. assimilis and V. tameamea’s common nodes, among which 11 and 8 were significant (p<0.01). The significant expansion gene families were involved in the activation of immune cells and process for oligosaccharide metabolic, but this branch also showed the contraction events were engaged in regulation of gonad development (Table S6).
A Venn diagram of H. assimilis and 14 other lepidopteran species gene family. The petals represent the number of unique gene families to each species. B Phylogenetic analysis of H. assimilis. The number beside each node denotes the estimated divergence time (million years ago) of two connected clades. The arrow indicates the gene family that significantly expanded (+) and contracted (-) at this node. The red pentagram indicates the gene families: immune cells activation (leukocyte GO:0045321, B cell GO:0042113 and lymphocyte GO:0046649), oligosaccharide catabolic process (GO:0009313), mannose metabolic process (GO:0006013) and RNA splicing (GO:0008380), etc., expand at this node.
Positive selection analysis
The values of Ka, Ks and Ka/Ks reveals patterns of functional gene evolution. Compared with V. tameamea and H. erato, we searched for the rapid evolution of genes associated with dietary habits in H. assimilis. Positive selection analysis showed that the rapidly evolving genes could not be found (Ka/Ks ratios< 1), due to too long differentiation time in these three species (Fig. 4). The result also indicated that the genetic background of the three butterfly species had been changed along with differentiation, and most of the gene functions were mainly related to the collateral origin specialized genes (specific genes and gene families) (Fig. S6).
Functional enrichment of target gene family
To understand and trace changes associated with saprophagous dietary, functional analysis was performed on the horizontal transfer gene family, unique gene family and expansion gene family of H. assimilis. Among them, 32 gene families were horizontal transfer gene family and the function were mainly associated with chromosome assembly, cellular component biogenesis and other structural functions (Fig. 5A, Table S7). For 112 unique gene families, the significant gene family enrichment functions were related to macromolecule metabolism, response to external stimulus, cell development and differentiation, locomotion and immune cell activation (Fig. 5B, Table S8). Furthermore, 186 gene families across H. assimilis exhibited expansions, and the significant gain on gene families was involved in process for organonitrogen compound, carbohydrate and lipid metabolic, the cellular response to external stimulus and detoxification, etc. And we also found expansion events involved in the response to acid chemical and regulation of alcohol biosynthetic process (Fig. 5C, Table. 1). In H. assimilis, most of the enzymes analysed involved in nitrogen compound and lipid metabolic pathways such as trypsin-like serine protease, acyltransferase and enoyl reductase might reflect the diversity of its diet (Table S9).
A GO enrichment of horizontal transfer gene family. B GO enrichment of unique gene family. C GO enrichment of expansion gene family.
Table. 1 GO enrichment for significant gene families of habit food.
Species and nodes
|
Function and metabolic pathway
|
GO
|
P-value < 0.01
|
H. assimilis
Expansions
|
+ response to extracellular stimulus
+ response to acid chemical
+ response to nutrient
+ response to starvation
+ fucose metabolic process
+ immune system process
|
GO:0009991
GO:0001101
GO:0007584
GO:0042594
GO:0006004
GO:0002376
|
0.0028
0.0032
4.34E-04
0.0025
6.64E-08
0.0021
|
|
|
|
|
H. assimilis
Unique
|
+ leukocyte activation
+ B cell activation
+ locomotion
+ nitrogen compound metabolic process
+ lymphocyte activation
+ heterocycle metabolic process
|
GO:0045321
GO:0042113
GO:0040011
GO:0006807
GO:0046649
GO:0046483
|
9.13E-04
8.53E-09
4.52E-04
0.0030
1.47E-06
0.0085
|
|
|
|
|
H. assimilis and V. tameamea node
Expansions
|
+ leukocyte activation
+ B cell activation
+ oligosaccharide catabolic process
+ mannose metabolic process
+ lymphocyte activation
|
GO:0045321
GO:0042113
GO:0009313
GO:0006013
GO:0046649
|
9.13E-04
0.0010
0.0035
0.0045
0.0059
|
|
|
|
|
Note: “+”: expansion.
Key gene screening
The successful capture of food is attributed to the ability of H. assimilis to effectively perceive information in the environment. The results of comparative genomic analysis showed that the OBP gene family was contracted. Here, we have comprehensively screened the genome of H. assimilis and determined 23 putative odorant binding proteins (OBPs), including 2 General Odorant-Binding Proteins (GOBPs), 2 Pheromone-Binding Proteins (PBPs) and 19 OBPs. These genes were distributed across 12 scaffolds and some of the H. assimilis OBP genes were located within the same cluster. For example, Hass-OBP10, Hass-OBP11, Hass-OBP12, Hass-OBP13 and Hass-OBP14 were located on the region of scaffold 17. Another four OBP genes (Hass-GOBP1, Hass-GOBP2, Hass-PBP-C and Hass-PBP-D) resided in the scaffold 16 (Table S8). Among them, 10 OBPs belonged to minus-C subfamily and others were classified as classic subfamily (Fig. 6A Table S10). We also found that GOBP1, GOBP2, PBP-C and PBP-D of H. assimilis defined a unique clade, which was species-specific and related to host location. Interestingly, compared with other species of lepidopteran, PBP-A and PBP-B were absent in H. assimilis (Fig. 6B, 6C). We speculate that the change of feeding habits of H. assimilis may be related to the loss of OBP gene. Future studies could evaluate the expression levels for OBP genes, and their regulation.