Genome assembly, organization, and base composition
Except for the variable control region (CR) and Norgal assemblies, we recovered the same gene order and content using both of our mitogenome assembly strategies. We discovered that Norgal failed to assemble mitogenome sequences when using the de novo assembly strategy, with no mitogenomic features (PCGs, tRNAs, and rRNAs) found for both assemblies resulting from using default (assembly size = 24,871 bp) and adjusted parameters (assembly size = 29,520 bp, -m 500). However, when run using the baited de novo assembly strategy, Norgal recovered the same gene order and content as SPAdes and Geneious Prime®, with the exception of the large ribosomal RNA gene (rrnL), which differed in size by 1 bp and sequence from that recovered by SPAdes and Geneious Prime® (pairwise p-distance = 0.01). We discovered that the different mitogenome sizes were caused by difficulties associated with the CR’s properties, which include variation in the copy number of tandemly repeated sequences and extensive length variation of a variable domain [31, 32]. When using SPAdes under the de novo assembly strategy, the nearly complete CR (1101 bp) was recovered. When the baited de novo assembly strategy was used, SPAdes recovered a partial CR of 380 bp in which the repetitive sequences could not be assembled. As a result, we present the complete mitogenome sequences of the apple fruit moth from the SPAdes de novo assembly, where the mitogenome is a 16,044 bp closed circular molecule (GenBank accession: ON496993; Fig. 1). Interestingly, the mitogenome size of the apple fruit moth was similar to available Yponomeutoid mitogenomes [33-35], which are relatively longer on average compared to other superfamilies of Lepidoptera (n = 4, 16092 ± 353 bp, Table 2).
The gene content of the apple fruit moth mitogenome is similar to that of other Ditrysian insects studied previously, with 22 tRNA genes, 13 PCGs, 2 rRNAs and a noncoding control region. The low-strand codes for 9 PCGs (cob, cox1, cox2, cox3, atp6, atp8, nad2, nad3 and nad6), 14 tRNAs (trnM, trnI, trnW, trnL2, trnK, trnD, trnG, trnA, trnR, trnN, trnS1, trnE, trnT and trnS2), 4 PCGs (nad1, nad4, nad4L and nad5), 8 tRNAs (trnC, trnF, trnH, trnL1, trnP, trnQ, trnV, trnY) and two mitochondrial rRNAs (rrnL and rrnS) (Fig. 1, Table 3). The lengths of the tRNA genes range from 64 to 75 bp (Table 3), which is well within the range of the corresponding tRNA genes of other lepidopterans: Plutella xylostella [34], Parnassius apollo [36], Leucoma salicis [7], Ephestia kuehniella [37] and Speiredonia retorta [6]. All 22 tRNAs had cloverleaf secondary structures, except tRNAS1, where one of the dihydrouridine (DHU) arms is missing (Fig. 2). The loss of the DHU arm in tRNAs has been detected in various Lepidoptera species [6, 38, 39]. DHU lacking arm was hypothesized to have evolved in response to recognition signals for seryl-tRNA synthetases, reflecting potential differences in gene expression [40, 41] The location of rrnL is between trnV and trnL1, while rrnS is detected between the control region and trnV. These are the same gene positions found in P. xylotella [34]. The lengths of rrnL and rrnS in A. conjugella are 1371 bp and 783 bp, while the lengths of these genes are 1371 bp & 783 bp, 1344 bp & 840 bp and 1413 bp & 781 bp in Speiredonia retorta, Leucoma salicis and Plutella xylostella, respectively[6, 7, 34]. The rRNA genes were A + T rich (82%), falling within the range detected in other Lepidoptera species, including Agrotis segetum [42], Agrotis ipsilon [43], Spodoptera frugiperda [44], and Papilio machaon [45]. The rRNA AT and GC skewness values were found to be negative in most of the analyzed Lepidoptera mitogenomes in the study, including A. conjugella; however, in Tecia solanivora [46], Spilarctia subcarnea [47] and Speiredonia retorta [6], these values were positive. In A. conjugella, the cox1 gene starts with ATT, which is different from the start codon in the superfamily Yponomeutoidea members Plutella xylostella, Leucoptera malifoliella and Prays oleae, where the gene start codon is CGA. The start codon of the cox1 gene was found to be variable in other Lepidoptera species [48]. The size of this gene (1534) in A. conjugella is 3 bp larger than that in these three species (Plutella xylostella, Leucoptera malifoliella and Prays oleae) in the same superfamily. The cox2 gene size (682 bp) is the same size as that of Leucoptera malifoliella but larger than that found in Plutella xylostella and Prays oleae (679), while all these species have the size of the cox3 gene (789 bp). The largest PCG found in A. conjugella mitogenomes is nad5 (1732 bp), and the smallest one is atp8 (162 bp). These results are widely reported in various insect mitogenomes [49, 50]. Overlap of the alginate sequences of atp6 and atp8 in A. conjugella (Fig. 3) showed the conserved nucleotide sequence ATG ATA A, which is detected in most lepidopteran species [34, 51].
We found that the locations of the trnM gene follow the ditrysian type trnM-trnI-trnQ [52], which is different from non-ditrysian groups in Lepidoptera and from the ancestral order in which trnM is translocated: trnI-trnQ-trnM [52-54]. The control region of A. conjugella is large (1101 bp), which is a common feature detected in the superfamily Yponomeutoidea [35]. In comparison, the CR of the olive and diamondback moths were found to be ~1600 bp and ~1081 bp, respectively [34, 35]. We found that the CR is comprised of nonrepetitive sequences, including the motif ‘ATAGA’ followed by a 20 bp poly-T stretch, dinucleotide microsatellites (AT)18 and (AT)53, each flanked by ATTTA motifs, a (TAAA)4 adjacent to trnM instead of the 11 bp poly-A adjacent to tRNAs, and several imperfect repeat elements, indicating that the sequence in the present study may be partial. We found that the nucleotide composition of the CR was highly AT-rich, where the AT content was estimated at 94.3%, (A: 47.6%, T: 46.7%, G: 1.8%, C: 3.9%), and AT and GC skews were negative, 0.010 and −0.368, respectively. Overall, the nucleotide composition of the apple fruit moth mitogenome was also highly AT-rich, where the AT content was estimated at 82%, (A: 44.8%, T: 41.2%, G: 7.4%, C: 10.6%), and AT and GC skews were negative, −0.005 and −0.178, respectively (Table 2). These results are in agreement with results obtained in Plutella xylostella [34], Leucoma salicis [7], Ephestia kuehniella [37] and Speiredonia retorta [6].
The codon usage in A. conjugella was compared with twelve Lepidopteran species from different families (Fig. 4). The comparison showed that the pattern of codon usage in the PCGs of the A. conjugella mitogenome is very similar to the patterns in these Lepidopteran mitogenomes. Asn, Ile, Leu2, Met and Phe are the most commonly used codon families in all these species, while Cys codons are the rarest (Fig. 4, Fig. 5). The relative synonymous codon usage (RCSU) was analysed for A. conjugella and compared with the same set of Lepidopteran insects (Fig. 6). CTG, CTC, AGG and ACG were completely absent in the A. conjugella mitogenome PCGs. Codons with high G and C contents are also rare or absent in the PCGs in other Lepidopteran mitogenomes. Moreover, TTA (Leu2), TCT (Ser2), CGT (Arg), GCT (Ala), and GGA (Gly) are the most frequently used codons and account for 36.41%. These five amino acids are also detected in other Lepidoptera species, such as Manduca sexta [55], Helicoverpa armigera [56], Plutella xylostella [34], Tecia solanivora [46], Papilio machaon [45], and Ostrinia nubilalis [57]. In particular, Leu2 was found to be the most frequently detected amino acid in all Lepidoptera species in the study, and this result is supported by results found in Leucoma salicis [7] and Speiredonia retorta [6].
Phylogenetics
To obtain an overview of A. conjugella and its relationships with other Lepidoptera species, our study investigated 18 superfamilies representing 42 families and 507 Lepidoptera species (Table S1, Table S2 and Figure S2). This is the first phylogenetic study (using the mt genome) of A. conjugella in the Argyresthiidae family, which belongs to the Yponomeutoidea superfamily. Using the ML approach, analyses of the three datasets (specified in the materials & methods section) resulted in the generation of three topologies. The 507 Lepidoptera species showed that some families clustered together as Papilionidae & Pieridae, Pyralidae & Tortricidae, Geometridae & Sphingidae, Erebidae & Noctuidae and Gelechiidae & Sphingidae, while other families as Tortricidae and Crambidae clustered alone and separately. Yponomeutoidea was recovered as a well-supported monophyly group and as one of the earliest lepidopteran groups after Tineoidea and the basal Hepialoidea (Fig. 7, figure S1, and figure S2). However, the paraphyletic Tineoidea to some extent led to the phylogenetic instability of the monophyly of Yponomeutoidea in cases of Datasets 1 and 2 (Fig. 7, Figure S1), which was fully resolved with dense taxon sampling (Figure S2). Tineidae, represented by four species (Amorophaga japonica, Dahlica ochrostigma, Gibbovalva kobusi and Eudarcia gwangneungensis) with relatively high nodal support (Fig. 7, Figure S2), Wang et al. (2018) [60], Bao et al. (2019) [38], Jeong et al. (2022) [23] and Zhang et al. (2020) [61], all found similar results for these two superfamilies. Additionally, Boa et al. (2019) [38] and Jeong et al. (2022) [23] also found that Yponomeutoidea, Tineoidea and Gracillarioidea in Ditrysia have strong phylogenetic relationships. Additionally, we detected strong relationships between Yponomeutoidea, Zygaenidae and Tortricoidea, findings that are in line with results found by Liu et al. (2016) [48], Zhang et al. (2020) [61], Wang et al. (2018) [60], and Kim et al. (2014) [62]. We detected only a weak phylogenetic relationship between the superfamilies Yponomeutoidea and Bombycoidea, results that are supported by Liu et al. (2016) [63] and Liu et al. (2017) [64]. Nonetheless, we consistently recovered Argyresthiidae embedded in Yponomeutoidea with a sister-group relationship to Plutellidae (Dataset 1:SH-aLRT = 92, UFBoot2 = 100; Dataset 2: SH-aLRT = 88, UFBoot2 = 100; Dataset 3: SH-aLRT = 87, UFBoot2 = 99). Our phylogenetic tree hypothesis rejects the provisional ‘AL’ clade (Argyresthiidae + Lyonetiidae) recovered with nuclear gene datasets by [16] Sohn et al. (2013). We found that Lyonetiidae was unstable, possibly due to its relatively long branch length. We recovered Lyonetiidae as basal to the Yponomeutoidea clade (Figure S1, Dataset 1: SH-aLRT = 99, UFBoot2 = 100) or as a sister-group to Praydidae with Yponomeutoidea (Figure S2, Dataset 3: SH-aLRT = 84, UFBoot2 = 100), and as sister-group to Gracillariidae of the order Tineoidea, although with weak support (Fig. 7, Dataset 2: SH-aLRT = 43, UFBoot2 = 91). With increased taxon sampling, our phylogenetic tree hypotheses strongly supported the basal placement of Lyonetiidae within the Yponomeutoidea clade (Fig. 7, Figure S2, Dataset 2: SH-aLRT = 98, UFBoot2 = 99). Moreover, we consistently recovered the previously described pairing of Yponomeutoidea and Gracillariidae as internested subclades[16, 22]. At a higher level, our phylogenetic tree hypothesis recovers some fundamental and uncontroversial lepidopteran clades that agree with the majority of mitogenomic phylogenies as well as those that included both mitochondrial and/or nuclear markers. The analyses found that A. conjugella had the closest relationship with Plutella xylostella, Leucoptera malifoliella and Prays oleae, which belong to the Plutellidae, Lyonetiidae and Praydidae families, respectively (Fig. 7, Figure S2). Wei et al. (2013) [34], Sohn et al. (2013) [16], Liu et al. (2016) [48], Yang et al. (2020) [58], Jeong et al. (2021) [59] and Jeong et al. (2022) [23] all found that Plutella xylostella, Leucoptera malifoliella and Prays oleae are closely related.
Our phylogenetic analysis supported the previous morphological characterization of the Yponomeutoidea superfamily [16, 66, 67]. This study can provide a useful resource for studies on the genetic evolution of A. conjugella and underline the potential importance of mitochondrial genomes in comparative genomic analyses of Lepidoptera species. Comprehensive analyses of insect mitogenomes provide important phylogenetic information to identify potentially novel genes that may serve as valuable targets in future research efforts. Further investigations of the whole genome of A. conjugella together with other genomes of Lepidoptera species will facilitate the understanding of the taxonomy and evolutionary process acting on the Ditrysia natural group.