Genome composition and base structure
The complete mitogenome of C. infuscatellus was 15,252 bp in length (GenBank accession number: OR288533), within the range of sequenced lepidopteran mitogenomes from 14,838 to 16,173 bp (Table S1). This mitogenome contained a total of 37 genes, including 13 PCGs (cox1-cox3, cob, nad1-nad6, nad4l, atp6, and atp8), 22 tRNA genes, and 2 rRNA genes (Table 1). Among them, 4 PCGs (nad1, nad4, nad4l, and nad5), 8 tRNA genes (trnF, trnH, trnP, trnL1, trnV, trnQ, trnC, and trnY), and 2 rRNA genes were encoded by the minor strand (N-strand) (Fig. 1). The remaining 23 genes were encoded by the major strand (J-strand). The location and structure of the protein-coding genes, tRNA genes, and rRNA genes were found to be conserved.
Table 1
Characteristics of the mitochondrial genome of Chilo infuscatellus
Feature
|
Strand
|
Location
|
Size(bp)
|
Anticodon
|
Start/Stop codon
|
Intergenic nucleotides
|
trnM
|
+
|
1–67
|
67
|
CAT
|
|
0
|
trnI
|
+
|
68–134
|
67
|
GAT
|
|
3
|
trnQ
|
-
|
138–206
|
69
|
TTG
|
|
53
|
nad2
|
+
|
260–1273
|
1014
|
|
ATT/TAA
|
1
|
trnW
|
+
|
1275–1341
|
67
|
TCA
|
|
-8
|
trnC
|
-
|
1334–1403
|
70
|
GCA
|
|
2
|
trnY
|
-
|
1406–1470
|
65
|
GTA
|
|
6
|
cox1
|
+
|
1477–3007
|
1531
|
|
CGA/T
|
0
|
trnL2
|
+
|
3008–3074
|
67
|
TAA
|
|
0
|
cox2
|
+
|
3075–3759
|
685
|
|
ATG/T
|
-3
|
trnK
|
+
|
3757–3826
|
70
|
CTT
|
|
0
|
trnD
|
+
|
3827–3896
|
70
|
GTC
|
|
0
|
atp8
|
+
|
3897–4062
|
166
|
|
ATT/T
|
-2
|
atp6
|
+
|
4061–4738
|
678
|
|
ATG/TAA
|
-1
|
cox3
|
+
|
4738–5526
|
789
|
|
ATG/TAA
|
2
|
trnG
|
+
|
5529–5594
|
66
|
TCC
|
|
0
|
nad3
|
+
|
5595–5948
|
354
|
|
ATT/TAA
|
8
|
trnA
|
+
|
5957–6021
|
65
|
TGC
|
|
3
|
trnR
|
+
|
6025–6090
|
66
|
TCG
|
|
-1
|
trnN
|
+
|
6090–6154
|
65
|
GTT
|
|
-1
|
trnS1
|
+
|
6154–6219
|
66
|
GCT
|
|
3
|
trnE
|
+
|
6223–6289
|
67
|
TTC
|
|
3
|
trnF
|
-
|
6293–6360
|
68
|
GAA
|
|
0
|
nad5
|
-
|
6361–8098
|
1738
|
|
ATT/T
|
0
|
trnH
|
-
|
8099–8163
|
65
|
GTG
|
|
-1
|
nad4
|
-
|
8163–9503
|
1341
|
|
ATG/TAA
|
-2
|
nad4l
|
-
|
9502–9793
|
292
|
|
ATG/T
|
2
|
trnT
|
+
|
9796–9859
|
64
|
TGT
|
|
0
|
trnP
|
-
|
9860–9924
|
65
|
TGG
|
|
2
|
nad6
|
+
|
9927-10,454
|
528
|
|
ATA/TAA
|
13
|
cob
|
+
|
10,468 − 11,613
|
1146
|
|
ATG/TAA
|
-2
|
trnS2
|
+
|
11,612 − 11,680
|
69
|
TGA
|
|
12
|
nad1
|
-
|
11,693 − 12,634
|
942
|
|
ATG/TAA
|
1
|
trnL1
|
-
|
12,636 − 12,702
|
67
|
TAG
|
|
-21
|
rrnL
|
-
|
12,682 − 14,069
|
1388
|
|
|
-2
|
trnV
|
-
|
14,068 − 14,133
|
66
|
TAC
|
|
0
|
rrnS
|
-
|
14,134 − 14,911
|
778
|
|
|
0
|
A + T-rich region
|
|
14,912 − 15,252
|
341
|
|
|
0
|
Note: + represents the major strand; - represents the minor strand.
The mitogenome of C. infuscatellus exhibits 11 gene overlaps, ranging in size from 1 to 21 bp, and 15 intergenic spacers, ranging in length from 1 to 53 bp (Table 1). The largest gene overlap is observed between trnL1 and rrnL, and there is a 2 bp overlapping fragment between atp8 and atp6, which is shorter than that in most lepidopteran mitogenomes (Campbell and Barker, 1999; Cao et al., 2014; Wang et al., 2022). The longest intergenic spacer is located between trnQ and nad2 (Table 1).
The nucleotide composition of the C. infuscatellus mitogenome was as follows: T (40.30%), A (39.83%), C (11.99%), and G (7.88%). The A + T content, which is consistent with the characteristic of a strong A + T bias in insect mitogenome, was 80.13% in the entire mitogenome, 78.38% in PCGs, 81.58% in tRNA genes, and 84.86% in rRNA genes (Table 2). The GC skewness was considerably higher than that of the AT skewness across the entire mitogenome, with values of -0.01 and − 0.21 for AT and GC skewness, respectively. Moreover, the content of the T base was higher than that of A, while the content of the C base was higher than that of G.
Table 2
Nucleotide composition of C. infuscatellus mitogenome.
Feature
|
A%
|
T%
|
C%
|
G%
|
AT%
|
GC%
|
AT Skew
|
GC Skew
|
protein-coding genes
|
33.73
|
44.65
|
10.43
|
11.18
|
78.38
|
21.62
|
-0.14
|
0.03
|
1st codon position
|
36.92
|
36.60
|
10.19
|
16.29
|
73.52
|
26.48
|
0.004
|
0.23
|
2nd codon position
|
21.99
|
48.49
|
16.18
|
13.34
|
70.48
|
29.52
|
-0.38
|
-0.10
|
3rd codon position
|
42.27
|
48.89
|
4.93
|
3.91
|
91.16
|
8.84
|
-0.07
|
-0.12
|
tRNAs
|
40.86
|
40.72
|
7.89
|
10.54
|
81.58
|
18.42
|
0.00
|
0.14
|
rRNAs
|
43.17
|
41.69
|
4.85
|
10.30
|
84.86
|
15.14
|
0.02
|
0.36
|
A + T-rich region
|
45.45
|
50.73
|
2.93
|
0.88
|
96.19
|
3.81
|
-0.05
|
-0.54
|
whole mitogenome
|
39.83
|
40.30
|
11.99
|
7.88
|
80.13
|
19.87
|
-0.01
|
-0.21
|
PCGs
The A + T content in the PCGs of the C. infuscatellus mitogenome was measured to be 78.38%, which was significantly higher than the G + C content (21.62%). Notably, the A + T content at the third codon position was the highest (91.16%) (Table 2), which is consistent with other insect mitogenomes (Boore, 1999; Tang et al., 2014; Yang et al., 2021; Kim et al., 2022). GC skewness was lower than AT skewness in 13 PCGs, which was in contrast to the skewness observed in the entire mitogenome. AT skewness was − 0.14, and GC skewness was 0.03. In lepidopteran mitogenomes, the AT skewness values of PCGs are all negative, while the values of GC skewness can be either negative or positive.
Among the 13 PCGs in the C. infuscatellus mitogenome, seven PCGs (cox2, atp6, cox3, nad4, nad4l, cob, and nad1) start with ATG, while four PCGs (nad2, atp8, nad3, and nad5) start with ATT. The remaining two PCGs, e.g. cox1 and nad6, start with CGA and ATA, respectively. The ATN and CGA start codons for PCGs are commonly found in insect mitogenomes (Sun et al., 2016; Zheng et al., 2018). Regarding the termination codons, eight PCGs (nad2, atp6, cox3, nad3, nad4, cob, nad1, and nad6) end with TAA. Five PCGs (cox1, cox2, atp8, nad5, and nad4l) have an incomplete termination codon of a single T. The incomplete termination codon, widely observed in insect mitogenomes, is believed to potentially undergo completion through the process of post-transcriptional polyadenylation (Ojala et al., 1981; Jiang et al., 2009; Donath et al., 2019).
The relative synonymous codon usage was calculated using MEGA5.0. Results showed that amino acids with two synonymous codons exhibited a high usage frequency of A or U in the third codon position. Codon usage analysis showed that the most frequently used codon was UUA (L), followed by AUU (I), UUU (F), AUA (I), AAU (N), and UAU (Y). The biased usage of A and U results in a high A + T content in other insect mitogenomes (Ding et al., 2023; Huang et al., 2022, 2023). Additionally, the analysis of amino acid composition showed that Leu was the highest, followed by Ile, Phe, and Ser (Fig. 2).
rRNA and tRNA genes
Two rRNA genes (rrnL and rrnS) of the C. infuscatellus mitogenome were 1388 and 778 bp in length, respectively. They were located between trnL1 and trnV, and between trnV and the A + T-rich region, respectively. The analysis of base composition showed that the A + T content of rRNA genes was 84.86%. The A + T bias was obvious, and the AT-skew and GC-skew were 0.02 and 0.36, respectively. The length, location, and base composition of two rRNA genes were similar to those of other lepidopteran insects (Table 1) (Boore, 1999; Taanman, 1999; Yang et al., 2021; Kim et al., 2022; Yukuhiro et al., 2002; Qin et al., 2015; Yang et al., 2019).
The total length of the 22 tRNA genes in the C. infuscatellus mitogenome is 1471 bp. Among these tRNAs, trnK, trnC, and trnD were the longest (70 bp), while trnT was the shortest with only 64 bp in size. The “DHU” arm was absent in the trnS1 gene, and trnS2 exhibited two mismatched base pairs in the anticodon stem (Fig. 3). Similarly, trnL1 (CUN) and trnA contain a U-U mismatch in the recipient stem. All tRNA genes, except for trnS1, were folded into the typical cloverleaf structure. This is consistent with the results obtained by other lepidopteran insects (Chai et al., 2012; Bian et al., 2020).
A + T-rich region
The A + T-rich region of the C. infuscatellus mitogenome is 341 bp in length, located between rrnS and trnM (Fig. 4 and Table 1). The content of A + T is high at 96.19%, while the content of G + C is only 3.81%. The AT-skew for the control region was slightly negative (-0.05), indicating a higher occurrence of T nucleotides compared to A nucleotides. The length and A + T content of this region are comparable to those of other insects within the same genus, such as Chilo suppressalis (Chai et al., 2012), Chilo auricilius (Cao and Du, 2014), and C. sacchariphagus. However, the A + T-rich region in the T. renzhiensis mitogenome is 1367 bp in size (Cao et al., 2012), while it is 288 bp in the Cydalima perspectalis mitogenome (Gao et al., 2023), This suggests that the A + T-rich region in lepidopteran mitogenomes is not conserved.
The A + T-rich region of the C. infuscatellus mitogenome contains a motif of “ATAGA” and a 19 bp poly-T stretch, which is similar to most lepidopteran mitogenomes (Chen et al., 2022; Shah et al., 2022). The ten copies of “ATTTA” motif and microsatellite A/T repeat sequences were also found in the A + T-rich region, which are considered common features in insects (Kim et al., 2012; Bian et al., 2020; Riyaz et al., 2021). In addition, a 6 bp poly-T stretch was observed at the end of the A + T-rich region.
Gene rearrangement
Gene rearrangement is an important clue for exploring the evolution of insects. In contrast to the high frequency of rearrangement events found in Hymenoptera, lepidopteran mitogenomes have a relatively conserved gene arrangement (Zhu et al., 2018; Yi et al., 2022). CREx analysis demonstrated that the mitogenome of C. infuscatellus exhibits trnM gene rearrangement events compared to the ancestral mitogenome (Fig. 5). The order of tRNA genes between the A + T-rich region and nad2 has been changed from trnI-trnQ-trnM to trnM-trnI-trnQ. This rearrangement occurred among most lepidopteran species, except for the Hepialoidea species, which maintains the ancestral gene order (Cao et al., 2012). It implies that the rearrangement of trnM occurred during the divergence of Hepialoidea and other lepidopteran lineages. The rearrangement is commonly explained by the tandem duplication-random loss (TDRL) model, which suggests that the trnM gene underwent duplication and subsequent loss in a random manner (Cameron, 2014).
Phylogenetic analysis
The phylogenetic trees were constructed based on 13 PCGS of 54 lepidopteran mitogenomes using ML and BI methods (Fig. 6). The phylogeny of the superfamily Pyraloidea has been extensively discussed in numerous studies, and the phylogenetic relationships within the superfamily Pyraloidea are relatively stable. Although some clades exhibited low support values, the results of the two trees were identical.
In the present study, two phylogenetic trees were divided into two branches. One branch is composed of four subfamilies of the Pyralidae, while the other branch is composed of seven subfamilies of the Crambidae. This strongly supports the monophyly of Pyralidae and Crambidae (Qi et al., 2021; Wu et al., 2023). Except for the absence of the mitogenome in Chrysauginae, there are five subfamilies within Pyralidae. The phylogenetic relationships among these subfamilies were presented as ((Epipaschiae + Pyralinae) + Phylitinae) + Galleriinae) (Fig. 6). These results are consistent with numerous reports in recent years (Zhu et al., 2018; Liu et al., 2021) and do not support the reversion of Phycitinae and Pyralinae based on a unique orientation of the uncus arms, as reported in the early study (Solis and Mitter, 1992). Within the family Crambidae, two lineages were identified, namely the “PS clade” (Pyraustinae and Spilomelinae) and the “non-PS clade” (the other subfamilies) (Regier et al., 2012). Our results confirmed that Spilomelinae and Pyraustinae were grouped together and separated from the other subfamilies. It has been reported that the “non-PS clade” can be further divided into two branches, namely the “OG clade”(including Odontiinae and Glaphyriinae) and the “CAMMSS clade”(including Nymphulinae, Crambinae, Scopariinae, and Schoenobiinae) (Regier et al., 2012). Our BI and ML analyses supported the absence of two clades. However, the phylogenetic relationship of subfamilies in the “CAMMSS clade” is highly controversial in various literature (Yang et al., 2018; Roh et al., 2020). Certain groups of researchers suggested that the phylogenetic relationship can be represented as (Scopariinae + Crambinae) + (Nymphulinae + Schoenobiinae) (Regier et al., 2012; Jeong et al., 2021; Cheng et al., 2022). However, other researchers have reported alternative relationships, including (Scopariinae + (Crambinae + (Schoenobiinae + Nymphulinae))) or (Schoenobiinae + (Scopariinae + (Nymphulinae + Crambinae))) (Gao et al., 2023; Wu et al., 2023). In this article, the relationship among the three taxa appears to be ((Nymphulinae + Scopariinae) + (Crambinae + Schoenobiinae)), but with low bootstrap support. It is also supported by another recent study (Liu et al., 2021). The controversial phylogenetic relationship of these subfamilies may be attributed to the limited availability of mitochondrial data for the subfamily Schoenobiinae. More data is required to validate these findings.