The first complete mitochondrial genome sequences for Ulidiidae and phylogenetic analysis of Diptera

Tetanops sintenisi is a pest that mainly damages the root of quinoa (Chenopodium quinoa) and it is first discovered in China in 2018. Here, the complete mitochondrial genome (mitogenome) of T. sintenisi was sequenced and compared with the mitogenomes of other Diptera species. The results revealed that the mitogenome of T. sintenisi is 15,763 bp in length (GenBank accession number: MT795181) and is comprised of 13 protein-coding genes (PCGs), 22 transfer RNA (tRNA) genes, 2 ribosomal RNA genes, and a non-coding A + T-rich region (959 bp). The highly conserved gene arrangement of the mitogenome of T. sintenisi was identical to that of other Diptera insects. Twelve PCGs contained the typical insect start codon ATN, while cox1 had CGA as the start codon. The genes cox2, nad4, and nad1 contained an incomplete termination codon T; nad3, nad5, and cob contained the complete termination codon TAG; and the remaining seven PCGs contained the termination codon TAA. All tRNA genes were predicted to fold into the typical cloverleaf secondary structure. Phylogenetic analysis of 48 species based on the mitogenome sequence revealed that T. sintenisi clustered with the Tephritidae family, indicating that T. sintenisi and Tephritidae have a close phylogenetic relationship. The phylogenetic relationship of T. sintenisi based on the mitogenome is consistent with the traditional morphological taxonomy, according to which T. sintenisi belongs to the family Otitidae, which is closely related to the family Muscidae.

Tetanops sintenisi belongs to the family Ulidiidae of Diptera [15]. Its larvae mainly damage the root of Chenopodium plants, causing the above ground parts to turn yellow and wilt and leading toa decreased growth and eventually death of the plants [16]. In recent years, this pest has broken out on the quinoa (Chenopodium quinoa) fields in Shanxi Province in China, causing serious losses [16]. It was first captured from Calluna vulgaris in Bargerveen, central Holland in 1909 [15], and has been discovered in Ukraine, Finland, Latvia, Russia, and other regions in Eastern Europe [17]. Since 2000, it has been found in Poland, Germany, United Kingdom, and Belgium in Western Europe [18][19][20][21]. In China, it was first discovered in Jingle County, Shanxi Province in 2018 [16]. To our knowledge, the mitogenome of T. sintenisi or other species of the family Ulidiidae has not been previously determined.
Here, we characterized the mitogenome sequence of T. sintenisi and compared it with the mitogenome sequences of other insects, especially Diptera species. In addition, we used the mitogenome sequences to construct phylogenetic relationships among the 48 families of Diptera.

Specimens
On May 10, 2018, T. sintenisi larvae (approximately 50) were collected from a quinoa field (112.2°E, 38.4°N) in Jingle County, Shanxi Province, China. The samples were fixed in 99% anhydrous ethanol. The ethanol was replaced with fresh 99% anhydrous ethanol once after the samples were brought back to the laboratory, and the samples were stored in a refrigerator at -30°C.

Morphological characteristics and sequence analysis of CO1 gene
The taxonomic status of quinoa root maggots was determined following the morphological identification characteristics reported in the literature [15]. The egg is 1 mm long, white or milky white, fusiform, slightly curved. The larva is 8-10 mm long, white, maggot-shaped, and the head is cone-shaped ( Supplementary Fig. 1a); the surface of the mature larva is tough, with a pair of spikes at the end of the tail section ( Supplementary Fig. 1b). The pupa is 7-9 mm long, cocoon-shaped, yellow at the beginning, and brown later ( Supplementary Fig. 1c). Adult body length is 6-9 mm, wingspan is 11-13 mm, and surface appearance is similar to housefly, with black, shiny body, without obvious stripes or bristles. The wings are transparent, and there is a brown streak at 1/3rd of the body of the subfront vein of the forewing (the identifying feature of this species). The end of the abdomen of the male fly is black and round, and the end of the abdomen of the female fly is dark to dark orange and pointed, with a long ovipositor at the end ( Supplementary  Fig. 1d). The CO1 gene sequence of mitochondrial cytochrome C oxidase subunit was analyzed (Shanghai Suny Biotechnology CO., LTD., Project No. CMP201805014), and the results were identified as Tetanops Sintenisi. It belonged to Diptera, Ulidiidae and Tetanops.

Mitochondrial genome sequencing
Three samples were collected, each with ten larvae, and total DNA was extracted using a blood/cell/tissue genomic DNA extraction kit (Tiangen Biochemical Technology (Beijing) Co., Ltd.) according to the manufacturer's instructions. The concentration and purity of the total DNA were detected using a nucleic acid analyzer, and the integrity of the total DNA was evaluated through 1% agargel electrophoresis. After the quality test was completed, the DNA was sent to Shanghai Paisenol Biotechnology Co., Ltd. for sequencing. In this study, paired end (PE) sequencing was performed on these libraries using the Illumina MiSeq Sequencing platform and next generation sequencing (NGS).

Sequence annotation and feature analysis
A5-miseq V20150522 [22] and SPAdesv3.9.0 [23] were used to assemble high-quality second-generation sequencing data de novo and construct contig and scaffold sequences. The sequences were blasted (BLAST V2.2.31+) by comparing with the nt database in NCBI, and the mitochondrial sequences of each splicing result were selected.
Mitochondrial assembly results were integrated, and collinear analysis was performed using Mummer V3.1 [24] to determine the position relationship and fill the gap between contigs. Pilon V1.18 [25] was used to correct the results to obtain the final mitochondrial sequence. Then, we uploaded the complete mitogenome sequence to the MITOS (http:// mitos.bioinf.uni-leipzig.de/) and MFannot (http://megasun. bch.umontreal.ca/cgi-bin/mfannot/mfannotInterface.pl) web servers for functional annotation [26]. All parameters were set to the defaults, except Genetic Code, which was set as 05-Invertebrate. The tRNA genes were identified and their potential secondary structures were inferred using MITOS online software. CGView visualization software was used to generate the whole genome circle map [27].

Phylogenetic analysis
We used 48 complete mitogenome sequences from Diptera and one Hymenoptera species as outgroups to construct phylogenetic relationships among families of Diptera. The nucleotide sequences of their complete mitogenome sequences were obtained from NCBI, and they were aligned using ClustalW in MEGA-X [28]. Best partition schemes and substitution models, tree topology reconstruction, and node supports (1000 fast bootstrap replicates, alrt 0, and abayes), were performed under Maximum Likelihood (ML) in IQ-TREE v1.6.12 [29,30]. The Bayesian method was modeled as GTR + I + G, and MrBayes 3 software was used to run the system for 200,000 generations using a burn-in of 25% generations.

Mitochondrial genome structure
The mitogenome of T. sintenisi was 15763 bp long (Gen-Bank accession number ID: MT795181) and it consisted of 13 PCGs, 22 tRNA genes, 2 rRNA genes, and 1 noncoding control region. Minor coding strand encoded 23 genes, comprising 9 PCGs and 14 tRNA genes. The majority strand contained four PCGs, eight tRNA genes, and two rRNA genes ( Fig. 1 and Table 1). The annotated sequence has been deposited into GenBank under the accession number MT795181.
Gene spacing or overlap was found in 37 genes of the whole mitogenome except the control region. There were 16 intergenic spacers, 194 bp in total, among which the longest intergenic spacer was 32 bp and located between rrnL and trnV. There were 15 gene overlaps, a total of 75 bp, and the longest overlap (20 bp) was between trnF and nad5( Table 1). The number of gene spacer regions was larger than that of the overlap regions and most spacer regions were longer than the overlap regions ( Fig. 1, Table 1).
The nucleotide composition of the genome was A: T: C: G = 38.2%: 31.0%: 19.5%: 11.3%. The A + T content of Genes outside the map are transcribed in a clockwise direction, whereas those inside the map are transcribed counterclockwise. The second circle shows the GC content and the third shows the GC skew

Protein-coding genes
The total sequence length of 13 PCGs was 11145 bp, among which the longest gene was nad5 (1731 bp) and the shortest gene was atp8 (162 bp), located in the J-strand and N-strand, respectively. The start codon of cox2, atp8, atp6, cox3, nad4, nad4l, and cob was typically ATG. The start codon of nad2, nad3, nad5, and nad6 was ATT, and those of the remaining genes cox1 and nad1 were CGA and ATA, respectively ( Table 1). The stop codon of nad2, cox1, atp8, atp6, cox3, nad4l, and nad6 was TAA, while that of nad3, nad5, and cob was TAG. The cox2, nad4, and nad1 had an incomplete stop codon T (AA). Among the 13 PCGs, only three genes the whole genome was 69.2%, higher than that of G + C (30.8%), and the A + T content of all protein-codingand rRNA genes in the genome was higher than the G + C content (Supplementary Table 1). There was a high A + T content bias in the mitogenome sequence, and the highest A + T content was detected in the control area (80.6%), which was much higher than that in the whole sequence; this result was consistent with the characteristics of base nucleotide composition bias in the insect control area. trnY, trnS1, trnE, trnH (2), trnP, trnF, and trnA, five pairs located in the anticodon arm of trnE, trnH, trnT, trnF, and trnV, and two pairs located in the TψC arm of trnP and trnF (Fig. 2). The two rRNA genes rrnL and rrnS were separated by trnV. The length of the rrnL sequence was 1302 bp, the intergenic spacer between rrnL and trnV was 32 bp, and the A + T content was 77.8%. The length of the rrnS sequence was 789 bp, which overlapped with trnV by 1 bp, and the A + T content was 72.5% (Tables 1). The length of rrnL of T. sintenisi was the shortest among the five relative species and its A + T content was the lowest. The length of rrnS was the shortest in D.antiqua (784 bp), while the rrnS of T. sintenisi was only five bases longer than that of D. antiqua, and the A + T content of rrnS was still the lowest in T. sintenisi (Supplementary Table 2).

Control region
The control region was the main non-coding region in the mitogenome sequence of T. sintenisi. The control region of T. sintenisi was 959 bp, located between rrnS and trnI, with the A, T, C, and G base contents of 42.7%, 37.9%, 12.1%, and 7.3%, respectively, and the A + T content of 80.6%, thereby showing significant AT base bias (Tables 1). The control region included a total of nine (TA)n regions, of which five were (TA) 3 , two were (TA) 4 , one was (TA) 5 , and one was (TA) 6 . Moreover, there were six poly-T regions with a length of 5 bp or more, among which the longest was 17 bp. In addition, there were two repeat elements with a length of 13 bp at 14868 bp and 15037 bp, and an interval of 156 bp between the repeat elements. There were two repeat elements with a length of 17 bp at 15305 bp and 15357 bp, and an interval of 35 bp between the repeat elements (Fig. 3).
The control region length of these five related species ranged from 570 bp to 1266 bp, with that of T. sintenisi being intermediate. The A + T content in the control region ranged from 80.6-94.1%, and the A + T content in T. sintenisi was the lowest (Supplementary Table 2).

Phylogenetic relationships
In the traditional morphological classification, T. sintenisi is placed within Ulidiidae, but there are no reports on the complete mitogenome sequence of Ulidiidae species. Therefore, phylogenetic trees based on Maximum likelihood (ML) and Bayesian inference (BI) were constructed using complete mitogenome sequences of 48 species belonging to argonauts and other Diptera, with Apisdorsata (Hymenoptera) as the outgroup. The results showed that the topological structures of phylogenetic trees constructed with the two (atp8, nad6, and cob) showed slight A-bias, ranging from 0.008 to 0.024. The rest of the PCGs were T-skewed. There were four genes with G-bias, namely, nad5, nad4, nad4l, and nad1, which ranged from 0.301 to 0.361. The remaining PCGs showed C-bias (Supplementary Table 1). Compared with the mitogenomes of other related species in Diptera, the A + T content of the PCGs in T. sintenisi was the lowest at 66.3%; the A + T content in four other related species was more than 70%; and the highest A + T content was in allium fly (76.5%) (Supplementary Table 2).
MEGA-X software was used to analyze the relative synonymous codon usage (RSCU) and distribution of 13 PCGs in the mitogenome of T. sintenisi and its four related species (Supplementary Figs. 2 and 3). Among the five Diptera species, the most frequently employed amino acid was Ser2. The loss of synonymous codons was observed in the PCGs of A. funestus, D. antiqua, and C. capitata. The CUC encoding Leu1 was lost in all the three species, the CGC codon encoding Arg and the AGG encoding Ser1 was lost in both A. funestus and C. capitata. In addition, the CUG encoding Leu1 was lost in A. funestus and D. antiqua, and the CCG encoding Pro was lost in C. capitata. Based on these findings, it is evident that the lost codons always ended in C or G. Although no loss of synonymous codons was found in T. sintenisi and Bactrocera oleae, the content of the above codons was very low (Supplementary Fig. 2). Additionally, it is evident from Fig. 2 that the codons with high frequency mostly ended in A or U (the part shown in blue and green in the figure). Therefore, the use of synonymous codons confirmed the biased characteristics of A + T content in insect mitogenome. The distribution trend of the relative synonymous codons of 13 PCGs in T. sintenisi and its four related species was approximately the same ( Supplementary  Fig. 3). Although the mitochondrial PCGs of these five Diptera species had a preference in codon usage, the difference in preference was not significant, indicating that the mitogenome of Diptera species had a certain degree of conservation in the evolutionary process. This could indirectly reflect that more closely related species have more similar pattern of codon use.

Transfer RNA and ribosomal RNA genes
The 22 tRNA genes of the mitogenome had a sequence length of 62-72 bp. trnC was the shortest tRNA gene with 62 bp, while trnV was the longest with 72 bp. Among the 22 tRNA genes, 14 tRNA genes were located in the N-strand and 8 tRNA genes were located in the J-strand. Twentythree mismatched base pairs were identified in the tRNA genes, and these were all G-U pairs, comprising seven pairs located in the DHU arm of trnQ, trnY, trnG (2), trnH, trnP, and trnF, nine pairs located in the acceptor stem of trnC,

Discussion
The mitogenome sequence of T. sintenisi was consistent with that of D. antiqua, D.melanogaster, and B. oleae, which are other known Diptera species [4,31,32], and exhibited the typical structure observed in Diptera. In evolution, rearrangement in the mitogenome of Diptera is relatively rare [4], which indicates the evolutionary conservatism of the mitogenome in Diptera. In the mitogenome of T. sintenisi, all the 12 PCGs contained the typical start codon ATN, but that of cox1 was CGA ( Table 1). The start codon of cox1 is diverse in insects, it is a tetranucleotide (TTAG, ATAA, ATTA) [14,33,34], hexanucleotide (TATTAG, TTTTAG) [35], CGA [36], or TTG [37]. Although, it is commonly nearly identical among related insect groups, such as those of Lepidoptera [36] and Polyphaga [38], the present study found that start codon of cox1 in T. sintenisi was not consistent with that in the known Diptera species (Table 2). methods were basically the same, and the 48 Diptera species were grouped into three clusters (Aristocera, Nematocera, and Brachycera). Phylogenetic relationships were slightly different among the Aristocera. The phylogenetic relationship among Aristocera families constructed using ML method was: ((((T. sintenisi + Tephritidae) + Drosophilidae) + ((Calliphoridae + Anthomyiidae + Muscidae) + Agromyzidae)) + Syrphidae) (Fig. 4), and that constructed using BI method was: (((T. sintenisi + Tephritidae) + Drosophilidae) + (((Muscidae + (Calliphoridae + Anthomyiidae)) + Agromyzidae) + Syrphidae)) (Fig. 4). However, in both the phylogenetic trees, Ulidiidae and Tephritidae were clustered together, indicating that the two families have a close relationship.  [4]. The length of the control region in Liriomyza sativae is 741 bp, including poly-T and G(A) n T structures [40]. This could be attributed to the difference in the number of copies of tandem repeatsin the control region between related species [45].This feature of high A + T content and length difference in the control region is of great significance for the study of molecular evolution.
In the ML-and BI-based phylogenetic tree topologies, the 48 Diptera species were divided into three clusters, Aristocera, Nematocera, and Brachycera, and both topologies strongly supported the close relationship between T. sintenisi and Tephritidae (Fig. 4). This result is consistent with the traditional morphological taxonomy. However, to date, the complete mitogenome sequences of Ulidiidae have been limited; therefore, further mitogenome sequencing and in-depth analyses of Ulidiidae and other families is required in future studies to determine a more comprehensive evolutionary relationship among the families of Aristocera.
To the best of our knowledge, this is the first report of the complete mitochondrial genome in the family Ulidiidae. Comparative analysis showed that the gene size, gene order, base content, and base composition are comparatively conserved, similar to other dipteran mitochondrial genomes.
The cox2, nad4, and nad1 genes of T. sintenisi are terminated by a single T (Table 1). This partially coincides with other Diptera species ( Table 3), suggesting that the usage of incomplete codons as stop codons is a common phenomenon in the Diptera [11,14,41], and verifying the characteristics of A + T content bias in the stop codon. The incomplete codons may likely extend to TAA during the maturation of transcript, by the addition of a poly A tail during RNA processing, a phenomenon commonly observed in metazoan mitochondrial gene [14,44].
The location of the control region (A + T-rich region) in the mitogenome was relatively conserved but its length varied greatly from species to species. There were five common structures in the control region: a poly-thymidine stretch, a [TA(A)] n -like stretch, a stem and loop structure, a G(A) n T structure, and a G + A-rich sequence block [45]. In the present study, the length of the control region in T. sintenisi was 959 bp, and many (TA) n and poly-T structures were found in the control region. This was slightly different from that of some Diptera species, such as D. antiqua, whose control region is 1266 bp and comprised of three tandem replicates, namely (TA) n , and poly-T, which were speculated to be involved in the control of transcription or replication Fig. 3 Structure of the control region of the mitochondrial genome of Tetanops sintenisi. The (TA)nis marked with black box, the poly-T stretch is shown in red, and the repeat elements are shown in blue A + T content of the rRNAs was the lowest among the five related insect species. Two special structures were found in the control region, poly-T stretches and a (TA) n stretch, which are considered important elements related to replication and transcription. Both ML and BI analyses using nucleotide sequences of the whole mitogenome strongly suggest a closer relationship of T. sintenisi with Ulidiidae and Tephritidae. The whole mitogenome sequences have also been demonstrated as an effective method for resolving phylogenetic relationships [46]. The comparison of genomic features in related species should contribute to a comprehensive understanding of the evolutionary process of Diptera. The sequencing of the T. sintenisi mitogenome provides not only an important opportunity for ecological ATN is the initiation codon in all the 13 PCGs, except for cox1, which starts with CGA. All tRNAs have the typical cloverleaf structure, and mismatched base pairs, all of which were G-U pairs, were identified in some tRNA genes. The location of the two rRNAs were conservative and the  identification of Ulidiidae, but also valuable information for future evolutionary and molecular studies of Ulidiidae.

Conclusion
In our study, the mitochondrial genome (mitogenome) of Tetanops sintenisi is the only species in the family Ulidiidae, the whole genome has been sequenced and the genome structure and phylogenetic analysis have been conducted. The sequence of Tetanops sintenisi mitochondrial genes is consistent with the classical structure of Diptera mitochondria. The results of phylogenetic analysis showed that Tetanops sintenisi was more closely related to the family Tephritidae.