Structural features and phylogenetic implications of two new mitogenomes of Erythroneurine leafhoppers (Hemiptera: Cicadellidae: Typhlocybinae)

Background: Complete mitochondrial genome sequences facilitate species identification and analyses of phylogenetic relationships. However, the available data are limited for the diverse and widespread insect familiy Cicadellidae. In this study, we sequenced complete mitochondrial genomes of two species of Erythroneurini. Results: The mitogenomes of Empoascanara wengangensis and E. gracilis were 14,830 and 14,627 bp in length, respectively, and similar to other published leafhopper mitogenomes in terms of gene size, base composition, gene order, PCGs codon usage, and tRNA secondary structure. Most protein coding genes start with ATN and end with TAA or TAG. Two rRNA genes are highly conserved and encoded on the minority strand, and the AT content in 16S is higher than that of 12S . The control regions in the genus Empoascanara are highly variable and contain various numbers of repeat sequences. Conclusions: The mitogenomes of these two species closely resemble those of most other sequenced leafhoppers in various structural and compositional aspects. Phylogenetic analysis of 13 PCGs yielded a well-supported topology with most branches receiving maximum support and most relationships agreeing with those of other recent phylogenetic studies. Nucleotide diversity analysis show that nad4 and nad5 can be evaluated as potential DNA markers that define the Cicadellidae insect species. Like other studies, the main evolutionary event of leafhoppers occurred in the Tertiary, and its divergence time is estimated to be 10.03~122.48 Ma. This study confirms results of previous studies indicating that mitochondrial genome sequences are informative of leafhopper phylogeny.


Background
The insect mitochondrial genome (mtDNA) is the only extranuclear genetic information carrier in insects. It is usually a closed double-stranded DNA molecule with a measured molecular weight of about 15-20 kb. Usually, it contains 37 genes, including 13 protein coding genes (PCGs), NADH dehydrogenase 1-6 and 4L (nad1-6 and nad4L), cytochrome c oxidase subunits 1-3 (cox1-3), ATPase subunit 6 and 8 (atp6 and atp8), cytochrome b (cytb), two ribosomal RNAs genes (16s and 12s) and 22 transfer RNA (tRNA) genes. A region rich in A + T, the control region, is also present [1,2]. Due to the characteristics of simple structure, small molecular weight, stable composition, conservative arrangement, lack of recombination, maternal inheritance, relatively high evolutionary rate and easy detection, the mitochondrial genome has been widely used for species identification and population genetic research as well as in biogeography and phylogeny [3][4][5].
The hemipteran insect family Cicadellidae (leafhoppers) includes >2,600 genera and >22,000 species worldwide, including >2,000 species in China [6,7]. Erythroneurini, the largest tribe of the cicadellid subfamily Typhlocybinae, is widely distributed in the 6 major zoogeographic regions of the world and includes~2,000 species worldwide and >300 species in China [8,9]. All leafhoppers are phytophagous, different species feeding on a wide variety of plants, and the group includes important agricultural pests and vectors of plant pathogens [10][11][12].
Many previous attempts have been made to estimate phylogenetic relationships among leafhoppers mostly using either morphological data or sequence data from a few gene regions [13][14][15][16]. Recently partial genome sequences have improved resolution of relationships among major leafhopper lineages [17], but, despite these efforts, many relationships remain poorly resolved. Several recent attempts have also been made to use complete mitochondrial genomes to estimate phylogenetic relationshps among Cicadellidae. Until now, the National Center for Biotechnology Information (NCBI) included complete mitochondrial genome data for 51 leafhopper species, seven belonging to Erythroneurini, including four species of Empoascanara (Fig. 1). These datasets still represent only a tiny fraction of leafhopper diversity and mostly represent distantly related species. Therefore, to further enrich the available mitochondrial genome data for leafhoppers and provide comparative data for closely related species, we sequenced and analyzed the complete mitochondrial genomes of Empoascanara wengangensis (GenBank no. MT445764) and E. gracilis (GenBank no. MT576649), and analyzed their phylogenetic relationships to other Cicadellidae ( Table 1). The new molecular data obtained in this study will facilitate the identification of leafhopper species as well as future population genetic and evolutionary studies.

Genome Organization and Composition
Genome organization and nucleotide composition of the two new mitogenomes sequenced in this study are similar to those of other Erythroneurini reported previously [18][19][20]. The complete mitogenomes of E. wengangensis and E. gracilis are double-stranded plasmids with 14,830 and 14,627 bp respectively. Both of them contain the usual 13 PCGs, 22 tRNA genes, 2 rRNA genes, and a control region (Fig. 2). Fourteen genes encode in the minority strand (L-strand) while the others encode in the majority strand (H-strand). E. wengangensis has a total of 45bp intergenic space in 12 regions ranging from 1 to 8bp. Eleven genes were found to overlap by a total of 47bp. E. gracilis has a total of 84bp intergenic space in 14 regions ranging from 2 to 15bp, and 11 genes were found to overlap by a total of 32bp ( Table 2).  The AT contents and skew statistics are shown in Table 3. The mitochondrial genomes of E. wengangensis and E. gracilis exhibit heavy AT nucleotide bias, with A + T% for the whole sequence 76.6% and 77.0% respectively. Similar patterns of nucleotide composition are also found in other leafhopper species [24,33]. The control region (CR) has the strongest A + T% bias, while the PCGs shows the lowest A + T% among whole genes. The whole genome has positive AT-skews (0.015, 0.140) and negative GC-skews (-0.154, -0.157). Analysis of 37 individual genes of the two species show that AT-skews are mostly positive, while for GC-skews, the genes of E. wengangensis are mostly negative, but the genes of E. gracilis are mostly positive ( Fig. 3 and and supplementary table S1, Supplementary Material online). Positive AT-skews indicate that the content of base A is higher than that of base T. However, in a few genes, although the AT-skews are negative, the difference in absolute value was very small. For negative GC-skew a negative value indicates that the content of base G is lower than that of base C, while a positive value is the opposite. Overall, the base composition of these two species is skewed toward A and C. PCGs have the standard ATN as the start codon, while nad5 and atp8 genes have TTG, a pattern also observed in other leafhopper mitogenomes [18,19]. Conventional stop codons (TAA or TAG) appear in 11 PCGs, except that cox2 and nad5 use an incomplete codon (a single T--) as the stop codon (Table 3, 4). The relative synonymous codon usage (RSCU) was calculated and summarized in Table 4 and    Table 2). The average AT content values of tRNAs are 77.5% and 78.7% respectively, and the tRNA genes have negligible AT and GC-skews (Table 3). Compared to the ancestral insect mitochondrial gene order, no tRNA gene rearrangements were found. Such rearrangements have been reported in some Deltocephalinae but are, so far, unknown in other leafhoppers including Typhlocybinae [28]. All of the tRNA genes can be folded into typical clover-leaf secondary structure except for the trnS1 in both mitochondrial genomes, which lacks the dihydrouridine (DHU) stem and forms a simple loop. In the metazoan mt genome, lack of the DHU arm is very common in trnS1 [38]. Based on the secondary structure, a total of 20 and 21 G-U weak base pairs are found in E. wengangensis and E. gracilis of tRNAs respectively ( Fig. 5 and supplementary Fig. S1, Supplementary Material online), forming weak bonds and located in AA stems (11bp), T stems (3 and 2bp) and DHU stems (6 and 8 bp). Most mismatched nucleotides are G-U pairs, which form weak bonds in tRNA and non-classical pairs in tRNA secondary structure, similar to other Cicadellidae [26,32]. Leafhopper ribosomal RNA (rRNA) includes 16S RNA and 12S RNA. These two genes are highly conserved and are encoded on the minor strand (L-strand). Similar to other known insects, the content of A + T% in 16S is higher than that of 12S. The 16s genes of E. wengangensis and E. gracilis are 1188bp and 1202bp in length, with AT content of 81.90% and 82.90% respectively, and located between trnL2 and trnV. The 12S rRNA genes of both are 727bp and 735 bp in length, with AT contents of 79.80% and 80.30% respectively, and located after trnV. The rRNA genes showed a positive AT-skew and GC-skew (Table 3). These features are similar to those observed in other insects [34,39,40].

Control region
Like the typical insect mitochondrial genome, the mt genomes of E. wengangensis and E. gracilis have a large non-coding region, which was identified as the control region and located downstream of 12S. Control regions of both species are rich in AT, their lengths are 525bp and 212bp, and the AT contents are 83.1% and 91.1% respectively ( Table 3).
The control regions in the four available Empoascanara mitogenomes are various and not highly conserved, and their lengths range between 212 and 990 bp with variable numbers of repeat sequences (Fig. 6). No tandem repeat units were found in E. gracilis; E. sipra includes one type of repeat unit (R); two kinds of repeats (R1, R2) are found in E. dwalata and E. wengangensis with various lengths and copy numbers. At present, we are unable to find any correlation in repeating unit of different species, probably because of the limited number of species in this study. Further comparative study of additional leafhopper mitogenomes is needed. Fig. 6 Organization of the control region structure in the mitochondrial genomes of three Empoascanara species. R: repeat unit.
Nucleotide diversity analysis are primary for identifying the regions with large nucleotide divergence, especially useful for designing species-specific markers [42,43]. These are useful for taxa with highly variable morphological characteristics, especially Cicadellidae species. For a long time, gene cox1 has been considered as a universal barcode for identifying animal species [44], but in these 45 Cicadellidae species, it is the slowest evolving and least changing gene among 13 PCGs. If it is proved that the resolution of cox1 is very low in 13 PCGs, then other genes with sufficient large size, rapid evolution and high Ka/Ks ratio can be used as potential molecular markers in population genetics [43,45]. In this case, nad4 and nad5 can be evaluated as potential DNA markers that define the Cicadellidae insect species.

Phylogenetic relationships
The phylogenetic relationships were estimated based on the concatenated nucleotide sequences of 13 PCGs from the newly sequenced E. wengangensis and E. gracilis, combined with the mitogenome sequences of 43 previously sequenced Cicadellidae species and one outgroup ( Table 1). Maximum-Likelihood (ML) method was used with IQ-TREE using an ultrafast bootstrap approximation approach with 10,000 replicates [46]. The Bayesian (BI) analysis was performed using MrBayes 3.2.7, with the best fit model GTR + I + G [37]. BI and ML analyses generated the same tree topology and most relationships were highly supported (Fig.  9). A few branches pertaining to relationships within Deltocephalinae and among some subfamilies received lower support. As in other recent studies, most currently recognized leafhopper subfamilies were recovered as monophyletic but relationships among some subfamilies are not well resolved [17,21,32,47,48]. For example, Dietrich et al. (2017) recovered Eurymelinae (sensu lato), including Macropsini and Idiocerini, as monophyletic but our analysis placed the former tribe as sister to Megophthalinae and the latter as sister to Cicadellinae. Neither of these relationships received maximum branch support. E. wengangensis and E. gracilis grouped with the other seven included species of Typhlocybinae with strong support but Empoascanara was not recovered as monophyletic. Along with other recent studies, our results indicate that sequence data from leafhopper mitogenomes is informative of phylogenetic relationships at various levels in the taxonomic hierarchy of this group. Data are available for only a tiny fraction of species so addition of more species, representing other major lineages will be needed to determine the extent to which mitogenome sequence data can resolve leafhopper phylogeny.

Divergence time estimation
The phylogenetic age of leafhopper insects was analyzed, and the divergence time was estimated. The results are shown in Figure 10. Based on the fossil calibration scheme, the root node constrains the tree based on the oldest known leafhopper fossils, and uses the fossil data of the remaining four subfamilies to calibrate together ( This result is consistent with the findings of other scholars, and it is estimated that the main evolutionary events of leafhoppers all occurred in Tertiary [17,49]. Research based on evidence from fossil records believes that for herbivorous insects, Tertiary is the heyday of species evolution. Since late Cretaceous, the dominant angiosperms have become more prosperous, and the plant divisions are closer to modern. During this period, abundant food types provided good preconditions for the adaptation of leafhopper insects to the evolution of radiation [50,51].  Table 2.

Conclusion
This paper describes the complete mitochondrial genomes of E. wengangensis and E. gracilis, analyzes the basic composition, location, secondary structure and other characteristics of PCGs, tRNA genes, rRNA genes and control regions, and compares them to other published Cicadellidae mitochondrial genomes. The mitogenomes of these two species closely resemble those of most other sequenced leafhoppers in various structural and compositional aspects. Phylogenetic analysis of nucleotide sequences of 13 PCGs yielded a well-supported topology with most branches receiving maximum support and most relationships agreeing with those of other recent phylogenetic studies. A few areas of conflict were limited to parts of the tree with lower branch support. Nucleotide diversity analysis show that nad4 and nad5 can be evaluated as potential DNA markers that defining the Cicadellidae insect species. Like other studies, the main evolutionary events of leafhoppers occurred in Tertiary, and its divergence time is estimated to be 10.03~122. 48 Ma. This study confirms results of previous studies indicating that mitochondrial genome sequences are informative of leafhopper phylogeny. The new data provided here will facilitate future comparative studies of leafhopper mitogenomes but also accentuate the need for more comparative data.

Sample collection and DNA extraction
Samples of E. wengangensis and E. gracilis were collected from Duyun and Anshun, Guizhou province, China in 2018 and 2019. The whole body was preserved in absolute ethanol and then stored at -20℃ in the laboratory. After morphological identification, voucher specimens with male genitalia prepared were deposited in the insect specimen room of Guizhou Normal University. Total DNA was extracted from the entire body without the abdomen. Genome Sequencing, Assembly, and Annotation The mitochondrial gene sequence was obtained by sequencing. Primers were designed to amplify the mtDNA sequence in PCR reactions. The PCR reaction was performed using the LA Taq polymerase. The PCR conditions were as follows: initial denaturation 94°C for 2 min, then 35 cycles of denaturation at 94°C for 30 s, annealing at 55°C for 30 s, and extension at 72°C for 1 min/kb, followed by the final extension at 72°C for 10 min. The PCR products were sequenced directly, or if needed first cloned into a pMD18-T vector (Takara, JAP) and then sequenced, by the dideoxynucleotide procedure, using an ABI 3730 automatic sequencer (Sanger sequencing) using the same set of primers. After quality-proofing of the obtained fragments, the complete mt genome sequence was assembled manually using DNAStar [52], and homology search was performed by the Blast function in NCBI to verify the amplified sequence as the target sequence [53,54]. The nucleotide base composition, codon usage and A + T content values were analyzed with MEGA 6.06 [55]. The secondary structure of tRNA genes were annotated using online tools tRNAscan-SE 1.21 [56] and ARWEN [57]. The tandem repeat sequence in the control area was determined by the online search tool Tandem Repeats Finder [58]. The base skew values for a given strand were calculated using the formulae [59]:  [60]. And the ratio between the non-synonymous (Ka) and the synonymous substitution rate (Ks) of 13 PCGs was also estimated in DnaSP 5.0 . Phylogenetic Analysis Phylogenetic analysis was conducted using a dataset including the complete mitochondrial genomes of the two newly sequenced erythroneurine species plus 43 Cicadellidae species from GenBank, and one outgroup of Cicadoidea (Table 1). The Gblocks Server online platform was used to eliminate poorly aligned positions and divergent regions of DNA protein alignment, and all alignments were checked and corrected in MEGA 6.06 [55] prior to phylogenetic analysis. The trimmed alignment (PCG13 matrix, a total of 9963 bp) was used to estimate the phylogeny by Bayesian inference (BI) using MrBayes 3.2.7 [61] and maximum likelihood (ML) using IQ-TREE [46]. BI selected GTR + I + G as the optimal model, running 10 million generations twice, sampling once every 1000 generations, after the average standard deviation of the segmentation frequency drops below 0.01, with the first 25% of the samples are discarded burn-in, and the remaining trees used to generate a consensus tree and calculate the posterior probability (PP) of each branch. ML constructed with the IQ-TREE used an ultrafast bootstrap approximation approach with 10,000 replicates and calculate bootstrap scores for each node (BP).

Divergence Time Estimation
In the BEAST 2.6.3 software [62], the Bayesian relaxed clock uncorrelated lognormal method was used to construct the timetable of 10 subfamily differentiation events in the phylogenetic process of the leafhopper subfamily. Nine fossil records were selected as reference points during the analysis (Table 5) [17]. Under the software of BEAST 2.6.3, run the subroutine BEAUti, input the data file (nex format), based on the Bayesian tree topology, set substitution model: GTR; base freequences: estimated; sit heterogeneity mode: gamma + invariant sites; number of Gamma categores: 4; Partition into codon postion: off (non-partitioned); clock mode: lognormal relaxed clock (uncorrelated); length of chain: 100000000, generate an xml file, and import it into BEAST 2.6.3 to run. The results were tested by Tracer 1.7.1 [63] and the ESS (effective sample size) value must be more than 200, which is considered reliable. Then using software Tree Annotator 2.6.3 to generate the maximum confidence tree and open it with the Figtree 1.4.3 [64] to view the analysis results.

Supplementary information
Supplementary information accompanies this paper at ......

Additional file 1 Electronic Supplementary Materials.
Supplementary materials (pdf format), contain Supplementary Figure S1, and Supplementary table S1. Supplementary Figure S1: additional information additional data with legends to support the manuscript.
Supplementary table S1: additional information additional data with legends to support the manuscript.