Complete Mitochondrial Genomes of Three Nuthatches From the Genus Sitta (Aves: Passeriformes: Sittidae) and Mitogenome-Based Phylogenetic Analysis

Nuthatches (genus Sitta) comprise a group of Passeriformes. With the publication of more mitochondrial genome data, there has been considerable focus on the taxonomic status of the nuthatches. To understand the phylogenetic position of Sitta and phylogenetic relations within this genus, we sequenced and analyzed the complete mitochondrial genomes of three species, S. himalayensis, S. nagaensis and S. yunnanensis, making this the rst account of complete mitochondrial genomes (mitogenomes) for this genus. The mitochondrial genomes of three Sitta species are 16,822-16,830 bp in length and consisted of 37 genes and a control region. This study recovered the same gene arrangement found in the mitogenomes of Gallus gallus, which is considered the typical ancestral avian gene order. All tRNAs were predicted to form the typical cloverleaf secondary structures. Bayesian inference and maximum likelihood phylogenetic analyses of sequences of 18 species obtained a well-supported topology. The family Sittidae is the sister-group of Troglodytidae, and the genus Sitta can be divided into 3 major clades. We demonstrated the phylogenetic relationships within genus Sitta (S. carolinensis + (S. villosa + S. yunnanensis + (S. himalayensis + (S. europaea + S. nagaensis)))). strict maternal to the himalayensis posterior and


Genome organization
The complete mitogenomes of the three newly sequenced Sitta species are very similar to each other. All three mitogenomes are closed circular molecules ranging from 16,822 to 16,830 bp in length, consisting of 13 PCGs, 22 tRNAs, 2 rRNAs and a control region (Fig. 1). The gene order of mitogenomes of all three species analyzed are highly conserved (Fig. 1), which is also identical to the gene order found in the mitogenome of Gallus gallus [16]. For the whole mitogenomes of the three species, one PCG (nad6) and eight tRNAs (trnQ, trnA, trnN, trnC, trnY, trnS2(UCN), trnP, trnE) are encoded by the L-strand, while all the other genes are encoded by the H-strand. The comparison of the three Sitta species discovered the longest overlap (10 bp) between atp8 and atp6. The longest intergenic spacer (21 bp) is located between trnV and rrnL in mitogenome of S. himalayensis (Table 1). Similar to other typical vertebrates [17], the mitogenomes of the three Sitta species show a signi cant bias towards A and T, with the nucleotide composition of A and T ranging from 53.1-55.7%. The AT-skew and the GC-skew of the whole mitogenomes of three Sitta species are 0.10 to 0.13 and -0.39 to -0.35, respectively (Table 2).  Protein-coding genes and codon usage In the three Sitta species, the scope of A + T content in PCGs is between 52.0% and 55.2% ( Table 2). The start codons and stop codons of the 13 PCGs are mostly the same among the three species. cox1 uses GTG as the start codon, while the rest twelve PCGs initiate strictly with the standard start codon ATG (Table 1) [20]. The A + T content in PCGs of S. yunnanensis is slightly higher than that of the other two species, and the use of NNA and NNU codons is also more common in S. yunnanensis.

The tRNA genes and rRNA genes
The 22 tRNAs of the three Sitta species are typical and include all 20 types of amino acids, ranging from 66 to 75 bp in size. And the total length of the 22 mitogenome tRNAs of S. nagaensis is 1542 bp, which is the same as that of S. yunnanensis and only one-base different from that of S. himalayensis. The A + T content of the total mitogenome tRNAs of S. nagaensis is 58.0%, which is lower than that of S. himalayensis (58.3%) and S. yunnanensis (58.3%) ( Table 2). The tRNAs of three Sitta species were all predicted to fold into typical cloverleaf secondary structures. Furthermore, mismatched base pairs were identi ed in the stems of 22 different tRNAs, most of which were G-U pairs.
In the mitogenome of the three Sitta species, the 16S rRNA is located between trnV and trnL2(UUR), ranging from 1575 to 1592 bp in length, while the 12S rRNA is located between trnF and trnV, ranging from 977 to 980 bp. The longest 16S rRNA was found in S. nagaensis and the shortest in S. himalayensis, while the longest 12S rRNA was discovered in S. yunnanensis and the shortest in S. himalayensis. The A + T contents of 16S rRNA and 12S rRNA range from 55.6-56.4% and from 51.2-52.2% respectively ( Table 2).

The control region
The control region of the three species is located between trnE and trnF genes (Fig. 1). The size of control region of S. yunnanensis is 975 bp, which is longer than that of S. himalayensis (945 bp) and S. nagaensis (945 bp). The A + T content of the control region ranges from 53.3-55.5% ( Table 2). The AT-skew is -0.15 to -0.12 and the GC-skew is -0.22, and the A + C content is higher than the T + G content. In this study, we analyzed the control region of three Sitta species and the predicted structures are shown in Figure 3. The entire control region contains three structural domains, namely Domain , Domain and Domain . Domain is relatively conservative, while Domain and Domain are heterogeneous across species in terms of nucleotide composition and size [21].
Domain includes extended termination-associated sequences such as ETAS1 and ETAS2 and CSB1-like sequences. Domain is the central conserved domain in the control region, including six conserved sequence blocks (F-box, E-box, D-box, C-box, b-box and B-box). Domain includes CSB1 sequence and light/heavy strand promoter (LSP/HSP), which are located at 911-929 bp in the control region ( Figure 3).

Phylogenetic Analyses
Based on the concatenated nucleotide sequences of 13 PCGs, the phylogenetic analyses of 18 Passeriformes mitogenome sequences were performed, with one of the Regulus species as the outgroup (Table 3). BI and ML analyses generated similar tree topologies, so the topology of the BI tree is shown (Fig. 4).
The results of this study indicate that Sittidae is closely related to Troglodytidae (1.00 posterior probability and 95% bootstrap value). Muscicapidae and Turdidae were herein corroborated to be sister groups (1.00 posterior probability and 100% bootstrap value). These results are consistent with the work of Barker on the sister groups of Sittidae and Troglodytidae [14]. In the genus Sitta, S. nagaensis and S. europaea were found to be the sister to the S. himalayensis (1.00 posterior probability and 100% bootstrap value). S. villosa is the sister to S. yunnanensis (1.00 posterior probability and 100% bootstrap value). All datasets supported a monophyletic clade of S. carolinensis, which was placed at the basal position of the genus Sitta (1.00 posterior probability and 100% bootstrap value). These results are generally identical to the previous study conducted by Pasquet et al. [30]. Currently, published mitochondrial genome data of Sitta species are very limited, so mitochondrial genomes of more Sitta species should be sequenced to better elucidate these phylogenetic relationships. Genome Sequencing, Assembly and Annotation

Samples and DNA Extraction
As described in previous studies, the mitogenomes were ampli ed and sequenced [31,32,33]. All products of this study were sequenced by Shanghai Personal Biotechnology Co., Ltd (Shanghai, China). The complete mitogenomes of S. himalayensis, S. nagaensis and S. yunnanensis have been deposited in GenBank (Accession Numbers: MK343426, MK343427 and MN052793). Sequences were checked and assembled with SeqMan program of DNASTAR software [31]. Two rRNAs and all PCGs were identi ed by BLAST searches in NCBI (Available online: http://www.ncbi.nlm.nih.gov), and then con rmed by alignment with homologous genes from other published Sitta mitogenomes. The mitogenomic map was depicted with OGDRAW (https://chlorobox.mpimpgolm.mpg.de/OGDraw.html) [34]. . The Tandem Repeats Finder program (Available online: http://tandem.bu.edu/trf/trf.advanced.submit.html) was used to analyze the tandem repeats of the putative control region [39]. Moreover, genome organization, base composition, intergenic spacers, overlapping regions, codon usage, PCGs, tRNAs, rRNAs and control region of the mitochondrial genomes of three Sitta species were compared.

Phylogenetic Analyses
The sequences of 15 published mitochondrial genomes were obtained from NCBI. These sequences, along with three new mitogenome sequences obtained in this study, were used to reconstruct the phylogenetic relationships within the genus Sitta, with Regulus regulus (Accession No. NC_029837) serving as an outgroup [14]. The sequence information is listed in Table 3.
The mitogenome sequences of the 13 PCGs were aligned using Clustal X in MEGA v.7.0 with the default parameters [40]. The length of the nal alignment was 11,380 nucleotides. The substituted saturation of nucleotide sequences was analyzed with DAMBE 5.2.63 [41]. If Iss was signi cantly lower than Iss.c (p < 0.05), the whole PCG nucleotide sequences entered the next step. The Bayesian information criterion (BIC) in jModelTest v.0.1.1 [42] was used to determine the optimal nucleotide substitution model, which was GTR+G+I. Bayesian inferences (BI) and maximum likelihood (ML) analyses were performed using MrBayes v.3.2.1 [43,44] and RAxMLGUI v.1.5b3 [45], respectively. BI analyses initiated from a random tree, with four Markov chains running simultaneously for 200,000 generations, sampling every 100 generations and discarding the rst 25% as burn-in. The average standard deviation of split frequencies was set below 0.01 to ensure that stationarity was reached.
[46]. The con dence values of the BI tree were shown as Bayesian posterior probabilities. In ML analyses, a total of 1000 replicates were performed with the GTR+GAMMA substitution model. Finally, FigTree v.1.2.2 was used to visualize the phylogenetic trees [47].

Declarations
Ethics approval and consent to participate All samples collected in this study were non-invasive sampling. All animal experiments were approved by the Academic Committee of Southwest Forestry University, which includes some regulations on animal ethics, animal welfare, and wildlife conservation. And all methods were performed in accordance with the the Guidelines for the ethical review of laboratory animal welfare People's Republic of China National Standard (GB/T 35892-2018) and the study complies with the ARRIVE guidelines (https://arriveguidelines.org).

Consent for publication
Not applicable.
Availability of data and materials The newly described mitogenome sequences have been deposited in the NCBI database under the Accession numbers MK343426, MK343427 and MN052793. The datasets generated and/or analysed during the current study are available in the NCBI (https://www.ncbi.nlm.nih.gov/).

Competing interests
The authors declare no con ict of interest.  Predicted structural elements in the control region of three Sitta species. Extended termination-associated sequences are indicated by orange boxes, and conserved sequence blocks are indicated by blue boxes.