Assembly and characterization analysis of the complete mitochondrial genome of Lithocarpus litseifolius (Hance) Chun

15 Lithocarpus litseifolius (Hance) Chun ( L. litseifolius 1837) is an evergreen tree of Fagaceae, commonly 16 known as sweet tea. L. litseifolius is a natural sweetener with high levels of dihydrochalcone. In addition, 17 L. litseifolius is a precious medicinal material, its phlorizin has a unique role in the treatment of diabetes. 18 This investigation aimed to assemble and scrutinize the entire mitochondrial (mt) genome of L. litseifolius . 19 The circular mt genome of L. litseifolius spans 573,177 bp and has a GC content of 45.61%. The mt 20 genome of L. litseifolius comprises 61 genes, of which 21 are tRNA genes, 3 are rRNA genes, 36 are 21 protein-coding genes (PCGs), and 1 is a pseudogene. Tetramer repeats made up 32.57% of all identified 22 simple repeat sequences (SSRs), making them the most abundant type of SSR. 35 PCGs with a combined 23 length of 32,208 bp were predicted to include a total of 461 RNA editing sites in the L. litseifolius mt 24 genome. Besides, nine homologous genes between the chloroplast and mt genomes of L. litseifolius were 25 identified. Furthermore, our findings demonstrated that while plant mt genome sizes vary considerably, 26 the GC content of these genomes has remained largely constant. Seven genes were found to be associated 27 with conservatism: atp 6, rps 1, ccm C, rpl 2, nad 4, nad 7, and trn Y-GTA. The phylogenetic analysis 28 confirmed that L. litseifolius was genetically more clustered with Quercus variabilis . This study 29 establishes the groundwork for investigations on the systematic evolution, genetic variability, and 30 breeding of L. litseifolius .


Introduction
Lithocarpus litseifolius (Hance) Chun (L.litseifolius 1837) (Supplementary Figure 1), an evergreen plant of Fagaceae, commonly referred to as "sweet tea".It has a long history of being consumed as a daily beverage for thousands (Cheng et al. 2016).L. litseifolius is a good source of natural sweetener compounds because it contains a large amount of dihydrochalcone (Cheng et al. 2018).Additionally, the medicinal values of L. litseifolius have also been identified in recent years, encompassing antioxidant (Shang et al. 2020), antidiabetic (Wang et al. 2016), anti-inflammatory (Gao et al. 2018), and hepatoprotective effects (Li et al. 2013).L. litseifolius is presently regarded as a special natural asset possessing noteworthy commercial, medicinal, and breeding worth.
The primary role of mitochondria within living cells by transforming biomass energy into chemical energy (Cheng et al. 2021).In addition, mitochondria participate in a multitude of life activities, including intracellular growth, division, differentiation, and apoptosis (Lu et al. 2022;Qiao et al. 2022).
Mitochondria are believed to have originated from ancient endosymbiotic events and possess relatively autonomous genetic material (Qiao et al. 2022).In the majority of seed plants, mitochondrial DNA is inherited maternally, thereby eliminating the effects of patrilineal inheritance (Ye et al. 2022;Ma et al. 2022).Plant mt genomes exhibit numerous distinctive characteristics in comparison to the relatively conserved and compact mt genomes of animals, particularly a wide range of genome sizes and intricate genomic arrangements (Wu et al. 2022).Currently, there is a significant variation in the dimensions of completely sequenced plant mt genomes.This variation covers a span of 100 times, with variation extending from 66 kb in Viscum scurruloideum (Skippington et al. 2015) to 11,000 kb in Silene conica (Sloan et al. 2012).Except for the typical circular structure, certain plant species have linear, and even multichromosomal architectures (Daniel et al. 2013).The factors underlying the size variability of plant mt genomes are different.The principal drivers of significant size differences are attributed to the amplification of repetitive elements, the acquisition of exogenous DNA, and the alteration of large intragenic segments (Wu et al. 2022).plant mitogenomes often use simple sequence repeats (SSRs) as molecular markers to identify species (Bi et al. 2020;Ma et al. 2017).Furthermore, indels and single nucleotide polymorphisms (SNPs) occurring in mitogenomes have been utilized as effective tools for rapid species differentiation and phylogenetic analyses (Seok et al. 2019;Mwamuye et al. 2020).Plant mt genomes are often large but have relatively small gene pools, including 24 core genes and 17 variant genes.The observed phenomenon can be explained by the disappearance or relocation of many genes to the nucleus throughout the evolution of angiosperms.However, the coding sequences of the remaining genes have remained highly stable (Li et al. 2023).Consequently, the mt genome holds significant importance in the fields of species identification, classification, and evolution.
The intricate characteristics of plant mt genomes, specifically, the abundance of repetitive sequences and frequent recombination events, pose challenges to achieving precise genome assembly (Wu et al. 2022), as a result, the assembly of plant mt genomes lags significantly behind that of animal mt genomes and plant plastid genomes (Lai et al. 2022;Hong et al. 2021).The read lengths of Illumina sequencing often fail to cover larger repetitive regions, resulting in the incomplete assembly of these areas (Wu et al. 2022).Third-generation sequencing technologies (TGS), like Oxford Nanopore and PacBio sequencing, produce individual reads ranging from 10 to 100 kb or even longer (Michael et al. 2020), which holds promise for enhancing coverage and enabling scaffolding across previously uncharted genomic areas (Li et al. 2023).Because nuclear and chloroplast DNA substantially contaminate the mt DNA readings, employing whole genomic DNA still remains problematic (Sloan and Wu. 2014).There is yet no convincing method that can be used to examine the whole structural information of plant mt genomes, assembling plant mt genomes is still a very challenging task.It was recently revealed that the pipeline SAG-BAC used a graph-based approach to construct plant mt genomes (Fischer et al. 2022).However, these algorithms still have a propensity to produce one or more circular sequences, making it impossible to confirm the mitogenome's whole structural information (He et al. 2023).Graph-based sequence assembly toolkit (GSAT) was recently created with the purpose of assembling and generating superior mt master graphs (MMGs) that accurately depict the diverse range of structural conformations observed in plant mt genomes.By utilizing high-throughput sequencing reads, both short and long, a well-supported MMG for two plant species served as models: rice (Oryza sativa) and thale cress (Arabidopsis thaliana) (He et al. 2023).The continuous progress in technology has led to a growing number of published plant mt genomes.
There are currently no reports on the L. litseifolius mt genome.Herein, the entire mt genome of L. litseifolius was sequenced and assembled, and the repeated sequences, codon preference, RNA editing, gene movement between chloroplast and mt genomes, and phylogenetic relationships were conducted through analysis.It is anticipated that our results will serve as a theoretical basis for biological research and species identification, and will be crucial for ascertaining the provenance and evolutionary affiliations of various species.

Plant materials, DNA extraction and genome sequencing
L. litseifolius plant was provided by Hunan Yaocha Engineering Research Center (Huaihua, China; 25°52 N, 111°08 E), the plant was placed in obscurity for 14 days in advance to get etiolated L. litseifolius seedlings.The leaf materials were collected to extract DNA by the hipure universal DNA kit (DP305-03, Tiangen Biotech, Beijing, China).Qubit 3.0 (Q33216, Invitrogen, Singapore) accurately quantified the concentration of DNA, in which DNA samples with a content of more than 1.5 micrograms were used for sequencing.The library was sequenced and constructed using the Oxford Nanopore PromethION sequencing device and Illumina Novaseq 6000 platform.Using Fastp (v0.20.0, https://github.com/OpenGene/fastp), the raw data were filtered to remove sequencing junctions and primer sequences, Sequence data from three separate sources were counted and filtered using the Perl script filling (v0.2.1, https://link.zhihu.com/?target=https%3A//github.com/rrwick/ Filtlong).

Mitogenome assembly and annotation
The initial third-generation data were compared with the reference gene sequence (plant mt core gene) using the third-generation comparison software Minimap2 (v2.1) (Li 2018), the sequences with longer alignment lengths than 50 bp were selected as candidate sequences in the alignment.The sequence with the higher alignment quality (covering more complete core genes) and more aligned genes (one sequencing sequence encompasses multiple core genes) were chosen as the seed sequence.The third-generation assembly program canu (v1.4) was used to correct the third-generation data that were acquired (Koren et al. 2017), and bowtie2 (v2.3.5.1) was used to align the second-generation data to the corrected sequence.Using the default Unicycler (v0.4.8) parameter, the above-mentioned data of the second generation and the revised data of the third generation were then compared for concatenation.
Finally, the L. litseifolius mt genome was obtained.
The shortest length was fixed to 102 bp when ORF was annotated using Open Reading Frame Finder (http://www.ncbi.nlm.nih.gov/gorf/gorf.html),and redundant sequences and sequences that overlapped with known genes were ignored.Over 300 bp long sequence alignments are annotated in the NR collection.The aforementioned results were verified and manually revised to produce more precise annotation results.Then, using OGDRAW (https://chlorobox.mpimp-golm.mpg.de/OGDraw.html), the mt genome was mapped, and the complete mt genome sequences of L. litseifolius were registered into NCBI with an accession number ON462106.

Analysis of homologous fragments of mitochondria and chloroplasts, and RNA editing analyses
From the NCBI Genome Resources Database, the chloroplast genome of L. litseifolius (OM048987) was obtained.The blast software (v2.6, https://blast.ncbi.nlm.nih.gov/Blast.Cgi) was employed with optimized parameters to achieve a 70% success rate of pairing, and the homologous genes of chloroplasts and mt genome were found.The Plant Predictive RNA Editor tool (http://prep.unl.edu/) was used to forecast the RNA editing events.

Genomic comparison of related species
The mt genome of L. litseifolius was compared to three other mt genomes of four Fagaceae plant species: Q. acuissima (MZ636519), Q. variabilis (MN199236), and F. sylvatica (NC_050960).To find covariance regions, whole-genome covariance comparisons were made with the Lastz (v1.02.00).The following guidelines were established: the step is 20, and the seed pattern is 12 out of 19.

Genomic features of the L. litseifolius mt genome
Table S1 displays the output of the second-generation and third-generation sequencing platforms.The former provided 24,930,314 clean reads, whereas the latter delivered 1,109,068 clean reads.

Repeat sequences anaysis of L. litseifolius mt genome
Simple repeat sequences (SSRs), refer to a segment of DNA in the genome that is repeated numerous times and has a length ranging from 1-6 bp (Li et al. 2021).As shown in Table 3, 175 SSRs were identified in the L. litseifolius mt genome, the detected SSRs included mononucleotide, dinucleotide, trinucleotide, tetranucleotide, and pentanucleotide repeats.Tetra and pentamer repeats were the most abundant and the least SSR type, constituting 32.57% and 5.14% of all identified SSRs, respectively.There were 51 monomer repeats (29.14%), 40 dimer repeats (22.86%) and 18 trimer repeats (10.29%) in all identified SSRs.The most frequently occurring motifs were A/T motifs (38), which made up 21.71% of all SSRs that were identified.
Tandem repeat sequences are characterized by the repetition of contiguous short sequences with a length of 1 to 200 bases (Paco et al. 2019).In the L. litseifolius mt genome, 8 tandem repeats were identified, with lengths varied between 13 and 21 bp and a matching degree of more than 95%, as presented in Table S2.As shown in Fig. 2, in the mt genome of L. litseifolius, 515 interspersed repeats, each with a length of 30 bp or more, were identified.Among these repeats, 271 (52.62%) were found to be in the forward orientation, while 244 (47.38%) exhibited palindromic characteristics.
A majority of interspersed repeats (423 repeats, 82.14%) were 30-50 bp in length.The maximum length of a palindromic repeat was observed to be 118.1 bp, whereas the longest forward repeat measured 456.0 bp.

The prediction of RNA editing
In the mt genome of L. litseifolius, a sum of 461 RNA editing sites within 35 PCGs was predicted, involving 30 codon transitions (Fig. 3).Among them, the predicted values of editing sites of atp1, atp8, rps19, sdh3, and sdh4 were the lowest (3), while those of ccmFn and nad4 were the highest (38).The editing sites of 30.59% (141) and 66.38% (306) were located in the first and second positions of the triplet code, respectively.And in one particular editing instance, the first and second locations of the triplet codes were changed, causing the original proline (CCT) amino acid to be replaced with lysine (TTT).The hydrophobicity of 43.82% of amino acids was found to be unaltered after RNA editing, nevertheless, it was expected that 8.03% of the amino acids would turn hydrophilic while 47.72% of them would become hydrophobic (Table 4).In addition to changing the encoded amino acids, RNA editing can also cause early termination of the coding procedure.The above phenomenon occurred in coding genes ccmFc and atp6 in the L. litseifolius mt genome.The predicted outcomes also demonstrate that after RNA editing, 45.99% (212 sites) of the amino acids underwent conversion to leucine, exhibiting the highest conversion tendency, additionally, 22.99% (106 sites) of the amino acids underwent conversion to phenylalanine, with the second-highest number.

Codon usage analysis of PCGS
The total length of L. litseifolius's PCGs is 32,208 bp.ATG served as the start codon for all PCGs, while three sorts of stop codons were recognized, namely, TGA (36.11%),TAG (19.44%), and TAA (44.45%) (Table S3).The condon utilization examination uncovered that the most frequently occurring amino acids were Leucine (Leu) (10.58%-11.16%),Serine (Ser) (8.67%-9.32%),and Isoleucine (Ile) (7.63%-7.70%)(Fig. 4).In contrast, Tryptophan (Trp) (1.45%-1.63%)and Cysteine (Cys) (1.46%-1.54%)were identified only frequently.Additionally, we analyzed 10,226 codons out of 36 PCGs of the L. litseifolius mt genome, and 6717 codons (65.69%) had RSCU values over 1.0, which shows that these codons are utilized more frequently than other synonymous codons (Table S4).Four mt genomes from Fagaceae plants contain 5768 and 6717 codons exhibiting an RSCU value exceeding 1.0.The condons GCU (Ala) and UAU (Tyr) were found to be the most prevalent among the four species, with an RSCU value greater than 1.5, whereas CUG (Met) and UUG (Met) were the least codons (RSCU<0.5).According to Table S5, there were 10,736 codons in all of the coding genes.Furthermore, the average GC content of the three-base codons in the L. litseifolius mt genome was lower than 50%, showing that the genome's codons were biased because both A and T bases were used.The ENC values of the mt genomes of four Fagaceae plants were greater than 35, the distribution range of GC3s is between 0.297 and 0.597, and ENC values are between 36.90 and 58.44.The results showed that the impact of mutation pressure on the codon usage preference of the mt genome in four species of Fagaceae species was relatively low.Most of the scattered points of the mitochondrial coding genes of the four Fagaceae plants deviated from the standard curve, indicating that the impact of selection pressure on codon preference is more important (Fig. 5).

Chloroplast to Mitochondrion DNA Transformation
As shown in Table 5 and Fig. 6, a total of 28 homologous fragments, spanning a combined length of 19,904 bp, were identified, making up 2.08% of the L. litseifolius mt genome.Four of them were longer than 1,000 bp, and fragments 1 and 2 were the longest, both of which were 6,080 bp.The following 9 mt genes were detected: trnI-GAT, trnV-GAC, trnP-TGG, trnW-CCA, trnH-GTG, trnN-GTT, trnD-GTC, trnM-CAT, and rrn18.

Analysis of synonymous and nonsynonymous replacement rates
The Ka/Ks ratio is very important in evolutionary analysis as it serves as a determinant of whether the PCG is subject to selection pressure.It is generally believed that a Ka/Ks ratio higher than one has a forward selection effect, a ratio of Ka/Ks equal to one has a neutral selection effect, and a ratio of Ka/Ks less than one has an effect of purification and selection.In this investigation, as depicted in Fig. 7, the Ka/Ks ratio of 35 PCGs in the mt genomes of L. litseifolius, Q. acuissima, Q. variabilis, F. sylvatica, and A. thaliana was analyzed.The Ka/Ks ratio of 26 PCGs between L. litseifolius and Q. variabilis (MN199236) was zero.For the 25 shared PCGs between L. litseifolius and Q. acuissima, the Ka/Ks ratio was also zero.The above results show that the PCGs shared between L. litseifolius and Q. variabilis, Q. acuissima were close homologs.Additionally, the observation that the vast majority of Ka/Ks values were below 1.0 implies that most PCGs underwent stabilizing selection during their evolutionary history.In contrast, the Ka/Ks proportions of the three genes (atp4, ccmB, and nad1) were more prominent than 1.0, indicating that these genes had encountered positive choices all through advancement.

Comparison of the mt genome size and GC content between L. litseifolius and other species
The size and GC content of plant mt genome are their main features.In this investigation, we performed a comparative analysis of the mt genome size and GC content among 19 plant species, including 3 species each from Fagaceae, Asteraceae, Cruciferae, and Fabaceae families, 2 species from Theaceae family, 3 species from Poaceae family, and 1 species each from Grimmiaceae and Ginkgoaceae families, with that of the L. litseifolius mt genome (Table S6).These plant mt genomes range widely in size, from 107,186 bp (Racomitrium emersum) to 1,081,966 bp (Camellia duntsa), while the distinction in GC content of mt genomes was somewhat little (typically about 45%) (Fig. 8).

Mitochondrial genome comparison in Four Fagaceae plants
The mt genome of L. litseifolius was utilized as a reference to assess the results of a whole-genome correlation analysis conducted on four distinct Fagaceae plants, as depicted in Fig. 9.The mt genome's gene region of the related species showed a greater degree of resemblance to that of L. litseifolius, the conserved genes were atp6, rps1, ccmC, rpl2, nad4, nad7, and trnY-GTA (Table S7).The mt genomes of L. litseifolius and Q. variabilis were more similar to one another than those of the other two Fagaceae species.

Duplication and loss of mt genome in Fagaceae
With rapidly evolving sequencing technology, the complete plant mt genome has been assembled and reported, making it possible to compare mt genome characteristics among different plant species.Four mt genomes from the family of Fagaceae can now be accessed: L. litseifolius, Q. acuissima, Q. variabilis, and F. sylvatica (Table S8).There are a total of 36, 32, 36, and 36 PCGs in each of these four mt genomes, respectively.Different species have shown both gene duplication and gene loss.For example, atp1 was replicated in the L. litseifolius mt genome, atp9 was replicated in the Q. acuissima mt genome, and atp1 was replicated in the Q. variabilis.The rps14 and rps3 were lost in the L. litseifolius mt genome, rps14 and rps4 from Q. acuissima being lost, mttB, nad1, nad4L, rps16, rps1, rps14, rps4, from Q. variabilis being lost, rpl2 and rps10 from F. sylvatica being lost.

Phylogenetic analysis
The L. litseifolius mt genome and the other 19 plant mt genomes were downloaded from GenBank to conduct phylogenetic trees by utilizing a fusion of the maximum likelihood and bayesian methodologies (Table S6).The result showed that Poaceae, Theaceae, Asteraceae, Fagaceae, Cruciferae, and Fabaceae were well-clustered (Fig. 10).Both analytical methodologies yield consistent outcomes regarding the phylogenetic tree, and the clustering of the phylogenetic tree conforms to the family and genus affiliations of these species, proving the validity of the genome-based clustering findings.L. litseifolius, which is closely linked to Q. acutissima, belonged to the family Fagaceae of the order Fagales.

Discussion
Generally, the mt genome of plants has many unique characteristics, including complex composition, a broad distribution in genome size, and so on (Zhou et al. 2022).The key characteristics of the L. litseifolius mt genome are described in this study.Many mt genomes have circular structures, but some are linear, and even multichromosomal architectures, for example, the mt genome of Polytomella parva is linear (Daniel et al. 2013).The published L. litseifolius mt genome here is circular with 573,177 bp and 45.61% GC content.The GC content was determined to be similar to that observed in previously sequenced plant mt genomes, for instance, Camellia sinensis var.Assamica cv.Duntsa, 45.62% (Li et al. 2023); Acer truncatum, 45.68% (Ma et al. 2022).
The mt genome contains numerous repeat sequences, including two main categories: interspersed and tandem repeats (Gualberto et al. 2014).Repeated sequences are frequently crucial to intermolecular recombination of mitochondria, and usually, the biggest repeats inside a species (frequently more than 1 kb in angiosperms) are discovered to undergo structural recombination, resulting in isomerization (Wynn et al. 2018;Guo et al. 2016).Numerous repetitive sequences have been found in the L. litseifolius mt genome, and the longest interspersed repeat sequence exceeded 1 kb (4,560 bp), which might be evidence of heterodimerization.In addition, the analysis shows that the number and sequence length of SSR in the L. litseifolius mt genome is out of proportion between the gene-coding region and the non-coding region, which indicates that the distribution of SSR loci is uneven.
The sizes of the mt genomes for the four Fagaceae plants ranged from 412,886 to 573,177 bp.The assessment of phylogenetic relationships and the rate of species change within Fagaceae plant species could potentially be achieved by analyzing the shape and dimensions of the mt genome.Notably, closely related species such as R. pulchrum and R. simsii exhibit quite similarities in the size and structure of their mt genomes (Xu et al. 2021).Therefore, the mt genome could serve as a valuable resource for comprehending plant evolution and establishing taxonomic classifications (Li et al. 2023).
L. litseifolius has thirty-six PCGs, twenty-one tRNAs, three rRNAs, and one pseudogene in its mt genome, with the longest mt genome among all Fagaceae plant species identified to date.Comparative analysis of mt genomes from four distinct Fagaceae plant species revealed the presence of two copies of the atp1 gene in both L. litseifolius and Q. variabilis, and gene atp9 has two copes in Q. acuissima.
Additionally, the L. litseifolius mt genome had more annotated genes than the Q. variabilis.They were mttB, nad1, nad4L, rpl16, rps1, rps19, and rps4.These findings highlight the importance of plant mt genomes in future research, as they provide valuable insights into the evolution of genomes that have not been previously explored.
Codon usage bias is often used in phylogenetic and evolutionary analyses of mammals and insects, but less research has been done on plants.The range of ENC values was 20-61, values closer to 20 were indicative of a stronger preference for specific codons, while values closer to 61 indicated a weaker preference (Ikemura 1981).The codon of the L. litseifolius mt genome exhibited an ENC value of 53.64, indicating a low degree of codon preference in this organism.Four Fagaceae plants' mt genomes had ENC values greater than 35, suggesting that natural or artificial selection may be had changed the codon usage bias of those plants' genes, but not mutation (Li et al. 2023).These findings suggested that mt genome ENC analysis may be useful for tracing the evolutionary history of Fagaceae plant species.
The phenomenon of genetic material exchange, specifically in the form of DNA, occurring between nuclear genomes and plant organelles, and between different species, is a common occurrence (Nguyen et al. 2020).Sequencing analysis has facilitated the identification of multiple occurrences of DNA transfer events among diverse genomes, encompassing mitochondrial (mt), nuclear, and chloroplast genomes, across a wide range of plant species (Qiao et al. 2022).In most plants, the chloroplast DNA content in the mt genomes ranges from 3% to 6%, and can sometimes reach up to 10% (Adams et al. 2002).In the case of the L. litseifolius mt genome, a total of 28 chloroplast insertions were identified, with lengths ranging from 29 to 6,080 bp, and a cumulative length of 11.90 kb, which accounts for 2.08% of the total length of the genome.The ratio of transferred fragments observed exhibits similarity to the earlier research results in Acer truncatum (2.36%) (Ma et al. 2022), however, it is inferior to that of Vitis vinifera (8.8%) (Yin et al. 2021).In this study, 9 mt genes were annotated, with 8 tRNA genes, and 1 rRNA gene.The occurrence of tRNA gene transfer from chloroplast to mt DNA is prevalent among angiosperms (Bi et al. 2016), and the tRNA gene plays an indispensable role in the mt genome.The chloroplast genome is very conservative in higher plants, and it is rarely reported that the fragments in the mitochondria go into the chloroplast.Furthermore, the interplay between plastomes and mitogenomes in plants has conventionally been perceived as a unidirectional mechanism, wherein plastomes are transferred to the mitogenome (Yue et al. 2022).However, several studies, such as those on Daucus carota (Iorizzo et al. 2012) and Asclepias syriaca (Straub et al. 2013), have documented the transfer of mt sequences to the plastomes.But in L. litseifolius, there is currently no empirical evidence indicating the occurrence of mt sequence transfer to the chloroplast genome.
The outcomes of the mt phylogenetic investigation of four Fagaceae botanical species and sixteen additional botanical species were in accordance with the taxonomic categorization of said species, indicating the feasibility of utilizing organelle genomes in plant phylogenetic investigations (Wu et al. 2012).Classification trees in this study demonstrate that the affinity between L. litseifolius and Q. acutissima is closer.
In this investigation, the L. litseifolius mt genome served as a reference.Through a match analysis of the whole-genome correlation, it was observed that three species of Fagaceae plants exhibit a higher degree of genomic similarity with L. litseifolius compared to interval sections.This finding suggests that the regions of the mt genome that encompass genes are more conserved among Fagaceae plant species, in contrast to the regions containing intervals (Li et al. 2023).Seven conserved genes were discovered in this study: atp6, rps1, ccmC, rpl2, nad4, nad7, and trnY-GTA.These genes will aid in the classification of Fagaceae plant species and further understanding of evolution.
In different species, there are variable numbers of RNA-editing sites.In the mt genome of L. litseifolius, a sum of 461 RNA editing sites within 35 PCGs was predicted, which is lower compared to those found in gymnosperms with larger mt genomes, such as Ginkgo biloba (1,306) (Kan et al. 2020), and more than those in Suaeda glauca (261) (Cheng et al. 2021).The frequency of RNA editing events located at the first codon position is approximately 30.59% (141), less than that at the second codon position (306; 66.38%).This is in accordance with the outcomes of prior studies (Cheng et al. 2021;Robles and Quesada 2021).The absence of an editing site located at the third position of triplet codons in the mt genome of L. litseifolius is in accordance with the infrequent occurrence of RNA editing sites in the mt genomes of plants (Cheng et al. 2021;Verhage 2020).The mt genome of L. litseifolius contains numerous RNA editing sites, with a limited number of editing types.Specifically, there were only 30 types of codon, corresponding to 20 types of amino acid transfers, among the 461 RNA editing sites.
These transfer types are comparatively smaller than those observed in monocotyledonous and dicotyledonous plants, which typically exhibit 50-60 codons and approximately 30 amino acids (Verhage 2020;Pinard et al. 2019;Alverson et al. 2010).However, they are similar to those observed in most gymnosperms, which typically exhibit 30-40 codons and around 20 amino acids.These findings are in agreement with prior research (Kan et al. 2020;Edera and Sanchez-Puerta 2021).In this study, following RNA editing, it was observed that the hydrophobicity of 43.82% of the amino acids remained unaltered, and it is expected that the hydrophobicity of other amino acids will change, which enables the protein to better fold and function.30 codons are edited into new starting codons by RNA, which makes the encoded proteins more conservative and more homologous to the corresponding proteins encoded by other plants, thus helping genes to be expressed more completely.
The mt genomes of L. litseifolius, Q. acuissima, Q. variabilis, F. sylvatica, and A. thaliana were subjected to Ka/Ks analysis.The results indicated that negative selection dominated the evolution of most genes, suggesting that the PCGs of the L. litseifolius mt genome have been conserved.However, PCGs such as atp4, ccmB, and nad1 exhibited Ka/Ks values > 1, indicating that the evolutionary history of these genes has been influenced by positive selection.The identification of genes with high Ka/Ks ratios is a crucial aspect of gene selection and evolutionary research within the Fagaceae family.Additionally, we investigated the GC content present in the mt genomes of L. litseifolius and nineteen other plant species.
Our results support the existing theory that higher plants' GC content maintains a high level of constancy throughout time (Cheng et al. 2021).

Conclusions
The entire mt genome of L. litseifolius was assembled and characterized in this investigation.The mt genome of L. litseifolius exhibits a circular structure spanning 573,177 bp.The nucleotide composition of the genome is as follows: 27.33% A, 22.94% C, 27.06% T, and 22.67% G, resulting in a GC content of 45.61%.Sixty-one genes in the mt genome of L. litseifolius have been annotated, including 36 PCGs, 21 tRNA genes, 3 rRNA genes, and 1 pseudogene.Then, the repeat sequences, RNA editing, codon usage, and chloroplast to mitochondrion DNA transformation were analyzed.The Ka/Ks analysis revealed that a significant proportion of genes experienced negative selection during the course of evolution, suggesting that the PCGs within the mt genome of L. litseifolius are conserved overall.In addition, our findings also demonstrate that despite significant variations in the magnitude of plant mt genomes, their GC content has remained relatively stable throughout the course of development.Seven genes were found to be associated with conservatism: atp6, rps1, ccmC, rpl2, nad4, nad7, and trnY-GTA.The phylogenetic analysis confirmed that L. litseifolius was genetically more clustered with Q. variabilis.Overall, This research offers extensive information about the L. litseifolius mt genome.Further study on breeding, genetic variation, and systematic evolution of L. litseifolius using the mt genome would benefit from the findings.
Author Contributions X.Q. and S.X.conceived and designed the study.X.Q. and Y.T. performed the experiments and wrote the paper.X.W., Y.W., and Z.X.contributed to resource investigation and sample collection.Z.L. and J.L. analyzed and interpreted the data.

Funding
This investigation received financial backing from the bioengineering Hunan provincial key laboratory of "double first-class" applied characteristic disciplines and ethnic medicinal plant resources in Hunan colleges and universities [2022SWGC05], the key project of Huaihua "Study on selection of excellent germplasm resources and key techniques of deep processing of Xupu Yao cha" [2022], Furong Plan "the key technology research and development of L. litseifolius and industrialization innovation and entrepreneurship team" [2022], the provincial natural science foundation [2022JJ50044], the technical regulations for dwarfing cultivation of L. litseifolius" Xiang Caihang Index [2022] 91", Hunan province traditional Chinese medicine industry technology system project [2022] 67.

Fig. 1 Fig. 5
Fig. 1 The circular map of the mt genome of L. litseifolius.The gene map illustrates a total of 61 annotated genes belonging to various functional groups.Genes located on the outer and inner regions of the circle are transcribed in a clockwise and counterclockwise direction, respectively.The gray circle positioned inside the map represents the GC content

Table 1
Genomic characteristics of the L. litseifolius mt genome

Table 2
Gene profile and organization of the L. litseifolius mt genome * : number of introns; Gene(2): Number of multi-copy gene copies; # Gene: Pseudo gene.

Table 3
SSR distribution of in the mt genome of L. litseifolius

Table 4
Forecasting of RNA editing sites

Table 5
Fragments transferred from chloroplasts to mitochondria in L. litseifolius