Complete chloroplast genomes of Sorbus sensu stricto (Rosaceae): comparative analyses and phylogenetic relationship

DOI: https://doi.org/10.21203/rs.3.rs-1690837/v1

Abstract

Background: Sorbus sensu stricto (Sorbus s.s.) is a genus with important economical values for its beautiful leaves, flowers, especially the colorful fruits in tribe Maleae of the family Rosaceae, and comprises about 90 species mainly distributed in China. There are considerable disputes on the infrageneric classification and species delimitation for the morphological similarities of species. With the aim to shed light on the circumscription of taxa within the genus, phylogenetic analyses were performed with 29 Sorbus s.s. chloroplast (cp) genomes (16 newly sequenced) representing two subgenera and eight sections. 

Results: The 16 cp genomes newly sequenced range between 159,646 bp and 160,178 bp in length. All the samples examined and 22 taxa re-annotated in Sorbus sensu lato (Sorbus s.l.) contain 113 unique genes and 19 of these duplicated in the inverted repeat (IR). Gene deletions detected previously caused mainly by references selected. Six hypervariable regions including trnR-atpA, petN-psbM, rpl32-trnL, trnH-psbA, trnT-trnL and ndhC-trnV were screened, 44–53 SSRs and 58–130 dispersed repeats were identified as potential molecular markers. Phylogenetic analyses under ML/BI indicated that Sorbus s.l. is polyphyletic, Sorbus s.s. and the other five segregated genera, Aria, Chamaemespilus, Cormus, Micromeles and Torminalis are monophyletic. Two major clades and four sub-clades resolved with full-support within Sorbus s.s. are inconsistent with the existing infrageneric classification. Two subgenera, subg. Sorbus and subg. Albocarmesinae are supported to be monophyletic when S. tianschanica transferred to subg. Albocarmesinae from subg. Sorbus and S. hupehensis var. paucijuga transferred to subg. Sorbus from subg. Albocarmesinae, respectively. The classification at the sectional level is not supported by analysis of cp genome phylogeny.

Conclusion: Phylogenomic analyses of the cp genomes are useful for inferring phylogenetic relationships in Sorbus s.s. Though genome structure is highly conserved in the genus, hypervariable regions and repeat sequences detected are the most promising potentially molecule makers for population genetic, species delimitation and phylogenetic studies.

Introduction

The genus Sorbus L. (Maleae, Rosaceae), when established by Linnaeus [1], included only two pinnately leaved species, S. aucuparia L. and S. domestica L. Simple leaved species in Sorbus sensu lato (Sorbus s.l.) known to Linnaeus were assigned to other genera in tribe Maleae. The taxonomy of Sorbus has historically been controversial. Taxonomists either adopted a broad definition [26] or segregated it to six small genera, i.e., Aria (Pers.) Host, Chamaemespilus Medik., Cormus Spach, Micromeles Decne., Sorbus sensu stricto (Sorbus s.s.) and Torminalis Medik., with varied delimitations [1, 712].

Evidence from morphological [11, 13] and molecular analyses [1419] suggested that Sorbus s.l. is polyphyletic and can be divided into five or six separate evolutionary lineages. Accordingly, Sorbus s.l. has been divided into five or six genera, and the genus Sorbus s.s. is restrict to species with pinnately compound leaves and small fruits [12].

Currently, Sorbus s.s. consists about 90 species with more than 60 species native to China [5, 6, 12]. The genus is distributed in the temperate regions of the Northern Hemisphere with the greatest diversity found in the mountains of south-western China, adjacent areas of Upper Burma and the Eastern Himalaya [12]. Sorbus s.s. species have great horticultural potential for leaves color in autumn, white or red flowers, and especially the attractive fruits in crimson, red, pink, orange, yellow and pure white. However, relationships within the genus are still not resolved due to interspecific hybridization, apomictic polyploidy and the limited phylogenetic research data available.

Phylogenetic relationships among Sorbus s.s. species are longstanding problems. Intrageneric classifications proposed by previous taxonomists need to be tested. In the twentieth century, the broad sense of the genus Sorbus was adopted by most authors and the genus Sorbus s.s. was usually treated as a subgenus or a section in Sorbus s.l. Koehne [20] classified subg. Aucuparia (equivalent to Sorbus s.s.) into five unnamed groups because it was "impossible to divide the genus into well characterized sections". Yü and Kuan [21] separated sect. Sorbus (equivalent to Sorbus s.s.) into eight series based on morphological characters such as trichomes on buds, number and shape of leaflets and fruit color. Gabrielian [4] argued that some series proposed by Yü and Kuan [21] included distantly related taxa and assigned species of sect. Sorbus in Western Asia and the Himalayas into nine subsections based on comparative morphological and anatomical data. Phipps et al. [5] divided subg. Sorbus (equivalent to Sorbus s.s.) into two sections, nine series and five informal groups based on morphological characters such as the number and size of leaflets, free or united carpels, color of fruits. Up to date, the only revision on the genus Sorbus s.s. was by published McAllister [12], who divided the genus into two subgenera and 11 sections based mainly on morphological characters such as the color of hairs on the buds, leaflet number, size and shape, color of fruits, combined with ploidy levels, breeding system and geographical distribution.

Taxonomic inconsistency in species delimitation also remains a challenge in the genus. The identities of S. rehderiana Koehne and S. koehneana C.K.Schneid. are examples presented here. S. hypoglauca (Cardot) Hand.-Mazz. was treated as a synonym of S. rehderiana by Yü and Lu [3] and Lu and Spongberg [6], S. unguiculata Koehne was reduced to the synonymy of S. koehneana by McAllister [12], while both of them were recognized as distinct species by McAllister [12] and Phipps et al. [5], respectively.

Previous molecular studies mainly focused on the phylogeny of tribe Maleae, few was specifically concentrated on the infrageneric relationships within Sorbus s.s. Despite previous efforts to elucidate infrageneric relationships within the genus, relationships among the subgenera and sections have remained uncertain. Phylogenetic analyses using chloroplast marker [1618, 22] or chloroplast (cp) genomes [19] supported the monophyletic of the genus but did not support any existing infrageneric classifications. However, significant conflicts were detected on nuclear DNA phylogenies. Wang and Zhang [23] suggested that Sorbus s.l. is monophyletic, but Sorbus s.s and infrageneric groups are not monophyletic based on ITS phylogeny. Contrary to Wang and Zhang [23], Li et al. [24] supported the monophyly of Sorbus s.s. and the other four segregated genera from Sorbus s.l., i.e., Aria (including Micromeles), Chamaemespilus, Cormus and Torminalis based on ITS phylogeny.

Chloroplast genomes of most vascular plants range from 120 to 160 kb, and their cp genomes have a conserved quadripartite structure composed of two copies of an inverted repeat (IR) which divides the remainder of the genomes into one large and one small single copy regions (LSC and SSC) [25]. Chloroplast genomes are frequently used in systematics for the simplicity of the circular structure, predominantly clonal inheritance along the maternal line, as well as being highly variable even at low taxonomic levels [26]. Knowledge of the organization and evolution of cp genomes in Sorbus s.s., Sorbus s.l. and tribe Maleae has been expanding rapidly because of the fast growing in the number of completely sequenced genomes available. Currently, the cp genomes of more than 100 species in the tribe Maleae including 22 species of Sorbus s.l. have been reported and are available for use (https://www.ncbi.nlm.nih.gov).

In this study, cp genomes of 15 Sorbus s.s. species and an unidentified sample were newly sequenced and compared with other 22 species in Sorbus s.l. and 24 species in other genera from tribe Maleae. The aims are: (1) to determine the structure of cp genomes in the 16 Sorbus s.s. samples; (2) to compare the structural variation, investigate and screen mutational hotspots, examine variations of simple sequence repeats (SSRs) and dispersed repeat sequences, and to calculate the nucleotide diversity in Sorbus s.s. cp genomes for future population genetic, species delimitation and phylogenetic studies; (3) to reconstruct phylogenetic relationships among species in Sorbus s.s. and Sorbus s.l.

Results

Organization and features of the chloroplast genomes 

The chloroplast genomes of 15 species and an unidentified sample of Sorbus s.s. exhibit similar structure and organization (Table S1, Fig. 1). The size of cp genomes of the 16 Sorbus s.s. samples range from 159,646 bp in S. wilsoniana C.K.Schneid. to 160,178 bp in S. hypoglauca. All the 16 cp genomes consist of a large single-copy (LSC) with length between 87,612 bp in S. sargentiana Koehne and 88,125 bp in S. hypoglauca; a small single-copy (SSC) with length between 19,219 bp in Sorbus sp. and 19,359 bp in S. tianschanica Rupr.; and a pair of inverted repeats (IRs) with length between 26,378 bp (S. aestivalis Koehne and other nine taxa) and 26,405 bp (S. amabilis Cheng ex T.T.Yu; Table S1). The total GC content is nearly similar, 36.5% for five samples and 36.6% for the other 11 samples (Table S1).

All the 16 cp genomes assembled here encode 113 unique genes (79 protein-coding genes, 30 tRNA genes and four rRNA genes), and 19 of these are duplicated in the IR, giving a total of 132 genes (Table S1, S2 and Fig. 1). Eighteen genes contain one (atpF, ndhA, ndhB, petB, petD, rpl2, rpl16, rpoC1, rps12, rps16, trnA-UGC, trnG-UCC, trnI-GAU, trnK-UUU, trnL-UAA and trnV-UAC) or two (clpP and ycf3) introns, and six of these are the tRNA genes (Table S2, Fig. 1). The cp genomes consist of 56.5 or 56.6% coding regions (49.1 or 49.2% protein coding genes and 7.4% RNA genes) and 43.4 or 43.5% non-coding regions, including both intergenic spacers and introns (Table S1).

The boundaries between IR and LSC/SSC regions of 16 Sorbus s.s. cp genomes and eight species in other genera in Rosaceae were compared (Fig. 2). The IRb/LSC boundary is located within the rps19 gene (the 5′ end of the rps19 is located in the IRb region while 3′ end is located in the LSC), therefore creating a pseudogene of the 5′ end of this gene (rps19Ψ) in the IRa region in all cp genomes compared. The length of rps19Ψ is 116 bp in Micromeles thibetica (Cardot) Mezhenskyj (Fig. 2 C), 182 bp in Prunus persica (L.) Batsch (Fig. 2 F), and 120 bp in the other 22 species (Fig. 2 A–B, D, E). The IRa/LSC border is adjacent to the rps19Ψ in all species except in Micromeles thibetica (Fig. 2 C), which is within the rps19Ψ. The IRa/SSC boundary is located in the ycf1 gene (the 5′ end of the ycf1 is located in the IRa region while 3′ end is located in the SSC), thus creating a pseudogene of the 5′ end of this gene (ycf1Ψ) in the IRb region. The size of ycf1Ψ range from 1003 (Prunus persica; Fig. 2 F) to 1092 bp (Torminalis glaberrima (Gand.) Sennikov & Kurtto; Fig. 2 D), and 1083 bp in all the Sorbus s.s. species (Fig. 2 A–C, E). The IRb/SSC boundary slightly varies: 21 species are located within the overlapping region of the pseudogene ycf1Ψ and ndhF, while the other three species (Malus hupehensis (Pamp.) Rehder, Prunus persica, Pyrus pashia Buch.-Ham. ex D.Don) are located within the ndhF gene (Fig. 2 E, F).

Codon preference analysis

According to the codon usage analysis, the total sequence sizes of the protein coding genes are 78,570–78,588 bp in the 16 Sorbus s.s. genomes, and 26,190–26,196 codons were encoded (Table S3). Leucine encoded with the highest number of codons ranging from 2,753 to 2,757, followed by isoleucine, with the number of codons encoded between 2,255 and 2,260. Cysteine is the least (297 or 298). The relative synonymous codon usage (RSCU) values vary a little among 16 Sorbus s.s. sequences. Thirty codons are used frequently with RSCU > 1 and 32 codons used less frequently with RSCU < 1. UUA shows a preference in all the 16 cp genomes. The frequency of use for the start codons AUG and UGG, encoding methionine and tryptophan, show no bias (RSCU = 1). Codons with A (32.1%) or U (38.2%) in the third position are all 70.3%, thus the codon usage is biased towards A or U at the third codon position.  

Repeated sequences analysis

The total number of SSRs in 16 Sorbus s.s. genomes ranges from 44 to 53 (Fig. 3 A–C; Table S4). The most abundant SSRs are A or T nucleotide repeats, which account from 88.2 to 96.3% (Table S4). The most common SSRs are mononucleotides, which range from 29 to 38, followed by tetranucleotides ranging from 5 to 7, and pentanucleotides ranging from 2 to 5. Dinucleotides are all four in the examined samples except for five in S. tianschanica. Trinucleotides were discovered only in three species: S. filipes Hand.-Mazz., S. hypoglauca and S. rutilans McAll. There are three hexanucleotides in S. cibagouensis H.Peng & Z.J.Yin, two in S. helenae Koehne, one in S. aestivalis, S. albopilosa T.T.Yu & L.T.Lu, S. amabilis and S. rehderiana, and none in the other 11 samples. SSRs are mainly distributed in the intergenic regions (76.6–89.4%), with much lower quantities distributed in the intron regions (10.6–21.3%) and exon regions (0–2.1%; Fig. 3B). Furthermore, SSRs are found mainly in LSC regions (78.4–89.4%), and are remarkably lower in the SSC (6.4–17.6%) and IR (3.8–8%) regions (Fig. 3 C).

The REPuter screening discovered 58 to 130 dispersed repeats 20 bp or longer among the 16 Sorbus s.s. cp genomes examined (Fig. 3 D–E). Sorbus tianschanica has the largest number of repeats with 130 and S. sargentiana has the fewest with only 58. The majority of the repeats (69.4–87.7%) in all cp genomes range between 20 and 25 bp. The longest repeats is 123 bp and is only found in S. foliolosa Spach. Six taxa, S. albopilosa, S. cibagouensis, S. helenae, S. pteridophyslla Hand.-Mazz., S. tianschanica and Sorbus sp., have a maximum size of 44 bp. Only four taxa, S. foliolosa, S. hypoglauca, S. rehderiana and S. ursina S.Schauer, have repeats size larger than 60 bp (Table S5, Fig. 3 E). Among them, forward repeats (25–47) are the most common, followed by palindromic repeats (19–35), reverse repeats (12–38) and complement repeats (1–12, Fig. 3 D). 

Comparative analysis of chloroplast genomes 

Comparative cp genome analysis reveals that noncoding regions are generally more divergent than coding regions and LSC/SSC regions are more divergent than IR regions (Fig. 4). The highest levels of divergence were found in 17 intergenic regions: 15 in the LSC regions, namely trnH-psbA, trnK-rps16, trnG-trnR, trnR-atpA, atpF-atpH, atpH-atpI, trnC-petN, petN-psbM, trnT-psbD, psbZ-trnG, trnT-trnL, ndhC-trnV, trnM-atpE, accD-psaI and rps8-rpl14; and two in the SSC regions, namely ndhF-rpl32, rpl32-trnL. Apart from these regions, two intron regions: clpP and rpl16 also show high sequence variation. 

To elucidate levels of diversity at the sequence level, the nucleotide diversity (Pi) values were calculated. The Pi values range from 0 to 0.00975, with mean value of 0.00098 (Fig. 5, Table S6). The SSC region shows the highest nucleotide diversity (Pi = 0.00173), while the lowest Pi is in the IR boundary regions (Pi = 0.00016). Meanwhile, six hypervariable sites with Pi between 0.005 and 0.01 were screened, which are trnR-atpA (Pi = 0.00975), petN-psbM (Pi = 0.00932), rpl32-trnL (Pi = 0.00753), trnH-psbA (Pi = 0.00636), trnT-trnL (Pi = 0.00642) and ndhC-trnV (Pi = 0.00616).

Phylogenetic Analysis

The ML and BI analyses of cp genomes result in highly congruent topologies. There are only slight differences in support values among the phylogenetic trees. Therefore, only the ML topology is shown here with the ML/BI support values added at each node (Fig. 6). 

Our analyses confirmed that Sorbus s.l. is polyphyletic and six segregated genera, i.e., Aria, Chamaemespilus, Cormus, Miromeles, Sorbus s.s. and Torminalis, are monophyletic. Aria, Chamaemespilus and Torminalis are resolved in one branch near the base of the cp genome phylogeny together with Malus trilobata C.K. Schneid., Aronia arbutifolia (L.) Pers. and Cydonia oblonga Mill. Miromeles is sister to Sorbus s.s. and nested in one branch with Cormus and Pyrus L.

Within the monophyletic genus Sorbus s.s., two major clades are resolved. Clade I comprises two full support subclades (A and B). Subclade A is consistent with subg. Albocarmesinae McAll. Nevertheless, three sections, sect. Hypoglaucae McAll., sect. Insignes (T.T. Yu) McAll. and sect. Multijugae (T.T.Yu) McAll. within subg. Albocarmesinae, are not monophyletic. Subclade B contains two samples representing S. tianschanica belongs to subg. Sorbus sect. Tianshanicae (Kom. ex T.T.Yu) McAll., however, it is resolved in a branch with subg. Albocarmesinae with full-support. Clade Ⅱ contains two full support subclades (C and D) and is sister to the rest of the genus. Subclade C includes five taxa belonging to three different sections, S. aucuparia in sect. Sorbus McAll. and S. hupehensis var. paucijuga in sect. Discolores (T.T.Yu) McAll. are nested within sect. Commixtae McAll. Amongst, sect. Sorbus and sect. Commixtae were classified in subg. Sorbus while sect. Discolores was placed in subg. Albocarmesinae. Subclade D contains two species in subg. Sorbus sect. Wilsonianae McAll. 

Discussion

Gene, structure and the potential molecular markers

Sorbus s.s. can be easily identified by the pinnate leaves and colorful fruits with persistent sepals, stamens and styles. Understanding of taxonomy and phylogenetic relationships in Sorbus s.s. have been particularly difficult because of the widespread occurrence of polyploidy associated with gametophytic apomixis [2729]. In the present study, 29 Sorbus s.s. cp genomes (16 newly sequenced and 13 previously reported) representing 23 species, one variety and two unidentified taxa from both two subgenera and eight out of the 11 sections were compared to all the cp genomes in Sorbus s.l. to clarify phylogenetic relationships and resolve taxonomic uncertainties.

The structure, gene order and GC content are highly conserved and nearly similar in Sorbus s.s. samples analyzed here, and are identical to other cp genomes in angiosperms [3035]. Size of the 29 cp genomes varied from 159,632 (S. ulleungensis Chin S. Chang; NC037022) to 160,178 bp (S. hypoglauca). The Sorbus s.s. cp genomes sequenced here all contain 113 unique genes with the total GC content being 36.5% or 36.6% (Table S1). However, the absence of one to six of the following genes: infA, psaC, psbL, rpl16, rrn4.5, rrn5, rrn16, rrn23, trnG-GCC, trnG-UCC, trnI-CAU and trnS-GGA, were reported in 22 species in Sorbus s.l. (https://www.ncbi.nlm.nih.gov/, Table S9). Some species were found to contain different number of genes in different individuals, for examples, S. amabilis (MT357029) and S. helene (KY419924) were reported to contain 109 and 111 genes respectively, but both annotated 113 genes in the samples examined here. To eliminate the influence of annotation software and references used, the 22 samples were all re-annotated using Plastid Genome Annotator (PGA) program [36] and Geneious v.9.0.2 [37] with S. insignis (NC051947) and Malus hupehensis (NC040187) as references. Unexpectedly, no gene loss was found and all the 22 sequences re-annotated contain 113 genes which are identical to samples examined in this study (Table S9).

Genome composition and natural selection are the two major factors affecting codon usage bias [38, 39]. The total number of 64 codons present across of the Sorbus s.s. cp genomes encoding 20 amino acids (AAs) and codon usage is biased towards A or U at the third codon position, which is in consistent with other higher plants [4043].

The contraction and expansion of IR regions are useful in evolutionary studies in some taxa [44]. However, the IR/SC boundaries are conserved in Sorbus s.s. and in most species of Sorbus s.l. All species compared in this study with the IRb/LSC boundary located within the rps19 gene and creating a pseudogene (rps19Ψ) in the IRa, the IRa/SSC boundary located in the ycf1 gene and creating a pseudogene (ycf1Ψ) in the IRb region.

SSRs are useful markers to assess the organization of genomes and diversity at the species and population level [4547] and to analysis phylogenetic relationships in plants [48]. In this study, the number of SSRs found within Sorbus s.s. genomes ranges from 44 to 53, which are similar to SSRs previously documented in the genus [4950]. Consistent with the previous reports in other Sorbus s.s. species, mononucleotides are the most common SSRs and the largest amount of SSRs is located in the intergenic regions. SSRs are especially useful in establishing the amount of genetic diversity within and between populations [51] and in investigating the parentage of polyploid in Sorbus s.s. [52]. Dispersed repetitive sequences represent a major component of genomes and play a major role in genomic rearrangement and sequence variation [5354]. Sorbus s.s. species contain a substantial number of dispersed repeats and show marked difference in number which range from 58 to 130 with a majority of the repeats ranging between 20 and 25 bp.

Despite the high levels of gene conservation observed, 17 intergenic regions and two introns genes are identified as highly divergent in Sorbus s.s. (Figs. 4 and 5). Among them, some were shown in previous studies to be highly variable and of high phylogenetic utility, such as trnK-rps16, atpH-atpI, trnT-psbD, ndhC-trnV, ndhF-rpl32 and rpl32-trnL [34, 55]. Consistent with the diverse patterns found in most angiosperms [5658], sequence divergence in non-coding regions is higher than that in coding regions. Variable chloroplast sequences have been widely used for plant phylogeny reconstruction [58, 59]. However, among the chloroplast sequences which most frequently used in phylogeny reconstruction of tribe Maleae, such as trnL-trnF, trnG-trnS, rpl20-rps12, etc. [15, 17, 18], only one intergenic region, trnH-psbA (Pi = 0.00339; ranked 5) and one intron rpl16 (Pi = 0.00339; ranked 15) show high variable in Sorbus s.s. The only one chloroplast marker, rps16-trnK, which was applied in phylogeny and historical biogeography analysis of Sorbus s.s. [22] with Pi value (0.0032) ranked 16. Furthermore, the intergenic region trnR-atpA shows the highest Pi value (0.00975) in all Sorbus s.s. genomes. And the trnR-atpA is also hypervariable within other species in Rosaceae genomes [18, 33, 60]. Thus, the new high variable sequences generated in this study, especially the six hypervariable regions, trnR-atpA, petN-psbM, rpl32-trn, trnT-trnL, trnH-psbA and ndhC-trnV, are the most promising potentially molecule makers in phylogeny reconstruction and DNA barcodes identification for Sorbus s.s. plants.

Phylogenetic analysis

Chloroplast genomes are effective in inferring phylogenetic relationships at various taxonomic levels for the conservatism and uniparental heritance [6163]. In this phylogenetic analysis using cp genomes, the monophyly and the infrageneric classification of Sorbus s.s. were investigated, as well as its relationship with other genera in Maleae. The status of S. hypoglauca and S. unguiculata were also re-evaluated.

In congruence with previous molecular phylogenetic studies [15, 1719] and morphological researches [11, 13], the generic circumscription of Sorbus s.l. is not supported by the phylogenetic analyses in this study. Six monophyletic lineage correspond to the six genera segregated from Sorbus s.l., Aria, Chamaemespilus, Cormus, Miromeles, Sorbus s.s. and Torminalis, are well supported. However, the delimitations of three genera having simple leaves, i.e., Aria, Chamaemespilus and Micromeles, were controversial. Aria was usually accepted in a broad sense in previous morphological studies to include Chamaemespilus and Micromeles [11, 64, 65] or in molecular studies to include only Micromeles [14, 15, 24]. Our analyses indicated that Asiatic species formerly included in Aria with persistent calyx are nested within Micromeles which forms the sister group with Sorbus s.s., but are distantly related to Aria edulis, the type species of Aria. Therefore, it is supported to treat Micromeles as an independent genus. All Asiatic simple leaved species formerly included in Aria have been transferred to Micromeles by Mezhenska et al. [66]. In our study, Chamaemespilus is formed a sister group with Aria. The relationship between them need to be further investigated.

Systematics of Sorbus s.s. have been discussed morphologically [4, 5, 12, 20, 21] and molecularly [17, 2224]. The topologies of the phylogenetic trees obtained here are congruent with that revealed by Li et al. [22] using four nuclear markers (LEAFY-2, GBSSI-1, SBEI and WD) and one chloroplast marker (rps16-trnK). Correspond to Li et al. [22], the two monophyletic clades resolved are largely congruent with the two subgenera, subg. Albocarmesinae and subg. Sorbus, defined by McAllister [12]. However, the sections defined by McAllister [12] and infrageneric classification proposed by Koehne [20], Yü & Kuan [21], Gabrielian [4] and Phipps et al. [5], are not supported.

Species in clade I have white to crimson flowers, pinkish-red fruits or white to pink or crimson fruits which will gradually become almost pure white with only the occasional crimson of pink fleck when ripen. Two monophyletic subclades, namely subclade A and subclade B are resolved in this clade. Subclade A consists of 16 species in subg. Albocarmesinae and two unidentified samples that is morphologically similar to species in this subgenus. Two species, S. helenae and S. insignis of sect. Insignes and two species S. hypoglauca and S. pteridophylla of sect. Hypoglaucae, are nested within sect. Multijugae. Thus, the three sections in subclade A, sect. Hypoglaucae, sect. Insignes and sect. Multijugae are not monophyletic. McAllister [12] distinguished the tree sections by the color of hairs on buds and young shoots, petiole bases sheathing or not, carpel apices free of fused, and the ploidy levels. However, the two full support groups in subclade A lack of a consistent morphological synapomophy. Species in subclade A are sexual diploids or apomictic tetraploids. Four taxa, S. albopilosa (2C = 2.624 ± 0.047 pg), S. unguiculata (2C = 2.783 ± 0.103 pg), S. ursina (2C = 2.681 ± 0.028 pg) and the unidentified Sorbus sp. Chen et al. 0914 (2C = 2.765 ± 0.248 pg) are tetraploids (Chen et al. unpublished), other 13 species are diploids [12, 67, 68]. Ploidy level of Sorbus sp. SCZ-2017 is unknown. The tetraploids species S. albopilosa and S. unguiculata are clustered together and formed a full support group with diploid species S. helenae and S. aestivalis, S. ursina is grouped with diploid species S. foliolosa and Sorbus sp. Chen et al. 0914 is grouped with diploid S. koehneana. However, the origin of tetraploid taxa and the relationship with the closely related diploid ones need to be further studied. Subclade B contains two samples of S. tianschanica. Sorbus tianschanica was formerly included in sect. Tianshanicae under subg. Sorbus by McAllister [12] and it is also a sexual diploid. In accordance with the previous works [22], S. tianschanica is sister to the sampled species of subg. Albocarmesinae and it suggests that the species may be misplaced. Sorbus tianschanica can be distinguished from all other species of subg. Sorbus by its “very glossy twigs” [12]. Furthermore, McAllister [12] noted that sect. Tianshanicae have fruits with a distinctive pinkish-red color unknown in subg. Sorbus, and he thought that it might indicate some relationship with species in subg. Albocarmesinae. Our result suggested to transfer S. tianschanica to subg. Albocarmesinae from subg. Sorbus. However, more samples from other species of sect. Tianshanicae are needed to be sequenced to confirm its affiliation at sectional level.

Species in clade II could be easily distinguished from species in clade I by having white flowers and orange-red to bright red fruits without any trace of white or crimson [12, 24, 69]. All species in clade II are sexual diploids. In clade II, two subclades, subclades C and D, are full-supported. Morphologically, species in subclades C have much small inflorescence and relatively larger fruits than species in subclades D. Subclade C contains five taxa formerly assigned to three sections, sect. Commixtae, sect. Sorbus and sect. Discolores. Two taxa, Sorbus aucuparia of sect. Sorbus and S. hupehensis var. paucijuga of sect. Discolores, are nested within sect. Commixtae. Morphologically, S. aucuparia could be easily distinguished from the other two species in sect. Sorbus, S. esserteauiana Koehne and S. scalaris Koehne by its smaller stipules while the latter two have larger persistent stipules; S. hupehensis var. paucijuga is more closely related to S. amabilis and S. commixta in having white flowers and small red fruits rather than to S. hupehensis C.K. Schneid. which has white fruits. Therefore, S. aucuparia and S. hupehensis var. paucijuga might be transferred to sect. Commixtae. Subclade D includes two species, S. sargentiana and S. wilsoniana of sect. Wilsonianae in Subg. Sorbus, and it is the only one section that monophyle is supported in the present study.

Taxonomic inconsistencies in species delimitations also remain a challenge in the genus Sorbus s.s. S. hypoglauca (Cardot) Hand.-Mazz. was treated as a synonym of S. rehderiana by Yü and Lu [3], and Lu and Spongberg [6], but it was re-instated as a distinct species by McAllister [12]. In the present study, S. hypoglauca is sister to S. filipes but not S. rehderiana. Sorbus hypoglauca differs from both S. filipes and S. rehderiana in having large persistent stuples. Therefore, it is supported to treat S. hypoglauca as a distinct species. S. unguiculata Koehne was reduced to the synonymy of S. koehneana by McAllister [12], but was treated as a distinct species by Phipps et al. [5]. In our study, S. unguiculata is not clustered with S. koehneana, but formed a sister group with S. albopilosa. Morphologically, S. unguiculata differs from S. koehneana by the much more number of leaflets, and from S. albopilosa which having red fruits by the white fruits. Therefore, S. unguiculata might be treated as a distinct species.

Conclusion

Complete chloroplast genomes of 29 species in Sorbus s.s. including 16 newly sequenced representing both two subgenera and eight sections were compared. Though genome structure, organization, gene content are highly conserved in the genus, differences in number and distribution of repeat sequences and the six hypervariable regions could be used for molecular systematic, phylogeographic, and population genetic studies.

Sorbus s.s. and the other five genera segregated from Sorbus s.l. (i.e., Aria, Chamaemespilus, Cormus, Miromeles and Torminalis) are strongly supported to be monophyletic, while Sorbus s.l. is confirmed to be polyphyletic. The two subgenera of Sorbus s.s., subg. Sorbus and subg. Albocarmesinae defined by McAllister [12] are monophyletic when S. tianschanica is transferred to subg. Albocarmesinae and S. hupehensis var. paucijuga is transferred to subg. Sorbus. Nevertheless, except sect. Wilsonianae, seven sections defined by McAllister [12] in the genus Sorbis s.s. are not supported. To fully resolve relationships among Sorbus s.s., more cp genomes need be sequenced and phylogenetic analysis with cp genome and nrDNA data combined with morphological comparisons would still be necessary.

Methods

Sampling, DNA extraction and sequencing

Leaf samples representing 15 Sorbus s.s. species and an unidentified sample were collected in the field between 2015 and 2018 from Provinces of Anhui, Hubei, Sichuan, Xinjiang, Xizang and Yunnan in China. Fresh leaves were immediately dried with silica gel for further DNA extraction. Voucher specimens are deposited in the Herbarium of Nanjing Forestry University (NF) and collection information was listed in Table S7. 

Total DNA was extracted following the CTAB protocol [70]. DNA was quantified through fluorometry using Qubit Fluorometer or microplate reader, visualized in a 1% agarose-gel electrophoresis for the quality check. The extraction genomic DNA was subjected to random degradation by Covaris, and then fragments with a size of 270 bp were selected by using AxyPrep Mag PCR clean up Kit. The selected fragments were amplified after suffering from end repair, addition of polyA tail and adaptor ligation. The processed fragments were heat denatured to single strand after purification. The single strands were circularized, and single strand circle DNA was obtained as the final library. The final library was sequenced by Illumina HiSeq 4000 platform at BGI (Shenzhen, China) to generate raw deta (Table S7). The generated raw sequencing data was filtered using program SOAPnuke [71] with default parameters to remove adapters, low-quality reads with quality value ≤ 20, to final obtain high-quality reads.  

Genome assembly and annotation

The high-quality reads were used for de novo assembly to reconstruct Sorbus s.s. chloroplast genomes using GetOrganelle v.1.7.5.3 [72] with the reference cp genome sequence of Torminalis glaberrima (NC033975) with wordsize of 103 and K-mer sizes of 127. Bandage software [73] was used to map all reads to the assembled cp genome sequence for visualization processing and obtaining accurate cp genomes. Complete cp genomes were annotated using PGA program [35] with S. insignis as a reference, then, manually verified and corrected by comparison with five sequences in the same tribe Maleae, Aria edulis M. Roem. (NC045418), Malus hupehensis (NC040170), Micromeles thibetica (MK920287), Pyrus pashia (NC034909), Torminalis glaberrima using Geneious v.9.0.2 [36]. The cp genome maps were created using Organellar Genome DRAW (https://chlorobox.mpimp-golm.mpg.de/OGDraw.html). The complete cp genome sequences and gene annotation of the 16 newly assembled Sorbus s.s. samples were submitted to NCBI database (https://www.ncbi.nlm.nih.gov) under the accession numbers listed in Table S7. Meanwhile, all the 22 cp genomes in Sorbus s.l. (13 in Sorbus s.s.) reported previously were re-annotated. 

Genome structure and codon usage analyses

The structure, size, gene content and GC content of cp genomes were identified using Geneious v 9.0.2. LSC, SSC, IRa and IRb region were plotted with boundary positions being compared using IRscope online software (https://irscope.shinyapps.io/irapp/) [74]. All CDSs were extracted using Geneious v.9.0.2 . The amount of codon and RSCU ratio was calculated using CodonW v.1.4.2 software (http://codonw.sourceforge. net/) with default parameters. 

Repeat analyses

SSRs were identified using the MISA online software (https://webblast.ipk-gatersleben.de/misa/) with the minimum repeat parameters set as 12, 6, 4, 3, 3, 3 repeat units for mono-, di-, tri- tetra-, penta-, and hexanucleotide SSRs, respectively. Online REPuter software (https://bibiserv.cebitec.uni-bielefeld.de/reputer) was used to identify and locate forward, palindromic, reverse and complement sequences with minimum repeat size of 20 bp, maximum repeat sequences number of 200 and the E-value below 0.01.                          

Comparative analyses of chloroplast genomes 

To identify variable regions and intra-generic variations within Sorbus s.s., the alignment was visualized using online mVISTA (https://genome.lbl.gov/vista/index.shtml) in Shuffle-LAGAN mode, with the annotated cp genome of S. insignis as a reference. The 16 Sorbus s.s. cp genomes sequences were aligned in MAFFT [75]. The alignment was used to calculate the Pi value within Sorbus s.s. cp genomes. The sliding window analysis was performed in DnaSP v.5 [76] with step size of 200 bp and window length of 800 bp. 

Phylogenetic analysis

The complete cp genome sequences of 16 newly sequenced Sorbus s.s. with other 45 cp genomes of tribe Maleae, one cp genomes of Amygdaleae and the outgroup (Barbeya oleoides) (Table S8) were aligned with the program MAFFT and any alignment issues were manually modified in Geneious v.9.0.2. Phylogenetic analyses were performed using both maximum likelihood (ML) and Bayesian inference (BI) methods based on the 63 complete cp genomes. ML analyses were implemented in RAxML v.8.0.0 [77] with GTR+GAMMA model. The best likelihood tree was obtain from 100 starting trees using rapid bootstrap analyses with 1000 bootstrap replicates. Multiparametric bootstrapping analyses with 1000 replicates was conducted to obtain the bootstrap for each node. BI analysis were conducted using MrBayes v.3.2.2 [78]. The best-fit nucleotide substitution model for BI analysis were inferred from Modeltest v.3.7 [79] and PAUP v.4.0 [80]. The Markov chain Monte Carlo (MCMC) analysis was run for 6,000,000 generations, and the trees were sampled every 1000 generations with the initial 25% discarded as a burn-in fraction. The resulting trees by ML and BI methods were rooted with Barbeya oleoides and visualized with FigTree v.1.4.3 [81].

Abbreviations

AAs: Amino acids; BI: Bayesian inference; CNS: Non-coding sequence; cp: Chloroplasts; IRs: Inverted repeats; IR: Inverted repeats region; LSC: Large single-copy region; MCMC: Markov chain Monte Carlo; ML: Maximum likelihood; nrDNA: Nuclear ribosome DNA; Pi: Nucleotide polymorphism; RSCU: Relative synonymous codon usage values; SSC: Small single-copy region; SSRs: Simple sequence repeats.

Declarations

Ethics approval and consent to participate

No specific permissions were required for the collection of plant material in this study. The field works and molecular experiments were carried out in compliance with the relevant laws of China. All specimens were identifed by Xin Chen. 

Consent for publication

Not applicable. 

Availability of data and materials

All 16 newly sequenced sequences in this study are available from the National Center for Biotechnology Information (NCBI) (https://www.ncbi.nlm.nih.gov; accession numbers are ON049650–ON049657, ON049659–ON049662 and ON049664–ON049667; see Additional file 7: Table S7). Information for other cp genomes used for phylogenetic analysis download from NCBI (https://www.ncbi.nlm.nih.gov) can be found in Additional Table 9: Table S9. Voucher specimens are deposited in the Herbarium of Nanjing Forestry University (NF) and collection information was listed in Additional file 7: Table S7. 

Competing interests 

The authors declare no competing interests. 

Funding

This work was supported by Natural Science Foundation of Jiangsu Province (grant no. BK20141472) and the Priority Academic Program Development of Jiangsu Higher Education Institutions, Jiangsu Province, China (PAPD).  

Authors’ contributions 

X.C. and Y.F.D. designed experiments; C.Q.T., L.Y.G., X.Y.W. and J.H.M. assembled the genome sequences; C.Q.T. annotated the sequences, identified sequence variants, performed phylogenetic relationship analysis and made figures; C.Q.T., X.C. and Y.F.D. wrote the manuscript; all authors read and approved the manuscript. 

Acknowledgements

We thank Zhongren Xiong, Yun Chen, Yang Zhao, Jing Qiu, Wan Du, Mingwei Geng and Qin Wang for their helps during the field work and Zhengyang Niu, Qin Wang, Tianyi Jiang for their help in DNA extraction and data analyses.

References

  1. Linnaeus C. Species Plantarum. 1753. p. 1–477. https://www.biodiversitylibrary.org/page/358496.
  2. Hedlund T. Monographie der Gattung Sorbus. Kongliga Svenska Vetenskaps Akademiens Handlingar 35; 1901. p. 1–147.
  3. Yü TT, Lu LT. Spiraea, Dichotomanthes, Cotoneaster, Sorbus, Chaenomeles. In: Yü TT, editor. Flora Reipublicae Popularis Sinicae, vol. 36. Beijing: Science Press; 1974. p. 1–443. http://www.iplant.cn/info/Sorbus?t=z. (In Chinese)
  4. Gabrielian E. The genus Sorbus L. in Western Asia and the Himalayas. Yerevan: Armenian Acadenmy of Sciences; 1978. p. 1–264.
  5. Phipps JB, Robertson KR, Smith PG, Rohrer JR. A checklist of the subfamily Maloideae (Rosaceae). Can J Bot. 1990;68:2209–2269. doi:10.1139/b90-288.
  6. Lu LT, Spongberg SA. Sorbus L. In: Wu ZY, Raven PH, Hong DY, editors. Flora of China. vol. 9. Beijing: Science Press; St. Louis: Missouri Botanical Garden Press; 2003. p. 144–170. http://foc.eflora.cn/content.aspx?TaxonId=130718.
  7. Roemer MJ. Familiarum naturalium regni vegetabilis synopses monographicae Ⅲ. Rosiflorae. Amygdalacearum et Pomacearum. Weimar: Landes-Industrie-Comptoir; 1847.
  8. Decaisne J. Mémoirs sur le famille des Pomacées. Nouv Arch Mus Hist Nat. 1874;10:113–192.
  9. Koehne E. Die Gattungen der Pomaceen. Wissenschaftliche beilage zum programm des falk realgymnasiums zu Berlin. Berlin: Verlagsbuchhandlung Hermann Heyfelder; 1890. 
  10. Koehne E. Die Gattungen der Pomaceen. Garten flora. 1891;40:4–7, 35–38, 59–61.
  11. Robertson KR, Phipps JB, Rohrer JR, Smith PG. A synopsis of genera in Maloideae (Rosaceae). Syst Bot. 1991;16(2):376–394. doi:10.2307/2419287.
  12. McAllister H. The genus Sorbus Mountain Ash and other Rowans. London: Royal Botanical Gardens; 2005. p. 1–252.
  13. Phipps JB, Robertson KR, Rohrer JR, Smith PG. Origins and Evolution of Subfam. Maloideae (Rosaceae). Syst Bot. 1991;16(2):303–332. http://www.jstor.org/stable/2419283.
  14. Campbell CS, Donoghue MJ, Baldwin BG, Wojciechowski MF. Phylogenetic relationships in Maloideae (Rosaceae): evidence from sequences of the internal transcribed spacers of nuclear ribosomal DNA and its congruence with morphology. Amer J Bot. 1995;82(7):903–918. doi:10.2307/2445977.
  15. Campbell CS, Evans RC, Morgan DR, Dickinson TA, Arsenault MP. Phylogeny of subtribe Pyrinae (formerly the Maloideae, Rosaceae): limited resolution of a complex evolutionary history. Pl Syst Evol. 2007;266(1–2):119–145. doi:10.1007/s00606-007-0545-y.
  16. Potter D, Eriksson T, Evans RC, Oh S, Smedmark JEE, Morgan DR, et al. Phylogeny and classification of Rosaceae. Pl Syst Evol. 2007;266(1–2):5–43. doi:10.1007/s00606-007-0539-9.
  17. Lo EYY, Donoghue MJ. Expanded phylogenetic and dating analyses of the apples and their relatives (Pyreae, Rosaceae). Molec Phylogen Evol. 2012;63(2):230–243. doi:10.1016/j.ympev.2011.10.005.
  18. Sun JH, Shi S, Li JL, Yu J, Wang L, Yang XY, et al. Phylogeny of Maleae (Rosaceae) based on multiple chloroplast regions: implications to genera circumscription. BioMed Res Int. 2018;2018:1–10. doi:10.1155/2018/7627191.
  19. Ulaszewski B, Jankowska-Wróblewska S, Swiło K, Burczyk J. Phylogeny of Maleae (Rosaceae) based on complete chloroplast genomes supports the distinction of Aria,Chamaemespilus and Torminalis as Separate Genera, Different from Sorbus sp. Plants. 2021;10(11):2534. doi:10.3390/plants10112534.
  20. Koehne E. Plantae Wilsonianae: an enumeration of the woody plants collected in western China for the Arnold Arboretum of Harvard University during the years 1907, 1908, and 1910. Cambridge: Cambridge University Press; 1913. p. 1–661. http://www.biodiversitylibrary.org/bibliography/191.
  21. Yü TT, Kuan KJ. Taxa nova Rosacearum Sinicarum (Ⅰ). Acta Phytotax Sin. 1963;8(3):202–234. https://www.plantsystematics.com/CN/abstract/abstract1327.shtml. (In Chinese)
  22. Li M, Tetsuo OT, Gao YD, Xu B, Zhu ZM, Ju WB, et al. Molecular phylogenetics and historical biogeography of Sorbussensu stricto (Rosaceae). Molec Phylogen Evol. 2017;111:76–86. doi:10.1016/j.ympev.2017.03.018.
  23. Wang GX, Zhang ML. A molecular phylogeny of Sorbus (Rosaceae) based on ITS sequence. Acta Hort Sin. 2011;38(12):2387–2394. doi:10.16420/j.issn.0513-353x.2011.12.001. (In Chinese)
  24. Li QY, Guo W, Liao WB, Macklin JA, Li JH. Generic limits of Pyrinae: Insights from nuclear ribosomal DNA sequences. Bot Stud. 2012;53:151–164.
  25. Smith DR, Keeling PJ. Mitochondrial and plastid genome architecture: reoccurring themes, but significant differences at the extremes. PNAS. 2015;112:10177–10184. doi:10.1073/pnas.1422049112.
  26. Wicke S, Schneeweiss GM, dePamphilis CW, Müller KF, Quandt D. The evolution of the plastid chromosome in land plants: gene content, gene order, gene function. Plant Mol Biol. 2011;76:273–297. doi:10.1007/s11103-011-9762-4.
  27. Campbell CS, Dickinson TA. Apomixis, patterns of morphological variation, and species concept in subfam. Maloideae (Rosaceae). Syst Bot. 1990;15(1):124–135. http://www.jstor.org/stable/2419022.
  28. Ludwig S, Robertson A, Rich TCG, Djordjević M, Cerović R, Houston L, et al. Breeding systems, hybridization and continuing evolution in Avon Gorge Sorbus. Ann Bot. 2013;111(4):563–575. doi:10.1093/aob/mct013.
  29. Robertson A, Rich TCG, Allen AM, Houston L, Roberts C, Bridle JR, et al. Hybridization and polyploidy as drivers of continuing evolution and speciation in Sorbus. Molec Ecol. 2010;19:1675–1690. doi:10.1111/j.1365-294X.2010.04585.x.
  30. The Arabidopsis Genome Initiative. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature. 2000;408(6814):796–815. doi:10.1038/35048692.
  31. Xing SC, Liu CJH. Progress in chloroplast genome analysis. Prog Biochem Biophys. 2008;35(1):21–28.
  32. Jeon JH, Kim SC. Comparative analysis of the complete chloroplast genome sequences of three closely related East-Asian wild Roses (Rosa sect. Synstylae; Rosaceae). Genes. 2019;10(1):23. doi:10.3390/genes10010023.
  33. Sun JH, Wang YH, Liu YL, Xu C, Yuan QJ, Guo LP, et al. Evolutionary and phylogenetic aspects of the chloroplast genome of Chaenomeles species. Sci Rep. 2020;10:11466. doi:10.1038/s41598-020-67943-1.
  34. Cho MS, Kim JH, Yamada T, Maki M, Kim SC. Plastome characterization and comparative analyses of wild crabapples (Malus baccata and M. toringo): insights into infraspecific plastome variation and phylogenetic relationships. Tree Genet Genomes. 2021;17:41. doi:10.1007/s11295-021-01520-z.
  35. Yan JW, Li JH, Yu L, Bai WF, Nie DL, Xiong Y, Wu SZ. Comparative chloroplast genomes of Prunus subgenus Cerasus (Rosaceae): insights into sequence variations and phylogenetic relationships. Tree Genet Genomes. 2021;17:50.   doi:10.1007/s11295-021-01533-8.
  36. Qu X, Moore M, Li D, Yi T. PGA: a software package for rapid, accurateand flexible batch annotation of 13 plastomes. Plant Methods. 2019;15(1):1–12. doi:10.1186/s13007-019-0435-7.
  37. Kearse M, Moir R, Wilson A, Stones-Havas S, Cheung M, Sturrock S, et al. Geneious basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics. 2012;28(12):1647–1649. doi:10.1093/bioinformatics/bts199.
  38. Ikemura T. Codon usage and tRNA content in unicellular and multicellular organisms. Mol Biol Evol. 1985;2(1):13–34. doi:10.1093/oxfordjournals.molbev.a040335.
  39. Bernardi G, Bernardi G. Compositional constraints and genome evolution. J Mol Evol. 1986;24:1–11. doi:10.1007/BF02099946.
  40. Sablok G, Nayak KC, Vazquez F, Tatarinova TV. Synonymous codon usage, GC3, and evolutionary patterns across plastomes of three pooid model species: emerging grass genome models for monocots. Mol Biotechnol. 2011;49(2):116–128. doi:10.1007/s12033-011-9383-9.
  41. Lee SR, Kim K, Lee BY, Lim CE. Complete chloroplast genomes of all six Hosta species occurring in Korea: molecular structures, comparative, and phylogenetic analyses. BMC Genomics. 2019;20:833. doi:10.1186/s12864-019-6215-y.
  42. Ren T, Li ZX, Xie DF, Gui LJ, Peng C, Wen J, et al. Plastomes of eight Ligusticum species: characterization, genome evolution, and phylogenetic relationships. BMC Plant Biol. 2020;20:519. doi:10.1186/s12870-020-02696-7.
  43. Chi XF, Zhang FQ, Dong Q, Chen SL. Insights into comparative genomics, codon usage bias, and phylogenetic relationship of species from Biebersteiniaceae and Nitrariaceae based on complete chloroplast genomes. Plants. 2020;9:1605. doi:10.3390/plants9111605.
  44. Zhu AD, Guo WH, Gupta S, Fan WS, Mower JP. Evolutionary dynamics of the plastid inverted repeat: the effects of expansion, contraction, and loss on substitution rates. New Phytol. 2016;209:1747–1756. doi:10.1111/nph.13743.
  45. Yamamoto T. DNA markers and molecular breeding in pear and other Rosaceae fruit trees. Hort J. 2021;90(1):1–13. doi:10.2503/hortj.UTD-R014.
  46. Eken BU, Kirdok E, Velioglu E, Ciftci YO. Assessment of genetic variation of natural populations of wild cherry (Prunus avium L.) via SSR markers. Turk J of Bot. 2022;46(1):14–25. doi:10.3906/bot-2111-16.
  47. Khan G, Zhang FQ, Gao QB, Fu PC, Zhang Y, Chen SL. Spiroides shrubs on Qinghai-Tibetan Plateau: multilocus phylogeography and palaeodistributional reconstruction of Spiraea alpina and S. Mongolica (Rosaceae). Mol Phylogenet Evol. 2018;123:137–148. doi:10.1016/j.ympev.2018.02.009.
  48. Olmstead RG, Palmer JD. Chloroplast DNA systematics: a review of methods and data analysis. Amer J Bot. 1994;81(9):1205–1224. doi:10.2307/2445483.
  49. Wang Q, Niu Z, Li JB, Zhu KL, Chen X. The complete chloroplast genome sequence of the Chinese endemic species Sorbus setschwanensis (Rosaceae) and its phylogenetic analysis. Nordic J Bot. 2020;38(2):e02532. doi:10.1111/njb.02532.
  50. Tang CQ, Qiu ZX, Tan C, Qian YM, Chen X. Sorbus koehneana(Rosaceae):its complete chloroplast genome and phylogenetic relationship with S. unguiculata. Acta Hort Sin. 2022;49(3):641–654. doi:10.16420/j.issn.0513-353x.2021-0040. (In Chinese)
  51. Raspé O, Saumitou-Laprade P, Cuguen J, Jacquemart AL. Chloroplast DNA haplotype variation and population differentiation in Sorbus aucuparia L. (Rosaceae: Maloideae). Molec Ecol. 2000;9(8):1113–1122. doi:10.1046/j.1365-294x.2000.00977.x.
  52. Chester M, Cowan RS, Fay MF, Rich TCG. Parentage of endemic Sorbus L. (Rosaceae) species in the British Isles: evidence from plastid DNA. Bot J Linn Soc. 2007;154(3):291–304. doi:10.1111/j.1095-8339.2007.00669.x.
  53. Borsch T, Quandt D. Mutational dynamics and phylogenetic utility of noncoding chloroplast DNA. Plant Syst Evol. 2009;282:169–199. doi:10.1007/s00606-009-0210-8. 
  54. Nie XJ, Lv SZ, Zhang YX, Du XH, Wang L, Biradar SS, et al. Complete chloroplast genome sequence of a major invasive species, Crofton Weed (Ageratina adenophora). PLoS ONE. 2012;7(5):e36869. doi:10.1371/journal.pone.0036869.
  55. Shaw J, Lickey EB, Schilling EE, Small RL. Comparison of whole chloroplast genome sequences to choose noncoding regions for phylogenetic studies in angiosperms: the tortoise and the hare III. Amer J Bot. 2007;94(3):275–288. doi:10.3732/ajb.94.3.275. 
  56. Huang H, Shi C, Liu Y, Mao SY, Gao LZ. Thirteen Camellia chloroplast genome sequences determined by high-throughput sequencing: genome structure and phylogenetic relationships. BMC Evol Biol. 2014;14:151. doi:10.1186/1471-2148-14-151.
  57. Barrett CF, Baker WJ, Comer JR, Conran JG, Lahmeyer SC, Leebens-Mack JH, et al. Plastid genomes reveal support for deep phylogenetic relationships and extensive rate variation among palms and other commelinid monocots. New Phytol. 2015;209:855–870. doi:10.1111/nph.13617. 
  58. Kartonegoro A, Veranso‐Libalah MC, Kadereit G, Frenger A, Penneys DS, Mota de Oliveira S, et al. Molecular phylogenetics of the Dissochaeta alliance (Melastomataceae): Redefining tribe Dissochaeteae. Taxon. 2021;70(4):793–825. doi:10.1002/tax.12508.
  59. Mapaya RJ, Cron GV. A phylogeny of Emilia (Senecioneae, Asteraceae) – implications for generic and sectional circumscription. Taxon. 2020;70(1):127–138. doi:10.1002/tax.12417.
  60. Korotkova N, Nauheimer L, Ter-Voskanyan H, Allgaier M, Borsch T. Variability among the most rapidly evolving plastid genomic regions is lineage-Specific: implications of pairwise genome comparisons in Pyrus (Rosaceae) and other angiosperms for marker choice. PLoS ONE. 2014:9(11):e112998. doi:10.1371/journal.pone.0112998.
  61. Rokas A, Holland PWH. Rare genomic changes as a tool for phylogenetics. Trends Ecol Evol. 2000;15(11):454–459. doi:10.1016/s0169-5347(00)01967-4.
  62. Li X, Yang Y, Henry RJ, Rossetto M, Wang Y, Chen S. Plant DNA barcoding: from gene to genome. Biol Rev. 2014;90(1):157–166. doi:10.1111/brv.12104.
  63. Daniell H, Lin CS, Yu M, Chang WJ. Chloroplast genomes: diversity, evolution, and applications in genetic engineering. Genome Biol. 2016;17:134. doi:10.1186/s13059-016-1004-2. 
  64. Aldasoro JJ, Aedo C, Navarro C, Garmendia FM. The genus Sorbus (Maloideae, Rosaceae) in Europe and in North Africa: morphological analysis and systematics. Syst Bot. 1998;23(2):189–212. doi:10.2307/2419588.
  65. Robertson KR, Phipps JB, Rohrer JR. Summary of Leaves in the Genera of Maloideae (Rosaceae). AAnn Missouri Bot Gard. 1992;79:(1):81-94. http://www.jstor.org/stable/2399811.
  66. Mezhenska LO, Mezhenskyj VM, Yakubenko BY. NULESU Collections of fruit and ornamental plants. Lira-K, Kiev, КОЛЕКЦІЯ НУБІП УКРАЇНИ ПЛОДОВИХ І ДЕКОРАТИВНИХ РОСЛИН; 2018. p. 1–107.
  67. Xi LL, Li JB, Zhu KL, Qi Q, Chen X. Variation in genome size and stomatal traits among three Sorbus species. Pl Sci J. 2020;38(1):32–38. doi:10.11913/PSJ.2095-0837.2020.10032. (In Chinese)
  68. Li JB, Zhu KL, Wang Q, Chen X. Genome size variation and karyotype diversity in eight taxa of Sorbus sensu stricto (Rosaceae) from China. Cytogenet Genome Res. 2021;15(2):137–148. https://doi.org/10.3897/compcytogen.v15.i2.58278.
  69. Chang CS, Gil HY. Sorbus ulleungensis, a New Endemic Species on Ulleung Island, Korea. Harvard Pap Bot. 2014;19(2):247–255. doi:10.3100/hpib.v19iss2.2014.n11.
  70. Doyle JJ, Doyle JL. A rapid DNA isolation procedure for small quantities of fresh leaf tissue. Phytochem Bull. 1987;19:11–15. doi:10.12691/ajmr-3-1-7.
  71. Chen YX, Chen YS, Shi CM, Huang ZB, Zhang Y, Li S, et al. SOAPnuke: a MapReduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data. GigaScience. 2018;7(1):1–6. doi:10.1093/gigascience/gix120.
  72. Jin JJ, Yu WB, Yang JB, Song Y, dePamphilis CW, Yi TS, et al. GetOrganelle: a fast and versatile toolkit for accurate de novo assembly of organelle genomes. Genome Biol. 2020;21(1):241. doi:10.1186/s13059-020-02154-5.
  73. Wick RR, Schultz MB, Zobel J., Holt KE. Bandage: interactive visualization of de novo genome assemblies. Bioinformatics. 2015;31(20):3350–3352. doi:10.1093/bioinformatics/btv383.
  74. Amiryousefi A, Hyvönen J, Poczai P. IRscope: an online program to visualize the junction sites of chloroplast genomes. Bioinformatics. 2018;34(17):3030–3031. doi:10.1093/bioinformatics/bty220.
  75. Katoh K, Standley DM. MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability. Molec Biol Evol. 2013;30(4):772–780. doi:10.1093/molbev/mst010.
  76. Librado P, Rozas J. DnaSP v5: A software for comprehensive analysis of DNA polymorphism data. Bioinformatics. 2009;25(11):1451–1452. doi:10.1093/bioinformatics/btp187.
  77. Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014;30(9):1312–1313. doi:10.1093/bioinformatics/btu033.
  78. Ronquist F, Teslenko M, van der Mark P, Ayres DL, Darling A, Höhna S, et al. MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst Biol. 2012;61(3):539–542. doi:10.1093/sysbio/sys029.
  79. Posada D, Crandall KA. MODELTEST: testing the model of DNA substitution. Bioinformatics. 1998;14(9):817–818. doi:10.1093/bioinformatics/14.9.817.
  80. Matthews LJ, Rosenberger AL. Taxon combinations, parsimony analysis (PAUP*) and the taxonomy of the yellow-tailed woolly monkey, Lagothrix flavicauda. Am J Phys Anthropol. 2008;137:245–255. doi:10.1002/ajpa.20859.
  81. Rambaut A. FigTree, a Graphical Viewer of Phylogenetic Trees. Edinburgh: Institute of Evolutionary Biology University of Edinburgh; 2007. http://tree.bio.ed.ac.uk/software/figtree.