DOI: https://doi.org/10.21203/rs.3.rs-1471846/v1
Background: Saussurea is one of the most species-rich genera in the Cardueae, Asteraceae. There are approximately 40 Saussurea species distributed in Korea, with nearly 40% of them endemics. Infrageneric relationships remain uncertain due to insufficient resolutions and low statistical support. In this study, we sequenced the plastid genomes of five Korean endemic Saussurea (S. albifolia, S. calcicola, S. diamantica, S. grandicapitula, and S. seoulensis), and comparative analyses including two other endemics (S. chabyoungsanica and S. polylepis) were conducted.
Results: The plastomes of Korean endemics were highly conserved in gene content, order, and numbers. Exceptionally, S. diamantica had mitochondrial DNA sequences including two tRNAs in SSC region. There were no significant differences of the type and numbers of SSRs among the seven Korean endemics except in S. seoulensis. Nine mutation hotspots with high nucleotide diversity value (Pi > 0.0033) were identified, and phylogenetic analysis suggested that those Korean endemic species most likely evolved several times from diverse lineages within the genus. Moreover, molecular dating estimated that the Korean endemic species diverged since the late Miocene.
Conclusions: This study provides insight into understanding the plastome evolution and evolutionary relationships of highly complex species of Saussurea in Korean peninsula.
Saussurea DC. (ca. 400 species) is one of the most abundant genera of the tribe Cardueae (Asteraceae) and is adapted to cool temperate and arctic regions of Asia, Europe, and North America [1, 2]. Lipschitz (1979) classified Saussurea into six subgenera according to morphological characteristics: Amphilaena (Stschegl.) Lipsch., Eriocoryne (DC.) Hook. f., Frolovia (DC.) Lipsch, Jurinocera (Baill.) Lipsch., Saussurea DC., and Theodorea (Cass.) Lipsch. However, several phylogenetic studies based on morphological traits and molecular markers have provided evidence to designate subg. Jurinocera, subg. Frolovia, subg. Saussurea sect. Elatae, subg. Saussurea sect. Aucklandia, and subg. Saussurea sect Jacea as new genera [3–6]. Subsequently, Saussurea has recently been recognized as four subgenera (Amphilaena, Eriocoryne, Saussurea, and Theodorea) [6]. Although several phylogenetic studies have been conducted using nuclear and plastid loci as molecular markers [3–5, 7], relationships within Saussurea were poorly resolved due to rapid adaptive radiation and convergent evolution [5, 7, 8].
Recently, the advent of next generation sequencing technologies has led to rapidly accumulation of genomic data. Because of conserved structure, non-recombinant traits, and greater variability than the mitochondrion genome, plastid genome (plastome) regions have consistently been used as a robust tool in phylogenetic studies [9–11]. Furthermore, studies using complete plastomes have offered new insights into phylogenetic relationships and the diversification histories of species [12–14]. Although studies on phylogenetic relationships, origins, and evolution using plastomes of Saussurea have been reported [15, 16], the limited number of Korean species was used in the previous studies, and there is a need for a better understanding the relationship among Korean species.
Saussurea is one of the rich species groups in the flora of Korea. The approximately 40 Saussurea species distributed in Korea comprise 2 subgenera, Theodorea and Saussurea, and nearly 40% are endemic species belonging to subg. Saussurea: S. calcicola Nakai, S. chabyoungsanica Im, S. chinnampoensis H. Lév. & Vaniot, S. conandrifolia Nakai, S. diamantica Nakai, S. eriophylla Nakai, S. grandicapitula W. Lee et H. T. Im, S. koidzumiana Kitam., S. macrolepis (Nakai) Kitam., S. myokoensis Kitam., S. polylepis Nakai, S. rorinsanensis Nakai, S. seoulensis Nakai, and S. uchiyamana Nakai [17]. Obtaining a rigorous phylogenetic framework for Saussurea species in Korea has been exceptionally challenging due to sampling difficulties and insufficient levels of resolutions and degree of statistical support. In particular, the phylogenetic tree based on chloroplast (cp) DNA markers (e.g., trnL–trnF and trnH–psbA) commonly used as barcodes has not yet been clearly resolved, showing multiple polytomies (Yun and Kim, unpublished data).
In this study, plastomes of five Korean endemic species, S. albifolia, S. calcicola, S. diamantica, S. grandicapitula, and S. seoulensis, were sequenced and comparative analyses were conducted, including two previously reported species S. chabyoungsanica [18] and S. polylepis [19]. Saussurea albifolia, which is described recently as a new species, has cordate or deltoid-cordate leaves with white or yellowish hair on abaxial surface. In addition, the campanulate involucre has brown-cobwebby hair and the tips of phyllaries do not recurve [20]. Saussurea calcicola has large leaves with cobwebby hair on the underside, and wings on the petiole. It grows to approximately a height of 1 m in limestone regions. Saussurea grandicapitula has cobwebby hairs on the petioles of the radical and lower cauline leaves, big globose involucres with brown-cobwebby hairs, and recurved phyllaries [21]. Saussurea albifolia, S. diamantica, and S. seoulensis share common traits including basal rosette leaves, cobwebby hair on abaxial leaf surfaces, and white or yellowish hairs on the involucre. However, S. diamantica has recurved involucres, and S. albifolia has larger involucral width than S. diamantica and yellowish cobwebby hair on the abaxial surface of leaf. Saussurea seoulensis has a distinctive bell-shaped involucre, the largest such structure among the species [17]. The most notable differences between S. chabyoungsanica and other Korean Saussurea species are long lanceolate leaves with short petioles and a compact corymb [22]. Saussurea polylepis is distinguishable by its glossy and reniform leaves. Based on several diagnostic morphlogical features of Saussurea, congeneric species are distinguished, but variable molecular markers through comparison of plastomes of endemic species can overcome low resolution shown in the previous phylogenetic study (Yun and Kim, unpublished data). In addition, identifying the structure and characteristics of plastomes of the Korean endemic Saussurea species will provide insights into understanding the plastome evolution of Saussurea.
The aims of this study were (1) to determine five complete plastomes of Korean endemic species, (2) to identify divergent sequence hotspots for development of informative cpDNA markers, (3) to gain insight into the evolution of Saussurea plastomes, including structural differences and molecular evolutionary patterns, and (4) to reconstruct phylogenetic relationships among the Korean endemic Saussurea species and estimate their divergence times using plastomes.
Newly sequenced plastomes of the five species had a total length of 152,435 (S. albifolia) – 173,114 bp (S. diamantica) (Fig. 1 and Table 1). Because some parts of mitochondrial DNA sequences including two tRNAs were inserted in the small single copy (SSC) region (ndhF-pseudo ycf1) of plastome of S. diamantica, the total length of S. diamantica was longer than that of other Saussurea species by 20,550 bp (Fig. 1b and Table 1). BLAST searches were performed to determine the characteristics of the insertion. The result demonstrated that the inserted sequences were highly matched to Chrysanthemum, Diplostephium, Lactuca, Helianthus, and Paraprenanthes mitochondrial DNA sequences, ranging from 5,159 bp (Paraprenanthes diversifolia, MN661146) to 7,090 bp (Diplostephium hartwegii, KX063855), but this does not mean the continuous consistency of the whole 20,550 bp on the mitochondrial genome. The transfer of mitochondrial DNA sequences to plastoms has been reported in the families Apiaceae, Apocynaceae, and Poaceae [23–25]. The previous studies indicated that genes of less than 3 kb of mitochondrial DNA are inserted into the IR or LSC regions. Given that a plastome is highly conserved, the large insertion of mitochondrial DNA sequences is an unusual event. Thus, confirmation is needed that the insertion was not merely a product of assembling error. By comparing sequencing depth of before and after the insertion, Ma et al. [25] confirmed that it is not a product of misassembly. Because the plastome occupies the smallest portion of the genomic DNA, it can be easily distinguished from mitochondrial and nuclear DNA sequences by sequencing depth. Ma et al. [25] also inferred that the nuclear and mitochondrial genomes are larger than the plastome and would have a lower sequencing depth. The sequencing coverage of S. diamantica was evaluated instead of the sequencing depth suggested by Ma et al. [25]. The average sequencing coverage was 322.3 and that of the inserted regions was 356.8. The average sequencing coverage of the surrounding regions, which was 349.1 and 346.9 before and after the insertion with 200 bp, is similar to that of the inserted region. These results indicated that misassembly is not a cause of mitochondrial DNA insertion into the plastome of S. diamantica. In addition, PCR amplification and Sanger sequencing were conducted to confirm the insertion using designed two primer sets (SD1f: GTAGGGGGTGGGCGTATTTC, SD1r: GATGTCGAGTGCCGCTTTTC and SD2f: AGGGTGATGCTTGGCTTCT, SD2r: TTTTCGTGGTTAGAGCGGCT), and amplifications were successful but not for other species (data not shown). It also supported that there was no error in the assembly process.
Seven Saussurea species have a typical quadripartite structure that comprised a pair of inverted repeats (IR: IRA and IRB) of 25,185–25,893 bp, which were separated by SSC of 18,671–37,846 bp and large single copy (LSC) of 83,387–83,482 bp. The largest insertion (62 bp) was identified in the trnE–rpoB intergenic region of LSC of S. grandicapitula, S. polylepis, and S. seoulensis. Other than S. diamantica, length of IR, SSC, and LSC of each species was similar. The overall GC content was 37.7% and the GC content in the IR, SSC, and LSC was respectively 43.1%, 31.4%, and 35.8%, except for S. diamantica (overall GC: 38.6%, IR: 42.8%, SSC: 39.1%, LSC: 35.8%) (Table 1).
Seven Saussurea plastomes possessed the two inversions in LSC region like other Asteraceae. The seven plastomes depicted high similarity at the LSC, IR, and SSC boundaries (Fig. S1). Rps19 was across the LSC–IRB boundary without any change in sequence length, and trnH–GUG was located three base pairs away from the LSC–IRA boundary in all species. The ycf1 gene crossed SSC–IRB, with 4,022–4,740 bp within the SSC region and 561–1,240 bp within the IRB region. Ycf1 was a pseudogene, located at the SSC–IRA boundary, with 6–752 bp within the SSC region and with 561–1,240 bp within the IRA. In particular, S. diamantica had longer ycf1 (pseudogene) than others.
The seven plastomes contained 114 identical genes, including 80 protein-coding genes, 30 tRNA, and four rRNA genes. Eighteen genes including 12 protein–coding genes (atpF, clpP, ndhA, ndhB, petB, petD, rpl2, rpl16, rpoC1, rps12, rps16, and ycf3) and 6 tRNA genes (trnA–UGC, trnG–UCC, trnI–GAU, trnK–UUU, trnL–UAA, and trnV–UAC) had intron, and 17 genes (ndhB, rrn4.5, rrn5, rrn16, rrn23, trnA–UGC, trnI–CAU, trnI–GAU, trnL–CAA, trnN–GUU, trnR–ACG, trnV–GAC, rpl2, rpl23, rps7, ycf2, and ycf15) were duplicated in IR regions (Table S1). However, S. diamantica had two additional tRNA genes (trnC–GCA and trnM–CAU) from the mitochondrial genome in the SSC region. The formation of tertiary structure of two tRNAs was confirmed by simulation through the tRNAscan-SE 2.0.
S. albifolia |
S. calcicola |
S. chabyoungsanica |
S. diamantica |
S. grandicapitula |
S. polylepis |
S. seoulensis |
|
---|---|---|---|---|---|---|---|
Reference |
This study |
This study |
Cheon et al. [18] |
This study |
This study |
Yun et al. [19] |
This study |
Total length (bp) |
152,435 |
152,453 |
152,446 |
173,114 |
152,484 |
152,488 |
152,578 |
LSC length (bp) |
83,387 |
83,399 |
83,397 |
83,482 |
83,431 |
83,417 |
83,441 |
SSC length (bp) |
18,678 |
18,672 |
18,679 |
37,846 |
18,671 |
18,689 |
18,727 |
IRa length (bp) |
25,185 |
25,191 |
25,185 |
25,893 |
25,191 |
25,191 |
25,205 |
Total GC content (%) |
37.7 |
37.7 |
37.7 |
38.6 |
37.7 |
37.7 |
37.7 |
GC content of LSC |
35.8 |
35.8 |
35.8 |
35.8 |
35.8 |
35.8 |
35.8 |
GC content of SSC |
31.4 |
31.4 |
31.4 |
39.1 |
31.4 |
31.4 |
31.4 |
GC content of IR |
43.1 |
43.1 |
43.1 |
42.8 |
43.1 |
43.1 |
43.1 |
Protein-coding gene number |
80 |
80 |
80 |
80 |
80 |
80 |
80 |
rRNA gene number |
4 |
4 |
4 |
4 |
4 |
4 |
4 |
tRNA gene number |
30 |
30 |
30 |
32* |
30 |
30 |
30 |
Total gene number |
114 |
114 |
114 |
116 |
114 |
114 |
114 |
SNP patterns that can be divided into 2 groups were found in 42 regions (Table S2). Of them, 34 regions were in LSC, followed by SSC and IR. Based on S. involucrata as a reference, there were no large differences among the seven Korean endemics. The LSC and SSC regions were more divergent than IR regions, and the coding regions were more conserved than the non-coding regions (Fig. S2). However, coding regions such as rbcL in the LSC region, ycf1 in the SSC region, and ycf2 in the IR regions showed variability.
The nucleotide diversity (Pi) ranged from 0 to 0.0053 (Fig. 2). The IR region had a relatively low nucleotide diversity value, ranging from 0 to 0.00286. We detected nine divergence hotspots with Pi values over 0.0033. Among them, seven were located in the LSC region, and two were located in the SSC region. Other than ycf1, the variable regions were concentrated in intergenic spaces. The hotspot with the highest Pi was ycf4–cemA (0.0051), followed by seven intergenic regions (psbC–trnS, rbcL–accD–psaI, rpl32–ndhF, trnT–trnD, psbE–petL, rps4–trnT–trnL, and rpl16–trnQ–psbK) and one gene region (ycf1).
Five categories (mononucleotide, dinucleotide, trinucleotide, pentanucleotide, and hexanucleotide) of SSRs were detected, and the types and numbers of SSRs were similar across the seven Saussurea (Fig. 3). The total number of SSRs was 72 in S. albifolia, 75 in S. calcicola, 74 in S. chabyoungsanica, 76 in S. diamantica, 77 in S. grandicapitula, 78 in S. polylepis, and 82 in S. seoulensis. The detected SSRs were mainly located in the LSC region (67.1–77%) and distributed in the IR and SSC regions ranging from 9.8–11.1% and from 12.2–22.4%, respectively. Twenty-three of the SSRs detected from the seven Saussurea were located in 15 genes (cemA, ndhB, petA, psaA, psbC, rbcL, rpoA, rpoB, rpoC1, rpoC2, rps15, rrn23, trnS–UGA, ycf1, and ycf2) with 3–10 repeat numbers (Table S3). The most abundant type was mononucleotides A/T and species–specific SSR was identified from S. seoulensis as hexanucleotides TACAAA/TTTGTA.
The non-synonymous (Ka) to synonymous (Ks) substitution rate ratio (Ka/Ks) has been used to determine whether protein-coding genes are subjected to selective pressure. If Ka/Ks is greater than 1, it could indicate that it is under positive selection [26]. We calculated synonymous and nonsynonymous substitution rates between S. involucrata and Korean endemic Saussurea (Fig. S3). Approximately 90% of protein coding genes were below 1 in Ka/Ks values. In seven Korean species, the Ka/Ks value was close to zero at 12 protein coding genes (clpP, ndhB, ndhH, petD, psaC, psbA, psbB, psbC, psbD, rpoB, rps11, and rps15) while that of five protein coding genes (ndhI, psaJ, psbL, rpl33, and ycf2) were 50, indicating positive selection influenced the differentiation of Saussurea.
We detected similar patterns in the frequency of codon usage of seven Korean endemics. The 80 annotated protein-coding genes are encoded by 22,739 codons in S. albifolia, 22,831 in S. calcicola and S. polylepis, 22,826 in S. chabyoungsanica, 22,821 in S. diamantica, and 22,835 in S. grandicapitula, and 22,834 in S. seoulensis (Table S4). Leucine was the most abundant amino acid (10.6%), whereas cysteine was the least (1.1%). The most used synonymous codon was ATT, encoding isoleucine, and the least used was TGC, encoding cysteine. Usage of the start codon methionine (ATG) and tryptophan (TGG) had no biases (relative synonymous codon usage, RSCU = 1). All preferred relative synonymous codons (RSCU > 1) ended with an A or a T, other than TTG (leucine) (Fig. 4). The tendency for codon preference was similar among species. Of 61 codons (except for stop codon), 14 (Ala–GCT, Arg–AGA, Asn–AAT, Asp–GAT, Gln–CAA, Gly–GGA, His–CAT, Leu–TTA, Lys–AAA, Pro–CCT, Ser–TCT, Thr–ACT, Tyr–TAT, and Val–GTA) were highly preferred (RSCU > 1.5).
In this study, 32 plastomes were used to determine the phylogenetic relationships among Korean endemic Saussurea. As a result, the higher resolution phylogenetic tree showed that Saussurea based on the current sampling is not monophyletic (Fig. 5). Of the seven endemic species, S. diamantica diverged first. The morphologically similar S. albifolia and S. seoulensis did not form a sister relationship; S. albifolia formed a sister with the group including S. odontolepis, S. bullockii, S. tianshuiensis, and S. chabyoungsanica. Limestone endemic, S. calcicola, shared its common ancestor with the group consisting of S. brachycephala, S. amurensis, S. polylepis, S. grandicapitula, S. seoulensis. S. kuschakewiczii, S. leucophylla, S. tomentosa, S. komaroviana, and S. subtriangulata. However, relatively low bootstrap values hindered us to determine precisely their phylogenetic relationships. Also, S. chabyoungsanica, which is narrow limestone endemic to central Korea, is sister to S. tianshuiensis, which occurs narrowly in high montane meadows (1800–2500 m) in three provinces of northwestern China (i.e., SE Gansu, Shaanxi, and Ningxia).
The molecular age estimation suggested that endemic Korean Saussurea originated in the late Miocene (Tortonian), with the estimated crown age of approximately 9 million years ago (95% HPD, 3.03–18.8 million years ago, MYA) (Fig. 6). The clade containing all but S. diamantica, which has unusual mitochondrial DNA sequences insertion, was estimated to be 6.18 MYA (95% HPD, 2.14–13.28 MYA). Two major lineages of the Korean endemics, i.e., S. albifolia–S. chabyoungsanica and S. calcicola–S. seoulensis–S. polylepis–S. grandicapitula, appear to be speciated even more recently, during the Pleistocene.
Accumulation of plastome data from various land plants has improved our understanding of plant evolution. In general, the plastome has a highly conserved structure; a single circular DNA molecule is composed of a large single copy, a small single copy, and two copies of inverted repeats. However, structural rearrangements, gene loss, IR expansion and contraction, inversion, and gene transfer occur in certain species or lineages [27]. In this study, we sequenced the complete plastomes of five Korean endemic Saussurea species and are compared their characteristics including two previous sequenced endemic species, S. chabyougnsanica and S. polylepis. Like other angiosperms, they have a quadripartite structure and are highly conserved in gene order, gene content, and gene number. The expansion and contraction of the IR regions have been affected by length variation in plastomes [28–29]. In addition, smaller expansions and contractions of the IR regions are a common phenomenon in angiosperms [30–31]. The IR lengths between the seven endemic species are slightly different due to insertions and deletions in ycf2 and intergenic spaces of ycf2-trnL-CAA, trnL-CAA-ndhB, ycf15-trnV-GAC, and trnR-ACG-trnN-GUU. In particular, insertion or deletion in ycf2 sequences is well reported in a previous study [32]. Many insertions and deletions can be shared by certain groups (e.g., Brassicales, Saxifragales). Moreover, multiple insertions and deletions that are highly conserved in ycf2 can be observed to be shared by more distantly related taxa [32]. The distribution of those insertions and deletions may relate to multiple evolutionary histories, such as differential gene losses and horizontal gene transfer. Concerning that there are differently shared patterns of insertions and deletions as well as SNPs among Korean endemic species, it can support those Korean endemics are originated from multiple lineages.
Interestingly, we found that S. diamantica has mitochondrial DNA sequences including two tRNAs in SSC region. DNA transfer between nuclear and organellar (plastid and mitochondria) genomes has been reported in several taxa. Most prominent transfer is from organellar genomes into the nuclear genomes [33, 34]. Also, there are previous studies reporting gene transfer between organelle genomes [23, 25, 35]. In particular, the transfer of mitochondrial DNA sequences to plastomes has been reported in the families Apiaceae, Apocynaceae, and Poaceae [23–25]. However, the evidence of DNA transfer between organellar genomes have been not reported in Asteraceae. In this study, our result of BLAST search demonstrated that the inserted sequences were highly matched to Chrysanthemum, Diplostephium, Lactuca, Helianthus, and Paraprenanthes mitochondrial DNA sequences, but this does not mean the continuous consistency of the whole 20,550bp on the mitochondrial genome. Even though it is difficult to reveal whether the exact mechanism is the result of deletion after insertion of long DNA sequences or the result of multiple transfers. The insertion of mitochondrial DNA into the plastome is not common at the species level, so these conclusions require verification through the further studies. Information on the mitochondrial genome in Saussurea will improve understanding of the insertion event and genome evolution overall.
The Pi values can provide useful information for marker development for phylogenetic analysis and DNA barcoding. The effectiveness of newly developed markers based on nucleotide diversity values has been verified through the discrimination of closely related species [36, 37]. In seven Korean species, Pi ranged from 0 to 0.0053 (Fig. 2), indicating high similarity of sequences among seven Saussurea species. These low values were reported in Sonchus (Asteraceae) ranged from 0 to 0.006 [38] and Meconopsis (Papaveraceae) ranged from 0 to 0.007 [39]. The high similarity might be related to the recent speciation.
The hotspot with the highest Pi was ycf4–cemA (0.0051), followed by seven intergenic regions (psbC–trnS, rbcL–accD–psaI, rpl32–ndhF, trnT–trnD, psbE–petL, rps4–trnT–trnL, and rpl16–trnQ–psbK) and one gene region (ycf1). Of the nine variable regions detected in this study, rpl16–trnQ, trnT–psbD, rps4–trnT–trnL, accD–psaI, psbE–petL, and rpl32–ndhF coincide with the variable cp regions identified by Shaw et al. [41, 41] and trnT–psbD, rps4–trnT–trnL, accD–psaI, psbE–petL, and rpl32–ndhF regions have been utilized in conducting phylogenetic studies in many taxa [42–46]. Previous studies used in cp molecular markers, such as trnL–trnF, psbA–trnH, and matK, have been poorly resolved in Saussurea [3–5]. The low resolution of phylogenetic studies can be interpreted as the low diversity values of trnL-trnF, psbA-trnH, and matK markers. Therefore, using these identified variable regions will be helpful for further clarifying phylogenetic relationships.
As microsatellites or SSR markers have hyper-mutation rates and polymorphism, they are suitable for population genetic analyses such as population genetic structures and gene flow patterns. In particular, Powell et al. [47] suggested that the SSR markers from the chloroplast are useful for acquiring insight into gene flow related to seed and pollen dispersal, genetic structure, nuclear-chloroplast interaction, and the origin of polyploidy. Many studies have reported the use of cpSSR markers with high polymorphisms. For example, Vendramin et al. [48] assessed the genetic variation among Abies alba populations using 2 cpSSRs, while Cubas et al. [49] evaluated the genetic variability and relationships among Ulex species using 6 cpSSRs. The detected SSRs were mainly located in the LSC region (67.1–77%), which is consistent with the characteristics found in other angiosperms [38, 50]. The most abundant type was mononucleotides A/T, which is consistent with the results obtained in previous studies [51, 52]. As the plastomes of seven Saussurea were conservative, SSR primers can be transferable across species and genera. Therefore, information involving SSRs in this study could be useful for studies at the population level and provide complementary data to the SSR markers of Saussurea identified from the nuclear genome [53].
If Ka/Ks is greater than 1, it could indicate that it is under positive selection [26]. Approximately 90% of protein coding genes were below 1 in Ka/Ks values. These results indicate that the protein-coding genes may have undergone purifying selection pressure during their evolution, and it is consistent with the typical tendency shown in the plastid genes [54].
Codon usage bias can also improve our understanding of the effects of natural selection during the evolution [55, 56]. If selective pressure or mutation preferences are absent, synonymous codons prefer equally, and the nucleotide mutations at each amino acid site occur randomly [57]. The tendency for codon preference was similar among species, indicating relatively conserved characteristics of plastome. There were more codons with the RSCU value less than one ended with base C or G and there was high A/T preference in the third codon. These are a common phenomenon in plastomes of vascular plants.
Although Korean endemic Saussurea species can be distinguished by morphological characteristics, the low resolution of the previous phylogenetic study (Yun and Kim, unpublished data) was insufficient to understand the relationships between species. In the phylogenetic tree using complete plastomes, Korean endemics were not monophyletic. It is plausible that few independent lineages of Saussurea involved in the origin of several endemic species in Korea. In addition, we found that the morphologically similar S. albifolia and S. seoulensis did not form a sister relationship; S. albifolia formed a sister with the group including S. odontolepis, S. bullockii, S. tianshuiensis, and S. chabyoungsanica. This may suggest that the morphological similarities between S. albifolia and S. seoulensis could be due to convergent evolution or parallelism. Nevertheless, when the current Korean phylogenetic framework was compared to the previous broader phylogenomic study [16], our result also showed the same relationships between S. polylepis and S. amurensis, and S. chabyoungsanica and S. tianshuiensis. However, the clade including the Korean endemics has still low support values (bootstrap support value < 50). Although the complete plastome sequences were utilized to build baseline phylogenetic framework among the Korean endemics, recent explosive speciation of this group perhaps contributed to insufficient resolutions in species relationships.
As for the molecular age estimation based on much broader phylogenomic study [16], the MRCA of S. polylepis and S. amurensis was estimated to be 4.8 MYA, while the MRCA of S. chabyoungsanica and S. tianshuiensis was 0.32 MYA. Therefore, these age estimates are concordant with the current study. Like precise phylogenetic relationships among species of Saussurea in East Asia require further study, accurate molecular age estimation of Korean endemic lineages is needed. In addition, it is yet to be determined whether climatic oscillations during the Pleistocene were major evolutionary drivers in speciation of Saussurea [58–60].
In this study, the complete plastomes of five Korean endemic Saussurea were reported and comparatively analyzed these data including previously reported species, S. chabyoungsanica and S. polylepis. The structures of the plastomes were generally conserved, sharing most genomic features despite the different morphological diversity, but we found the mitochondrial DNA sequences insertion in S. diamantica. Through the comparative analyses including variable regions, SSRs, Ka/Ks value, and codon usage, we identified the species-specific SSRs, different patterns of Ka/Ks among species, and nine hotspot regions. These resources will provide insight into the evolution of Korean Saussurea and plastome molecular markers for the identification among Saussurea species. The phylogenetic tree indicated Korean endemic species did not originate from one lineage.
Fresh leaves of Saussurea albifolia, S. calcicola, S. dimantica, S. grandicapitula, and S. seoulensis were sampled from natural populations in South Korea and dried with silica gel. All voucher specimens were deposited in the Ha Eun Herbarium, at Sungkyunkwan University (SKK) (Table S5). Total genomic DNA was extracted using the DNeasy Plant Mini Kit (Qiagen, Carlsbad, California, USA) according to the manufacturer’s instructions.
After conducting quality control, qualified samples proceeded to library construction. Paired-end sequencing was performed on the Illumina Hiseq 4000 (Illumina Inc., San Diego, California, USA) at the Macrogen Corporation (Seoul, Korea). For each species, approximately 3.0 GB raw data were generated. The raw reads were assembled de novo into whole plastomes using Velvet v. 1.2.10 with multiple k-mers [61]. Dual Organellar GenoMe Annotator (DOGMA) software [62] and tRNAscan-SE [63] were used to annotate the protein coding genes and transfer RNA genes, respectively. The graphical maps of plastomes were drawn in the Organellar Genome DRAW (OGDRAW) program [64]. The five annotated complete plastome sequences were submitted to GenBank (Table S5).
The complete plastomes of the five new species and two previously reported Korean endemic S. chabyoungsanica (NC036677) and S. polylepis (NC036490) were aligned and adjusted manually using Geneious v.10.2.2. (Biomatters Ltd., Auckland, New Zealand). A large insertion (20,550 bp) found in S. diamantica was excluded for analysis. The complete plastomes of the seven Korean endemic Saussurea were compared using mVISTA [65] with Shuffle-LAGAN mode [66] and default parameters. The plastome of S. involucrata (NC029465) was used as a reference. Sliding window analysis was carried out to calculate the nucleotide diversity (Pi) using DnaSP v. 6 [67]. The step size was set to 300 bp, with a 900 bp window length. SSRs were detected using MISA [68]. The minimum repeat thresholds were set to ten for mononucleotide repeats, four for dinucleotide to tetranucleotide repeats, and three for pentanucleotide and hexanucleotide repeats. Sequences of 80 protein-coding regions without stop-codons were extracted from the plastomes of seven Saussurea and S. involucrata as a reference. KaKs_Calculator 2.0 [69] was used for calculating Ka/Ks values with genetic code 11 (bacterial and plant plastid code) and GY as a calculation mode. The codon usage frequencies and RSCU values for 80 protein-coding genes were determined with DnaSP v.6. The RSCU was divided into four models, including lack of preference (RSCU ≤ 1.0), low preference (1.0 < RSCU < 1.3), moderate preference (1.30 ≤ RSCU ≤ 1.50), and high preference (RSCU > 1.5) [57].
For the phylogenetic analysis, the plastomes of 28 Saussurea species including 7 Korean endemics and 4 representative species from 4 genera (Arctium, Carthamus, Centaurea, and Hemistepta) as an outgroup were used. Twenty-one additional Saussurea species were selected based on the previous study [16]. Sequences were aligned using MAFFT v.7.149 [70] and removed the gap or poorly aligned position using Gblocks v.0.91b, using default settings [71]. A maximum likelihood (ML) analysis was performed in IQ-TREE v. 1.4.2 [72] with 1000 replicates. By using jModelTest v.2.1.10 [73], GTR + I + G was selected as the optimal model based on the Bayesian information criterion. For the bayesian inference (BI) phylogenetic tree, the analysis was performed until the standard deviation of split frequencies was below 0.01 using MrBayes v3.1.2 [74]. Each chain was sampled every 100 generations. The first 25% of the sample was discarded as burn-in, and the rest was used to construct a consensus tree.
Divergence times of Korean endemic Saussurea species were estimated from the same dataset used for phylogenetic analysis using BEAST v2.6.2 [75] under lognormal relaxed clock. The GTR model was chosen to generate the tree. As a calibration point, the pairwise divergence time of 11.8 MYA for Carthamus and Centaurea was applied according to the data deposited in TIMETREE [76]. The Markov chain Monte Carlo chains were set to run for 16 million generations, sampling one every 2,500 generations. Tracer v.1.7 [77] was used for checking the convergence of the chains through adequate effective sample sizes (ESS). Finally, maximum clade credibility trees were calculated in TreeAnnotator v1.8.4 [78] and the summary trees with 95% highest posterior density (HPD) intervals of divergence time were visualized using Figtree v1.4.
Bayesian inference
Chloroplast DNA
DNA sequence polymorphism
Effective sample size
Highest posterior density
Inverted repeat
Non–synonymous substitution rate
Synonymous substitution rate
Large single copy
Maximum likelihood
Million years ago
Nucleotide diversity
Plastid genome
Relative synonymous codon usage
Small single copy
Simple sequence repeat
Ethics approval and consent to participated
Not applicable.
Consent for publication
Not applicable.
Availability of data and materials
Sequence data that support the findings of this study have been deposited in GenBank (https://www.ncbi.nlm.nih.gov) with the accession codes of MT478053, MN509431, MT536932, MN530094, and MN530095. The sequences will be released after the article has been published.
Competing interests
The authors declare that they have no competing interests.
Funding
This project was supported in part by the Basic Science Research Program through the National Research Foundation of Korea (NRF) (NRF-2019R1A2C2009841).
Authors’ contributions
All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by SAY. The first draft of the manuscript was written by SAY
and SCK revised it. All authors read and approved the final manuscript.