Features of Complete Chloroplast Genome and Comparative Analyses
In the current study, we reported the initial complete cp genomes for two Polygonatum species, Polygonatum campanulatum and P. franchetii. Additionally, the complete cp genomes of another four species were newly sequenced (P. cyrtonema, P. filipes, P. zanlanscianense, P. sibiricum) using Illumina sequencing technology. Besides, cp genomic comparative analyses were carried out among the six species plus another three related species (P. kingianum, Heteropolygonatum alternicirrhosum, H. ginfushanicum) to understand potential genetic information of Polygonatum. The cp genome showed a typical quadripartite structure, with the length between 155,361 bp and 155,962 bp in Polygonatum, and 155,50–155,944 bp in Heteropolygonatum. The range of chloroplast genome length variation in these two species was similar to other Asparagaceae and higher plants reported previously [48–52]. And the size changes are partially caused by elongation or contraction of inverted repeat regions.
Our study revealed that gene content and gene order in the cp genome of Polygonatum and Heteropolygonatum were highly conserved, only slight variations in gene size and position were found. This result is similar to other species of Asparagaceae . All plastomes contained 132 genes comprising 86 protein-coding genes, 38 tRNA and eight rRNA. Among these genes, 18 included intron and 19 were duplicated in IR regions. One interesting finding was that one of the rps19 genes in P. sibiricum presented to be a pseudogene. Owing to its location at IR/LSC boundary, the gene lost the ability to duplicate completely. Likewise, this phenomenon has also been observed in Polygonatum verticillatum (NC_028523). Expression of the rps19 gene is relatively unstable among species of Asparagaceae, the pseudogenization of rps19 has also been reported in Behnia reticulate, Hesperaloe parviflora and Hosta ventricosa, while Camassia scilloides and Chlorophytum rhizopendulum missed this gene completely . The rps2, infA and other pseudogenes reported previously in Asparagaceae were not detected in this study [54, 55]. In addition, although there were no remarkable variations in GC content among different species, the distribution of GC content was identified as asymmetrical. The higher GC content in IRs means a more stable structure in that GC pairs include three hydrogen bonds and AT pairs have two . And this may be attributed to the presence of the four rRNA genes, which possess high-level GC nucleotide percentages. Similar results have been found in the chloroplast genomes of other angiosperms [57–59].
The pattern of codon usage is a vital genetic characteristic of the organism, related to the occurrence of mutation, selection and other molecular evolutionary phenomena . Our results demonstrated that Leucine (Leu) presented the highest frequency of all amino acids in Polygonatum campanulatum, P. filipes, P. franchetii, P. zanlanscianense, P. cyrtonema and P. sibiricum. On the contrary, cystine (Cys) was the least abundant amino acid except for stop codons, which was also found in other angiosperm taxa [23, 61]. Furthermore, The result of RSCU analysis illustrated that most codons ended with A or U when RSCU value was greater than one, likewise, most codons ended with C or G when RSCU value was less than one. This phenomenon revealed that codon usage was biased towards A and U at the third codon position in Polygonatum, which coincided with previous studies [53, 58, 62].
Long dispersed repeats are essential for the rearrangement and stability of the chloroplast genome, and relevant to copy number differences among species . Identifying their number and distribution plays a key role in genomic studies . The current study found that palindromic repeats were the most common repeat type, followed by forward repeats. Whereas complementary repeat was identified only in P. campanulatum, and P. franchetii and H. ginfushanicum did not harbor any reverse repeats. In the plastomes of the nine species reported here, the length of repeats ranging from 30–39 bp are dominant, which is commonly observed in other angiosperm lineages [49, 65, 66]. Our study also revealed that the repetitive sequences were not randomly allocated in the cp genome of the nine Polygonatum and Heteropolygonatum species, they were mainly identified in the LSC region (48.7%) and CDs (51.9%).
SSR (Simple Sequence Repeats) is a kind of significant codominant DNA molecular marker, with the advantages of high abundance, random distribution throughout the genome and ample polymorphism information [67, 68]. Therefore, it provides important insights into many fields, such as species identification, phylogeography and population genetics [69, 70]. A total of 507 SSRs were detected in the current study, with H. alternicirrhosum containing the most. Further, among the nine cp genomes of Polygonatum and its related species, six categories of SSRs were observed in total. Mononucleotide SSRs showed the highest frequency in each genome, with A/T as the predominant motif type. Similar results had been reported in numerous taxa [50, 58, 71]. By contrast, hexanucleotide SSRs were the rarest type, with only one such element being observed in P. cyrtonema and P. filipes. In addition, SSRs lying within LSC regions accounted for the majority (72.4%), which was in agreement with previous studies [62, 66]. In summary, the microsatellites identified in this study will be developed as markers for Polygonatum, and contribute to species identification and evolutionary studies of this genus in the future.
The results of multiple sequence alignment revealed the similarities of cp genome in structure, content, and order among Polygonatum and its related species. Consistent with previous reports [72–74], we also found out that no coding regions harbored more distinctive variation than coding regions in this study, and two single-copy regions exhibited higher sequence divergence than the IRs. The following seven intergenic regions, i.e. rps16-trnQ, trnS-trnG, atpF-atpH, atpH-atpI, petA-psbJ, ndhF-rpl32, rpl32-trnL and two genes, i.e. ycf1 and rpl16 were detected as the most divergent. These regions can be adopted as potential molecular markers, which is a promising issue for future research. Comparative analysis among Polygonatum and its related species discovered that the cp genomes presented highly conserved, and no interspecific or intraspecific rearrangement was detected.
Contraction and expansion in IRs regions led to variations in cp genome size, which was observed in the evolutionary history of terrestrial plants commonly . The size of IR regions was relatively similar in Polygonatum and Heteropolygonatum, ranging from 26,214 bp in H. ginfushanicum to 26,415 bp in P. zanlanscianense. Despite that, all the cp genomes showed similarity in the overall gene order and structures, several variations were identified at the junctions of IR/SC. The current study demonstrated that boundary genes in Polygonatum were mainly rpl22, rps19, trnN, ndhF, ycf1 and psbA, which is also identified with Heteropolygonatum and Hosta . It further confirms that boundary features are relatively stable across closely related species . The LSC/IRb boundary was traversed by rps19 gene in P. sibiricum, whereas the junctions located between rpl22 and rps19 in the other species. Incomplete duplication of the normal copy resulting in pseudogenization of the rps19 gene located at IRa/LSC boundary, and this phenomenon has also been reported in other taxa of Asparagaceae, such as Behnia reticulate, Hesperaloe parviflora and Hosta ventricosa . Excluding rps19, the other genes situated at SC/IR boundaries exhibited relative stability across the six Polygonatum and two Heteropolygonatum species studied in this work, and only ndhF and ycf1 had slight variations in size. The high resemblances in boundaries between SC/IR also demonstrate that all the species share the same genes. Besides, the total number of genes does not change as a result of IR contraction and expansion .
We detected trnK-UUU-rps16, trnC-GCA-petN, trnT-UGU-trnL-UAA, ccsA-ndhD and ycf1 were prominent divergent regions, with nucleotide diversity greater than 0.014. The result indicated that divergent regions located in LSC were in the majority, and the IR regions displayed relatively poor diversity, which was agree with the results of multiple sequence alignment conducted by mVISTA. The same phenomenon has been observed in many taxa [23, 65]. The regions detected in nucleotide diversity analysis might also provide additional genetic information for DNA barcodes in Polygonatum, but this required the support of further experiments.
The non-synonymous (dN) and synonymous (dS) substitution rates are beneficial in inferring the adaptive evolution of genes [24, 77]. The analysis of dN/dS was carried out owing to its popularity and reliability in quantifying selective pressure [78, 79]. Results indicated that several positively selected genes exited in Polygonatum, and these genes are relevant to photosynthesis and self-replication activities (Table S11), which has a positive effect on comprehension of the mechanisms that generate selection pressure.
Phylogenetic analysis based on complete cp genome demonstrated that both Polygonautm and Heteropolygonatum were monophyly. Coinciding with the results of previous studies [7, 13, 27], Polygonatum was composed of three major clades, sect. Verticillata, sect. Sibirica and its sister clade sect Polygonatum. In the current study, we observed that sect. Sibirica contained only one species, P. sibiricum, which was consistent with Xia and Meng’s findings [7, 27], but data from Floden  suggest that one sample of Polygonatum verticillatum was sister to Polygonatum sibicirum within sect. Sibirica. Moreover, previous studies indicated that P. verticillatum was paraphyletic, potentially as a result of its wide geographic distribution and diverse morphological variations [13, 27]. The similar result was presented in this study. One sample of P. verticillatum exhibited as sister clade to P. zanlanscianense while the other was sister to P. curvistylum + P. pratti + P. stewartianum. With similarities to previous findings , P. cyrtonema was either recovered as paraphyletic in this study given that four samples including the newly sequenced one appeared as sister to P. hunanense, while the other two samples presented being sister relationship with P. hirtum. All the clades were supported highly. It suggests that the definition of P. cyrtonema requires further study.
Another important finding was that Polygonatum franchetii was strongly supported as sister clade to P. stenophyllum. The cp genome of P. franchetii was reported for the first time in this work. Before this, only Meng’s team  reported the phylogenetic relationships included in P. franchetii using four chloroplast fragments (rbcL, psbA-trnH, trnK and trnC-petN). Regrettably, the branch structure which P. franchetii belonged to was ambiguous, making it difficult to recognize the relationship between P. franchetii and its close taxa. Furthermore, P. filipes presented sister clade to P. yunnanense plus P. nodosum within sect. Polygonatum. This finding is contrary to that of Xia et al.  who found P. filipes was sister to the clade consisting of P. inflatum + P. multiflorum + P. odoratum + P. macropodum + P. involucratum + P. acuminatifolium + P. arisanense + P. orientale + P. yunnanense + P. nodosum with high support, however, the clade composed of P. yunnanense + P. nodosum was weakly supported as sister to the rest species in the sister clade of P. filipes.
One unanticipated finding was that phylogenetic tree strongly supported Polygonatum campanulatum placed in sect Verticillata, though P. campanulatum arise alternately leaves and sect Verticillata was characterized by whorled or opposite leaves. P. campanulatum was compared to P. gongshanense and P. franchetii when it was first published, but material for P. gongshanense was not available in this work. Furthermore, phylogenetic analysis indicated that P. franchetii and P. campanulatum presented in separate branches whereas P. tessellatum + P. oppositifolium were highly supported as sisters to P. campanulatum (BS = 100, PP = 1.00). Despite P. campanulatum, P. tessellatum and P. oppositifolium sharing similar lustrous and lanceolate leaves [2, 80], they differ in leaf arrangement, filament structure and florescence, etc. In detail, P. campanulatum is characterized by alternate leaves with a retrorse spur at the filament apex and flowers in October, while P. tessellatum and P. oppositifolium differ in whorled or opposite leaves without a retrorse spur at the filament apex and flower in May [2, 80]. Moreover, previous studies discover that leaf arrangement is labile and the whorled leaves have arisen from the alternate state at least twice [7, 81]. In conclusion, we infer that phyllotaxis appears not suitable as basis for delimitation of subgeneric groups in Polygonatum. Additionally, flower color and pollen exine sculpture were also used as the characters to subgrouping Polygonatum in previous studies [7, 12, 82]. Section Polygonatum is characterized by perforated pollen exines and greenish-white or yellow perianths, whereas Verticillata presented mostly reticulate pollen exines and purple or pink perianths [7, 82]. However, P. campanulatum placed in Verticillata has perforate reticulate ornaments and yellowish-green, or greenish-white perianths . The controversy of flower color has been reported in the study of Xia and her team . From this we can see that flower color and pollen exine sculpture may be irrelated with phylogeny and not ideal as basis for subgenus classification of Polygonatum either. Moreover, further research about the information on base chromosome numbers and karyotypes of P. campanulatum requires to be undertaken. This work will contribute to a more insightful understanding of the infrageneric classification of Polygonatum and demonstrate that cp genome is an efficient tool for resolving specific level phylogeny.