Comparative chloroplast genomics of Fritillaria (Liliaceae), inferences for phylogenetic relationships between Fritillaria and Lilium and genome evolution


 Background Fritillaria is a genus consisting of about 140 species that has important medicinal and horticultural values. The monophyly of Fritillaria and phylogenetic relationships with Lilium were previously not fully resolved. The study involved the most comprehensive chloroplast genomes samples to date referring to Old and New World clades of Fritillaria as identified in earlier studies.Results We reported and compared eleven newly sequenced whole-plastome sequences of Fritillaria as well as characterization of SSRs and repeat sequence. These 11 plastomes proved highly similar in overall size (151,652-152,434bp), genome structure, gene content and order; Comparing them with other species of Liliales (6 out of 10 families) indicated the same similarity but showed some structural variations due to the contraction or expansion of the IR regions out or into of adjacent single-copy regions. A/T mononucleotides, palindromic and forward repeats were the most common types. Six hypervariable regions ( rps16 - trnQ, rbcL - accD, accD - psaI, psaJ - rpl33, petD - rpoA and rpl32 - trnL ) were discovered based on 26 Fritillaria whole-plastomes to be potential molecular markers. 26 species of Fritillaria plastomes and 21 species of Lilium plastomes were combined in a phylogenomic study with 3 Cardiocrinum species as out groups. Fritillaria was monophyly with moderate support as sister to Lilium based on 64 protein-coding genes (CDS) and the interspecific relationship within subgenus Fritillaria has strong resolution. The topology recovered from the whole plastome and single-copy gene data sets was the same as for coding gene data, but weak support for monophyly of Fritillaria .Conclusions The phylogenomic analysis reconstructed a topology that had some incongruences with previous studies. The six hypervariable regions can be used as candidate DNA barcodes for global genetic diversity detection of Fritillaria . The phylogenomic framework from this study can guide extensive genomic sampling to further discern the relationships among the Old and New World clades of Fritillaria and Lilium .

discern the relationships among the Old and New World clades of Fritillaria and Lilium .

Background
Plastid genome of angiosperm have a typical quadripartite structure with a pair of inverted repeat (IR) regions separated by a small single-copy (SSC) region and a large single-copy (LSC) region [1,2]. In contrast to mitochondrial and nuclear genomes, plastid genome is highly conserved and generally contains 110-130 distinct genes, ranging in size from 120-160 kb [3]. Although plastid genomes are reported as highly conserved sequence and structure in most angiosperms, they have been showed considerable variation in many taxa [2]. These structural alterations most typically involve contraction or extension of the IR region out or into of adjacent SC regions [1], the presence of large inversions or deletions [4,5], and gene gain and loss events (including pseudogenization) [2,6]. Some hotspots regions with single nucleotide polymorphisms could be identified and these regions may be used for species identification in terms of enough information [7]. Due to low rates of nucleotide substitutions, lack of recombination and uniparental inheritance, many plastid DNA sequences have been used for inferring plant phylogenies and population genetic analyses [8,9], such as matK, rbcL and trnH-psbA [10].
Currently, whole chloroplast genomes have been increasingly used for phylogenetic analyses and inferring phylogeographic histories, as the advent and fast development of next-generation sequencing (NGS) technologies. They can provide plenty of variable sites among their entire size for phylogenetic analyses [3]. Thus, whole chloroplast genomes show the potential for resolving evolutionary relationships and have been employed to generate highly resolved phylogenies and genetic diversity, especially in unresolved relationship of some complex taxon or at low taxonomic levels [2,[11][12][13]. Because different regions of the whole chloroplast genomes differ with their evolutionary rates, it was preferable for phylogenomic analysis by partition the genome by regions or genes and the concatenated coding genes were by far most widely used [3,14,15].
The genus Fritillaria L. (Liliaceae) contains 130 to 140 species and is mostly distributed in the temperate regions of the Northern Hemisphere, with centers of speciation and diversity may in the Qinghai-Tiber Plateau (QTP), Irano-Turanian region and the Mediterranean Basin [16,17]. The bulbs of Fritillaria have long been used as herbs in Asian countries, especially in traditional Chinese medicine, and are called "Bei-mu" in Chinese. The bulbs of Fritillaria are a botanical source for various pharmaceutically active components and have relieving cough, clearing heat, eliminating phlegm, analgesic and detoxifying effect, moreover, they have been used in some Chinese herbal formulations for treating cancers [18,19]. As the chemical compounds and pharmacological effects greatly differ among Fritillaria species [18], it is crucial to identify accurately them based on medicinal purpose. For this purpose, different barcoding such as plastid DNA (matK, rpl16, trnL-trnF) and nuclear DNA internal transcribed spacer (ITS), single nucleotide polymorphisms (SNPs) and genomic inter-simple sequence repeats (ISSRs) have been performed for identifying Fritillaria species [18,20]. Fritillaria comprises eight subgenera which are supported by molecular phylogenetic analyses and these subgenera are all monophyletic except subgenus Fritillaria [17,21]. However, the combined sequence analyses of plastid (matK, rpl16 and rbcL) have identified Fritillaria was paraphyletic and included the New World and Old World clades; Lilium was nested within Fritillaria but only with moderate support [17]. Moreover, the ITS analysis with limited species sampling resolved the monophyly of Fritillaria but only moderate support [17]. Day et al. attempted to adopt low-copy nuclear gene regions as barcoding, but none of the target regions experimented could be amplified across all species tested16]. These uncertainties about phylogenetic relationships within Fritillaria and related genus are most likely due to few genetic barcoding used in previous studies, even though confounding influences such as hybridization in the evolutionary processes cannot be excluded. So far, a total of fifteen plastid genomes of Fritillaria have been sequenced and are available on GenBank. These comparative analyses with different Fritillaria plastid genomes have provided a basic understanding of the plastid genome characteristics of this genus [22][23][24][25].
In this study, we report 11 species whole-plastome sequence data of Fritillaria through next-generation sequencing (NGS) and performed a genomic comparative analysis with 15 plastid genome sequences of Fritillaria available from Genebank. The specific purposes of this study were to: (1) comparative analysis for the plastome structural patterns and evolution of the eleven Fritillaria species; (2) identify highly divergent regions of all 26 Fritillaria plastomes as DNA barcodes for future population genetics and species identification; and (3) resolve the evolutionary relationship between two clades of Fritillaria and Lilium. Overall, this study should be helpful to further understand the plastome evolution and phylogenetic genomics of Fritillaria.

Results
Plastome features of Fritillaria and comparison with other Liliales taxa 1.02~2.24 G clean base of the eleven species of Fritillaria were obtained from Next Generation Sequencing. And the eleven plastomes ranged in size from 151,652bp (Fritillaria yuzhongensis) to 152,434bp (Fritillaria maximowiczii) ( Table 1). All these plastomes showed the typical quadripartite structure akin to other seed plants, consisting of a pair of IRs (26,232-26,574bp) separated by the SSC regions (17,310-17,684bp) and the LSC (81,424-81,976bp). The whole GC content of these 11 plastomes was very similar (36.9-37.1%). The eleven Fritillaria plastid genomes contained about 131-136 predicted functional genes, of including 85-90 protein-coding genes, 38 transfer RNA (tRNA) genes, and 8 ribosomal RNA (rRNA) genes ( Fig.1 and Table 1& S1). Six-eight protein-coding genes, eight tRNA genes, and all four rRNA genes were duplicated in the IR regions.
Among these genes, two CDS (clpP and ycf3) possessed two introns, while ten CDS genes and six tRNA genes contained a single intron (Table S1). The gene rps12 was a transspliced gene with the 5' end exon located in the LSC region and the 3' exon and intron located in the IR regions. Across the five Fritillaria plastomes ( F.crassicaulis, F.dajinensis, F.delavayi, F.sichuanica, F.unibracteata),, four pseudogenes (ψinfA, ψycf1, ψycf15 and ψycf68) were identified. In these species, ψycf15 and ψycf68 were duplicated and the two copies were located in the IR A and IR B regions, respectively. The rps19 gene at the LSC-IR A border in F.maximowiczii was also identified as a pseudogene. In addition, The F.przewalskii and F.yuzhongensis plastid genome contained two pseudogenes (ψinfA and ψycf1).. The genes ycf68, ycf15 and infA contained several internal stop codons and indicated that they may be identified as pseudogenes [9,22]. Pseudogene for ycf1 was found at the SSC-IR B junctions and has lost its protein-coding ability because of incomplete gene duplication. The similar phenomenon was also observed in the rps19 gene at the LSC-IR A border.
The IR/SC junction regions with full annotations were compared and had the almost same relative positions among the eleven Fritillaria chloroplast genomes (Fig.2). Except for F.anhuiensis, all the LSC-IR B junctions were located within the rps19 gene, resulting in the IR B region expanded by a part (28-140bp) toward the rps19 gene. In F.anhuiensis, the junction was located at the intergenic spacer region (IGS) between rps19 and trnH-GUG.
The SSC-IR A borders in the eleven plastomes were located in the ycf1 gene and part of this gene was duplicated from 1147-1230bp in the IR B . In addition, the ycf1 gene in four species (F.anhuiensis, F.monantha, F.davidii and F.maximowiczii) extended 24-122bp into the SSC region, which also had a 27-33bp overlap with ndhF. In five species (F.crassicaulis, F.dajinensis, F.delevayi, F.sichuanica, F.unibracteata),, the SSC-IR B boundary positions were located at the junction between the pseudogene ψycf1 and ndhF.
However, for the other two species (F.przewalskii and F.yuzhongensis),, the distance between ψycf1 and ndhF varied from 58bp (Fig.2). The LSC-IR A border was located at the IGS between trnH-GUG and psbA in all but one species (F.maximowiczii).. In F.maximowiczii, the junction was located in the ψrps19-psbA spacer.
Comparison of the 11 Fritillaria plastomes with 10 other species of Liliales showed obvious expansion and contraction of the IRs in several families (Fig.S1). The LSC-IR B boundary was located in the rps19 gene in most Liliales species. However, this boundary shifted into the IGS between the rpl22 and rps19 genes in Campynemataceae and it expanded a part toward the rpl22 gene in Smilacaceae. It further shifted into a part of rps3 in Melanthiaceae (Paris).. Long ψycf1 fragment located at the IR B varied from 2555 to 992bp, and the IR has expanded ~1.0-2.0kb at the SSC-IR S border in Colchicaceae and Campynemataceae compared to the most Liliales species (see Table S2).Meanwhile, the IR region had a relative contraction in Melanthiaceae (Veratrum) (Fig.S1). Notably, too, the SSC region of Campynemataceae had a different orientation (inversion) relative to the other Liliales species.

Comparative genomic analysis and divergence hotspot regions
We investigated the comprehensive sequence divergence of the 11 Fritillaria plastid genomes using mVISTA with F.cirrhosa as the reference. The aligned sequences revealed high sequence similarities, and several regions of high sequence length polymorphism were revealed (Fig.S2). As expected, IR regions exhibited comparatively fewer sequence divergence than LSC and SSC regions. We identified 130 regions (18 introns, 53 intergenic spacers, and 59 coding regions) with more than 200 bp in length from the 26 Fritillaria plastid genomes. Of the 59 protein-coding regions (CDS), nucleotide variability (Pi) for each locus ranged from 0.00036 (ndhB) to 0.0112 (rps19) and the average of 0.00453. Thereby seven regions (matK, rps16, accD, ycf4, rps19, psbH, ndhD) had remarkably high values (Pi > 0.007; Fig.3A and Table S3). For the 71 noncoding (intergenic spacer and intron) regions, Pi values ranged from 0.0000 (ycf4-cemA and trnV-rrn16) to 0.05996 (rps16-trnQ) and the average of 0.01163. Six of those regions also exhibited considerable high values (Pi > 0.02; i.e. rps16-trnQ, rbcL-accD, accD-psaI, psaJ-rpl33, petD-rpoA and rpl32-trnL; see Fig.3B and Table S4). The results proved that the IR regions had more sequence conservation than the SC regions, and the average value of Pi in the non-coding regions was more than twice as much as in the coding regions. Specifically, rbcL-accD, accD-psaI and rpl32-trnL had the highest resolution and similar topology according to the NJ trees. These six divergence hotspot regions in the noncoding regions should provide a useful resource for future phylogenetic and phylogeographic analyses as well as Fritillaria species identification.

SSRs analysis and repeat sequences
With MISA analysis, a total of 848 SSRs were identified across the 11 Fritillaria plastomes.
Moreover, most of the SSR loci were situated in the LSC region (75.8%), followed by the SSC region (19.2%) and a minimum in the IR regions (5.0%) (Table S5). In addition, we found 41 polymorphic SSRs between the eleven species of Fritillaria (excluding mononucleotide SSRs) (Table S6), which could be useful for further population genetic studies.
We identified the total of 471 repeats including 196 forward, 224 palindromic, 36 reverse and 15 complement in the eleven Fritillaria plastomes using REPUTER (Table S7). Repeats situated in homologous regions with the identical lengths were recognized as shared repeats [13,26]. Under this criterion, there were18 repeats shared by all eleven  (Table S7).

Phylogenetic analyses
Both BI and ML methods based on the 64 common CDS shared among the 52 plastomes (28 Fritillaria, 21 Lilium and 3 Cardiocrinum species) generated almost identical topologies (Fig.5). Lilium was strongly supported as a monophyletic group (ML bootstrap support, BS = 100%, posterior probability, PP = 1). However, the monophyly of Fritillaria was moderately supported (BS = 81%, PP = 0.71). Within the Old World clade of Fritillaria, relationships between five subgenera (except subgenus Korolkowia and Japonica) were fully supported with generally high support values. The phylogenetic trees in this study also indicated subgenus Fritillaria was not monophyletic which divided into two clades, and Fritillaria clade B formed three monophyletic subclades (F.pallidiflora, F. thunbergii and F.cirrhosa subclades). This was accord with our previous phylogenetic analysis based on three plastid markers [17]. Within Lilium, two strongly supported lineages formed the backbone of the phylogeny, however, the monophyly of four sections (section Sinomartagon, Martagon, Leucolirion and Pseudolirium) were not supported based on our limited samples. In addition, the topologies from the single-copy genes and the whole complete chloroplast genome sequences were similar to that from the CDS sequences, and most lineages possess high bootstrap values. Rather unexpectedly, however, the monophyly of Fritillaria was weakly supported (BS = 67%, PP = 0.52) based on 64 singlecopy genes (Fig.S3) and not recovered based on whole genome sequences (Fig.S4).

Discussion
Comparison of Fritillaria plastomes and phylogenetically informative markers Our results revealed that the whole plastome sequences newly obtained herein for 11 Fritillaria species only slightly vary in size (151,652bp-152,434bp) (Table1), and also largely similar in overall gene content, structure and order (Fig.1). The genome size in our study is accord with previously sequenced plastid genomes of Fritillaria [23][24][25]. Nonetheless, we have been able to document that the CDS genes vary in the plastomes of these eleven species, mainly there are particular (pseudo) genes: ψycf15, ψycf68 and ψinfA in some species. The pseudogenization of these genes are also detected in other angiosperm plastid genomes [7,9,26,27]. It is worth noting that there are only F.crassicaulis, F.dajinensi s, F.delavayi, F. sichuanica, F. unibracteata having ψycf15 and ψycf68 in our results, indicating a few common aspects in their evolutionary processes and functions. The absence or presence of ψycf15 and ψycf68 in other Fritillaria species has also been reported by Li et al. [25]. The same phenomenon has also been observed in Ipomoea [28]. Additionally, the functions of pseudogene ycf68 and ycf15 are equivocal in all kinds of land plants [28], for example in Podophylloideae, the ycf15 gene was detected as non-functional because of the presence of interrupted sequence where large amounts of stop codons were located [2]. The infA gene, which codes for translation initiation factor 1 and aids to assemble the translation initiation complex [29,30] was found to be a pseudogene in seven Fritillaria plastid genomes. However, it has been totally lost from the other four Fritillaria plastid genomes (Table S1). Similar phenomena have been also observed in other angiosperm plastid genomes, for instance those of Primula species, only the plastid genome of Primula poissonii contains the ψinfA [31]. Our study indicates that the pseudogenization of infA may be a synapomorphy for the F.cirrhosa subclade (Fig.5) The 11 plastomes sequenced in our study are also largely similar in overall gene content and structure when compared with some previously published plastomes in Liliales [9,26,[32][33][34][35][36][37]. However, the LSC-IR borders of Smilacaceae, Melanthiaceae (Paris) and Campynemataceae vary from those of Fritillaria (and other Liliales) by showing expansions (Fig.S1), which may assist to stabilize the structure of the entire plastome as well as the prevention of gene gain and gene loss phenomenon [2,[38][39][40]. By contrast, the SSC-IR boundaries of Melanthiaceae (Veratrum) feature contractions when compared to Fritillaria and other Liliales (Fig.S1). Our results included 6 out of 10 families in the order Liliales (Table S2) and are in accordance with those of Do et al. [33], who used 4 families in Liliaeles and Doet al.41], who used 5 families in Liliales. Wang et al. [42] suggested that trnH-rps19 clusters were located in the IR/LSC junctions within Liliales taxa, but different patterns of junction were found in Liliales from the present study (Fig.S1). Furthermore, more studies covering all of 10 families should be performed to investigate the whole evolutionary trends of plastomes in Liliales. In general, the contraction and expansion of IR regions are relatively common evolutionary events in plants and have been used as evolutionary loci for phylogenetic relationships [1,31,[43][44][45]. Li et al. reported that IR/LSC junctions expanded into rps19 and suggested the events seem to be an ancenstral symplesiomorphy of Liliaceae [26]. Here, we also found the feature but there was no obvious phylogenetic implication and further more evidences were needed using sufficient genera of Liliaceae.
The cpDNA sequences from a variety of intergenic spacers (trnL-trnF) and genes (matK, rbcL, rpl16 and atpB) have often been used to infer the phylogeny of Fritillaria [16,17,[46][47][48]. From our results, these frequently used plastid barcodes, such as atpB, rpl16 and rbcL, are among the relatively least informative genes (Fig.3). Therefore, the previous phylogenetic analysis based on these barcodes generated low resolution, especially for deep phylogenetic relationships with short internodes and fast rates, such as subgenus Fritillaria. By contrast, our sliding window analyses find six intergenic spacer regions with the highest Pi (>0.02) values from 26 Fritillaria species (Fig.3). We suggest these hotspot regions are valuable loci to understand the phylogenetic relationships for Fritillaria lineages (subgeneric levels) which have experienced rapid radiation.

Phylogenetic relationships and implications
Phylogenetic analyses of the 64 common plastid protein-coding genes generated a wellresolved phylogenetic tree (Fig.5). Lilium sisters to Fritillaria, and the monophyly of Fritillaria was recovered but only moderate support. Our study differs from the previous results which showed Fritillaira was paraphyletic and Lilium was nested within Fritillaria with moderate support based on three cpDNA fragments [17]. This difference may be attributed to our smaller sampling size. Notably, too, in our single-copy genes and the whole genome-derived phylogeny, the monophyly of Fritillaria was weakly supported (BS = 67%, PP = 0.52) (Fig.S3) and not recovered (Fig.S4). Thus, the relationship between the New World and the Old world clades of Fritillaria and Lilium still remains unsolved. But our results provide further evidence for relationships between the two clades of Fritillaria and Lilium from phylogenomic analyses based on the most complete plastid genomes sampling by far compared with those of the earlier study by Bi et al. [24] and Li et al. [25]. More extensive plastid genomic sampling should be needed for further resolution. The topologies of five subgenera of the Old World clade within Fritillaria were strongly confirmed and subgenus Fritillaria still was polyphyletic which has high resolution among species than in previous studies (Fig.5) [16,17]. Thereby, our results also prove the plastid genome to be a good tool in resolving phylogenetically difficult taxa [13].

Conclusion
Our study is the first to report the eleven whole plastomes of Fritillaria, of which Chloroplast genome assembly, annotation and structural analyses FastQC v0.11.7 was used to assess the quality of sequenced raw reads [49]. Then, we sieved the chloroplast genome related reads by mapping all raw reads to the chloroplast genome sequences downloaded from NCBI of the genera Fritillaria and Lilium (38 species).
Contigs, assembled from all related reads using SOAPdenovo2 [50], were sorted and joined into a single-draft sequence with Fritillaria cirrhosa (KF769143), Fritillaria thunbergii (NC034368), Fritillaria taipaiense (KC543997) and Lilium leucanhum (NC035590) as reference species in the software Geneious v11.0.4 [51]. Some gaps in the assembled plastomes were corrected using Sanger sequencing. The primers were designed by Primer 5.0 (Premier, Canada). Primer synthesis and the sequencing of the polymerase chain reaction products were performed by Sangon Biotech (Shanghai, China). The amplifications and primers are shown (Table S9). Annotations of the complete plastomes were conducted using Geneious v11.0.4. The draft annotation was checked and edited manually following the reference genome to accurately confirm the start/stop codons and the exon/intron borders of genes. Furthermore, the schematic diagram of the circular plastome map was generated utilizing OGDRAW [52]. The annotated 11 plastome sequences are deposited in GenBank (accession numbers MK258138-MK258148), which were reported here for the first time. The gene order and structure of 11 Fritillaria plastomes were compared using Geneious v11.0.4. We used the plastome of Fritillaira crassicaulis as a representation further to compare with plastomes of other 10 Liliales species (see Table S2).

Genome comparative analysis and identification of hypervariable regions
Chloroplast genome comparisons across the eleven Fritillaria species was performed by mVISTA program ( http://genome.lbl.gov/vista/mvista/submit.shtml) [53], using F.cirrhosa (KF769143) as the reference. For identifying hypervariable regions in Fritillaria and facilitate its utilization for future population genetic and species identification studies, multiple sequence alignments of 26 Fritillaria species (15 species were downloaded from NCBI; Table S10) were performed in MAFFT v.7 [54], and the software MEGA v.6 was used to adjust manually where necessary [55]. A sliding window analysis was conducted for the sequence alignment was subjected using DNASP v5.10 [56]  Fritillaria plastid genomes. The minimum numbers (thresholds) of repeats were 10, 5, 4, 3, 3 and 3 for mononucleotide, dinucleotides, trinucleotides, tetranucleotides, pentanucleotides, and hexanucleotides, respectively. The size and position of repeat sequences are assessed by REPuter [58], including inverted (palindromic), direct (forward), reverse and complement repeats. The constraints set for repeat identification were used: 1) 90% greater sequence identity; 2) hamming distance equal to 3; and 2) a minimum repeat size of 30bp.

Phylogenetic analyses
Phylogenetic analyses were performed for the 28 Fritillaria species (11 species sequenced here) and 21 Lilium species, using three Cardiocrinum species as outgroups based on previous studies [17] (Table S10). The analysis was performed base on an alignment of 64 protein-coding genes from the plastid genomes of the 52 species. The sequences of the 64 common CDS were extracted and aligned using MAFFT v.7 [54]. MEGA v.6 was used to adjust manually where necessary [55]. Topologies were constructed using both Bayesian inference (BI) and maximum likelihood (ML) methods. We used the Akaike Information Criterion (AIC) in JModeltest v.2.1.7 to determine the best-fitting models of nucleotide substitutions [59]. The GTR+I+G model were most suitable for both datasets. ML analyses were performed using RAxML-HPC2 [60] on XSEDE of the CIPRES Science Gateway [61], and statistical node supports were estimated via a bootstrap analysis. Bayesian inference (BI) analyses were conducted in MrBayes v3.2 [62]. Two independent Markov Chain Monte Carlo chains were run simultaneously for five million generations. The trees were sampled every 100 generations with the first 25% of calculated trees discarded as burn-in. The consensus tree was constructed from the remaining trees to estimate posterior probabilities (PPs). Additionally, the complete chloroplast genome sequences and the single-copy genes of the 52 species also were used to perform the phylogenetic analyses, respectively.