Organization of the Chloroplast Genomes of Chrysosplenium Species
The chloroplast genomes of the Chrysosplenium species contain the typical quadripartite structures (Fig. 1), which include a large single copy (LSC), two inverted repeats (IR) and a small single copy (SSC) region. The seven Chrysosplenium chloroplast genomes ranged from 151,679 bp to 153,460 bp in length (see Table 1 for details), with IRs 25,974–26,224 bps, LSCs 82,771–83,752 bps and SSCs 16,960–17,342 bps. Each Chrysosplenium chloroplast genome encoded 30 transfer RNAs (tRNAs) and 4 ribosomal RNAs (rRNAs). Each genome also includes 79 functional proteins (Table 2) encoding genes except for C. macrophyllum, C. flagelliferum, C. alternifolium, C. ramosum, which lacked rpl32. By homolog and expression analysis of chloroplast rpl32 in C. sinicum (Cp_rpl32), another homolog (Nu_rpl32) was also identified nuclear genome of C. sinicum, and the expression value (4.86 FPKMs) of Nu_rpl32 is much lower than that (2,0835.9 FPKMs) of Cp_rpl32 (Additional file 1: Supplementary Figure S1). In total, each chloroplast genome includes 113 (rpl32 present) or 112 (rpl32 loss) genes. The rps12 gene in Chrysosplenium was recognized as a trans-spliced gene, with the first exon located in the LSC region and the other one or two exons distributed in the IR regions. In addition, 17 intron-containing genes were also detected (Additional file 2: Supplementary Table S4). The chloroplast genome size and gene content neither significantly diverged between Oppositifolia and Alternifolia subgenera (Table 1) nor significantly diverged between Chrysosplenium and other genera of Saxifragaceae.
Table 1 General information and comparison of chloroplast genomes of Saxifragaceae species
Characteristic
|
C. macrophyllum
|
C. flagelliferum
|
C. alternifolium
|
C. kamtschaticum
|
C. ramosum
|
C. sinicum
|
C. aureobracteatum
|
Size (base pair, bp)
|
152,837
|
151,679
|
152,619
|
152,561
|
153,460
|
153,427
|
153,102
|
LSC length (bp)
|
83,583
|
82,771
|
83,524
|
83,175
|
83,670
|
83,745
|
83,753
|
SSC length (bp)
|
17,264
|
16,960
|
17,111
|
16,986
|
17,342
|
17,236
|
17,317
|
IR length (bp)
|
25,995
|
25,974
|
25,992
|
26,200
|
26,224
|
26,223
|
26,016
|
Number of genes
|
112
|
112
|
112
|
113
|
112
|
113
|
113
|
Protein-coding genes
|
78
|
78
|
78
|
79
|
78
|
79
|
79
|
rRNA genes
|
4
|
4
|
4
|
4
|
4
|
4
|
4
|
tRNA genes
|
30
|
30
|
30
|
30
|
30
|
30
|
30
|
LSC GC%
|
35.33
|
35.26
|
35.35
|
35.28
|
35.24
|
35.05
|
35.20
|
SSC GC%
|
31.42
|
31.37
|
31.40
|
31.46
|
31.64
|
31.27
|
31.16
|
IR GC%
|
42.89
|
42.87
|
42.86
|
42.71
|
42.69
|
42.75
|
42.85
|
Lacking gene
|
rpl32
|
rpl32
|
rpl32
|
|
rpl32
|
|
|
Table 2 Genes encoded in the C. macrophyllum chloroplast genome.
Group of Genes
|
Gene Name
|
tRNA genes
|
trnH-GUG trnK-UUU* trnQ-UUG trnS-GCU trnG-GCC* trnR-UCU trnC-GCA trnD-GUC trnY-GUA trnE-UUC trnT-GGU trnS-UGA trnG-GCC trnfM-CAU trnS-GGA trnT-UGU trnL-UAA* trnF-GAA trnV-UAC* trnM-CAU trnW-CCA trnP-UGG trnI-CAU(×2) trnL-CAA(×2) trnV-GAC(×2) trnI-GAU*(×2) trnA-UGC*(×2) trnR-ACG(×2) trnN-GUU(×2)trnL-UAG
|
rRNA genes
|
rrn16(×2) rrn23(×2) rrn4.5(×2) rrn5(×2)
|
Ribosomal small subunit
|
rps16* rps2 rps14 rps4 rps18 rps12**(×2) rps11 rps8 rps3 rps19 rps7(×2) rps15
|
Ribosomal large subunit
|
rpl33 rpl20 rpl36 rpl14 rpl16* rpl22 rpl2(×2) rpl23(×2)
|
DNA-dependent RNA polymerase
|
rpoC2 rpoC1* rpoB rpoA
|
Photosystem I
|
psaB psaA psaI psaJ psaC
|
Large subunit of rubisco
|
rbcL
|
Photosystem II
|
psbA psbK psbI psbM psbC psbZ psbG psbJ psbL psbF psbE psbB psbT psbN psbH
|
NADH dehydrogenase
|
ndhJ ndhK ndhC ndhB*(×2) ndhF ndhD ndhE ndhG ndhI ndhA* ndhH
|
Cytochrome b/f complex
|
petN petA petL petG petB* petD*
|
ATP synthase
|
atpA atpF* atpH atpI atpE atpB
|
Maturase
|
matK
|
Subunit of acetyl-CoA carboxylase
|
accD
|
Envelope membrane protein
|
cemA
|
Protease
|
clpP**
|
Translational initiation factor
|
infA
|
C-type cytochrome synthesis
|
ccsA
|
Conserved open reading frames (ycf)
|
ycf3** ycf4 ycf2(×2) ycf1(×2)
|
Genes with one or two introns are indicated by one (*) or two asterisks (**), respectively. Genes in the IR regions are followed by the (×2) symbol.
GC Content, Nucleotide Diversity, and Repeat Analysis
When we compared the total GC content of the chloroplast genomes of Chrysosplenium species with that of the chloroplast genomes of the three non-Chrysosplenium Saxifragaceae species (S. stolonifera, B. scopulosa, and O. rupifraga), we found the Chrysosplenium species have the lowest total GC contents (<37.5%) (Fig. 2 and Additional file 2: Supplementary Table S5). In addition, Chrysosplenium has the lowest GC contents (<29.7%) at the third codon position (GC3). Within the Chrysosplenium species, the GC contents in subgenus Oppositifolia were slightly lower than those in subgenus Alternifolia, regardless of the total GC contents or those in GC3.
The IR regions were more conserved than the LSC and SSC regions, with average Pi values of 0.00586 in IR regions, 0.01760 in the LSC region, and 0.01900 in the SSC region (Additional file 2: Supplementary Table S6 and Additional file 3: Supplementary Figure S2). In the LSC region, psbT has the highest Pi value of 0.22159, followed by trnG-GCC with Pi value of 0.10369.
Among the mono-, di-, tri-, tetra-, penta-, and hexa-nucleotide categories of SSRs in the chloroplast genomes of the Chrysosplenium species, mono-nucleotide repeats were the most common (Additional File 4: Supplementary Table S7 and Additional File 5: Supplementary Figure S3A) ranging from 42.42% (C. sinicum) to 61.29% (C. flagelliferum). Hexa-nucleotide repeats account for the lowest proportion of SSRs in C. ramosum, C. sinicum, and C. flagelliferum. Chrysosplenium species contained fewer SSRs than B. scopulosa and O. rupifraga. Among the four repeat types, the most common repeat type was palindromic repeats, which ranged from 53.13% in C. aureobracteatum to 42.42% in C. macrophyllum (Additional File 4: Supplementary Table S8 and Additional File 5: Supplementary Figure S3B).
Boundary Regions and Comparative Analysis
When comparing the chloroplast genomes of Chrysosplenium species, we found that IR/LSC junctions of IRb are largely located between rpl2 and rps19 (Fig. 3). Moreover, the overlap of ycf1 pseudogenes and ndhF appeared in different locations among the Chrysosplenium species: in the region of the SSC for C. ramosum, and at the IRb/SSC border for the other five species. C. alternifolium did not contain the ycf1 pseudogene. The ycf1 genes were sited at the SSC/IRa boundary and the length of ycf1 ranged from 5,402–5,546 bps. The trnH genes of the seven Chrysosplenium species were located in the LSC region, 2–19 bp away from the IRa–LSC border.
When comparing the genome boundaries of the Chrysosplenium species to the other three non-Chrysosplenium species of Saxifragaceae, ndhF was at the IRb/SSC boundary in most species of Chrysosplenium, except for C. ramosum, which showed contraction of the SSC and expansion of IRb. In addition, S. stolonifera was slightly different from the other two non-Chrysosplenium species in Saxifragaceae. In S. stolonifera, the contraction of the LSC region resulted in the rpl22 gene being at the IRb/LSC junction, which placed the whole rps19 gene in the IRb region. The rps19 pseudogenes were also found in the IRa region in S. stolonifera and B. scopulosa. When these data were combined with the phylogenetic tree of the three clades (S. stolonifera, B. scopulosa and O.rupifraga, and Chrysosplenium) inferred from whole-chloroplast protein-coding genes (Fig. 3), we found that the chloroplast genome structure within Chrysosplenium species is not strongly conserved, although the gene content is conserved.
LAGAN and Shuffle-LAGAN gave very similar results in the genetic divergence among the chloroplast genomes of Saxifragaceae species (Fig. 4 and Additional File 6: Supplementary Figure S4). The chloroplast genomes of the Chrysosplenium species were more conserved when compared with the three non-Chrysosplenium species of Saxifragaceae, and the intergenic spacer (IGS) regions had the highest levels of divergence: trnK–rps16, rps16–trnQ, rpoB–trnC, petN–psbM, trnT–psbD, psbZ–trnG, trnT–trnL, accD–psaI, ycf4–cemA, ndhF–rpl32, and rps15–ycf1. In addition, we found some highly variable coding sequences (ndhD, ycf2, ndhA, and ycf1), and the IR regions were more conserved than LSC and SSC regions in all the species tested. We also found slight difference for rpoC2, ycf2, and ycf1, which correspond to the difference between the Alternifolia and Oppositifolia subgenera.
Selective pressure analyses
We calculated the Ka/Ks ratios, the ratios of the rate of non-synonymous substitutions (Ka) to the rate of synonymous substitutions (Ks), at the species level by concatenating all of the 79 genes into a super-matrix. In Chrysosplenium species, the Ka/Ks ratios were around 0.2. This result suggested that at the whole-chloroplast protein level, the Chrysosplenium species have been subjected to a stronger purifying selection (Fig. 5, Additional File 4: Supplementary Table S9 and Additional File 4: Supplementary Table S10).
The Ka/Ks ratios were also calculated for all of the 79 protein-coding genes of the ten chloroplast genomes of Chrysosplenium separately (Fig. 6 and Additional File 4: Supplementary Table S7). Two genes (matK, ycf2) had Ka/Ks ratios around 1.0 in most species, implying possible positive selection. Specially, matK showed an average Ka/Ks ratio of 0.74 when compared with C. ramosum. Among the Chrysosplenium species, ycf2 often had a ratio higher than 0.8. Most of the other genes had a Ka/Ks ratio range from 0.1–0.3, implying strong purification (Table S10).
Sixty-six single-copy genes were used for selective pressure estimation with the branch-site model (Additional File 4: Supplementary Table S9). We found that matK was positively selected in Chrysosplenium with the p-value = 0.022 and the Bayes Empirical Bayes (BEB) posterior probability for one amino acid site (117S, from polar Ser to non-polar Val) larger than 0.972. And the gene of ycf2 was also positively selected in Chrysosplenium with p-value = 0.00003 and the BEB posterior probability for 0.953 in 1,028K (from Lys to Leu). In addition, positively selected sites were detected for 18 genes (atpB, atpE, atpF, atpI, cemA, clpP, matK, ndhC, ndhE, ndhF, ndhH, ndhK, petA, psaB, psbH, psbJ, psbN, rps14, rps16) (Fig. 7 and Additional File 4: Supplementary Table S9).
Phylogenetic Analysis
Phylogenetic analyses yielded a well-supported phylogeny of Saxifrageles with most of the nodes having maximum likelihood (ML) bootstrap support values >95 and bayesian inference (BI) posterior probabilities =1 (Fig. 8). The topologies yielded from ML analysis and BI analysis were completely identical. The topology of Saxifragales in our study was similar to the APG Ⅳ system [6] with Saxifragaceae closer to Iteaceae, phylogenetically. And, Chrysosplenium was divided into two clades corresponding to the two subgenera (Alternifolia and Oppositifolia) in our phylogenetic tree.