Structural features of Subtribe Swertiinae chloroplast genomes
In this study, we analyzed the chloroplast genome features and gene contents of 34 species in 9 genera from Subtribe Swertiinae (Table 1 and Table S1). All 34 chloroplast genomes of Subtribe Swertiinae demonstrated a typical quadripartite structure that was similar to the majority of angiosperm chloroplast genomes (Fig. 1). The length of the chloroplast genome of 34 species in 9 genera of Subtribe Swertiinae varied between genera and species. The chloroplast genome length of 34 species of 9 genera from Subtribe Swertiinae ranged from 149,036 (S. pubescens) to 154,365 bp (Pterygocalyx volubilis), with an average length of 152,274 bp (Table 1). The longest chloroplast genome (154,365 bp) differed from other chloroplast genomes in Subtribe Swertiinae by 0.614–5.329 kb. All complete chloroplast genomes were made up of four parts, containing an LSC region (80,432 − 84,153 bp), an SSC region (17,887 − 18,476 bp), and two IR regions (25,069 − 26,126 bp). The GC content of the 34 species was very similar in both the whole chloroplast genome (37.5%-38.26%) and the corresponding regions (LSC [32.18%-36.36%], SSC [30.39%-33.66%], and IR [42.16%-43.38%]), with the IR regions having the highest GC contents (Table 1).
The chloroplast genome gene contents of 34 species in 9 genera from Subtribe Swertiinae showed a slight change. The chloroplast genome gene contents of 34 species in 9 genera from Subtribe Swertiinae ranged from 129 (Gentianopsis paludosa) to 134 (G. grandis, H. coreana, S. bimaculate, S. diluta, S. leducii, S. mussotii, S. souliei, S. tetraptera, S. verticillifolia and S. wolfgangiana) (Table 1). Accordingly, the number of protein-coding genes also varied, ranging from 84 to 89. However, the number of tRNA genes (37) and rRNA genes were relatively conserved among species (Table S1). Among these protein-coding genes, four pseudogenes (rps16, infA, ycf1 and rps19 genes) were found. Except for the lack of the rpl33 gene in the chloroplast genomes of S. dilatate, S. hispidicalyx, P. volubilis and C. pulmonarium, the rpl2 gene in the chloroplast genome of C. falcatum and the ycf15 gene in the chloroplast genome of G. paludosa, gene content differences were caused by four pseudogenes. For example, due to the lack of rps16, ycf1 and rps19 pseudogenes, the chloroplast genome of Lomatogoniopsis alpina contained 131 genes (Table S1). Among all the genes, 18 genes (trnK-UUU、rps16、trnG-UCC、atpF、rpoC1、ycf3、trnL-UAA、trnV-UAC、rps12、clpP、petB、petD、rpl16、rpl2、ndhB、trnI-GAU、trnA-UGC、ndhA) in H. elliptica, Veratrilla baillonii and S. punicea contained only one intron, while 17 genes (rps16 gene was absent or does not contain intron) in remaining 31 species of Subtribe Swertiinae contained one intron. Two protein-coding genes (ycf3 and clpP) in all 34 species chloroplast genomes contained two introns (Table S2).
The functions of major genes in the chloroplast genome of Subtribe Swertiinae could be roughly divided into three categories (Table 2): photosynthesis-related genes, chloroplast self-replication-related genes and other genes. Genes associated with photosynthesis and self-replication made up the majority of the chloroplast genome.
Table 2
Gene composition of chloroplast genome of 33 species of 8 genus in Subtribe Swertiinae.
Categroy | Group of genes | Name of genes |
Photosynthesis | Photosystem I | psaA, psaB, psaC, psaI, psaJ |
Photosystem II | psbA, psbB, psbC, psbD, psbE, psbF,psbH, psbI, psbJ, psbK, psbL, psbM,psbN, psbT, psbZ |
NADH dehydrogenase | ndhA*, ndhB*, ndhC, ndhD, ndhE, ndhF,ndhG, ndhH, ndhI, ndhJ,ndhK |
Cytochrome b/f complex | petA, petB*, petD*, petG, petL, petN |
ATP synthase | atpA, atpB, atpE, atpF*, atpH, atpI |
Self-replication | Ribosomal proteins (SSU) | rps2, rps3, rps4, rps7, rps8, rps11, rps12#, rps14, rps15, rps16*, rps18, rps19 |
Ribosomal proteins (LSU) | rpl2*, rpl14, rpl16*, rpl20, rpl22, rpl23, rpl32, rpl33, rpl36 |
Ribosomal RNAs | rrn4.51, rrn51, rrn161, rrn231 |
Transfer RNAs | tRNA-Lys*,tRNA-Gln,tRNA-Ser,tRNA-Gly*,tRNA-Arg,tRNA-Cys,tRNA-Asp,tRNA-Tyr,tRNA-Glu, tRNA-Thr,tRNA-Ser,tRNA-Gly,tRNA-Met,tRNA-Ser,tRNA-Thr,tRNA-Leu,tRNA-Phe,tRNA-Val, tRNA-Gly,tRNA-Met,tRNA-Trp,tRNA-Pro,tRNA-Ile,tRNA-Leu*,tRNA-Val*,tRNA-His, tRNA-Ile*1, tRNA-Ala*1,tRNA-Arg1,tRNA-Asn1,tRNA-Leu,tRNA-Asn,tRNA-Arg,tRNA-Ala,tRNA-Ile,tRNA-His |
DNA-dependent RNA polymerase | rpoA, rpoB, rpoC1*, rpoC2 |
Other genes | Maturase | matK |
Protease | clpP** |
Envelope membrane protein | cemA |
Subunit acetyl-CoA-carboxylase | accd |
c-Type cytochrome synthesis gene | ccsA |
Genes of unkown function | Conserved open reading frames | ycf1, 2a, 3**, 4, 15 |
Note: * represents a gene with one intron, ** represents a gene with two introns, # represents trans-splice gene
SSR and Codon usage analysis
The number of SSRs identified in 34 Subtribe Swertiinae chloroplast genomes ranged from 36 (S. bifolia and S. erythrosticta) to 63 (S. cordata) (Fig. 2). Six types of repeat patterns were found in SSRs, the numbers and types of which were different in 34 species chloroplast genomes in Subtribe Swertiinae. Among the mononucleotide repeats, A/T was dominant (50-82.22%), while C/G was rare (0-10.53%). Dinucleotides (1.89–11.63%), trinucleotides (4.35–19.44%) and pentanucleotides (3.92-20.00%) were found in all samples. Tetranucleotides and hexanucleotides were identified in eighteen and nine samples, respectively (Fig. 3 and Table S3).
Codon usage frequency for 34 Subtribe Swertiinae chloroplast genomes was detected based on the sequences of protein-coding genes (CDS). The number of codons of protein-coding genes in the 34 chloroplast genomes of Subtribe Swertiinae ranged from 20531 (S. tetraptera) to 26402 (H. elliptica). In all species, serine (Ser; 1075–2268 instances) was the most abundant amino acid encoded by four codons, followed by arginine (Arg; 1137–2244 instances), encoded by six codons (Table S4). In contrast, methionine and tryptophan were encoded by only one codon, with instances ranging from 219 to 610 and from 387–605, respectively, and showed no codon-biased usage (RSCU = 1). The AGA codon in arginine had the largest RSCU values (1.70–2.11), and the CUG codon in leucine had the smallest RSCU values (0.31–0.80) in 34 species chloroplast genomes. A total of 26 codons with RSCU values greater than one were identified within the 64 codons in 34 species chloroplast genomes. Twenty-three of the 26 codons with RSCU values greater than one ended with A or U, which showed the codon preferences in 34 species chloroplast genomes (Fig. 3, Table S4).
Comparative genome analysis
We used the online procedure mVISTA to identify the potential divergence sequences among the 34 Subtribe Swertiinae chloroplast genomes, with the chloroplast genome of V. baillonii as a reference. The structures and sequences of Subtribe Swertiinae chloroplast genomes were conserved, especially in the IR regions (Fig. 4). Meanwhile, we used DNASP software to calculate the variation rate of coding and noncoding regions. The results demonstrated that the variation rates of noncoding regions were generally higher than those of coding regions (Fig. 5). The variation in noncoding region genes ranged from 11.11–99.28%, with an average of 63.98%, whereas the variation in coding region genes ranged from 5.78–88.97%, with an average of 25.39%. Both the variation rates of coding regions and noncoding regions in the IR region were lower than those in other regions. Additionally, the noncoding intergenic regions were highly divergent, especially trnC-GCA-petN, trnS-GCU-trnR-UCU, ndhC-trnV-UAC, trnC-GCA-petN, psbM-trnD-GUC, trnG-GCC-trnfM-CAU, trnS-GGA-rps4, ndhC-trnV-UAC, accD-psaI, psbH-petB, rpl36-infA and rps15-ycf1. However, highly divergent regions were also found within protein-coding regions, such as in ycf3, petD, ndhF, petL, rpl20, rpl15 and ycf1. In addition, there were no genomic rearrangements in the alignment analysis of 34 Subtribe Swertiinae chloroplast genomes.
Gene Selective Pressure Analysis
We calculated the nonsynonymous (Ka) and synonymous (Ks) substitution ratios for 80 protein-coding genes to estimate the selection pressure on chloroplast genes by comparing L. alpina with 33 other species in Subtribe Swertiinae. Sixty-three protein coding genes could not be calculated because of Ka or Ks = 0, demonstrating that no synonymous or nonsynonymous changes occurred. For the remaining 17 protein-coding genes, the results indicated that the mean Ka/Ks ratio between L. alpina and 33 other Subtribe Swertiinae species ranged from 0.01 (rpl14) to 2.34 (psbB) (Fig. 6). However, the Ka/Ks ratio for most genes was less than one, showing that they underwent negative selection, except for ccsA and psbB, which experienced positive selection (Ka/Ks > 1).
Phylogenetic Analysis
We used the complete chloroplast genome sequences and 80 shared protein sequences of 34 species from Subtribe Swertiinae to construct phylogenetic trees using G. straminea, G. ovata and A. microlobus as outgroups. Phylogenetic trees built with the whole chloroplast genome and CDSs have the same topology (Figure S1). The Bayesian trees demonstrated that all species in the Subtribe Swertiinae formed a monophyletic clade with high support from both Bayesian posterior probabilities (PP = 1; Fig. 7). Additionally, this well-supported clade was divided into two major clades (A and B) within Subtribe Swertiinae. Clade A was located at the base of the phylogenetic tree and was divided into two subclades (A1 and A2). The A1 subclade (P. volubilis) was sister to the A2 subclade consisting of three species of Gentianopsis and V. baillonii. Interestingly, G. paludosa did not cluster with the other two species of the same genus but clustered with V. baillonii, indicating that G. paludosa was closely related to V. baillonii. Clade B contained 29 species from the remaining 6 genera of Subtribe Swertiinae, which formed three main branches in the phylogenetic tree (B1, B2 and B4), that is, subgen. Swertia branch (B1), Gen. Halenia- Swertia dichotoma- Gen. Sinoswertia- Swertia bimaculate branch (B2) and subgen. Ophelia-Gen. Comastoma-L. alpina-L. perenne branch (B4).
IR Contraction and Expansion
We used the IRscope online website (https://irscope.shinyapps.io/irapp/) to visualize the differences in the four boundaries of the LSC, SSC, and IRs. Comparison of all Subtribe Swertiinae plastomes with three outgroups uncovered relatively stable IRs, with little expansion or contraction (Fig. 8). In these 37 plastomes, the LSC-IRa borders were located in the rps19 gene with the exception of the LSC-IRa border of L. perenne, Halenia elliptica and G. ovata. In the outgroup G. ovata, the LSC-IRa border was located within the ndhB gene, while in L. perenne, the LSC-IRa border was located in the rpl22 gene, and the LSC-IRa border had shifted 59 bp. In H. elliptica, the LSC-IRa border was located within the rpl22 gene, which had undergone contraction. The boundary of SSC-IRa was positioned in the ndhF gene, ycf1 pseudogene and the intergenic spacer region between the ycf1 pseudogene and ndhF. The exact position of the SSC-IRb border shifted 10 bp in C. falcatum, 8 bp in S. cincta, 4 bp in S. mussotii, 9 bp in S. dichotoma, 5 bp in S. przewalskii, 15 bp in S. erythrosticta, 10 bp in S. cordata and 3 bp in the outgroup A. microlobus. The SSC/IRa border in all Subtribe Swertiinae plastomes was located inside the ycf1 gene with a few exceptions, and their sequences demonstrated length variabilities among species. The IRa/LSC border in most species’ chloroplast genomes of Subtribe Swertiinae is located at the junction of the trnH gene and the rps19 pseudogene. In the L. perenne chloroplast genome, the trnH gene was included far inside the LSC region, and the rps19 pseudogene was positioned at the IRa/LSC border. In V. baillonii, L. alpina, G. paludosa, G. barbata, C. pulmonarium, S. przewalskii, S. nervosa, S. multicaulis and S. cordata chloroplast genomes, rps19 pseudogenes were lost, and the IRa/LSC border in these chloroplast genomes was positioned at the trnH gene.