DOI: https://doi.org/10.21203/rs.3.rs-2403178/v1
Subtribe Swertiinae, belonging to Gentianaceae, is one of the most taxonomically difficult representatives. The intergeneric and infrageneric classification and phylogenetic relationships within Subtribe Swertiinae are controversial and unresolved.
With the aim of clarifying the circumscription of taxa within the Subtribe Swertiinae, comparative and phylogenetic analyses were conducted using 34 Subtribe Swertiinae chloroplast genomes (4 newly sequenced) representing 9 genera.
The results showed that 34 chloroplast genomes of Subtribe Swertiinae were smaller and ranged in size from 149,036 to 154,365 bp, each comprising two inverted repeat regions (size range 25,069 − 26,126 bp) that separated large single-copy (80,432 − 84,153 bp) and small single-copy (17,887 − 18,47 bp) regions, and all chloroplast genomes showed similar gene order, content, and structure. These chloroplast genomes contained 129–134 genes each, including 84–89 protein-coding genes, 30 tRNAs, and 4 rRNAs. The chloroplast genomes of Subtribe Swertiinae appeared to lose some genes, such as the rpl33, rpl2 and ycf15 genes. Nineteen hypervariable regions, including trnC-GCA-petN, trnS-GCU-trnR-UCU, ndhC-trnV-UAC, trnC-GCA-petN, psbM-trnD-GUC, trnG-GCC-trnfM-CAU, trnS-GGA-rps4, ndhC-trnV-UAC, accD-psaI, psbH-petB, rpl36-infA, rps15-ycf1, ycf3, petD, ndhF, petL, rpl20, rpl15 and ycf1, were screened, and 36–63 SSRs were identified as potential molecular markers. Positive selection analyses showed that two genes (ccsA and psbB) were proven to have high Ka/Ks ratios, indicating that chloroplast genes may have undergone positive selection in evolutionary history. Phylogenetic analysis showed that 34 Subtribe Swertiinae species formed a monophyletic clade including two evident subbranches, and Swertia was paraphyly with other related genera, which were distributed in different clades.
These results provide valuable information to elucidate the phylogeny, divergence time and evolution process of Subtribe Swertiinae.
Subtribe Swertiinae belongs to Gentianaceae, with approximately 539–565 species, and is widely distributed in alpine, temperate and alpine regions around the world but rarely in tropical and subtropical regions at low latitudes. East Asia and North America are the centers of diversification of this subtribe, with 137 species of 11 genera in China [1]. Many species of Subtribe Swertiinae, such as Halenia elliptica, Comastoma pedunculatum, Gentianopsis paludosa, Lomatogonium carinthiacum, Swertia mussotii and S. franchetiana, are the original plants of Tibetan medicine “Dida” (Zangyinchen). "Dida" is one of the most representative common medicinal materials in Tibetan medicine and has various effects, such as clearing the liver and gallbladder, diuresis, strengthening muscles and bones, and hemostasis. Clinically, it is widely used in the treatment of acute jaundice hepatitis, viral hepatitis, cholecystitis, urinary tract infection, blood disease, fall injury, dysentery, edema, influenza and other diseases. According to preliminary statistics, approximately 15% of Tibetan medicine prescription compatibility uses "Dida", such as 25 flavours of coral pill, 25 flavours of Swertia pill, Ganlu ling pill, and so on. Meanwhile, it took “Dida” as the main drug or compatibility use in the Tibetan Traditional Medicine that developed in modern times, such as Zangyin Chen tablet (capsule), Gantaishu capsule, Zangjiangzhi capsule and fluan pill. Therefore, increasing attention has been given to the plants of Subtribe Swertiinae due to their extensive pharmacological effects. However, the relationships within Subtribe Swertiinae remain poorly understood, especially between genera [2–5]. Struwe et al. (2002) [1]divided Subtribe Swertiinae into 14 genera based on morphological characters, which were accepted by later researchers[3, 6]. Subsequently, Ho and Liu (2015) [7] added two newly published genera, Lomatogoniopsis and Sinoswertia, to Subtribe Swertiinae. Therefore, Subtribe Swertiinae contains 16 genera, of which 13 are native to China, including three Chinese endemic genera. Several recent phylogenetic studies have tried but failed to resolve the relationship between 16 genera in Subtribe Swertiinae[4, 5, 8]. Moreover, current taxonomic hypotheses with regard to the relationships within and between genera of Subtribe Swertiinae rely on morphological characters and fewer fragments of chloroplast DNA (cpDNA) sequences [4–5]. Therefore, additional molecular markers are needed for phylogenetic analysis to resolve the interspecific relationships and evolutionary history of Subtribe Swertiinae.
Because the chloroplast genome is the second largest genome after the nuclear genome and the nucleotide substitution rates of chloroplasts are moderate, the chloroplast genome of plants has a significant advantage in phylogenetic studies of higher-order elements of species and other species [9–13]. In addition, comparative analysis of chloroplast genomes provides essential insights into the organization and evolutionary history of taxonomically related species [14–17]. Herein, we conducted comparative analyses of the chloroplast genome for 34 selected Subtribe Swertiinae species representing 9 genera for which complete chloroplast sequences were available (Table 1). The study objectives were to (1) identify the structure and characteristics of chloroplast genomes among Subtribe Swertiinae; (2) explore the intergeneric and interspecific relationships of Subtribe Swertiinae; and (3) estimate genes that are potentially under positive selection, negative selection or neutral evolution and that could be targeted for evolutionary studies in Gentianaceae.
Species | All length (bp) | GC (%) | LSC Length (bp) | GC (%) | SSC Length (bp) | GC (%) | IR Length (bp) | GC (%) | GenBank accession numbers | Gene number | tRNA gene number | rRNA gene number | Protein-coding gene |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Comastoma falcatum | 151,423 | 38.26 | 81,721 | 36.34 | 18,248 | 31.78 | 25,727 | 43.59 | MK331815 | 132 | 37 | 8 | 87 |
Comastoma pulmonarium | 151,595 | 38.25 | 81,919 | 36.30 | 18,280 | 31.79 | 25,698 | 43.69 | MW324577 | 130 | 37 | 8 | 85 |
Gentianopsis barbata | 151,123 | 37.85 | 82,690 | 35.80 | 17,887 | 31.77 | 25,273 | 43.34 | MZ579704 | 131 | 37 | 8 | 86 |
Gentianopsis grandis | 151,271 | 37.87 | 82,572 | 35.81 | 17,907 | 31.76 | 25,396 | 43.27 | NC_049879 | 134 | 37 | 8 | 89 |
Gentianopsis paludosa | 151,568 | 37.84 | 82,834 | 35.76 | 17,928 | 31.77 | 25,403 | 43.35 | MT921831 | 129 | 37 | 8 | 84 |
Lomatogoniopsis alpina | 150,986 | 38.13 | 81,302 | 36.22 | 18,180 | 31.35 | 25,752 | 43.54 | NC_050658 | 131 | 37 | 8 | 86 |
Lomatogonium perenne | 151,678 | 38.16 | 81,979 | 36.28 | 18,237 | 31.46 | 25,731 | 43.52 | NC_050659 | 131 | 37 | 8 | 86 |
Pterygocalyx volubilis | 154,365 | 37.87 | 84,033 | 35.87 | 18,476 | 31.65 | 25,928 | 43.34 | NC_056992 | 131 | 37 | 8 | 86 |
Veratrilla baillonii | 151,962 | 38.24 | 82,475 | 36.35 | 17,983 | 30.39 | 25,752 | 43.44 | MW872006 | 132 | 37 | 8 | 87 |
Halenia coreana | 153,198 | 38.22 | 83,252 | 36.36 | 18,372 | 32.16 | 25,787 | 43.39 | MK606372 | 134 | 37 | 8 | 89 |
Halenia elliptica | 153,305 | 38.15 | 82,767 | 36.26 | 18,286 | 32.02 | 26,126 | 43.29 | NC_050657 | 133 | 37 | 8 | 88 |
Swertia bifolia | 153,242 | 38.06 | 83,496 | 36.16 | 18,200 | 31.89 | 25,773 | 43.33 | SUB11740174 | 133 | 37 | 8 | 88 |
Swertia bimaculata | 153,751 | 38.03 | 84,156 | 36.02 | 18,089 | 32.07 | 25,753 | 43.39 | MW344296 | 134 | 37 | 8 | 89 |
Swertia cincta | 149,089 | 38.20 | 80,481 | 36.34 | 17,946 | 31.79 | 25,331 | 43.42 | MZ261898 | 133 | 37 | 8 | 88 |
Swertia cordata | 153,429 | 38.05 | 83,612 | 36.16 | 18,037 | 31.75 | 25,890 | 43.3 | NC_054359 | 133 | 37 | 8 | 88 |
Swertia dichotoma | 152,977 | 37.50 | 83,044 | 35.55 | 18,303 | 31.25 | 25,815 | 43.02 | MZ261899.1 | 132 | 37 | 8 | 87 |
Swertia dilatata | 150,057 | 38.17 | 81,310 | 36.28 | 17,887 | 31.79 | 25,430 | 43.42 | MW344298 | 132 | 37 | 8 | 87 |
Swertia diluta | 153,691 | 38.10 | 83,859 | 36.20 | 18,300 | 31.9 | 25,766 | 43.5 | NC_057681.1 | 134 | 37 | 8 | 89 |
Swertia erythrosticta | 153,039 | 38.10 | 83,372 | 36.18 | 18,249 | 31.89 | 25,709 | 43.33 | MW344299 | 133 | 37 | 8 | 88 |
Swertia franchetiana | 153,428 | 38.20 | 83,564 | 34.66 | 18,342 | 33.22 | 25, 749 | 43.28 | NC_056357 | 133 | 37 | 8 | 88 |
Swertia hispidicalyx | 149,488 | 38.19 | 80,727 | 36.30 | 17,903 | 31.81 | 25,429 | 43.42 | NC_044474 | 133 | 37 | 8 | 88 |
Swertia kouitchensis | 153,475 | 38.15 | 83,595 | 36.23 | 18,348 | 31.93 | 25,766 | 43.47 | MZ261902 | 133 | 37 | 8 | 88 |
Swertia leducii | 153,015 | 38.17 | 83,048 | 36.35 | 18,395 | 31.90 | 25,785 | 43.44 | NC_045301 | 134 | 37 | 8 | 89 |
Swertia macrosperma | 152,737 | 38.22 | 83,046 | 36.31 | 18,231 | 31.99 | 25,730 | 43.50 | MZ261903 | 133 | 37 | 8 | 88 |
Swertia multicaulis | 152,190 | 38.10 | 82,893 | 36.25 | 18,343 | 31.82 | 25,477 | 43.35 | NC_050660 | 131 | 37 | 8 | 86 |
Swertia mussotii | 153,499 | 38.16 | 83,591 | 36.23 | 18,336 | 31.95 | 25,761 | 43.50 | KU641021 | 134 | 37 | 8 | 89 |
Swertia nervosa | 153,690 | 38.12 | 83,864 | 36.25 | 18,254 | 31.82 | 25,786 | 43.37 | NC_057596 | 131 | 37 | 8 | 86 |
Swertia przewalskii | 151,079 | 38.1 | 81,780 | 33.22 | 18,193 | 33.66 | 25,553 | 42.16 | ON017794 | 133 | 37 | 8 | 88 |
Swertia pubescens | 149,036 | 38.19 | 80,432 | 36.33 | 17,936 | 31.81 | 25,334 | 43.42 | MZ261905 | 133 | 37 | 8 | 88 |
Swertia punicea | 153,448 | 38.15 | 83,535 | 36.25 | 18,345 | 31.88 | 25,784 | 43.47 | MZ261896 | 133 | 37 | 8 | 88 |
Swertia souliei | 152,804 | 38.08 | 83,195 | 36.17 | 18,105 | 31.89 | 25,752 | 43.33 | NC_052874 | 134 | 37 | 8 | 89 |
Swertia tetraptera | 152,787 | 38.1 | 83,177 | 32.18 | 18,305 | 32.18 | 25,679 | 44.38 | ON164641 | 134 | 37 | 8 | 89 |
Swertia verticillifolia | 151,682 | 38.14 | 82,623 | 36.26 | 18,335 | 31.83 | 25,362 | 43.48 | MF795137 | 134 | 37 | 8 | 89 |
Swertia wolfgangiana | 153,225 | 38.06 | 83,528 | 36.17 | 18,219 | 31.88 | 25,739 | 43.34 | MW344307 | 134 | 37 | 8 | 89 |
We collected fresh young leaves of S. tetraptera, S. franchetian, S. przewalskii and S.
bifolia from Mengyuan County of Qinghai Province (101.32′E, 37.62′N, 3,208 m), Huangzhong County of Qinghai Province (101.63′E, 36.57′N, 2,510 m), Qilian County of Qinghai Province (99.61′E, 38.83′N, 3,234 m), and Qilian County of Qinghai Province (102.22′E, 37.45′N, 3,135 m), respectively. We used silica gel to rapidly store the leaves until dried. Voucher specimens of these four species were deposited in the Qinghai-Tibetan Plateau Museum of Biology (QTPMB) with voucher numbers QHGC-2011, QHGC20190821, QHGC-2013, and QHGC-2014, respectively.
The total genomic DNA of four Swertia L. plants was extracted from dried leaves using an improved CTAB method [18] and estimated for purity and concentration using a NanoDrop 2000 microspectrophotometer. Each genomic DNA sample was broken into fragments of different lengths by ultrasound. Then, the DNA fragments were purified, the end was repaired, the 3' end was added with an A tail, and the sequencing joint was connected. After that, agarose gel electrophoresis was used to select suitably sized DNA fragments, and PCR amplification was performed to complete the preparation of the sequencing library. After qualified library quality inspection, the Illumina HiSeq platform (Beijing Biomarker Technologies Co., Ltd.) was used for 150 bp paired-end sequencing.
Raw sequencing data were transformed into sequenced reads (raw data) by performing a base calling analysis of the raw image files. SQCToolkit_v2.3.3 software [19] was used to filter the raw read data obtained by sequencing to remove low-quality regions and obtain clean reads. The results were then stored in the FASTQ format. We used the iterative organelle genome assembly pip to assemble the chloroplast genome with S. mussotii (NC_031155) serving as a reference [20]. Then, SPAdes v3.6.1 software was employed for ab novo splicing under default parameters and to generate a series of contigs [21]. Contigs larger than 1,000 bp were used for chloroplast genome assembly. Complete chloroplast genome sequences were constructed by matching and linking contigs [22] and filling the gaps after assembly using second-generation sequencing technology.
The chloroplast genomes of four Swertia L. species were annotated using the online program Geseq [23] and PGA software [24]. We compared annotations from the two methods and made final adjustments with manual in Geneious version 11.0.2 [22]. Then, we checked the initial annotation, putative starts, stops, and intron positions by comparison with homologous genes in the same genus species S. mussotii. Then, we used OGDRAW [25] software to draw circular plastid genome maps of the four Swertia L. species. Finally, the sequence data and gene annotation information of the four Swertia L. species were uploaded to the NCBI database with accession numbers NC_056357 (S. franchetiana), ON164641 (S. tetraptera), ON017794 (S. przewalskii), and SUB11740174 (S. bifolia).
We used the online MISA program [26] to detect SSRs in the chloroplast genomes of 34 species in Subtribe Swertiinae using the following parameters: mononucleotide unit repetition number ≥ 10; dinucleotide unit repetition number ≥ 5; trinucleotide unit repetition number ≥ 4; and tetranucleotide, pentanucleotide, and hexanucleotide unit repetition number ≥ 3 (Beier et al. 2017). CodonW1.4.2 software was also employed to confirm the amino acid usage frequency and relative synonymous codon usage (RSCU) [27].
We used IRscope software to visually analyze boundaries among the four main chloroplast regions (LSC/IRb/SSC/IRa) of 34 species in Subtribe Swertiinae [28]. Moreover, mauve software was used to analyze the chloroplast DNA rearrangement of the 34 species in Subtribe Swertiinae. Meanwhile, the online software mVISTA was used to compare the 34 species of Subtribe Swertiinae with the shuffle-LAGAN Mode [29]. Veratrilla baillonii was used as a reference genome. The method developed by Zhang et al. (2011) [30] was used to calculate the percentages of variable characters in the coding and noncoding regions of chloroplast genomes.
We computed the selective pressures for protein-encoding genes that were located in three regions of chloroplast genomes (LSC, SSC and one IR). Protein-encoding genes that were shared by 34 species were chosen and extracted from complete chloroplast genomes for synonymous (Ks) and nonsynonymous (Ka) substitution rate analysis. Each gene selection was forecast by taking into account the ratios of Ka/Ks, that is, Ka/Ks < 1 purifying selection, Ka/Ks = 1 neutral selection, and Ka/Ks > 1 positive selection [31]. Nonsynonymous (Ka) and synonymous (Ks) substitution rates were calculated using KaKs_Calculator 2.0 software [32] with the following settings: genetic code table 11 (bacterial and plant plastid code); method of calculation: NG.
To examine the phylogenetic relationship of 34 species of 9 genera within Subtribe Swertiinae, an evolutionary tree was constructed using G. straminea (KJ657732), Gardneria ovata (NC_065470) and Amalocalyx microlobus (NC_067035) as outgroups. Meanwhile, we used 80 shared protein-coding genes of 34 chloroplast genomes to construct a molecular phylogenetic tree. All chloroplast genome sequences and shared protein-coding gene sequences were aligned with MAFFT (version 7) [33], and phylogenetic analyses were performed according to the Bayesian inference (BI) method under the best-fit substitution model GTR + I + G selected by AIC in MrModeltest 2.3 [34] using MrBayes v3.2.1 [35]. BI analysis was run independently using four Markov Chain Monte Carlo (MCMC) chains, that is, three heated chains and one cold chain, and started with a random tree; each chain was run for 2×107 generations, sampled every 2 000 generations, and discarded the first 25% preheated (Burn-in) trees. We estimated the convergence of data runs using an average standard deviation of split frequencies (ASDSF) < 0.01 and Tracer v1.7.1[36] to check for an effective sample size (ESS) > 200. The phylogenetic tree nodes were considered well-supported when the Bayesian posterior probability (BP) of the node was ≥ 0.95.
In this study, we analyzed the chloroplast genome features and gene contents of 34 species in 9 genera from Subtribe Swertiinae (Table 1 and Table S1). All 34 chloroplast genomes of Subtribe Swertiinae demonstrated a typical quadripartite structure that was similar to the majority of angiosperm chloroplast genomes (Fig. 1). The length of the chloroplast genome of 34 species in 9 genera of Subtribe Swertiinae varied between genera and species. The chloroplast genome length of 34 species of 9 genera from Subtribe Swertiinae ranged from 149,036 (S. pubescens) to 154,365 bp (Pterygocalyx volubilis), with an average length of 152,274 bp (Table 1). The longest chloroplast genome (154,365 bp) differed from other chloroplast genomes in Subtribe Swertiinae by 0.614–5.329 kb. All complete chloroplast genomes were made up of four parts, containing an LSC region (80,432 − 84,153 bp), an SSC region (17,887 − 18,476 bp), and two IR regions (25,069 − 26,126 bp). The GC content of the 34 species was very similar in both the whole chloroplast genome (37.5%-38.26%) and the corresponding regions (LSC [32.18%-36.36%], SSC [30.39%-33.66%], and IR [42.16%-43.38%]), with the IR regions having the highest GC contents (Table 1).
The chloroplast genome gene contents of 34 species in 9 genera from Subtribe Swertiinae showed a slight change. The chloroplast genome gene contents of 34 species in 9 genera from Subtribe Swertiinae ranged from 129 (Gentianopsis paludosa) to 134 (G. grandis, H. coreana, S. bimaculate, S. diluta, S. leducii, S. mussotii, S. souliei, S. tetraptera, S. verticillifolia and S. wolfgangiana) (Table 1). Accordingly, the number of protein-coding genes also varied, ranging from 84 to 89. However, the number of tRNA genes (37) and rRNA genes were relatively conserved among species (Table S1). Among these protein-coding genes, four pseudogenes (rps16, infA, ycf1 and rps19 genes) were found. Except for the lack of the rpl33 gene in the chloroplast genomes of S. dilatate, S. hispidicalyx, P. volubilis and C. pulmonarium, the rpl2 gene in the chloroplast genome of C. falcatum and the ycf15 gene in the chloroplast genome of G. paludosa, gene content differences were caused by four pseudogenes. For example, due to the lack of rps16, ycf1 and rps19 pseudogenes, the chloroplast genome of Lomatogoniopsis alpina contained 131 genes (Table S1). Among all the genes, 18 genes (trnK-UUU、rps16、trnG-UCC、atpF、rpoC1、ycf3、trnL-UAA、trnV-UAC、rps12、clpP、petB、petD、rpl16、rpl2、ndhB、trnI-GAU、trnA-UGC、ndhA) in H. elliptica, Veratrilla baillonii and S. punicea contained only one intron, while 17 genes (rps16 gene was absent or does not contain intron) in remaining 31 species of Subtribe Swertiinae contained one intron. Two protein-coding genes (ycf3 and clpP) in all 34 species chloroplast genomes contained two introns (Table S2).
The functions of major genes in the chloroplast genome of Subtribe Swertiinae could be roughly divided into three categories (Table 2): photosynthesis-related genes, chloroplast self-replication-related genes and other genes. Genes associated with photosynthesis and self-replication made up the majority of the chloroplast genome.
Categroy | Group of genes | Name of genes |
---|---|---|
Photosynthesis | Photosystem I | psaA, psaB, psaC, psaI, psaJ |
Photosystem II | psbA, psbB, psbC, psbD, psbE, psbF,psbH, psbI, psbJ, psbK, psbL, psbM,psbN, psbT, psbZ | |
NADH dehydrogenase | ndhA*, ndhB*, ndhC, ndhD, ndhE, ndhF,ndhG, ndhH, ndhI, ndhJ,ndhK | |
Cytochrome b/f complex | petA, petB*, petD*, petG, petL, petN | |
ATP synthase | atpA, atpB, atpE, atpF*, atpH, atpI | |
Self-replication | Ribosomal proteins (SSU) | rps2, rps3, rps4, rps7, rps8, rps11, rps12#, rps14, rps15, rps16*, rps18, rps19 |
Ribosomal proteins (LSU) | rpl2*, rpl14, rpl16*, rpl20, rpl22, rpl23, rpl32, rpl33, rpl36 | |
Ribosomal RNAs | rrn4.51, rrn51, rrn161, rrn231 | |
Transfer RNAs | tRNA-Lys*,tRNA-Gln,tRNA-Ser,tRNA-Gly*,tRNA-Arg,tRNA-Cys,tRNA-Asp,tRNA-Tyr,tRNA-Glu, tRNA-Thr,tRNA-Ser,tRNA-Gly,tRNA-Met,tRNA-Ser,tRNA-Thr,tRNA-Leu,tRNA-Phe,tRNA-Val, tRNA-Gly,tRNA-Met,tRNA-Trp,tRNA-Pro,tRNA-Ile,tRNA-Leu*,tRNA-Val*,tRNA-His, tRNA-Ile*1, tRNA-Ala*1,tRNA-Arg1,tRNA-Asn1,tRNA-Leu,tRNA-Asn,tRNA-Arg,tRNA-Ala,tRNA-Ile,tRNA-His | |
DNA-dependent RNA polymerase | rpoA, rpoB, rpoC1*, rpoC2 | |
Other genes | Maturase | matK |
Protease | clpP** | |
Envelope membrane protein | cemA | |
Subunit acetyl-CoA-carboxylase | accd | |
c-Type cytochrome synthesis gene | ccsA | |
Genes of unkown function | Conserved open reading frames | ycf1, 2a, 3**, 4, 15 |
Note: * represents a gene with one intron, ** represents a gene with two introns, # represents trans-splice gene
The number of SSRs identified in 34 Subtribe Swertiinae chloroplast genomes ranged from 36 (S. bifolia and S. erythrosticta) to 63 (S. cordata) (Fig. 2). Six types of repeat patterns were found in SSRs, the numbers and types of which were different in 34 species chloroplast genomes in Subtribe Swertiinae. Among the mononucleotide repeats, A/T was dominant (50-82.22%), while C/G was rare (0-10.53%). Dinucleotides (1.89–11.63%), trinucleotides (4.35–19.44%) and pentanucleotides (3.92-20.00%) were found in all samples. Tetranucleotides and hexanucleotides were identified in eighteen and nine samples, respectively (Fig. 3 and Table S3).
Codon usage frequency for 34 Subtribe Swertiinae chloroplast genomes was detected based on the sequences of protein-coding genes (CDS). The number of codons of protein-coding genes in the 34 chloroplast genomes of Subtribe Swertiinae ranged from 20531 (S. tetraptera) to 26402 (H. elliptica). In all species, serine (Ser; 1075–2268 instances) was the most abundant amino acid encoded by four codons, followed by arginine (Arg; 1137–2244 instances), encoded by six codons (Table S4). In contrast, methionine and tryptophan were encoded by only one codon, with instances ranging from 219 to 610 and from 387–605, respectively, and showed no codon-biased usage (RSCU = 1). The AGA codon in arginine had the largest RSCU values (1.70–2.11), and the CUG codon in leucine had the smallest RSCU values (0.31–0.80) in 34 species chloroplast genomes. A total of 26 codons with RSCU values greater than one were identified within the 64 codons in 34 species chloroplast genomes. Twenty-three of the 26 codons with RSCU values greater than one ended with A or U, which showed the codon preferences in 34 species chloroplast genomes (Fig. 3, Table S4).
We used the online procedure mVISTA to identify the potential divergence sequences among the 34 Subtribe Swertiinae chloroplast genomes, with the chloroplast genome of V. baillonii as a reference. The structures and sequences of Subtribe Swertiinae chloroplast genomes were conserved, especially in the IR regions (Fig. 4). Meanwhile, we used DNASP software to calculate the variation rate of coding and noncoding regions. The results demonstrated that the variation rates of noncoding regions were generally higher than those of coding regions (Fig. 5). The variation in noncoding region genes ranged from 11.11–99.28%, with an average of 63.98%, whereas the variation in coding region genes ranged from 5.78–88.97%, with an average of 25.39%. Both the variation rates of coding regions and noncoding regions in the IR region were lower than those in other regions. Additionally, the noncoding intergenic regions were highly divergent, especially trnC-GCA-petN, trnS-GCU-trnR-UCU, ndhC-trnV-UAC, trnC-GCA-petN, psbM-trnD-GUC, trnG-GCC-trnfM-CAU, trnS-GGA-rps4, ndhC-trnV-UAC, accD-psaI, psbH-petB, rpl36-infA and rps15-ycf1. However, highly divergent regions were also found within protein-coding regions, such as in ycf3, petD, ndhF, petL, rpl20, rpl15 and ycf1. In addition, there were no genomic rearrangements in the alignment analysis of 34 Subtribe Swertiinae chloroplast genomes.
We calculated the nonsynonymous (Ka) and synonymous (Ks) substitution ratios for 80 protein-coding genes to estimate the selection pressure on chloroplast genes by comparing L. alpina with 33 other species in Subtribe Swertiinae. Sixty-three protein coding genes could not be calculated because of Ka or Ks = 0, demonstrating that no synonymous or nonsynonymous changes occurred. For the remaining 17 protein-coding genes, the results indicated that the mean Ka/Ks ratio between L. alpina and 33 other Subtribe Swertiinae species ranged from 0.01 (rpl14) to 2.34 (psbB) (Fig. 6). However, the Ka/Ks ratio for most genes was less than one, showing that they underwent negative selection, except for ccsA and psbB, which experienced positive selection (Ka/Ks > 1).
We used the complete chloroplast genome sequences and 80 shared protein sequences of 34 species from Subtribe Swertiinae to construct phylogenetic trees using G. straminea, G. ovata and A. microlobus as outgroups. Phylogenetic trees built with the whole chloroplast genome and CDSs have the same topology (Figure S1). The Bayesian trees demonstrated that all species in the Subtribe Swertiinae formed a monophyletic clade with high support from both Bayesian posterior probabilities (PP = 1; Fig. 7). Additionally, this well-supported clade was divided into two major clades (A and B) within Subtribe Swertiinae. Clade A was located at the base of the phylogenetic tree and was divided into two subclades (A1 and A2). The A1 subclade (P. volubilis) was sister to the A2 subclade consisting of three species of Gentianopsis and V. baillonii. Interestingly, G. paludosa did not cluster with the other two species of the same genus but clustered with V. baillonii, indicating that G. paludosa was closely related to V. baillonii. Clade B contained 29 species from the remaining 6 genera of Subtribe Swertiinae, which formed three main branches in the phylogenetic tree (B1, B2 and B4), that is, subgen. Swertia branch (B1), Gen. Halenia- Swertia dichotoma- Gen. Sinoswertia- Swertia bimaculate branch (B2) and subgen. Ophelia-Gen. Comastoma-L. alpina-L. perenne branch (B4).
We used the IRscope online website (https://irscope.shinyapps.io/irapp/) to visualize the differences in the four boundaries of the LSC, SSC, and IRs. Comparison of all Subtribe Swertiinae plastomes with three outgroups uncovered relatively stable IRs, with little expansion or contraction (Fig. 8). In these 37 plastomes, the LSC-IRa borders were located in the rps19 gene with the exception of the LSC-IRa border of L. perenne, Halenia elliptica and G. ovata. In the outgroup G. ovata, the LSC-IRa border was located within the ndhB gene, while in L. perenne, the LSC-IRa border was located in the rpl22 gene, and the LSC-IRa border had shifted 59 bp. In H. elliptica, the LSC-IRa border was located within the rpl22 gene, which had undergone contraction. The boundary of SSC-IRa was positioned in the ndhF gene, ycf1 pseudogene and the intergenic spacer region between the ycf1 pseudogene and ndhF. The exact position of the SSC-IRb border shifted 10 bp in C. falcatum, 8 bp in S. cincta, 4 bp in S. mussotii, 9 bp in S. dichotoma, 5 bp in S. przewalskii, 15 bp in S. erythrosticta, 10 bp in S. cordata and 3 bp in the outgroup A. microlobus. The SSC/IRa border in all Subtribe Swertiinae plastomes was located inside the ycf1 gene with a few exceptions, and their sequences demonstrated length variabilities among species. The IRa/LSC border in most species’ chloroplast genomes of Subtribe Swertiinae is located at the junction of the trnH gene and the rps19 pseudogene. In the L. perenne chloroplast genome, the trnH gene was included far inside the LSC region, and the rps19 pseudogene was positioned at the IRa/LSC border. In V. baillonii, L. alpina, G. paludosa, G. barbata, C. pulmonarium, S. przewalskii, S. nervosa, S. multicaulis and S. cordata chloroplast genomes, rps19 pseudogenes were lost, and the IRa/LSC border in these chloroplast genomes was positioned at the trnH gene.
Our study compared the features, content, and organization of the chloroplast genomes of 34 species in Subtribe Swertiinae, demonstrating that all of them exhibited the typical quadripartite structure found in vascular plants [37–39]. The length of the chloroplast genomes of 34 species in Subtribe Swertiinae varied from 149,036 (S. pubescens) to 154,365 bp (P. volubilis), implying that they are relatively conserved, revealing only minor differences that changed their sizes. Differences in chloroplast genome length have previously been reported within a genus and a family, such as Swertia (Gentianaceae) [40], Notopterygium (Apiaceae) [41] and Rhodiola (Crassulaceae) [42], and in the subfamily Coryloideae of Betulaceae [43]. In this study, the differences in the chloroplast genome length of 33 species in 9 genera of Subtribe Swertiinae were mainly caused by the expansion and contraction of the IR region [44].
In terms of GC content, the chloroplast genomes of 34 species in Subtribe Swertiinae had similar GC contents (37.5%-38.26%), indicating high species similarity. The GC content in the IR region (43.39%) was higher than that in the other two regions (LSC, 35.92%; SSC, 31.88%), which may be related to the presence of four rRNA sequences in these regions, e.g., rrn16, rrn23, rrn4.5, and rrn5, as previously reported in many complete chloroplast genomes of angiosperms [45].
Regarding gene estimates, we found some differences among the chloroplast genomes of 34 species in Subtribe Swertiinae. Gene numbers ranged from 129 (G. paludosa) to 134 (G. grandis, H. coreana, S. bimaculate, S. diluta, S. leducii, S. mussotii, S. souliei, S. tetraptera, S. verticillifolia and S. wolfgangiana). G. paludosa had 129 genes due to the absence of the ycf15 gene and pseudogenes rps16, rps19, infA and ycf1, while G. grandis, H. coreana, S. bimaculate, S. diluta, S. leducii, S. mussotii, S. souliei, S. tetraptera, S. verticillifolia and S. wolfgangiana contained 134 genes because of a duplication of rps19 and ycf1. In fact, duplicated rps19 and ycf1 pseudogenes have also been reported in other Gentiaceae species [46]. Additionally, there have been reports of the absence of ndh genes in other Gentiaceae species, including ndhA, ndhC, ndhG, ndhH, ndhI, ndhJ, and ndhK [46]. However, the lack of the ycf15 gene has not been reported. Thus, small changes in the content of these genes in the chloroplast genome of Subtribe Swertiinae are caused by evolutionary events of gene deletion and insertion.
Chloroplast SSRs usually show a high level of variation and are widely used in the study of polymorphism, population genetics and phylogenetics [47–49]. Our study analyzed the number of different SSR motifs in the cp genome of 34 Subtribe Swertiinae species. Compared with other angiosperms, the number of chloroplast genome SSRs (36–63) of 34 Subtribe Swertiinae species was low to medium. Among the SSRs, a large number of single nucleotide repeats were detected, in which polyA and polyT structures were major players, which was consistent with the results of previous studies [50–53]. These SSRs may be useful for subsequent interspecies genomic polymorphism and population genetics based on repeat length polymorphism. People have different views on the mechanism of most SSRs in chloroplast genomes. Slip chain mismatch and intramolecular recombination are currently considered the main mechanisms that cause most SSRs [54].
Previous studies have shown that analysis of codon bias in the chloroplast genome is helpful for understanding the origin and evolution of species [55]. In addition, the frequency of codon use is also related to gene expression. Nucleotide composition is one of the important factors affecting codon use bias. In the genome, AT and GC contents are closely related to synonymous codon use bias. In this study, most amino acids in 34 species chloroplast genomes had codon bias with a high preference (RSCU > 1), apart from methionine and tryptophan (RSCU = 1). The RSCU value of codon types ending with A or U was larger than that ending with G or C, which showed that the codon preferred bases A or U in 34 species chloroplast genomes. Similar conclusions have been made in studies of Cinnamomum camphora [56], Notopterygium [41], Phyllanthaceae [57] and others. Thus, these findings may favor further understanding of the evolutionary history of Subtribe Swertiinae, especially through natural selection and mutation pressures [39].
Although the chloroplast genome is considered to be fairly conserved in angiosperm, mutational hotspots are often found in the sequences of some closely related species. These mutational hotspots are widely used in plant phylogeny, group genetics and DNA barcode research. In this study, we identified nineteen highly variable regions with high variation rates according to DNAsp analysis, including twelve intergenic regions (trnC-GCA-petN, trnS-GCU-trnR-UCU, ndhC-trnV-UAC, trnC-GCA-petN, psbM-trnD-GUC, trnG-GCC-trnfM-CAU, trnS-GGA-rps4, ndhC-trnV-UAC, accD-psaI, psbH-petB, rpl36-infA and rps15-ycf1) and seven genes (ycf3, petD, ndhF, petL, rpl20, rpl15 and ycf1). Both large-scale studies [58] and specific case studies [59–60] have identified mutational hotspots in noncoding regions and coding regions, which can serve as markers with high resolution for phylogenies. For example, rps16-trnQ has been employed for DNA barcoding in phylogenetic studies of 12 different genera in angiosperms because it is highly variable in most plants. Additionally, compared with existing candidate genes, the ycf1 gene is more suitable for barcodes of land plants due to its more variable loci. Therefore, these highly variable regions in the chloroplast genome of Subtribe Swertiinae are expected to afford adequate genetic information to implement studies on species delimitation and the phyletic evolution of Gentiaceae.
The topologies of the ML and BI trees constructed with complete chloroplast genome sequences and shared protein-coding gene sequences were consistent, indicating that all 34 Subtribe Swertiinae species formed a monophyletic clade, which was sister to Subtribe Gentianaceae. The monophyly of Subtribe Swertiinae is therefore ascertained by chloroplast genome data, a finding consistent with previous studies [4, 5, 61]. P. volubilis was closely related to Gentianopsis and V. baillonii, which are located at the base of Subtribe Swertiinae. In the other studies, the base groups of Subtribe Swertiinae also included Obliaria, Latouchea, Bartonia and Megacodon. From the analysis of geographical distribution, the basal groups are mostly isolated monospecies or small genera containing only a few species, such as Obliaria (1 species) and Bartonia (4 species), distributed in North America. Latouchea (1 species) and Megacodon (2 species) are distributed in Southwest China and the Himalaya region. Pterygocalyx (1 species) is distributed in Asia, and Veratrilla (2 species) is distributed in southwest China, northeast India, Sikkim and Bhutan. From the perspective of morphology, except for Bartonia (no floral nectary was observed), Obliaria, Megacodon, Latouchea, Gentianopsis and Pterygocalyx all have floral nectaries at the base of the ovary, which is the same as Gentian of the outgroup and different from other genera of Subtribe Swertiinae (most species have floral nectaries on corolla lobes). Thus, nectaries at the base of the ovary may be ancestral characteristics of Subtribe Swertiinae. In terms of basal branches, the phylogeny based on the chloroplast genomes was not totally in accord with that of the study by Cao et al. (2021) [5] and Xi et al. (2014)[4]. In our study, P. volubilis was located at the base of Subtribe Swertiinae and fell within a single clade, while V. baillonii clustered with Gp. paludosa; however, Xi et al. (2014) [4] concluded that P. volubilis clustered with Gentianopsis ciliata, and V. baillonii fell within a single clade. Morphological data show that although Gentianopsis and Pterygocalyx have the same flower morphological characteristics, the plants of Gentianopsis are erect herbs, and their seeds are wingless, while the plants of Pterygocalyx are entwined herbs, and their seeds are winged. The two genera are different in morphology. Our results were consistent with those of morphology. Apart from the base groups, the chloroplast genome sequence data support the formation of three main branches of Subtribe Swertiinae in the phylogenetic tree, that is, subgen. Swertia branch (B1), Gen. Halenia- S. dichotoma- Gen. Sinoswertia- S. bimaculate branch (B2) and subgen. Ophelia-Gen. Comastoma-L. alpina-L. perenne branch (B4). The results of our study and previous studies have shown that Swertia was paraphyly with other related genera, which were distributed in different clades. Therefore, Swertia is presumed to be the main group of Subtribe Swertiinae, and other related genera are derived from Swertia, which are either monophyletic or paraphyly. Although the results of this study provided a new perspective on the intergenic and interspecies relationships of Subtribe Swertiinae, only 34 species were included in our study, and more sampling is needed to construct the phylogeny to better infer the phylogenetic relationships within Subtribe Swertiinae.
Synonymous and nonsynonymous nucleotide substitution patterns play a major role in adaptive evolution. In Subtribe Swertiinae, we did not detect significant positive selection for the majority of genes, with only two genes (ccsA and psbB) revealing possible positive selection; these may have played a vital role in adaptive evolution in Subtribe Swertiinae. Our results were in accordance with a previous study, which showed that ccsA was under positive selection in the chloroplast genome of 15 selected plants in angiosperms [62]. psbB, encoding photosystem subunits (Table 2), plays a vital role in the life history of plants. In addition, the ccsA gene is a c-type cytochrome synthesis gene (Table 2) in plants. The cssA gene is responsible for encoding the cytochrome c synthesis protein, which has approximately 250 ~ 350 amino acids and is a membrane binding protein. The coding product of ccsA can co-form the ccsA complex with the coding protein of another gene, ccsB [63]. Xie et al. (1996) [64] believed that the ccsA gene was related to the binding of cytochrome C-heme. This provides implications for understanding the adaptive evolution of ccsA genes in angiosperms. These genes are highly correlated with physiological processes such as photosynthesis; thus, their positive selection may help Subtribe Swertiinae species quickly adapt to all kinds of environments and enable their wide global distribution.
We presented a comparative analysis of 34 plastomes from 34 Subtribe Swertiinae species and reported a comprehensive study of their phylogenetic relationships, divergence time estimation, and adaptative evolution. The phylogenetic analysis supported the monophyly of Subtribe Swertiinae, and paraphyly of Swertia with other related genera. Considerable inconsistency was observed between the molecular phylogeny and traditional classification of Halenia, Sinoswertia, Comastoma, Lomatogoniopsis and Lomatogonium. Positive selection analyses showed that two genes (ccsA and psbB) were proven to have high Ka/Ks ratios, indicating that chloroplast genes may have undergone positive selection in evolutionary history. These results provide valuable information to elucidate the phylogeny, divergence time and evolution process of Subtribe Swertiinae.
The authors declared that experimental research works on the plants described in this paper comply with institutional, national and international guidelines. Field studies were conducted in accordance with local legislation and get permissions from provincial department of forest and grass of Qinghai province. Voucher specimens of all plants were deposited at the herbarium of the QTPMB (Qinghai-Tibetan Plateau Museum of biology), Xining, Qinghai Province, China.
Not applicable.
All data generated or analyzed during this study are included in this published article and its supplementary information files. The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.
Competing interests
The authors declare no competing interests.
Funding
This work was supported by funds from the Qinghai Province Key Laboratory construction project [2022-ZJ-Y18]. The funders were not involved in the study design, data collection, and analysis, decision to publish, or manuscript preparation.
Authors' contributions
YL conceived the study, performed data analysis and drafted the manuscript; DS collected samples; ZY and DQ extracted DNA for nextgeneration sequencing; DS, ZY and DQ reviewed the manuscript critically. All authors have read and agreed with the contents of the manuscript.
Acknowledgements
We would like to thank Miss. Jingjing Li and Mr. Hongcai Yue for their help in collection of samples.