Characterization of the C. tomentella cp genome
The complete cp genome of C. tomentella is a typical circular structure with a length of 159,816 bp, including a pair of inverted repeats (IRa and IRb) region of 31,045 bp, a large single copy (LSC) region of 79,535 bp, and a small single copy (SSC) region of 18,191 bp (Fig. 1). Additionally, the GC content of the C. tomentella cp genome accounted for 38%, and the GC content in IR (42%) regions was higher than that of LSC (36.3%) and SSC (31.4%) regions. As shown in Table 1, the cp genome of C. tomentella showed 136 genes, including 92 protein-coding genes, 36 tRNA genes, and eight rRNA genes. In the IR region, there were 25 duplicated genes identified including 14 protein-coding genes (ndhB, rpl14, rpl16, rpl2, rpl22, rpl23, rps12, rps19, rps3, rps7, rps8, infA, ycf1, ycf2), 7 tRNA genes (trnA_UGC, trnI_CAU, trnI_GAU, trnL_CAA, trnN_GUU, trnR_ACG, trnV_GAC), and 4 rRNA genes (rrn16S, rrn23S, rrn4.5S, rrn5S). 14 genes with one intron (ndhA, ndhB, petB, petD, atpF, rpl16, rpl2, rps16, rpoC1, trnA_UGC, trnI_GAU, trnK_UUU, trnL_UAA, trnV_UAC) and three genes with two introns (rps12, clpP, ycf3) were also identified.
Table 1
List of genes present in the C. tomentella complete cp genome.
Category | Gene group | Gene name |
Photosynthesis | Subunits of photosystem I | psaA, psaB, psaC, psaI, psaJ |
Subunits of photosystem II | psbA, psbB, psbC, psbD, psbE, psbF, psbH, psbI, psbJ, psbK, psbL, psbM, psbN, psbT, psbZ |
Subunits of NADH dehydrogenase | ndhA*, ndhB*(2), ndhC, ndhD, ndhE, ndhF, ndhG, ndhH, ndhI, ndhJ, ndhK |
Subunits of cytochrome b/f complex | petA, petB*, petD*, petG, petL, petN |
Subunits of ATP synthase | atpA, atpB, atpE, atpF*, atpH, atpI |
Large subunit of rubisco | rbcL |
Self-replication | Proteins of large ribosomal subunit | rpl14(2), rpl16*(2), rpl2*(2), rpl20, rpl22(2), rpl23(2), rpl32, rpl33, rpl36 |
Proteins of small ribosomal subunit | rps11, rps12**(2), rps14, rps15, rps16*, rps18, rps19(2), rps2, rps3(2), rps7(2), rps8(2) |
Subunits of RNA polymerase | rpoA, rpoB, rpoC1*, rpoC2 |
Ribosomal RNAs | rrn16S(2), rrn23S(2), rrn4.5S(2), rrn5S(2) |
Transfer RNAs | trnA_UGC*(2), trnC_GCA, trnD_GUC, trnE_UUC, trnF_GAA, trnG_GCC, trnG_UCC*, trnH_GUG, trnI_CAU(2), trnI_GAU*(2), trnK_UUU*, trnL_CAA(2), trnL_UAA*, trnL_UAG, trnM_CAU, trnN_GUU(2), trnP_UGG, trnQ_UUG, trnR_ACG(2), trnR_UCU, trnS_GCU, trnS_GGA, trnS_UGA, trnT_GGU, trnV_GAC(2), trnV_UAC*, trnW_CCA, trnY_GUA, trnfM_CAU |
Other genes | Maturase | matK |
Protease | clpP** |
Envelope membrane protein | cemA |
Acetyl-CoA carboxylase | accD |
c-type cytochrome synthesis gene | ccsA |
Translation initiation factor | infA(2) |
Genes of unknown function | Conserved hypothetical chloroplast ORF | ycf1(2), ycf2(2), ycf3**, ycf4 |
Notes: Gene*:Gene with one intron; Gene**: Gene with two introns; Gene(2): Number of copies of multi-copy genes.
Codon usage bias and Simple sequence repeat (SSR) analysis
In the cp genome of C. tomentella, 20 amino acids and one stop codon (terminator) were encoded, among which 20 amino acids and terminator were composed of 61 triplets and three triplets, respectively (Fig. 2). SSRs were mainly located in the LSC region, accounting for 63% of the total SSRs, followed by the SSC region, accounting for 34%, while the IR region was the least, only accounting for 3% (Fig. 3. A). In the LSC region, 34 mononucleotide, one dinucleotide, and trinucleotide repeats were detected; in the SSC region, 13 mononucleotide and 3 dinucleotide repeats were identified, while in the IR region, only two mononucleotide repeats were found (Fig. 3. B-D). The distribution of the number of SSRs in the LSC, SSC, and IR regions of the cp genome of C. tomentella might be related to their genetic polymorphisms, but studies have been not reported now. This study showed that the SSRs of C. tomentella were mainly identified in the LSC region and composed of mononucleotide, while the lowest distribution of SSRs was observed in the IR region.
Figure 3. Analysis of Simple sequence repeats (SSRs) in the cp genome of C. tomentella. A: The proportion of SSRs in the LSC, SSC, and IR regions. B: Presence of nucleotide in the LSC regions. C: Presence of nucleotide in the IR regions. D: Presence of nucleotide in the SSC regions.
Comparison of the C. tomentella cp genome with related species
The online mVISTA program (https://genome.lbl.gov/vista/index.shtml) was employed to compare the sequence divergence of C. tomentella (ON854662), C. fruticosa (MT083932), C. tangutica (MK253446), C. aethusifolia (MK253462), C. henryi var. ternata (MW380948), C. trichotoma (MG952896), C. alternata (MG573152), C. montana (MT292622) and C. guniuensis (NC050373) and the nine Clematis cp genomes were drawn. It was found that almost the same genetic orders and arrangement between C. tomentella and C. fruticosa (Fig. 4). The results revealed that the genetic orders of nine Clematis species had some similarity in general, and the non-coding regions were largely divergent, while the coding regions were relatively conserved. Importantly, from the similar sequences of the cp genome of C. tomentella and C. fruticosa, we found four distinct divergent regions of C. tomentella, i.e., segments of ndhC-atpA, ndhF-rpl32, rps8-infA, and psbE-petL, which can be used as a barcode for DNA of C. tomentella or be recommended as evidence of evolutionary classification.
As shown in Fig. 5, the IR-LSC and IR-SSC boundaries of C. tomentella with the related species, C. fruticosa (MT083932), C. tangutica (MK253446), and C. aethusifolia (MK253446) were similar except C.taeguensis (MW201572). In the cp boundaries of C. tomentella, the infA gene was identified at the junction of the LSC/IRa region (17 bp from the end) due to the gene contraction and expansion. This differential gene is a typical representative of the cp genome boundary of C. tomentella and its relatives.
The sliding window analysis performed using DnaSP indicated that the highly divergent regions were observed within the cp genome of C. tomentella, C. fruticosa, C. tangutica, and C. aethusifolia (Fig. 6). The average nucleotide diversity (Pi) of the whole cp genome was 0.00212, and three highly divergent regions were identified based on regions where Eta (θ) values exceed 0.0125, which were trnG_UCC-atpA, ndhF-rpl32, and rps8-infA.
Figure 6 Sliding window analysis of cp genome of C. tomentella and three Clematis (C. fruticosa, C. tangutica, and C. aethusifolia).
Phylogenetic analysis
In this study, we aligned the complete cp genome of 29 Clematis-related species in the Ranunculaceae family and one outgroup to reveal the phylogenetic position of C. tomentella. The maximum likelihood tree suggested that C. tomentella was closely related to C. fruticosa (Fig. 7).