DOI: https://doi.org/10.21203/rs.3.rs-2481165/v1
Background Clematis tomentella 2001 (Ranunculaceae) is a typical drought-tolerant and sand-fixing plant in the desert ecosystem of northwest China. To elucidate the phylogenetic status of C. tomentella and its related species, we determined the complete chloroplast (cp) genome of C. tomentellaand analyzed their interspecific relationships.
Methods and results The complete cp genome sequence of C. tomentella was performed in this study. The result suggested that the cp genome of C. tomentella was 159,816 bp in length, including two inverted repeats of 31,045 bp, a large single copy of 79,535 bp, and a small single copy of 18,191 bp. 136 genes were annotated across the whole cp genome, including 92 protein-coding genes, 8 rRNA genes, and 36 tRNA genes, as well as the GC content accounted for 38%. Crucially, we found that the sequencing regions of psbE-petL, trnG_UCC-atpA, ndhF-rpl32, and rps8-infA were highlydivergent, which could be marked as DNA barcodes for the identification of C. tomentella in Ranunculaceae. A maximum likelihood phylogenetic tree revealed that C. tomentella was closely related to C. fruticosa.
Conclusions Our results filled the gap in cp genome sequence of C. tomentella, elucidating the taxonomic evolutionary position and relationship among Clematis. It provides the references and implications for the phylogenetic study of Clematis in Ranunculaceae in the future.
Clematis tomentella (Maximowicz) W. T. Wang & L. Q. Li (Ranunculaceae) is a rare drought-tolerant and sand-fixing shrub in the desert ecosystem of northwest China. It is distributed on fixed or semi-fixed sandy dunes and has the characteristics of sand buried and drought tolerance, and it grows well in extremely water-scarce and arid habitats [16]. C. tomentella is one of the few erect shrubs in the Clematis genus, it is characterized by having a gray-green leaf blade, linear-lanceolate, leathery, both surfaces densely appressed puberulous, base cuneate, margin entire or proximally 1- or 2-dentate or 1- or 2-lobulate, apex acute; midvein abaxially flat, lateral basal veins inconspicuous. Flowers solitary or cymes with 3 flowers, axillary or terminal, sepals 4, yellow, ascending, broadly lanceolate, long elliptic, or oblong, abaxially puberulous, adaxially glabrous or puberulous, margin narrowly winglike, velutinous on inner line, apex shortly cuspidate. Ovaries are densely pubescent; style 6–11 mm, and densely villous. Achenes compressed, long elliptic or narrowly ovate ca. 4.3×2 mm, densely pubescent; persistent style ca. 2 mm, and plumose [14]. The species has important ecological value and has been used in desert control and barren mountain reconstruction in recent years [16, 15], and has gradually become a new landscape of native green species in the aspect of garden appreciation [25]. With the increasing expansion of the ecological restoration area and the construction of urban and rural green spaces, the involvement of native plants also plays an increasingly important role [21, 36]. However, the application of this small germplasm resource has not been widely extended to more places, especially in the aspect of urban greening and ecological construction in arid or semi-arid areas.
Over the years, an increasing number of cp genomes of Clematis have been sequenced, assembled, and annotated. However, the complete cp genome of the woody C. tomentella and its phylogenetic relationship with other Clematis have not been reported. Especially in the period of continuous updating of the plant APG classification system [5], it is rather important to study the phylogenetic status and gene coding recognition of Ranunculaceae. Some of the gene segments, such as matk, accD, psbA-trnH, rps16-trnQ, etc, are commonly used as markers of plant DNA [28, 35]. Therefore, the objective of this study is to reveal the sequence assembly characteristics of the whole cp genome of C. tomentella using the latest sequencing methods and to provide a meaningful reference for the study of Clematis and its phylogenetic status.
The fresh leaves of C. tomentella were collected from the semi-fixed moving dune on the Southwest margin of Mu Us Sandy Land, Ningxia Hui Autonomous Region, China (106°28′09˝E, 37°56′43˝N), and flash-frozen with dry ice. The specimen (No. 2022CT081LY) was deposited in the herbarium of the Institute of Forestry and Grassland Ecology, Ningxia Academy of Agriculture and Forestry Science (http://www.nxaas.com.cn/, Wangsuo Liu, email: [email protected]). We have used a modified CTAB method to extract the total genomic DNA of C. tomentella in reference to the study by Stefanova et al. [34]. The total genome sequencing was conducted by Illumina HiSeq 2500 platform at Biomarker Technologies Corporation (Qingdao, China). After removing the low-quality sequences, the high-quality sequences were into contigs using SOAPdenovo v2.21 (http://soap.genomics.org.cn/), and then the contigs were aligned by the BLAST program of the NCBI website with C. fruitcosa (MT083932) as reference [1]. We assembled the cp genome by the program SPAdes3.11.0 [29].
The cpDNA of C. tomentella was annotated by Plann [17]. The online tool OGDRAW (https://chlorobox.mpimp-golm.mpg.de/OGDraw.html) was performed to generate the maps for the cp genome [13]. The cp genome sequence and the annotations have been submitted to the NCBI database with the accession number ON854662.
We used the CodonW tool to calculate the Relative Synonymous Codon Usage (RSCU) value from the protein-coding genes of C. tomentella cp whole genome [23], and then drew the figure of the RSCU with the packages “aplot” and “ggplot” in R (Version 4.2). Simple sequence repeat (SSR) analysis of C. tomentella was identified using the online tool MIcroSAtellite (https://webblast.ipk-gatersleben.de/misa) [4], and the SSRs parameters were obtained. The parameters 10, 6, 5, 5, 5, and 5 were set to represent the SSRs thresholds for mononucleotide, dinucleotide, trinucleotide, tetranucleotide, pentanucleotide, and hexanucleotide, respectively.
Comparison of the cpDNA between C. tomentella and related species
The shuffle-LAGAN model of the online tool mVISTA (http://genome.lbl.gov/vista/mvista/submit.shtml) was used to compare the cp genome of C. tomentella (ON854662) to 8 related cp genomes of C. fruticosa (MT083932), C. tangutica (MK253446), C. aethusifolia (MK253462), C. henryi var. ternata (MW380948), C. trichotoma (MG952896), C. alternata (MG573152), C. montana (MT292622) and C. guniuensis (NC050373), and to draw the genome comparison map, using the annotation of C. tomentella as a reference [12]. A comparison of the junction of LSC, SSC, and IR regions was performed using R (version 4.2.0) to activate the IRscope website (https://irscope.shinyapps.io/irapp/), and the accession numbers and annotation files of C. tomentella and its related species (C. fruticosa, C. tangutica, C. aethusifolia, and C.taeguensis) were uploaded online [3]. To calculate the nucleotide diversity and mark out the DNA divergent fragment of the related species, the cpDNA sequence differences of the 3 most closely related to C. tomentella were analyzed using the sliding window in DnaSP 5.0 software, with a step of 200 bp and a window length of 600 bp [32].
To further illustrate the phylogenetic position of C. tomentella, 29 cp genome sequences associated with C. tomentella were downloaded from the NCBI database and selected one Anemoclema glaucifolium (MK569471.1) as an outgroup. We aligned C. tomentella and the other cp genomes using MAFFT-7.037 software [19]. We used MEGA_X_10.2.4 to construct maximum likelihood trees with 1000 bootstrap replications to analyze phylogenetic status [22].
Characterization of the C. tomentella cp genome
The complete cp genome of C. tomentella is a typical circular structure with a length of 159,816 bp, including a pair of inverted repeats (IRa and IRb) region of 31,045 bp, a large single copy (LSC) region of 79,535 bp, and a small single copy (SSC) region of 18,191 bp (Fig. 1). Additionally, the GC content of the C. tomentella cp genome accounted for 38%, and the GC content in IR (42%) regions was higher than that of LSC (36.3%) and SSC (31.4%) regions. As shown in Table 1, the cp genome of C. tomentella showed 136 genes, including 92 protein-coding genes, 36 tRNA genes, and eight rRNA genes. In the IR region, there were 25 duplicated genes identified including 14 protein-coding genes (ndhB, rpl14, rpl16, rpl2, rpl22, rpl23, rps12, rps19, rps3, rps7, rps8, infA, ycf1, ycf2), 7 tRNA genes (trnA_UGC, trnI_CAU, trnI_GAU, trnL_CAA, trnN_GUU, trnR_ACG, trnV_GAC), and 4 rRNA genes (rrn16S, rrn23S, rrn4.5S, rrn5S). 14 genes with one intron (ndhA, ndhB, petB, petD, atpF, rpl16, rpl2, rps16, rpoC1, trnA_UGC, trnI_GAU, trnK_UUU, trnL_UAA, trnV_UAC) and three genes with two introns (rps12, clpP, ycf3) were also identified.
Category | Gene group | Gene name |
---|---|---|
Photosynthesis | Subunits of photosystem I | psaA, psaB, psaC, psaI, psaJ |
Subunits of photosystem II | psbA, psbB, psbC, psbD, psbE, psbF, psbH, psbI, psbJ, psbK, psbL, psbM, psbN, psbT, psbZ | |
Subunits of NADH dehydrogenase | ndhA*, ndhB*(2), ndhC, ndhD, ndhE, ndhF, ndhG, ndhH, ndhI, ndhJ, ndhK | |
Subunits of cytochrome b/f complex | petA, petB*, petD*, petG, petL, petN | |
Subunits of ATP synthase | atpA, atpB, atpE, atpF*, atpH, atpI | |
Large subunit of rubisco | rbcL | |
Self-replication | Proteins of large ribosomal subunit | rpl14(2), rpl16*(2), rpl2*(2), rpl20, rpl22(2), rpl23(2), rpl32, rpl33, rpl36 |
Proteins of small ribosomal subunit | rps11, rps12**(2), rps14, rps15, rps16*, rps18, rps19(2), rps2, rps3(2), rps7(2), rps8(2) | |
Subunits of RNA polymerase | rpoA, rpoB, rpoC1*, rpoC2 | |
Ribosomal RNAs | rrn16S(2), rrn23S(2), rrn4.5S(2), rrn5S(2) | |
Transfer RNAs | trnA_UGC*(2), trnC_GCA, trnD_GUC, trnE_UUC, trnF_GAA, trnG_GCC, trnG_UCC*, trnH_GUG, trnI_CAU(2), trnI_GAU*(2), trnK_UUU*, trnL_CAA(2), trnL_UAA*, trnL_UAG, trnM_CAU, trnN_GUU(2), trnP_UGG, trnQ_UUG, trnR_ACG(2), trnR_UCU, trnS_GCU, trnS_GGA, trnS_UGA, trnT_GGU, trnV_GAC(2), trnV_UAC*, trnW_CCA, trnY_GUA, trnfM_CAU | |
Other genes | Maturase | matK |
Protease | clpP** | |
Envelope membrane protein | cemA | |
Acetyl-CoA carboxylase | accD | |
c-type cytochrome synthesis gene | ccsA | |
Translation initiation factor | infA(2) | |
Genes of unknown function | Conserved hypothetical chloroplast ORF | ycf1(2), ycf2(2), ycf3**, ycf4 |
Notes: Gene*:Gene with one intron; Gene**: Gene with two introns; Gene(2): Number of copies of multi-copy genes.
In the cp genome of C. tomentella, 20 amino acids and one stop codon (terminator) were encoded, among which 20 amino acids and terminator were composed of 61 triplets and three triplets, respectively (Fig. 2). SSRs were mainly located in the LSC region, accounting for 63% of the total SSRs, followed by the SSC region, accounting for 34%, while the IR region was the least, only accounting for 3% (Fig. 3. A). In the LSC region, 34 mononucleotide, one dinucleotide, and trinucleotide repeats were detected; in the SSC region, 13 mononucleotide and 3 dinucleotide repeats were identified, while in the IR region, only two mononucleotide repeats were found (Fig. 3. B-D). The distribution of the number of SSRs in the LSC, SSC, and IR regions of the cp genome of C. tomentella might be related to their genetic polymorphisms, but studies have been not reported now. This study showed that the SSRs of C. tomentella were mainly identified in the LSC region and composed of mononucleotide, while the lowest distribution of SSRs was observed in the IR region.
Figure 3. Analysis of Simple sequence repeats (SSRs) in the cp genome of C. tomentella. A: The proportion of SSRs in the LSC, SSC, and IR regions. B: Presence of nucleotide in the LSC regions. C: Presence of nucleotide in the IR regions. D: Presence of nucleotide in the SSC regions.
Comparison of the C. tomentella cp genome with related species
The online mVISTA program (https://genome.lbl.gov/vista/index.shtml) was employed to compare the sequence divergence of C. tomentella (ON854662), C. fruticosa (MT083932), C. tangutica (MK253446), C. aethusifolia (MK253462), C. henryi var. ternata (MW380948), C. trichotoma (MG952896), C. alternata (MG573152), C. montana (MT292622) and C. guniuensis (NC050373) and the nine Clematis cp genomes were drawn. It was found that almost the same genetic orders and arrangement between C. tomentella and C. fruticosa (Fig. 4). The results revealed that the genetic orders of nine Clematis species had some similarity in general, and the non-coding regions were largely divergent, while the coding regions were relatively conserved. Importantly, from the similar sequences of the cp genome of C. tomentella and C. fruticosa, we found four distinct divergent regions of C. tomentella, i.e., segments of ndhC-atpA, ndhF-rpl32, rps8-infA, and psbE-petL, which can be used as a barcode for DNA of C. tomentella or be recommended as evidence of evolutionary classification.
As shown in Fig. 5, the IR-LSC and IR-SSC boundaries of C. tomentella with the related species, C. fruticosa (MT083932), C. tangutica (MK253446), and C. aethusifolia (MK253446) were similar except C.taeguensis (MW201572). In the cp boundaries of C. tomentella, the infA gene was identified at the junction of the LSC/IRa region (17 bp from the end) due to the gene contraction and expansion. This differential gene is a typical representative of the cp genome boundary of C. tomentella and its relatives.
The sliding window analysis performed using DnaSP indicated that the highly divergent regions were observed within the cp genome of C. tomentella, C. fruticosa, C. tangutica, and C. aethusifolia (Fig. 6). The average nucleotide diversity (Pi) of the whole cp genome was 0.00212, and three highly divergent regions were identified based on regions where Eta (θ) values exceed 0.0125, which were trnG_UCC-atpA, ndhF-rpl32, and rps8-infA.
Figure 6 Sliding window analysis of cp genome of C. tomentella and three Clematis (C. fruticosa, C. tangutica, and C. aethusifolia).
In this study, we aligned the complete cp genome of 29 Clematis-related species in the Ranunculaceae family and one outgroup to reveal the phylogenetic position of C. tomentella. The maximum likelihood tree suggested that C. tomentella was closely related to C. fruticosa (Fig. 7).
In this study, except for the rps12 and trnG-UCC genes, 3 genes with two introns and 12 protein-coding genes with one intron for the cp genome of C. tomentella (Fig. 1) were identical to C. guniuensis [18]. The GC content of the related species, C. henryi [38], C. fruticosa [43], and C. henryi var. ternate [6], were consistent with C. tomentella, both of which were 38%. Crucially, the GC content in LSC, SSC, and IR regions in the cp genome of C. tomentella in our study was the same as that of C. fruticosa [43].
Genetic code is the key link between proteins and nucleic acids and plays a crucial role in the transmission of biological genetic information [26]. The RSCU values detected in our study (Fig. 2) were similar to those of the cp genome of other plants, such as Sophora tonkinensis [37], Salvia plebeia [10], etc. The long sequence repeats of the cp genome, which are forward, palindromic, reverse, and complement repeats larger than 30 bp in length, promote the diversity of cp genome rearrangement and genetic diversity [44]. SSRs have been widely used in species identification and genetic diversity studies because of their high polymorphic characteristics [31]. In our study, a total of fifty-five simple sequence repeats (SSRs) were identified, including 50 (91%) mononucleotide, 4 (7%) dinucleotide, 1 (2%) trinucleotide repeats (Fig. 3). The proportion of mononucleotide repeat was the largest, with A and T base accounting for more than 90%, this is consistent with the results of some plants [37, 2].
DNA barcoding is an important basis for the evolutionary classification of plants, and it has been reported that the most variable regions of the cp genome were ycf1a, trnK, rpl32-trnL, trnH-psbA, trnS_UGA-trnG_UCC, petA-psbJ, rps16-trnQ, ndhC-trnV, ycf1b, ndhF, rpoB-trnC, psbE-petL, and rbcL-accD [11]. Genetic fragments of C. tomentella and its relatives, such as psbE-petL, trnG_UCC-atpA, ndhF-rpl32, and rps8-infA, which can be used as the DNA barcodes for phylogenetic identification in the present study (Fig. 4,6). Among the above fragments, trnG_UCC-atpA can be used as a unique cp genome divergent region of C. tomentella, and the evidence of phylogenetic classification. The regions of the psbE-petL, ndhF-rpl32, and rps8-infA divergence of C. tomentella are found in the cp genomes of many plants, such as Lonicera spp. [27], Musa spp. [33], Crataegus spp. [39], Thalictrum spp. [40], Rheum spp. [42], Litsea spp. [45], Papaver spp. [46], Persicaria amphibia [7], and Zephyranthes phycelloides [9], indicate these regions of sequence are a common divergent fragment of the cp genome. In this study, the IR-LSC and IR-SSC boundaries of C. tomentella were similar to the related species of C. fruticosa, C. tangutica, and C. aethusifolia (Fig. 5). IRs are an inverted repeat and conserved region of the cp genome for the plant, and the boundary contraction and expansion often reflect the evolution and size of cp genomes among species [3, 20].
The sisterly relationship between C. tomentella and C. fruticosa cp genomes may be based on the fact that they are both erect shrubs, whereas other Clematis are herbaceous lianas [14]. The ML tree in this study improved the phylogenetic studies of Clematis species (Fig. 7), such as C. fruticosa [43], C. canescens [25], C. henryi var. ternate [6], C. guniuensis [18], C. terniflora [24], C. taeguensis [30], C. henryi [38], and others [41, 8]. The result of this study provides meaningful information for the phylogenetic research of the Ranunculaceae in the future.
This study reported that the complete cp genome of C. tomentella is a typical circular form and 159,816 bp in length. A total of 136 genes were found in the cp genome of C. tomentella, including 92 protein-coding genes, 36 tRNA genes, and 8 rRNA genes, and the GC content accounted for 38%. Mostly, we found that the sequencing regions of psbE-petL, trnG_UCC-atpA, ndhF-rpl32, and rps8-infA were highly divergent, which could be marked as DNA barcodes for the identification of C. tomentella. The phylogenetic tree reflects C. tomentella has a close relationship with C. fruticosa. This result provides a reference for the phylogenetic study of Clematis in Ranunculaceae family.
Author contributions LWS and WZJ designed the research. TY and JB sampled the plant materials, TY and LWS analyzed the data and performed the experiments, LWS wrote the manuscript.
Acknowledgments This study was financially supported by the China Forage and Grass Research System (CARS-34), Special Project for Youth Top-notch Talents of Ningxia Hui Autonomous Region ([2017] 186), Innovative Demonstration Projects for High-quality Agricultural Development and Ecological Protection (NGSB-2021-14-6), and Natural Science Foundation of Ningxia Province (2022AAC03641, 2020AAC03273).
Data availability statement The data is openly available in the NCBI database at https://www.ncbi.nlm.nih.gov/, accession number ON854662. The associated SRA, BioSample, and BioProject numbers are SRR20647951, SAMN29333046, and PRJNA852531, respectively.
Compliance with ethical standards The authors declare that they have no conflict of interest. This article does not contain any studies involving animals or human participants performed by any of the authors.