2.1. Morphological Features and Chloroplast Genome Characteristics of Tetrastigma hemsleyanum
The samples of Tetrastigma hemsleyanum were collected from Zhejiang, Fujian, Jiangxi, Guangxi and Sichuan provinces, which have been indicated as the main producing zones of T. hemsleyanum crude drugs (Fig. 1A). The external morphology and microstructure of the root tubers of T. hemsleyanum from different regions were analyzed to identify the differences on morphological features. As shown in Fig. 1B, all of the root tubers from different regions exhibited similar morphological characters, including the root tuber size, the shape of elliptical or spindle, and the epidermis with tan. Moreover, most of root tubers of T. hemsleyanum showed smooth appearance, while a few of them presented folds and lenticel-like protuberances, as well as depressions (Fig. 1B). In addition, the microscopic features of the powder revealed that the cork cells, brown patches, needle crystals of calcium oxalate, starch granules and marginal orifice catheters were abundant in root tubers of T. hemsleyanum from five different regions, while cluster crystals of calcium oxalate were rarely observed in the tubers (Fig. 1C). However, the pharmacognostical analysis failed to identify significant differences on external morphology and microstructure of root tubers of T. hemsleyanum from different regions, requiring the development of alternative strategies to discriminate its geographical origin. Therefore, we sequenced the complete cp genome of T. hemsleyanum from five different regions and conducted a comparative analysis to establish and develop potential molecular approach for geographical origins traceability of T. hemsleyanum.
The complete chloroplast sequences of T. hemsleyanum from five different regions of Zhejiang, Fujian, Guangxi, Sichuan and Jiangxi Provinces, have been deposited in the GenBank database with the accession No. of MW375707 ~ MW375711. The size of the whole chloroplast genomes of T. hemsleyanum varied from 160,124 bp to 160,518 bp, with the smallest and largest T. hemsleyanum cp genome from Jiangxi and Zhejiang Province, respectively. All of the five T. hemsleyanum cp genomes exhibited a typical angiosperm circular chloroplast structure containing four regions: large single-copy region (LSC; 88,131 bp-89,298 bp), small single-copy region (SSC; 18,962 bp-18,965 bp), and a pair of inverted repeats (IR; 26126 bp-26,517 bp) (Fig. 2). A total of 112 genes, including 80 protein-coding genes, 28 tRNAs, and 4 rRNAs were identified from each genome of T. hemsleyanum from different regions (Table 1). The cp genomes showed high similarity in terms of gene contents, orders and orientations. Specifically, the overall GC contents of T. hemsleyanum from five regions revealed almost the same results in five regions, among which medicinal plant from Zhejiang, Fujian and Guangxi exhibited a GC content of 37.50%, while that from other two regions showed a result of 37.52% (Table 2). No significant differences on protein coding genes were identified in the T. hemsleyanum cp genomes from different regions, with a total length of 80022bp. There were 18 duplicated genes identified in the IR regions of T. hemsleyanum cp genome including 8 protein coding genes (rpl2, rpl23, ycf1, ycf2, ycf15, ndhB, rps12 and rps7), 7 tRNA genes (trnA-UGC, trnI-CAU, trnI-GAU, trnL-CAA, trnN-GUU, trnR-ACG and trnV-GAC) and 4 rRNA genes (rrn4.5, rrn5, rrn16, rrn23). (Table 2). Furthermore, 18 distinct genes were indicated as intron-containing genes in the cp genome of T. hemsleyanum, including 13 protein coding genes and 5 tRNA genes. All these genes exhibited a single intron, except for rps12, clpP and ycf3 which contained two introns. Moreover, it is intriguing that the location and the intron aera of rpl2 gene were diverse in T. hemsleyanum cp genomes from different genomes. The rpl2 gene of Guanxi and Zhejiang T. hemsleyanum cp genomes possessed two introns and across the junction of IRA and LSC region, which occupied in LSC region with 149bp and 223bp respectively. While the rpl2 gene of T. hemsleyanum cp genome from other three regions showed only one intron and located in IRA completely. The above results indicated the cp genomes of T. hemsleyanum from different regions were slightly different, but it is highly conserved on basic structure, genome size, gene number and total GC content.
Table 1
List of genes annotated in the chloroplast genomes of Tetrastigma hemsleyanum
Classification of genes | Gene Names | Number |
Photosystem Ⅰ | psaA, psaB, psaC, psaI, psaJ | 5 |
Photosystem Ⅱ | psbA, psbB, psbC, psbD, psbE,,psbF, psbH, psbI, psbJ, psbK, psbL, psbM, psbN, psbT, psbZ | 15 |
Cytochrome b/f complex | petA, petB, petD, petG, petL, petN | 6 |
ATP synthase | atpA, atpB, atpE, atpF, atpH, atpI | 6 |
NADH dehydrogenase | ndhA, ndhB*, ndhC, ndhD, ndhE, ndhF, ndhG, ndhH, ndhI, ndhJ, ndhK | 11 |
RubisCO large subunit | rbcL | 1 |
RNA polymerase | rpoA, rpoB, rpoC1, rpoC2 | 4 |
Ribosomal proteins (SSU) | rps2, rps3, rps4, rps7*, rps8, rps11, rps12*, rps14, rps15, rps16, rps18, rps19 | 12 |
Ribosomal proteins (LSU) | rpl2*, rpl14, rpl16, rpl20, rpl22, rpl23*, rpl32, rpl33, rpl36 | 9 |
Ribosomal RNAs | rrn 4.5*, rrn 5*, rrn 16*, rrn 23* | 4 |
Protein of unknown function | ycf1*, ycf2*, ycf3, ycf4, ycf15* | 5 |
Transfer RNAs | trnA-UGC*, trnC-GCA, trnD-GUC, trnE-UUC, trnF-GAA, trnH-GUG, trnI-CAU*, trnI-GAU*, trnK-UUU, trnL-CAA*, trnL-UAA, trnL-UAG, trnfM-CAU, trnM-CAU, trnN-GUU*, trnP-UGG, trnQ-UUG, trnR-ACG*, trnR-UCU, trnS-GCU, trnS-GGA, trnS-UGA, trnT-GGU, trnT-UGU, trnV-GAC*, trnV-UAC, trnW-CCA, trnY-GUA | 28 |
Other genes | accD, ccsA, cemA, clpP, infA, matK | 6 |
Total | | 112 |
Note: *indicate a duplicated gene |
Table 2
Statistics on the basic feature of the cp genomes of five T. hemsleyanum plants and three Vitaceae species
Characteristics | Tetrastigma hemsleyanum | Tetrastigma planicaule | Ampelopsis japonica | Vitis vinifera |
Zhejiang | Fujian | Guangxi | Sichuan | Jiangxi |
Genbank accession No. | MW375707 | MW375708 | MW375709 | MW375710 | MW375711 | MW401672 | NC_042235 | NC_007957 |
Total length (bp) | 160518 | 160152 | 160153 | 160127 | 160124 | 160323 | 161430 | 160928 |
LSC length (bp) | 89298 | 88183 | 88798 | 88131 | 88142 | 88181 | 89626 | 89147 |
SSC length (bp) | 18968 | 18965 | 18965 | 18962 | 18962 | 19096 | 18977 | 19065 |
IR length (bp) | 26126 | 26502 | 26195 | 26517 | 26510 | 26523 | 26413 | 26358 |
Gene number (bp) | 112 | 112 | 112 | 112 | 112 | 113 | 114 | 113 |
Gene number in IR regions | 19 | 19 | 19 | 19 | 19 | 18 | 19 | 18 |
Protein-coding gene number | 80 | 80 | 80 | 80 | 80 | 79 | 80 | 79 |
rRNA gene number | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
tRNA gene number | 28 | 28 | 28 | 28 | 28 | 30 | 30 | 30 |
Total GC content (%) | 37.50 | 37.50 | 37.50 | 37.52 | 37.52 | 37.49 | 37.32 | 37.40 |
To further determine the conserved and variable structures of cp genome in family Vitaceae, we conducted a comparative analysis between T. hemsleyanum plant and other species from tribe Cayratieae (Tetrastigma planicaule), tribe Ampelopsideae (Ampelopsis japonica) and tribe Viteae (Vitis vinifera). The structure of the chloroplast genome appeared to be largely conserved across the family Vitaceae, with little differences on the total genome length, gene number and GC content (Table 2). The size of the chloroplast genome varied from 160,323 bp in T. planicaule to 161,430 bp in A. japonica, and the overall GC content ranged from 37.32% (A. japonica) to 37.49% (T. planicaule). However, the types and numbers of genes coded in the cp genomes of T. planicaule, A. japonica and V. vinifera were not identical with that of T. hemsleyanum. The lack of ycf15 gene resulted in a decrease in the number of protein coding genes of T. planicaule and V. vinifera, while the protein coding gene number of A. japonica was consistent with that of T. hemsleyanum. In addition, the ycf1 gene of V. vinifera completely located in IRB region and was indicated as a pseudogene copy. Compared with the protein coding genes, more significant differences were identified on the the tRNA genes among the cp genomes from the four Vitaceae plants. The trnS-GCU gene was uniquely encoded by T. hemsleyanum, while trnG-GCC, trnG-UCC, trnV-GAU were solely encoded by other three Vitaceae species, which led to the uniqueness of T. hemsleyanum.
2.2. Comparative Analyses of Chloroplast Genome
As a link between the nucleic acids and proteins, the genetic code plays an important role in the transmission of genetic information in organisms [29]. Therefore, we analyzed the codon distribution among the protein coding genes in cp genome of T. hemsleyanum from different regions and performed a comparison analysis. The cp genomes of T. hemsleyanum from five regions exhibited almost identical protein-coding sequences, which represented a total of 26,674 codons. All of these codons belonged to 64 codon types and encoded 20 amino acids (Supplementary Fig. 1). However, the numbers of amino acid and the bias of codon usage of T. hemsleyanum cp genomes from different regions exhibited a slight disparity. Leucine was the most abundant amino acid (2774 ~ 2776 codons, 10.40%~10.41% of the total), whereas Cysteine (320 ~ 322 codons, 1.20%~1.21% of the total) showed the least abundance in the cp genome of T. hemsleyanum. Regardless of stop codons, the most commonly applied codon was AUU (1117 ~ 1118), encoding isoleucine and the least one was UGC (89 ~ 91), encoding cysteine (Supplementary Table 3). The single most striking observation to emerge from the data comparison in Supplementary Table 3 was that the codon usage patterns of T. hemsleyanum from five different regions could be divided into three types. According to the data in Supplementary Table 3, the codon usage bias of T. hemsleyanum from Jiangxi and Sichuan were exactly the same, while those from Fujian and Guangxi completely displayed the same bias, and the special one from Zhejiang exhibited a unique pattern of codon usage differing from the other regions. Further comparative analysis revealed that a total of 28 variants sites in 21 protein-coding genes of T. hemsleyanum from different regions, which led to discrepancies in codon usage preference and the number of amino acid coding (Supplementary Table 4). What stands out in the Supplementary Table 4 was that, the protein coding genes of atpB, ccsA, ycf2 and ycf1 exhibited two variable sites while accD gene displayed 3 mutation sites. The more surprising correlation was the variant sites of accD gene in T. hemsleyanum from Jiangxi and Sichuan provinces resulted in the encoding of lysine, which was obviously distinguished with methionine, glutarnine and asparagine encoded in cp genome from other three regions. Comparing with T. hemsleyanum of Sichuan, Jiangxi and Zhejiang regions, one base variation was identified in the ndhD and ycf2 genes of that from Fujian and Guangxi regions, which led to the preference of GGG and GGC, respectively (Supplementary Table 4). Taken together, these results provided important insights into the understanding of protein adaptive evolution and strategy development of identification geographical origin of T. hemsleyanum.
Previous reports have indicated that codon usage bias of chloroplast genome may be affected by selection, mutation and random drift [30, 31]. Further comparing analysis between T. hemsleyanum and other three Vitaceae species suggested the coded amino acids of T. planicaule, A. japonica and V. vinifera were identical with that of T. hemsleyanum. The numbers of codons in the cp genome of T. planicaule, A. japonica and V. vinifera were 26978, 26990 and 26124, respectively (Supplementary Table 3). On average, the most abundant amino acids in the three species were leucine (T. planicaule 2800, 10.38%; A. japonica 2724, 10.09%; V. vinifera 2803, 10.73%) whereas the least abundant amino acid was Cysteine (T. planicaule 327, 1.21%; A. japonica 308, 1.14%; V. vinifera 325, 1.24%). In addition, similar codon usage patterns were observed among the eight Vitaceae plants. As shown in Supplementary Table 3, most of the amino acid showed codon preferences in the cp genomes of Vitaceae plants. However, methionine (AUG) and tryptophan (UGG) were encoded by only one codon and exhibited no codon preferences. AGA (1.87 ~ 1.90) in arginine showed the highest RSCU value, and the lowest one was AGC (0.34 ~ 0.36) in serine. Moreover, the RSCU value for each Vitaceae species exhibited similar codon preference in the 64 codons in the CDS genes. As a result, 31 of them for each species exhibited greater preference (RSCU > 1), indicating an obvious codon bias in the amino acids. Most (29 codons) of these preferred codons among eight Vitaceae plants species ended with the nucleotide of A or U. Therefore, the investigation on codon preferences is conducive to understand the exogenous gene expression and the molecular evolution mechanisms of T. hemsleyanum in Vitaceae.
Contraction and expansion of the IR region is a common phenomenon known as ebb and flow, which could be used as effective tool for phylogenetic relationship and classification research of medicinal plants [32]. A comparison of five T. hemsleyanum plants and three Vitaceae species for borders was performed between the IRs and two single copies regions in detail. The length of the IR regions was similar among the eight Vitaceae species ranging from 26,126 bp in T. hemsleyanum (Zhejiang) to 26,523 bp in T. planicaule, with certain expansion and contraction (Fig. 3). Particularly, some notable differences were found at the boundaries among cp genomes of T. hemsleyanum from different regions. The LSC-IRb border was located within the rps19 gene in T. hemsleyanum from Jiangxi, Fujian and Sichuan province, with extending length of 96 bp, 92 bp and 103 bp to the IRB region, respectively (Fig. 3). However, the rps19 gene was completely encoded in LSC region and exhibited 290 and 216 bp distance to the junction of the LSC-IRB region in the cp genome of T. hemsleyanum from Zhejiang and Guangxi, respectively (Fig. 3). In contrast, the locating position of ycf1 gene was highly conserved at the boundary of IR/SSC region among Vitaceae plants except that of V. vinifera, which exhibited a pseudogene gene with the size of 1030 bp completely locating in IRB region (Fig. 3). The contraction and expansion of ycf1 gene at the boundary of LSC-IRB within five T. hemsleyanum plants were identical, which occupied 1,140 bp and 29 bp in IRB and SSC regions, respectively. Another interesting observation is that the overlap between ycf1 gene and the IRA region was significantly longer of other three species than that of T. hemsleyanum, which showed 1,144 bp for T. planicaule and 1,116 bp for A. japonica and V. vinifera, respectively. However, the overlap length was only 34 bp in the cp genome of T. hemsleyanum, which could be considered to be one of the reasons for the length change among these Vitaceae cp genomes.
2.3. Repeat Sequences Analysis and RNA editing Sites identification
Long repeats are significant genetic resources, playing a crucial role in genome rearrangement and intermolecular recombination [33]. As shown in Supplementary Fig. 2, the long repeat sequences detected in T. hemsleyanum cp genomes of Jiangxi and Sichuan revealed the identical results, while specimens from Zhejiang, Fuajin and Guangxi regions exhibited slightly different types and number of repeat sequences. Within the five T. hemsleyanum plants, the long repeats analysis revealed the most abundant repeats were length of 30–39, with the largest number in type of forward repeats (27–28), followed by palindrome (18–19), complemented (2) and reverse (2) repeats. These results further confirmed the high similarities on the type of repeats and certain slight variations on terms of the number and length among cp genomes of T. hemsleyanum from different regions (Supplementary Fig. 2A). However, the comparison of T. hemsleyanum with the other three Vitaceae species displayed an obvious distinction. A total of 49, 48, 40 long repeats were identified in the cp genomes of T. planicaule, A. japonica and V. vinifera respectively. In contrast with the cp genome of T. hemsleyanum, no complemented repeats were determined in the cp genome of other three Vitaceae plants. In addition, the type of reverse repeats was also lost in the cp genome of T. planicaule. Among these Vitaceae plants, most of the repeat units were short, ranging from 30 to 59 bp (Supplementary Fig. 2B).
Simple sequence repeats (SSRs) play an essential role in plant taxonomy and population genetics studies for the high polymorphism and codominance [34]. In total, 56 SSRs were identified in the cp genomes of T. hemsleyanum plants from four regions, while the species from Guangxi exhibited a SSR number of 57. The majority of SSR sequences were mononucleotide repeats (42–43), followed by dinucleotides (11) and tetranucleotides (3) (Table 3). The cp genome of T. hemsleyanum of Jiangxi and Sichuan exhibited the identical results on SSR types and numbers. However, the distinctions of SSRs in T. hemsleyanum cp genomes from the other regions were embodied in SSRs count of mononucleotide repeats (Fig. 4A). Specifically, the numbers of A/T repeats in the cp genomes of T. hemsleyanum plants from Jiangxi, Zhejiang, Fujian, Guangxi and Sichuan were 42, 41, 41, 42 and 42, respectively. In addition, the samples from Jiangxi and Sichuan showed no C/G SSR repeats in the cp genomes. These results further indicated that SSR might be a useful molecular marker for species determination of geographical origins of T. hemsleyanum. In addition, a comparative SSRs analysis conducted with three Vitaceae species revealed 55, 69 and 54 SSRs were detected in the cp genomes of T. planicaule, A. japonica and V. vinifera, respectively (Table 3). It is must mentioned that T. planicaule from Tetrastigma genus showed identical SSRs types with slight distinctions on SSR quantities (Fig. 4B). Comparing with the Tetrastigma plants, A. japonica and V. vinifera possessed lots of additional types of SSRs and repeat nuits, containing mono-(45/35), di-(13/8), tri-(4/5), tetra-(4/5) and penta-(3/1) respectively. The extra SSR sequences include unique AAT/ATT, AGC/CTG, AAG/CTT, AATC/ATTG, AGAT/ATCT, AAAAT/ATTTT and AATAT/ATATT in A. japonica cp genome and AAT/ATT, AGC/CTG, AATC/ATTG, ACAT/ATGT, AGAT/ATCT and AGGAT/ATCCT in V. vinifera cp genome, respectively (Fig. 4B). Moreover, the lack of AG/CT and AATT/AATT in both of A. japonica and V. vinifera also revealed the discrepancy of SSR loci among different genus. Among all Vitaceae species, the number of SSRs composed by A/T were significantly greater than that containing G or C, indicating that the base composition of SSRs was biased toward the application of A/T bases, which was consistent with A-T enrichment in complete chloroplast genomes [35].Taken together, these results provided important insights into understanding intrageneric and intergeneric variations within T. hemsleyanum and its relatives in Vitaceae species.
Table 3
The number and types of SSR in five T. hemsleyanum plants and three Vitaceae species
SSR type | Repeat unit | | Amount | | |
T.h. (Jiangxi) | T.h. (Zhejiang) | T.h. (Fujian) | T.h. (Guangxi) | T.h. (Sichuan) | T. planicaule | A. japonica | V. vinifera |
Mono | A/T | 42 | 41 | 41 | 42 | 42 | 39 | 45 | 34 |
C/G | / | 1 | 1 | 1 | / | / | / | 1 |
Di | AG/CT | 1 | 1 | 1 | 1 | 1 | 1 | / | / |
AT/AT | 10 | 10 | 10 | 10 | 10 | 11 | 13 | 8 |
Tri | AAT/ATT | / | / | / | / | / | / | 2 | 4 |
AGC/CTG | / | / | / | / | / | / | 1 | 1 |
AAG/CTT | / | / | / | / | / | / | 1 | / |
Tetra | AAAT/ATTT | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 1 |
AATC/ATTG | / | / | / | / | / | / | 1 | 1 |
AATT/AATT | 1 | 1 | 1 | 1 | 1 | 2 | / | / |
ACAT/ATGT | / | / | / | / | / | / | / | 1 |
AGAT/ATCT | / | / | / | / | / | / | 1 | 2 |
Penta | AGGAT/ATCCT | / | / | / | / | / | / | / | 1 |
AAAAT/ATTTT | / | / | / | / | / | / | 1 | / |
AATAT/ATATT | / | / | / | / | / | / | 2 | / |
Total | | 56 | 56 | 56 | 57 | 56 | 55 | 69 | 54 |
The RNA editing process is an essential maturation mechanism to avoid incorrect RNA mutations and is widespread in the chloroplast genome of plants [36]. In total, 71 potential RNA editing sites have been predicted in 24 protein-coding genes of the cp genome of T. hemsleyanum, which displayed no distinction in numbers of RNA editing sites and conversions of amino acids in cp genome of T. hemsleyanum from different regions. (Table 4). Among the 71 RNA editing sites, 17 codons were observed to be edited at the first nucleotide position, whereas 54 codons were identified to be edited at the second nucleotide position, and no codons were edited at both of the first and second nucleotide. All of the identified codon changes in the cp genomes of T. hemsleyanum showed C to T conversions. Especially, the ndhB gene showed the largest number of RNA editing sites (11 editing sites), followed by ndhD (8 editing sites) and ndhF (7 editing sites), while nine genes (accD, atpI, atpF, ccsA, clpP, psbE, psbF, psbL and rpl20) exhibited only one editing site in T. hemsleyanum (Table 4). The RNA edition on protein gene resulted in a total of 11 kinds of amino acid conversions in the cp genome of T. hemsleyanum. The conversions of H to Y, L to F, P to S, R to W, R to C were due to codons edited at the first nucleotide position, while the S to L, P to L, S to F, T to M, A to V, T to I conversions were caused by codons edited at the second nucleotide position (Supplementary Table 5). The conversion of serine to leucine (S to L) was the most abundant kind of conversion, accounting for 42.3%, while arginine to tryptophan (R to W) and arginine to cysteine (R to C) were the least conversion, accounting for 1.4% merely (Supplementary Table 5). Furthermore, the predicted RNA editing sites in the cp genomes of T. planicaule, A. japonica and V. vinifera showed similar results with that of T. hemsleyanum, with the RNA editing sites number of 71, 72 and 70 respectively. The slight difference of the number of RNA editing sites were observed in accD, ndhB and ndhF genes among these Vitaceae plants, which led to the distinctions of amino acid conversions (Supplementary Table 5). Since the close correlation between RNA editing sites and nucleotide substitution of protein coding genes, we performed further analysis to investigate the synonymous substitutions (Ks) and non-synonymous substitutions (Ka) of protein coding genes with abundant RNA editing sites. The Ka/Ks ratios of most genes (22/24) in T. hemsleyanum were less than 0.5 expect the matK (0.5534) and rps16 (0.5687), suggesting an obvious purifying selection pattern. Particularly, the clpP, psbL and psbF genes even exhibited a Ka/Ks value of 0, showing the three genes were possibly under strong purifying selection pressure (Table 5).
Table 4
Number of the RNA editing sites in the cp genome of T. hemsleyanum and three Vitaceae species
Gene | Number of RNA editing sites |
T. hemsleyanum | T.planicaule | A. japonica | V. vinifera |
accD | 1 | 1 | 2 | 1 |
atpA | 3 | 3 | 3 | 3 |
atpF | 1 | 1 | 1 | 1 |
atpI | 1 | 1 | 1 | 1 |
ccsA | 1 | 1 | 1 | 1 |
clpP | 1 | 1 | 1 | 1 |
matK | 4 | 4 | 4 | 4 |
ndhA | 4 | 4 | 4 | 4 |
ndhB | 11 | 11 | 12 | 11 |
ndhD | 8 | 8 | 8 | 8 |
ndhF | 7 | 7 | 6 | 6 |
ndhG | 3 | 3 | 3 | 3 |
petB | 2 | 2 | 2 | 2 |
psbE | 1 | 1 | 1 | 1 |
psbF | 1 | 1 | 1 | 1 |
psbL | 1 | 1 | 1 | 1 |
rpl20 | 1 | 1 | 1 | 1 |
rpoA | 2 | 2 | 2 | 2 |
rpoB | 5 | 5 | 5 | 5 |
rpoC1 | 2 | 2 | 2 | 2 |
rpoC2 | 4 | 4 | 4 | 4 |
rps2 | 2 | 2 | 2 | 2 |
rps14 | 2 | 2 | 2 | 2 |
rps16 | 3 | 3 | 3 | 3 |
Total | 71 | 71 | 72 | 70 |
Table 5
The value of Ka/Ks in 25 protein coding genes with RNA editing sites in T. hemsleyanum (Jiangxi)
Gene | Number of RNA editing sites | non-synonymous substitutions (Ka) | synonymous substitutions (Ks) | Ka/Ks |
ndhB | 11 | 0.0113 | 0.0243 | 0.4650 |
ndhD | 8 | 0.0516 | 0.3257 | 0.1584 |
ndhF | 7 | 0.1087 | 0.3022 | 0.3597 |
rpoB | 5 | 0.0240 | 0.2243 | 0.1070 |
ndhA | 4 | 0.0383 | 0.2928 | 0.1308 |
matK | 4 | 0.1296 | 0.2342 | 0.5534 |
rpoC2 | 4 | 0.0704 | 0.2663 | 0.2644 |
ndhG | 3 | 0.0491 | 0.2718 | 0.1807 |
atpA | 3 | 0.0334 | 0.2751 | 0.1214 |
rps16 | 3 | 0.1109 | 0.1950 | 0.5687 |
rpoA | 2 | 0.0700 | 0.1879 | 0.3725 |
rpoC1 | 2 | 0.0267 | 0.2567 | 0.1040 |
petB | 2 | 0.0124 | 0.1742 | 0.0712 |
rps2 | 2 | 0.0111 | 0.2074 | 0.0535 |
rps14 | 2 | 0.0263 | 0.1387 | 0.1896 |
accd | 1 | 0.1093 | 0.2714 | 0.4027 |
atpF | 1 | 0.0469 | 0.1480 | 0.3169 |
atpI | 1 | 0.0263 | 0.1612 | 0.1631 |
ccsA | 1 | 0.0844 | 0.2810 | 0.3004 |
clpP | 1 | 0 | 0 | 0 |
psbE | 1 | 0.0053 | 0.2050 | 0.0259 |
psbF | 1 | 0 | 0.0934 | 0 |
psbL | 1 | 0 | 0 | 0 |
rpl20 | 1 | 0.0709 | 0.1643 | 0.4315 |
2.4. Phylogenetic Analysis
The previous reports by molecular and morphological data indicated the family of Vitaceae could be classified into five major clades, including the tribe of Ampelopsideae, Cisseae, Cayratieae, Parthenocisseae, and Viteae [37]. However, the deep phylogenetic relationship of Vitaceae still needs further explorations to reveal the evolutionary characters and genetic status of grape species. Therefore, we constructed phylogenetic tree of Viteae family based on the 70 protein-coding gene datasets by maximum likelihood (ML) and maximum parsimony (MP) method, respectively. These grape plants contained 4 species from tribe Viteae, 3 species from tribe Ampelopsideae and 6 plants from tribe Cayratieae. Melaleuca alternifolia and M. cajuputi were chosen as the outgroups for phylogenetic analysis. As shown in Fig. 5, nearly all of the nodes received moderate to high support values in the ML and MP tree analysis. However, several topological differences have been occurred in relationships within the five T. hemsleyanum species and the tribe of Viteae between the ML and MP tree results (Fig. 5). The phylogenetic analysis among the five T. hemsleyanum plants revealed a stable monophyletic group with high bootstrap values, which exhibited a stable sister relationship with T. planicaule, indicating a close genetic relationship within the genus of Tetrastigma (Fig. 5). In addition, the ML analysis indicated that the samples of T. hemsleyanum from Fujian and Sichuan regions clustered together to form a combined group with a bootstrap score of 68, which subsequently gathered together with T. hemsleyanum species from other three regions (Fig. 5A). These results indicated certain subtle protein coding differences of T. hemsleyanum cp genomes from different regions, providing potential molecular tools for distinguishing the geographical origins of T. hemsleyanum. Furthermore, the Ampelopsideae species and Viteae plants combined together to form a clade with strong statistical support, which combined with six Tetrastigma plants to form a robust monophyletic clade, which was consist with the previous classification of the tribes of Ampelopsideae, Viteae and Cayratieae in Vitaceae.
2.5. Nucleotide Diversity Analysis and Development of Molecular Marker for Geographical Origin Discrimination
The complete cp genomes with high variable levels provides potential molecular marker for species identification and geographical origin determination. In order to assess the sequences divergence level within the Vitaceae species, the complete cp genomes have been multiple aligned and applied DnaSP to calculate nucleotide variability (Pi). As shown in Fig. 6A, the sliding window analysis revealed 5 highly variable regions with Pi values ranging from 0.06194 to 0.10611 across four complete cp genomes of Vitaceae species, including 4 intergenic regions (rps16-trnQ, psbM-trnD, psbZ-trnfM and ycf3-trnS) and one protein coding genes (ycf1) (Fig. 6A). Among the five mutational hotspot loci, four highly variable hotspots were located in the LSC region, and ycf1 gene with the Pi value of 0.06194 was identified in the SSC region. However, none of the hypervariable loci were determined in the IR region, further confirmed that the IR regions were highly conserved in the cp genomes among the Vitaceae species. The rps16-trnQ gene exhibited the highest Pi value of 0.10611, followed by psbZ-trnfM and ycf3-trnS with the Pi values of 0.10083 and 0.10056, respectively. Besides, a comparative analysis was carried out to determine the numbers of SNP sites and Gaps to further explore the characteristics of five hypervariable regions among four Vitaceae plants. The five mutational hotspots in the cp genome of T. hemsleyanum from Zhejiang province ranged from 892 bp (psbZ-trnfM) to 1139 bp (rps16-trnQ) in length (Table 6). Apparently, the high variable sequences of T. planicaule from Tetrastigma genus exhibited a small number of SNP sites (3–8) and Gaps (0–19) than that of T. hemsleyanum from Jiangxi Province except psbM-trnD region, which contained the SNP site and Gaps of 79 and 39 in the cp genome of T. planicaule, respectively. However, a great deal of variable sites was displayed in the 5 mutational hotspots of A. japonica and V. vinifera. For instance, the hypervariable regions of psbZ-trnfM showed 104 and 115 SNP sites in A. japonica and V. vinifera, respectively, which was significantly higher than that of T. hemsleyanum of Zhejiang. All these discrepancies led to variable mutational hotspot lengths in the Vitaceae plants eventually, and also provided potential molecular markers to resolve the difficulties in species identification of Vitaceae species.
Table 6
Multiple analysis of the mutational hotspots in four Vitaceae plants
mutational hotspots | Specises | Length | GC content | Number of SNP sites | Total length of Gaps |
rps16-trnQ | T. hemsleyanum (Jiangxi) | 1139 bp | 23.09% | / | / |
T. planicaule | 1141 bp | 23.14% | 5 | 4 |
A. japonica | 1208 bp | 20.86% | 149 | 167 |
V. vinifera | 1076 bp | 21.84% | 98 | 138 |
psbM-trnD | T. hemsleyanum (Jiangxi) | 895 bp | 35.08% | / | / |
T. planicaule | 868 bp | 35.37% | 79 | 39 |
A. japonica | 860 bp | 33.37% | 60 | 77 |
V. vinifera | 843 bp | 34.28% | 41 | 100 |
psbZ-trnfM | T. hemsleyanum (Jiangxi) | 892 bp | 24.33% | / | / |
T. planicaule | 883 bp | 24.46% | 8 | 19 |
A. japonica | 907 bp | 24.33% | 104 | 95 |
V. vinifera | 911 bp | 22.39% | 115 | 69 |
ycf3-trnS | T. hemsleyanum (Jiangxi) | 1031 bp | 33.56% | / | / |
T. planicaule | 1029 bp | 33.92% | 3 | 2 |
A. japonica | 1113 bp | 33.33% | 97 | 128 |
V. vinifera | 1123 bp | 33.93% | 97 | 146 |
ycf1 | T. hemsleyanum (Jiangxi) | 977 bp | 30.60% | / | / |
T. planicaule | 977 bp | 30.40% | 6 | 0 |
A. japonica | 965 bp | 29.95% | 77 | 12 |
V. vinifera | 980 bp | 30.61% | 67 | 15 |
To determine the potential of variable sequences in cp genome for geographical origin discrimination, we further evaluate the the sequences divergence level of T. hemsleyanum from different regions. The results demonstrated that the intraspecific differences of T. hemsleyanum was much lower than interspecific differences among Vitaceae species (Fig. 6). A total of 5 mutational hotspots with relative high Pi value (≥ 0.004) have been screened out in T. hemsleyanum plants, including 2 hypervariable regions (trnL-CAA and trnN-GUU) located in IRs and one intergenic region located in SSC (ndhD-psaC) with the Pi value of 0.009 (Fig. 6B). Moreover, we found that both the SSC and IR regions were more variable than the LSC region in the chloroplast genomes of T. hemsleyanum from different regions. This result was significant different with the general observations in other species, where the IR regions usually exhibited lower variability than the LSC and SSC regions. Accordingly, these hypervariable regions with abundant intraspecific variable sites could be developed as potential DNA barcodes to discriminate the geographical origins of T. hemsleyanum.
Our study designed five DNA barcodes (accD, trnL, trnN, ndhD-psaC and ndhC-trnV) based on hypervariable regions for PCR amplification of T. hemsleyanum medicinal materials in the Zhejiang region (Fig. 7A). The single bright band in agarose gel electrophoresis implied amplification of accD, trnL and trnN, while the trnL and trnN showed higher PCR amplification efficiency and sequence diversity. As a result, the two DNA barcodes were amplified with DNA of T. hemsleyanum samples from six different regions in batches to further analyze the efficiency of geographical origin discrimination. The detailed sequence information of the two PCR products is shown in Table 7. The size of the trnL and trnN barcodes in all T. hemsleyanum samples was 1143 bp and 469 bp, respectively. The trnL sequences of T. hemsleyanum from Sichuan exhibited a unique GC content of 38.76%, while remaining T. hemsleyanum samples from other regions showed a GC content of 38.85% in trnL (Table 7). A total of five stable variants at position of 165 bp, 166 bp, 167 bp, 168 bp and 1036 bp were identified in the trnL sequence, generating three haplotypes of T. hemsleyanum from different regions (including our experiments and data from GenBank). The trnL sequences of T. hemsleyanum from Sichuan Province exhibited a unique haplotype of A3, while those from Zhejiang Province displayed two haplotypes of A1 and A2 (Table 7). Notably, that T. hemsleyanum plants from Jiangxi, Zhejiang, Fujian, Guangxi and Guangdong regions harbored the identical trnL haplotype of A1, indicating the genetic variation of A1 was the main variety distributed in China due to its strong environmental adaptability. Additionally, the trnN sequences of T. hemsleyanum from different origins showed an identical GC content of 43.50%. However, these trnN sequences exhibited four variable bases at the position of 164 bp, 165 bp,166 bp and 167 bp, generating 2 haplotypes among different regions (Supplementary Table 6). Interestingly, the trnN sequence from Sichuan region showed a unique haplotype of B2, while that from other regions exhibited the same haplotype of B1 (Table 7). These results demonstrated that the intraspecies discrepancy of T. hemsleyanum plants among different regions, further confirming the availability and necessity of geographical origin identification strategy based on molecular markers of chloroplast genome.
Table 7
Sequence analysis of T. hemsleyanum samples and other Vitaceae species basing on two DNA barcodes
Species | Sample number | Sample source | trnL-CAA | trnN-GUU |
Length | GC content | Genbank accession No. | Haplotype | Length | GC content | Genbank accession No. | Haplotype |
T. hemsleyanum | JT-01 | Lushan District, Jiujiang, Jiangxi | 1143 bp | 38.85% | MZ995437 | A1 | 469 bp | 43.50% | MZ995468 | B1 |
JT-02 | MZ995438 | A1 | MZ995469 | B1 |
JT-03 | MZ995439 | A1 | MZ995470 | B1 |
ZT-01 | Linhai, Taizhou, Zhejiang | OK058531 | A1 | MZ995452 | B1 |
ZT-02 | OK058532 | A1 | MZ995453 | B1 |
ZT-03 | MZ995433 | A1 | MZ995454 | B1 |
ZT-04 | Suichang County, Lishui, Zhejiang | MZ995434 | A2 | MZ995455 | B1 |
ZT-05 | MZ995435 | A2 | MZ995456 | B1 |
ZT-06 | MZ995436 | A2 | MZ995457 | B1 |
MT-01 | Longyan, Fujian | MZ995440 | A1 | MZ995458 | B1 |
MT-02 | MZ995441 | A1 | MZ995459 | B1 |
MT-03 | MZ995442 | A1 | MZ995460 | B1 |
MT-04 | MZ995443 | A1 | MZ995461 | B1 |
YT-01 | Baise, Guangxi | MZ995444 | A1 | MZ995462 | B1 |
YT-02 | MZ995445 | A1 | MZ995463 | B1 |
YT-03 | MZ995446 | A1 | MZ995464 | B1 |
CT-01 | Wanyuan, Dazhou, Sichuan | 38.76% | MZ995447 | A3 | MZ995465 | B2 |
CT-02 | MZ995448 | A3 | MZ995466 | B2 |
CT-03 | MZ995449 | A3 | MZ995467 | B2 |
DT-01 | Shaoguan, Guangdong | 38.85% | MZ995450 | A1 | MZ995471 | B1 |
DT-02 | MZ995451 | A1 | MZ995472 | B1 |
NCBI | Genbank | MT827073 | A1 | MT827073 | B1 |
T. planicaule | TP-01 | 38.76% | MN401672 | A5 | 43.28% | MN401672 | B3 |
A. japonica | AJ-01 | 1136 bp | 39.00% | NC_042235 | A6 | NC_042235 | B4 |
V. vinifera | VV-01 | NC_007957 | A7 | 43.07% | NC_007957 | B4 |
M. alternifolia | MA-01 | 1090 bp | 38.53% | MN310606 | A8 | 491 bp | 41.75% | MN310606 | B5 |
M. cajuputi | MC-01 | 1077 bp | NC_052729 | A9 | 41.55% | NC_052729 | B6 |
This study also explored the genetic distance of intraspecific and interspecific variation within the trnL and trnN sequences of the T. hemsleyanum medicinal materials from different regions (Figs. 7B, C, D). The K2P distance of both trnL and trnN sequences among the 21 T. hemsleyanum samples ranged from 0.000 to 0.004, suggesting a significant barcoding gap among plants from different regions. For instance, the divergence value of trnL was highest (0.004) between the Jiangxi and Zhejiang regions and lowest (0.001) between the Jiangxi and Sichuan regions. The intraspecific genetic distance of trnN sequence between Jiangxi and the Sichuan region had a K2P value of 0.004, suggesting a barcoding gap. However, both of the two cp molecular markers failed to generate barcoding gap among species from Jiangxi, Fujian, Guangxi and Guandong province, indicating the inability of discriminating geographical origin from these regions by trnL and trnN. Moreover, the combination of trnL and trnN sequences revealed a lower intraspecific distance among different geographical origin of T. hemsleyanum than the single molecular marker (Fig. 7D). The intraspecific genetic distances based on trnL + trnN sequences between Jiangxi and Zhejiang, Jiangxi and Sichuan, and among Jiangxi, Fujian, Guangxi, Guangdong and Zhejiang were 0.002, 0.003 and 0, respectively (Fig. 7D). The interspecific distance was greater among the Vitaceae species than that intraspecific distance of T. hemsleyanum samples, suggesting the developed DNA barcodes could be successfully applied for T. hemsleyanum species identification from other Vitaceae plants (Fig. 7D). The NJ tree analysis of the trnL and trnN barcodes revealed a clear distinction clearly among the different geographical origins of T. hemsleyanum plants (Figs. 7E, F). The trnL-based NJ tree generated three groups with different geographical origins, while trnN-based NJ tree only provided two clades of T. hemsleyanum plants. The Clade I of trnL-based NJ tree consisted of all T. hemsleyanum samples from Jiangxi, Fujian, Guangxi and Guangdong areas, three samples from Zhejiang (ZT-01, ZT-02 and ZT-03) and one sample from Genbank (NCBI) with the bootstrap support value of 60. Clade II included T. hemsleyanum samples from Sichuan and T. planicaule samples. The other three T. hemsleyanum samples from Zhejiang (ZT-04, ZT-05 and ZT-06) formed Clade III (Fig. 7E). The Clade I of the trnN-based NJ tree included all samples from Sichuan, while Clade II consisted of the samples from the other five regions and the sample (NCBI) from the Genbank (bootstrap score, 93). Although trnL barcode is more powerful in discriminating geographical origins of T. hemsleyanum than the trnN barcode, it failed to distinguish other Vitaceae species from T. hemsleyanum (Fig. 7E). Finally, we constructed the NJ tree based on the combination sequence of trnL + trnN to determine the identification accuracy for T. hemsleyanum plants from different regions. Interestingly, this NJ tree divided the T. hemsleyanum plants into three groups. The three samples from Zhejiang province (ZT-04 to ZT-06) and Sichuan samples (CT-01 to CT-03) each formed a separate group, while the other samples from Zhejiang province (ZT-01 to ZT-03) and samples from other regions formed the third group (Fig. 8). Furthermore, the position of each Vitaceae species in the NJ tree based on a combination barcode was similar to phylogenetic trees based on the 70 protein-coding gene datasets (Figs. 5, 8). The NJ tree clearly showed that the Tetrastigma genus (T. planicaule and T. hemsleyanum samples) formed the main branch (bootstrap score, 54). They were significantly distinguished from the representative species of the Viteae tribe and Ampelopsideae tribe. This is an indication of the species identification potential of the combined molecular markers. These results demonstrate that the combined barcode strategy of trnL + trnN derived from comparative chloroplast genomes is a potential molecular tool for the geographical origin discrimination of T. hemsleyanum in China.