3.1 Plant characteristics of L. japonicus, L. cardiaca, and L. sibiricus
The species of Leonurus genus is an annual or biennial herb with dense roots and a shape stem erect is a blunt four prism. The lower leaf of the stem is oval and the middle leaf is diamond with mixed and stalked leaves. The leaves are fresh, tender, and green with juice. It has weak gas and a bitter taste. They are flowering from June to September and the nutlet fruits have been produced in the period from July to October. Because they have commonly been used in the treatment of gynecology, they have the name “benefit mother”.
The three Leonurus species have an obvious distinction in the flower color when they grow up to the ripe flowering stage. Their flowers have diverse colors as follows: L. japonicus (peach red), L. cardiaca (pink), and L. sibiricus (purple). In addition, the characteristics of tissues including stem, leaf, and flower has arguable feature. In the species of L. japonicus, the stem has inverted to the rough hair, especially dense in the section and edge and the base is sometimes close to hairless, with more branches (Fig. 1a). The leaf of the lower stem is ovate having the base wide wedge with 3 split palms. The corolla is pink to lavender, pubescent on the outside of the protruding calyx tube (Fig. 1a). While for the L. cardiaca species, the stem is blunt with four prisms, the smooth leaves are heart-shaped or wide oval, the apex is sharp-pointed, and usually 5–7 shallow cracked palms. The cover bud is axillary and the corolla is shaped like a lip (Fig. 1b). Meanwhile, in the species of L. sibiricus, the stem is blunt with four prisms with microgroove. The leaf of the middle stem is ovate with a wide wedge at the base and 3 deep splits as a palm. The flower is gradually spherical to the top and can be gathered into a long spike (Fig. 1c) (Pitschmann et al., 2017).
3.2 Assembly validation and collinearity analysis about chloroplast genomes of the four Leonurus species
To confirm the correctness of the assembly, we compared the four Leonurus chloroplast genomes with that of L. japonicus (MG673937.1) as the reference genome including the comparison. All the dot plots showed a diagonal line with two perpendicular lines, representing the IRs (Figure S1). The three other Leonurus species exhibited collinearity with that of L. japonicus (MG673937.1) and no rearrangement region could be found (Fig. 2). The result showed the high quality of assembly, sequence identity, and species relative, which have high uniformity regarding the nucleotide and regions (Brenner, 1966).
3.3 Differential gene contents among the chloroplast genomes in the four Leonurus species
After analysis of cp genomes in the four Leonurus species, in other words, L. japonicus, L. cardiaca, and L. sibiricus, they are all circular DNA molecules and conserved tetrad structure with the size is 151610bp, 151610bp, 151236bp, and 151689bp, respectively. Their genomes are commonly constructed by a LSC region, an SSC region, and a pair of IR regions with the lengths of 82827bp, 82827bp, 82294bp, and 82820bp; 17515bp, 17515bp, 17654bp, and 17619bp; 25634bp, 25634bp, 25644bp, and 25625bp; 25634bp, 25634bp, 25644bp, and 25625bp, respectively (Table S1 and Fig. 3). The length of CDS, tRNA rRNA, and non-coding regions in the four Leonurus species is 80391bp, 80391bp, 74508bp, and 80361 bp; 2834bp, 2834bp, 2833bp, and 2834bp; 9400bp, 9400bp, 9052bp, and 9400bp; 58985bp, 58985bp, 64843bp, and 59094bp. The overall GC content of them is 38.41%, 38.41%, 38.39%, and 38.41%, and that of the IR region (43.36%, 43.36%, 43.38%, and 43.37%) is higher than those of the LSC region (36.65%, 36.65%, 36.63%, and 36.65%) and the SSC region (32.23%, 32.23%, 32.16%, and 32.22%) (Table S1).
Moreover, the number of genes and proteins slightly changed from 132 to 133, from 87 to 88 in all 4 Leonurus species, respectively. While the number of tRNA and rRNA genes was commonly 37 and 8 (Table S1). Further comparison of cp genomes in the four Leonurus species, eight protein-coding genes (rps12, rps7, rpl2, rpl23, ndhB, ycf1, ycf2, and ycf15), seven tRNA-coding genes (trnN-GUU, trnR-ACG, trnA-UGC, trnI-GAU, trnV-GAC (×2), trnL-CAA, and trnI-CAU (×2)), and four types of rRNA-coding genes (rrn16S, rrn23S, rrn5S, and rrn4.5S) were located in the IR region (Table 1). In addition, twelve PCGs (rps16, rps12 (×2), atpF, rpoC1, petB, petD, rpl2 (×2), ndhB (×2), and ndhA), and eight tRNA-coding genes (trnK-UUU, trnG-UCC, trnL-UAA, trnV-UAC, trnI-GAU, trnA-UGC, trnA-UGC, and trnI-GAU) contain one intron by each, whereas each of the other two PCGs (ycf3 and clpP) contain two introns (Table S2, Figure S2). Differently, the rpl16 gene with one intron does not exist in the species of L. cardiaca (NC_058592.1, Figure S2), while the trans-spliced rps12 genes with two introns and three exons can be found in the species of L. sibiricus (OP327561.1, Fig. 4).
Table 1
Comparison of the gene contents in the chloroplast genomes of L. japonicus, L. cardiaca, and L. sibiricus.
Species/Items
|
L. japonicus
|
L. cardiaca
|
L. sibiricus
|
Gene Function
|
Gene Type
|
Gene Name
|
tRNA
|
tRNA genes
|
37 trn genes (include one intron in 6 genes)
|
Photosynthesis
|
Large subunit of ribosome
|
rpl14, rpl16, rpl2a, rpl2b, rpl20, rpl22, rpl23a, rpl23b, rpl32, rpl33, rpl36
|
DNA dependent RNA polymerase
|
rpoA, rpoB, rpoC1, rpoC2
|
Small subunit of ribosome
|
rps11, rps12a, rps12b, rps14, rps15, rps16, rps18, rps19, rps2, rps3, rps4, rps7a, rps7b, rps8
|
Gene expression
|
Subunits of ATP synthase
|
atpA, atpB, atpE, atpF, atpH, atpI
|
Subunits of photosystem II
|
psbA, psbB, psbC, psbD, psbE, psbF, psbI, psbJ, psbK, psbL, psbM, psbN, psbT, psbZ, ycf3
|
Subunits of NADH-dehydrogenase
|
ndhA, ndhBa, ndhBb, ndhC, ndhD, ndhE, ndhF, ndhG, ndhH, ndhI, ndhJ, ndhK
|
Subunits of cytochrome b/f complex
|
petA, petB, petD, petG, petL, petN
|
Subunits of photosystem I
|
psaA, psaB, psaC, psaI, psaJ
|
Subunit of rubisco
|
rbcL
|
Other genes
|
Subunit of acetyl-CoA-carboxylase
|
accD
|
C-type cytochrome synthase
|
ccsA
|
Protease
|
clpP
|
Translation initiation factor
|
infA
|
Mature enzyme
|
matK
|
Envelope membrane protein
|
cemA
|
Unknown functions
|
Conservative open reading frame
|
ycf1a, ycf1b, ycf15a, ycf15b, ycf2a, ycf2b, ycf4
|
Note: a: IRa region; b: IRb region.
3.4 Codon usage analysis
The codons were thought to result from a combination of natural selection, mutation, and genetic drift. Codon usage bias embodies that each gene of the species has its specific preferred amino acid codon (Subramanian et al., 2022). In this study, the number of the PCGs was 21328, 20864, and 20896 with 64 types in the three cp genomes of L. japonicus, L. cardiaca, and L. sibiricus, respectively (Table S3). Among them, isoleucine was the most abundant amino acid encoded by the codons “AUU”, ranging from 880 to 898 counts with a percentage of 4.21%, compared to the least frequency amino acid of cysteine encoded by the codons “UGC”, whereas the stop codon encoded by “UAG” was the least abundant below 12 counts (0.05%). Total of these amino acids and codons, the RSCU value varied from 0.3336 in the serine (AGC) to 1.9302 in the Leucine (UUA), simultaneously, 31 codons having the RSCU > 1, of which 29 ended with A/U for the codon bias. In addition, the amino acids (codons) of Met (AUG) and Trp (UGG) do have not any codon usage bias because their RSCU values were similarly equivalent to one (Table S3 and Fig. 5). These results were consistent with most published species, for instance, the Glycyrrhiza species and Agastache rugosa (Dai et al., 2023; Liang et al., 2023).
3.5 GO and KOG analysis
Through the protein annotations from the database of GO and KOG function in the three Leonurus species, the results showed that diverse number functions of common proteins within the three main functions (cellular component, molecular function, and biological process, Table S4 and Figure S3) (Sangrador-Vegas et al., 2016; Mudado et al., 2006), that is to say, of which they existed at cell, membrane, and organelle including catalytic activity (37 proteins), structural molecule activity (25 proteins), transporter activity (6 proteins), binding (57 proteins), metabolic process (78 proteins), cellular process (73 proteins), response to stimulus (1, psbA), localization (17 proteins), biological regulation (2 proteins, psbH and psbZ), and cellular component organization or biogenesis (5 proteins, ccsA, rpl23 (×2), petN, and rps11). Meanwhile, most of them had analogous functions with the homologous consensus sequence of genes and proteins, which mainly focused on the fruitful aspects, such as translation, ribosomal structure, and biogenesis; energy production and conversion; transcription; posttranslational modification, protein turnover, and chaperones; intracellular trafficking, secretion, and vesicular transport; transport and metabolism of carbohydrate, lipid, and inorganic ion (Table S5 and Figure S4).
3.6 SSR Polymorphism and repeat analysis
In the chloroplast genome of the three Leonurus species, there were four base types of SSR sequences (the base of A, T, AT, and TA) and three-component types (mononucleotides: p1, dinucleotide: p2, and compound: c, Table S6 and Fig. 6) (Iranawati et al., 2012). These chloroplast genomes included mononucleotides in the L. japonicus (the base of 8A or 13 T), the L. cardiaca (the base of 10A or 15 T), and the L. sibiricus (the base of 7A or 14T); while the dinucleotide (that is compound SSR) were found in the L. japonicus (the bases of 2 AT or 1 TA) and the L. cardiaca (the bases of 2 AT or 1 TA), added up to the number of SSR is 24, 25, and 24, respectively. The 20 SSRs were located in the LSC region of the three Leonurus species and 1 SSR was located in the former two species (L. japonicus and L. cardiaca, Table S6 and S7), whereas 3 SSRs in the SSC region of L. sibiricus (Table S7). The length of SSR sequences varied from 10 bp to 92 bp (Table S7-S10 and Fig. 6b). The four repeated sequences types (forward, reverse, complement, and palindromic) were observed in the species of L.japonicus (19 F, 5 R, 24 P, and 1 C, Table S7) and L.cardiaca (21 F, 3 R, 24 P, and 1 C, Table S7), nevertheless, the type of complement sequence was not detected in the species of L. sibiricus (20 F, 5 R, and 24 P, Table S7). For more detail, the number of tandem repeats found in the chloroplast genomes of L. japonicus, L. cardiaca, and L. sibiricus is 25, 26, and 24, respectively, which confirmed the total length of over 20 bp and the similarity ≥ 70% between repeat units. The length of tandem repeats sequences varied from 29.4 bp to 86.1 bp (Tables S11-S13). The number of scattered repeats units in these three Leonurus species is 39, 40, and 39, respectively, and obtained using an e-value of fewer than 6.15 E-04 as the threshold. The length of scattered repeat sequences varied from 30 bp to 46 bp (Tables S14-S16).
3.7 Analysis of IR boundaries and comparative genomes
The illustration boundaries of the LSC, SSC, and IRs in the four cp genomes of Leonurus species showed the gene contents were uniform except for gene diversity of the sequence and length (Giurazza et al., 2016). Similarly, nine genes including across-the-border genes (rps19, ycf1 (×2), and ndhF) and genes included in the regions (rpl22, rpl2(×2), trnH, and psbA) commonly existed. The three genes of trnH, rpl22, and psbA were located in the LSC region. The two rpl2 genes were apart and located within the IRb and IRa regions. The complete rps19 genes were found at the border of the LSC/IRb junction. One of the two ycf1 genes was present at the boundary between IRb and SSC region, the other one was across the IRa and SSC region, the same as the ndhF gene (Figure S5). After comparing the complete chloroplast genomes in the three Leonurus species with referred L. japonicus (MG673937.1), the four hotspot divergent regions were discovered to discriminate them. The regions were trnC-GCA-petN (A, LSC region), petB-trnI-CAT (B, LSC and SSC region) and rpl2-trnI-CAT (C, IRb region), and trnL-TAG-ndhF (D, IRa region, Fig. 7).
3.8 Hypervariable region identification
Highly conserved regions and variable sequences in the chloroplast genomes (cpDNA) are suitable to construct DNA barcodes for effective application to interspecific discrimination and phylogenetic research (Powell et al., 1995). After calculation, 72 variable regions in the chloroplast genomes of 3 Leonurus species were found based on the intergenic and intron regions (NCRs) using the K2p model. The average value of the K2p in the 72 IGS regions ranged from 0 to 10.95 (Table S17 and Fig. 8). Among them, the five IGS regions (trnC-GCA-petN, atpH-atpI, psaC-ndhE, rps15-ycf1, and petG-trnW-CCA) showed evident variations at divergence sequences with a high K2p value. Several results showed agreement with that of the genome comparison (Fig. 7) in the main IGS regions of trnC-GCA-petN and rps15-ycf1. Therefore, these regions can be applied as potential variation regions to develop the distinction among the 3 Leonurus species.
3.9 Genus-specific DNA barcodes development
Given the results of hypervariable region identification (Coissac et al., 2016), we selected the nucleotide of the two IGS regions (atpH-atpI and rps15-ycf1) to design the specific primers (Table 2). Through amplifying these sequences, we develop the DNA barcodes to identify the 3 Leonurus species. The IGS regions (atpH-atpI, LSC region) was between the location of 13460 bp and 14389 bp, while IGS of rps15-ycf1 (SSC region) was located between 121920 bp and 129540 bp in these chloroplast genomes. We cloned the two regions and acquired the three sequences of atpH-atpI (M1, Table 2, ~ 300 bp) and rps15-ycf1 (M2, Table 2, ~ 300 bp) by each species using Sanger sequencing. Then, we comparatively analyzed the two molecular markers (MMs) among the three studied Leonurus species to determine the variations, including indels and single nucleotide polymorphisms (SNP) (Table S19, M1, and M2). The amplification products of the two IGS were checked and the strips were clearly shown on the agarose gel (Figure S6). From the peak map (up) and sequencing results (down) of the three studied Leonurus species with the pairs of primers related to the molecular markers M1 and M2 (Fig. 9), the marker M1 developed at the IGS atpH-atpI had six specific variables SNP sites (marked A, B, C, D, E, and F, Fig. 9a) and the marker M2 found at IGS rps15-ycf1 had ten SNP sites (marked G, H, I, J, K, L, M, N, and Q, Fig. 10b) and two Indel sites (marked O and P, Fig. 9b). In detail, in the species of L. japonicus, the SNP site based at the positions 197/209/213 in the nucleotide sequence of YMC_M1 were G/A/A, and in the nucleotide sequence of YMC_M2, the InDel site was missing at the base at position 265 and the base at position 266 was T. In the species of L. cardiaca, the bases of the SNP site at positions 199/212/215 in the nucleotide sequence of OYMC_M1 were C/A/G, while at positions 241/243/265/267 in the nucleotide sequence of OYMC_M2 were G/T/T/A. In the species of L. sibiricus, base positions 245/247/250/253/256/263/267 in the nucleotide sequence of XYYMC_M1 were the diverse base A/A/C/G/G/A, at position 265 in the nucleotide sequence of XYYMC_M2 was the InDel deletion site, and deletion of base T or A at the position 266 (Fig. 9). While using two labeled SNP and InDel sites for M1 and M2, the three Leonurus species were isolated and successfully discriminated against based on these SNP and indel loci by separately or unitedly using the two M1 and M2 molecular markers with these results.
Table 2
Primers for amplifying DNA barcodes to distinguish Leonurus species in the Lamiaceae family.
Number
|
Species
|
Conserved Sequences for Designing Forward Primers
|
Conserved Sequences for Designing Reverse Primers
|
IGS
|
M1
|
L. japonicus, L. cardiaca, L. sibiricus
|
CACACGTCGTAAAATTGAATCACAC
|
GCTTAGCGCAAAGCAATGTATGC
|
atpH-atpI
|
M2
|
CAGATATGGATTTTACCGATCAG
|
GCAAACAAATACACAGATCAC
|
rps15-ycf1
|
3.10 Comparison of ITS2 sequences in the seven Leonurus species
In comparison with the chloroplast genome, Internal Transcribed Spacer2 (ITS2) is the non-coding region of the ribosome components in eukaryotes organisms (But et al., 2023). After annotation and contrast of the ITS2 sequences among the seven Leonurus species including the three studied species, we found that the most of nucleotide differences occur after the 57th base from the beginning of the sequence in a total of 221 bases except the species of L. pseudomacranthus showing the significant differentia, agreed with the discovery from the group of Dr. Chao and Yang (Yang et al., 2011) (Fig. 10 and Table S19). Whereas, the Leonurus species need to be further distinguished by the full-length ITS sequence or other distinctive ways.
3.11 Selective pressure analysis
Encoded codons can undergo evolution under a specific type of coding selective pressure by the comparative genomics approaches (Sheikh-Assadi et al., 2022). We checked the selection pressure of 80 protein-coding genes in the chloroplast genomes of three Leonurus species and found three genes in the three Leonurus species influenced by the distinct environment. Using aBSREL model implemented in the HyPhy software, we observed that the three genes were determined under positive selection, which included one Protease gene (clpP, L. japonicus), one gene for the subunits of NADH-dehydrogenase (ndhB, L. cardiaca), and one gene for the small subunit of ribosome (rps12, L. sibiricus). The optimized branch length, Likelihood Ratio Test (LRT), and p-value varied between 0.0005–0.0086, 8.3714–21.9128, and 0-0.0164, respectively. The ω1 ratios of these genes were zero, indicating that they were under a strong purifying and weak positive selection. The branch-specific model showed that the above three genes were under positive selection in the three Leonurus clades along the phylogenetic tree including Clade (ω2 = 371) and Clade (ω2 = 10000). These genes in the three Leonurus species were probably impacted under extreme conditions, for instance, hyperthermia, high humidity, and high pressure (Table 3).
Table 3
Selection pressure analysis of the three Leonurus genus in the Lamiaceae family
Name
|
GenBank No.
|
B
|
Genes
|
LRT
|
Test p-value
|
Uncorrected p-value
|
ω distribution over sites
|
Leonurus japonicus
|
MG673937.1
|
0.0005
|
clpP
|
12.5857
|
0.0019
|
0.0006
|
ω1 = 0.00(100%)
ω2 = 10000(0.25%)
|
Leonurus cardiaca
|
NC_058592.1
|
0.0086
|
ndhB
|
8.3714
|
0.0164
|
0.0055
|
ω1 = 0.00(99%)
ω2 = 371(0.64%)
|
Leonurus sibiricus
|
OP327561.1
|
0.0006
|
rps12
|
21.9128
|
0.0000
|
0.0000
|
ω1 = 0.00(97%)
ω2 = 10000(3.2%)
|
3.12 Phylogenetic analysis
In the phylogenetic analysis of the three Leonurus species in the Lamiaceae family, 80 protein sequences were extracted using the PhyloSuite software from the 16 chloroplast genomes in the study (Table S1). Among them, 64 shared nucleotide sequences of the common genes were found present in the 16 species. Using Corydalis yanhusuo (Papaveraceae family) and Angelica sinensis (Apiaceae family) as the outgroups, the phylogenetic tree was generated by the methods of maximum likelihood (ML) based on the 64 genes in these chloroplast genomes. The phylogenetic trees showed that all of the 14 species from the Lamiaceae family were clustered into a big branch with eight obvious clades (Fig. 11a). Among them, six species including the three Leonurus species and other two species (Phlomoides rotata and Lagopsis supina) were clustered into one great clade; the other eight species of Phlomoides betonicoides and Eriophyton wallichii, Stachys byzantina, Colquhounia seguinii and Gomphostemma lucidum, Pogostemon septentrionalis, Holmskioldia sanguinea, and Gmelina hainanensis were gathered into one branch solely in the Lamiaceae branch. Moreover, to find the affinity relationship among the more Leonurus species, we created the cladogram based on the ITS2 DNA sequences of the 19 evolutionary species including seven Leonurus species found in the NCBI database (Fig. 11b) by using the two same species as the outgroup. The results showed that the similar relationship of the three studied Leonurus apart were clustered into three different branches, which included the three embranchments of L. japonicus and L. chaituroides; L. sibiricus, and L. pseudomacranthus; L. cardiaca, L. glaucescens, and L. turkestanicus. Furthermore, the species of Eriophyton wallichii, Phlomoides rotata, Phlomoides betonicoides, and Stachys byzantina had a certain close relationship with the above three Leonurus branches. The specificity of L. sibiricus was clustered into the new sole clade in the light of cp genomes and ITS2 DNA sequences. Without a doubt, the outgroups of Corydalis yanhusuo (Papaveraceae family) and Angelica sinensis were gathered into a single branch by each and far more distantly related to the Lamiaceae species. The ML bootstrap showed strong support with bootstrap above 95% for all the nodes according to the chloroplast genomes (Fig. 11a). Thus the evolutionary relationships are plausible to elucidate the close relationship of the Leonurus species in the Lamiaceae family from the ITS2 regions in the nuclear and chloroplast genomes.