3.1 Identification, concentration and content analysis of whole genomic DNA
A clear band of genome DNA size can be seen through the detection of 1% agarose gel electrophoresis. The detection value of OD260/OD280 by the microspectrophotometer is 1.8–1.9, which indicates that the purity of the DNA sample is better to perform the PCR experiment and library construction. The information of chloroplast clean reads, obtained by filtering it through fastq and the reads were compared to chloroplast reference.
3.2 Features of C. spicatus chloroplast genome
The chloroplast genome of C. spicatus is 152155bp, including a large single copy (LSC) region of 83098bp, small single copy (SSC) region of 17665bp, and a pair of inverted repeat regions (IRa and IRb) of 25696bp by each (Fig. 1). The GC content in the whole chloroplast genome of C. spicatus is 37.86% and that of LSC, SSC and IR areas are 35.90%, 31.75% and 43.12%, respectively (Table 1). The GC content of the IR region is higher than that of the SSC region and the LSC region.
Table 1
Base composition in the chloroplast genome of Clerodendranthus spicatus
Region
|
Total
|
A (bp)
|
T (bp)
|
C (bp)
|
G (bp)
|
A + T (bp)
|
C + G (bp)
|
GC content(%)
|
LSC
|
83098
|
26002
|
27263
|
15257
|
14575
|
53265
|
29832
|
35.90
|
SSC
|
17665
|
6034
|
6021
|
2677
|
2932
|
12055
|
5609
|
31.75
|
IRb
|
25696
|
7324
|
7292
|
5337
|
5744
|
14616
|
11081
|
43.12
|
IRa
|
25696
|
7292
|
7324
|
5744
|
5337
|
14616
|
11081
|
43.12
|
Total
|
152155
|
46652
|
47900
|
29015
|
28588
|
94552
|
57603
|
37.86
|
3.3 Gene Content
One hundred and thirty-one genes in the circular genome of C. spicatus, including 87 protein-coding genes, 36 tRNA genes, and 8 rRNA genes were successfully annotated (Table 2). Fifteen protein-coding genes (rps16, rps7, rpl2, rpl23, ndhA, ndhB, ndhD, ndhE, ndhG, ndhH, ndhI, psaC, ycf1, ycf15 and ycf2), 7 tRNA genes (trnK-UUU, trnT-CGU, trnL-UAA, trnE-UUC (×2), trnA-UGC (×2)), and 8 rRNA genes (rrn16S (×2), rrn23S (×2), rrn4.5S (×2), rrn5S (×2)) are located in the IR region. Among these genes, nineteen cis-splicing genes, contain one or two intron, e.g., ten CDS (rps16(×2), atpF, rpoC1, petD, rpl2(×2), ndhB(×2) and ndhA) and 7 tRNA genes contain one intron two kinds of protein-coding genes (ycf3 and clpP) contain two introns and three exons. Furthermore, the rps12 is a trans-splicing gene, which also contain three exons. As shown in the Fig. 1.. The white areas present introns, and the black areas stand for exons (Table 2 and Fig. 2, 3). The structures of trans-splicing genes in CDS from the plastome of C. spicatus are shown in Fig. 4. The white area is exon 2 in IRa, the black area is antother the exon 2 in IRb and the grey area is the exon1 (Fig. 4). The arrow shows the sense direction of the forward and reverse genes.
Table 2
Genes Contents of the chloroplast genome of Clerodendranthus spicatus
Category for genes
|
Group of genes
|
Name of genes
|
rRNA
|
rRNA genes
|
rrn16S (×2), rrn23S (×2), rrn5S (×2), rrn4.5S (×2)
|
tRNA
|
tRNA genes
|
36 unique trna genes (6 contains 1 intron)
|
Self-replication
|
Small subunit of ribosome
|
rps11, rps12(×2), rps14, rps15, rps16, rps18, rps19, rps2, rps3, rps4, rps7(×2), rps8
|
Large subunit of ribosome
|
rpl14, rpl16, rpl2(×2), rpl20, rpl22, rpl23(×2), rpl32, rpl33, rpl36
|
DNA dependent RNA polymerase
|
rpoA, rpoB, rpoC1, rpoC2
|
Photosynthesis
|
Subunits of NADH-dehydrogenase
|
ndhA, ndhB (×2), ndhC, ndhD, ndhE, ndhF, ndhG, ndhH, ndhI, ndhJ, ndhK
|
Subunits of photosystem I
|
psaA, psaB, psaC, psaI, psaJ
|
Subunits of photosystem II
|
psbA, psbB, psbC, psbD, psbE, psbF, psbI, psbJ, psbK, psbL, psbM, psbN, psbT, psbZ, ycf3
|
Subunits of cytochrome b/f complex
|
petA, petB, petD, petG, petL, petN
|
Subunits of ATP synthase
|
atpA, atpB, atpE, atpF, atpH, atpI
|
Large subunit of rubisco
|
rbcL
|
Other genes
|
Maturase
|
matK
|
Protease
|
clpP
|
Envelope membrane protein
|
cemA
|
Subunit of Acetyl-CoA-carboxylase
|
accD
|
c-type cytochrom synthesis gene
|
ccsA
|
Genes of unknown functions
|
ycf1, ycf15 (×2), ycf2 (×2), ycf4
|
The “(x2)” indicates that the gene located in the IRs and thus had two copies. |
Table 3
The lengths of introns and exons in genes from chloroplast genome of Clerodendranthus spicatus
Gene
|
Location
|
Start
|
End
|
length(bp)
|
Exon I
|
Intron I
|
Exon II
|
Intron II
|
Exon III
|
trnK-UUU
|
LSC
|
1672
|
4250
|
37
|
2506
|
36
|
|
|
rps16
|
LSC
|
4777
|
5910
|
40
|
867
|
227
|
|
|
trnT-CGU
|
LSC
|
8970
|
9733
|
35
|
686
|
43
|
|
|
atpF
|
LSC
|
11725
|
12965
|
145
|
686
|
410
|
|
|
rpoC1
|
LSC
|
20741
|
23547
|
432
|
764
|
1611
|
|
|
ycf3
|
LSC
|
41971
|
43919
|
126
|
710
|
228
|
732
|
153
|
trnL-UAA
|
LSC
|
46897
|
47453
|
35
|
472
|
50
|
|
|
clpP
|
LSC
|
69239
|
71170
|
71
|
701
|
294
|
640
|
226
|
petD
|
LSC
|
75665
|
76848
|
8
|
701
|
475
|
|
|
rpl16
|
LSC
|
80283
|
81511
|
9
|
821
|
399
|
|
|
rpl2
|
IRb
|
83203
|
84681
|
391
|
654
|
434
|
|
|
ndhB
|
IRb
|
93392
|
95599
|
775
|
675
|
758
|
|
|
rps12
|
IRb
|
96927
|
97169
|
242
|
-
|
25
|
357
|
113
|
trnE-UUC
|
IRb
|
100907
|
101924
|
32
|
946
|
40
|
|
|
trnA-UGC
|
IRb
|
101989
|
102866
|
37
|
805
|
36
|
|
|
ndhA
|
SSC
|
115271
|
117373
|
553
|
1011
|
539
|
|
|
trnA-UGC
|
IRa
|
132388
|
133265
|
37
|
805
|
36
|
|
|
trnE-UUC
|
IRa
|
133330
|
134347
|
32
|
946
|
40
|
|
|
rps12
|
IRa
|
138085
|
138327
|
113
|
-
|
242
|
357
|
25
|
ndhB
|
IRa
|
139655
|
141862
|
775
|
675
|
758
|
|
|
rpl2
|
IRa
|
150573
|
152051
|
391
|
654
|
434
|
|
|
3.4 The characteristics of rRNAs and tRNAs genes
There are 8 rRNA genes in the chloroplast genome of C. spicatus, including rrn16S (×2), rrn23S (×2), rrn4.5S (×2), rrn5S (×2) with the inverse direction by one pairs. The length of rrn16S, rrn23S, rrn4.5S and rrn5S is 1491bp, 2811bp, 265bp and 131bp, respectively. The GC contents of them are 56.54%, 54.93%,45.28% and 50.38%, respectively (Fig. 5). The average of GC contents is 51.78%. Through the scanning of tRNAs, we can find 18 types of amino acids can be transported, including Arg, Asn, Asp, Cys, Gly, Gln, Glu, His, Ile, Leu, Met, Phe, Pro, Ser, Trp, Thr, Tyr and Val, of which the anti-codons are TCT and ACG, GTT, GTC, GCA, GCC, TTG, TTC, GTG, GAT, CAA and TAG, CAT, GAA, TGG, TGA, GGA and GCT, CCA, GGC and TGT, GTA, GAC, respectively (Table 4).
Table 4
The scanned tRNA from chloroplast genome of Clerodendranthus spicatus
tRNA No.
|
tRNA Bounds
|
tRNA Type
|
Anti-Codon
|
Intron Bounds
|
Score
|
Begin
|
End
|
Begin
|
End
|
1
|
9924
|
9995
|
Arg
|
TCT
|
0
|
0
|
67.2
|
2
|
27973
|
28043
|
Cys
|
GCA
|
0
|
0
|
60.7
|
3
|
30965
|
31036
|
Thr
|
GGT
|
0
|
0
|
67.4
|
4
|
35835
|
35905
|
Gly
|
GCC
|
0
|
0
|
61.4
|
5
|
44762
|
44848
|
Ser
|
GGA
|
0
|
0
|
72.2
|
6
|
47736
|
47808
|
Phe
|
GAA
|
0
|
0
|
71.9
|
7
|
51851
|
51923
|
Met
|
CAT
|
0
|
0
|
59.5
|
8
|
98827
|
98898
|
Val
|
GAC
|
0
|
0
|
58.9
|
9
|
100907
|
100994
|
Ile
|
GAT
|
100943
|
100958
|
16.4
|
10
|
106662
|
106735
|
Arg
|
ACG
|
0
|
0
|
57.7
|
11
|
127874
|
127945
|
Asn
|
GTT
|
0
|
0
|
72.5
|
12
|
142432
|
142512
|
Leu
|
CAA
|
0
|
0
|
62.3
|
13
|
150034
|
150107
|
Met
|
CAT
|
0
|
0
|
70.3
|
14
|
136427
|
136356
|
Val
|
GAC
|
0
|
0
|
58.9
|
15
|
134347
|
134260
|
Ile
|
GAT
|
134311
|
134296
|
16.4
|
16
|
128592
|
128519
|
Arg
|
ACG
|
0
|
0
|
57.7
|
17
|
122867
|
122788
|
Leu
|
TAG
|
0
|
0
|
58.5
|
18
|
107380
|
107309
|
Asn
|
GTT
|
0
|
0
|
72.5
|
19
|
92822
|
92742
|
Leu
|
CAA
|
0
|
0
|
62.3
|
20
|
85220
|
85147
|
Met
|
CAT
|
0
|
0
|
70.3
|
21
|
66129
|
66056
|
Pro
|
TGG
|
0
|
0
|
65.1
|
22
|
65887
|
65814
|
Trp
|
CCA
|
0
|
0
|
71.9
|
23
|
46208
|
46136
|
Thr
|
TGT
|
0
|
0
|
69.3
|
24
|
36151
|
36078
|
Met
|
CAT
|
0
|
0
|
63.2
|
25
|
35068
|
34976
|
Ser
|
TGA
|
0
|
0
|
78.4
|
26
|
30363
|
30291
|
Glu
|
TTC
|
0
|
0
|
56.2
|
27
|
30207
|
30124
|
Tyr
|
GTA
|
0
|
0
|
62.3
|
28
|
30006
|
29933
|
Asp
|
GTC
|
0
|
0
|
67.9
|
29
|
8231
|
8144
|
Ser
|
GCT
|
0
|
0
|
74.5
|
30
|
6922
|
6851
|
Gln
|
TTG
|
0
|
0
|
58.9
|
31
|
85
|
12
|
His
|
GTG
|
0
|
0
|
60.7
|
3.5 Analysis of RSCU and CAI regarding the codon usage
Gene sequence and the frequency of genetic code usages are closely related to plant evolution and genetic relationship [38]. From the codon usage analysis of results from the chloroplast genome of C. spicatus, there were 26421 codons in all protein-coding genes, of which Isoleucine (1102 codons, accounting for 4.17% of the whole codons) were the richest amino acid in the C. spicatus. chloroplast genomes [39]. Lysine was the second richest amino acid, accounting for 4.06% of the whole codons, while cysteine only had 0.24% of the whole codons. Besides, total of the fractions and frequencies for all codons usage are 21.001 and 1000, respectively (Table 5). Therefore it can be found that different codon appear and use in different frequency and RSCU values are an indication of how many times the codon is observed relative to the number of times it should be observed in the absence of any codon usage bias for a particular amino acid related to the evolutionary of species [40] (Table 5). The RSCU varied from 0.336 to 2.9904. RSCU value > 1 for each codon shows that this codon is preferred. In this study, the codons of AUG, UUA and AGA codon had the higher RSCU value indicating the presence of higher codon usage bias in total of 61 genes. The CAI value is 0.645 indicating the codon preference of genes. Excluding the stop codons, only two amino acids (Trp and Met) are encoded by a kind of codon, respectively [41].
Table 5
Codon usage and codon-anticodon recognition patterns in Clerodendranthus spicatus chloroplast genome
AminoAcid
|
Symbol
|
Codon
|
No.
|
Fraction
|
Frequency
|
RSCU
|
tRNA
|
A
|
Ala
|
GCA
|
385
|
0.271
|
13.797
|
1.0944
|
trnA-UGC
|
A
|
Ala
|
GCC
|
224
|
0.151
|
7.68
|
0.6368
|
-
|
A
|
Ala
|
GCG
|
171
|
0.123
|
6.277
|
0.486
|
-
|
A
|
Ala
|
GCU
|
627
|
0.454
|
23.081
|
1.7824
|
-
|
C
|
Cys
|
UGC
|
64
|
0.261
|
3.068
|
0.4354
|
trnC-GCA
|
C
|
Cys
|
UGU
|
230
|
0.739
|
8.703
|
1.5646
|
-
|
D
|
Asp
|
GAC
|
210
|
0.199
|
8.081
|
0.3922
|
trnD-GUC
|
D
|
Asp
|
GAU
|
861
|
0.801
|
32.627
|
1.6078
|
-
|
E
|
Glu
|
GAA
|
1032
|
0.754
|
38.803
|
1.5166
|
trnE-UUC
|
E
|
Glu
|
GAG
|
329
|
0.246
|
12.634
|
0.4834
|
-
|
F
|
Phe
|
UUC
|
499
|
0.337
|
19.532
|
0.6712
|
trnF-GAA
|
F
|
Phe
|
UUU
|
988
|
0.663
|
38.442
|
1.3288
|
-
|
G
|
Gly
|
GGA
|
725
|
0.391
|
25.187
|
1.6264
|
trnG-UCC
|
G
|
Gly
|
GGC
|
204
|
0.126
|
8.142
|
0.4576
|
trnG-GCC
|
G
|
Gly
|
GGG
|
313
|
0.188
|
12.092
|
0.702
|
-
|
G
|
Gly
|
GGU
|
541
|
0.296
|
19.071
|
1.2136
|
-
|
H
|
His
|
CAC
|
141
|
0.245
|
5.936
|
0.4608
|
trnH-GUG
|
H
|
His
|
CAU
|
471
|
0.755
|
18.249
|
1.5392
|
-
|
I
|
Ile
|
AUA
|
680
|
0.303
|
25.147
|
0.9084
|
-
|
I
|
Ile
|
AUC
|
464
|
0.206
|
17.126
|
0.6198
|
trnI-GAU
|
I
|
Ile
|
AUU
|
1102
|
0.491
|
40.849
|
1.4721
|
-
|
K
|
Lys
|
AAA
|
1072
|
0.727
|
40.849
|
1.4816
|
trnK-UUU
|
K
|
Lys
|
AAG
|
375
|
0.273
|
15.361
|
0.5184
|
-
|
L
|
Leu
|
CUA
|
397
|
0.132
|
13.977
|
0.8412
|
trnL-UAG
|
L
|
Leu
|
CUC
|
185
|
0.066
|
6.999
|
0.3918
|
-
|
L
|
Leu
|
CUG
|
194
|
0.073
|
7.68
|
0.411
|
-
|
L
|
Leu
|
CUU
|
605
|
0.214
|
22.54
|
1.2822
|
-
|
L
|
Leu
|
UUA
|
869
|
0.3
|
31.644
|
1.842
|
trnL-UAA
|
L
|
Leu
|
UUG
|
581
|
0.215
|
22.68
|
1.2312
|
trnL-CAA
|
M
|
Met
|
AUG
|
620
|
1
|
22.54
|
2.9904
|
trnI-GAU
|
N
|
Asn
|
AAC
|
302
|
0.247
|
11.811
|
0.479
|
trnN-GUU
|
N
|
Asn
|
AAU
|
959
|
0.753
|
36.016
|
1.521
|
-
|
P
|
Pro
|
CCA
|
308
|
0.283
|
11.711
|
1.11
|
trnP-UGG
|
P
|
Pro
|
CCC
|
229
|
0.215
|
8.864
|
0.8252
|
-
|
P
|
Pro
|
CCG
|
165
|
0.159
|
6.557
|
0.5944
|
-
|
P
|
Pro
|
CCU
|
408
|
0.343
|
14.178
|
1.4704
|
-
|
Q
|
Gln
|
CAA
|
711
|
0.756
|
27.112
|
1.5242
|
trnQ-UUG
|
Q
|
Gln
|
CAG
|
222
|
0.244
|
8.763
|
0.4758
|
-
|
R
|
Arg
|
AGA
|
495
|
0.299
|
18.589
|
1.8264
|
trnR-UCU
|
R
|
Arg
|
AGG
|
172
|
0.119
|
7.42
|
0.6348
|
-
|
R
|
Arg
|
CGA
|
367
|
0.218
|
13.516
|
1.3542
|
-
|
R
|
Arg
|
CGC
|
120
|
0.074
|
4.572
|
0.4428
|
-
|
R
|
Arg
|
CGG
|
130
|
0.084
|
5.194
|
0.48
|
-
|
R
|
Arg
|
CGU
|
342
|
0.206
|
12.814
|
1.2618
|
trnR-ACG
|
S
|
Ser
|
AGC
|
116
|
0.061
|
4.933
|
0.336
|
trnS-GCU
|
S
|
Ser
|
AGU
|
421
|
0.202
|
16.203
|
1.2198
|
-
|
S
|
Ser
|
UCA
|
401
|
0.189
|
15.16
|
1.1616
|
trnS-UGA
|
S
|
Ser
|
UCC
|
351
|
0.168
|
13.456
|
1.017
|
trnS-GGA
|
S
|
Ser
|
UCG
|
197
|
0.099
|
7.921
|
0.5706
|
-
|
S
|
Ser
|
UCU
|
585
|
0.282
|
22.6
|
1.695
|
-
|
T
|
Thr
|
ACA
|
397
|
0.292
|
14.358
|
1.194
|
trnT-UGU
|
T
|
Thr
|
ACC
|
247
|
0.188
|
9.225
|
0.7428
|
trnT-GGU
|
T
|
Thr
|
ACG
|
144
|
0.112
|
5.515
|
0.4332
|
-
|
T
|
Thr
|
ACU
|
542
|
0.408
|
20.053
|
1.63
|
-
|
V
|
Val
|
GUA
|
541
|
0.37
|
19.472
|
1.5196
|
trnV-UAC
|
V
|
Val
|
GUC
|
172
|
0.127
|
6.698
|
0.4832
|
trnV-GAC
|
V
|
Val
|
GUG
|
181
|
0.133
|
6.999
|
0.5084
|
-
|
V
|
Val
|
GUU
|
530
|
0.369
|
19.432
|
1.4888
|
-
|
W
|
Trp
|
UGG
|
465
|
1
|
18.469
|
1
|
trnW-CCA
|
Y
|
Tyr
|
UAC
|
186
|
0.197
|
7.119
|
0.3896
|
trnY-GUA
|
Y
|
Tyr
|
UAU
|
769
|
0.803
|
28.937
|
1.6104
|
-
|
Stop
|
Ter
|
UAA
|
46
|
0.423
|
3.188
|
1.5861
|
-
|
Stop
|
Ter
|
UAG
|
25
|
0.309
|
2.326
|
0.8622
|
-
|
Stop
|
Ter
|
UGA
|
16
|
0.269
|
2.025
|
0.5517
|
-
|
3.6 Repeat Sequences analysis
Repeat sequences are kinds of important genetic markers and are closely related to the origin and evolution of species [42]. Repeat sequences can generally be divided into scattered(interspersed) repetition and tandem repetition sequence (TRS) [43]. Interspersed repetition sequences are scattered in the way and distributed in the genome. Multiple repeats of a sequence on a chromosome are called tandem repeats. A special form of tandem repeats is simple tandem repeats, also known as simple repeats (SSR) [44]. SSRs often have natural polymorphism. The cpSSRs are the focus of chloroplast genome repeat analysis. Therefore, we analyzed three types of repetitive sequences (simple sequence repeats, tandem repeats, and interspersed repeats) in the chloroplast genome of C. spicatus. For the simple sequence repeat, 28 repeats (12 A,15T and 1 TA) were identified and the types are P1and P2 (Table S2). The sizes of SSRs are between 10bp and 15bp (Table S2). For the tandem repeats, 36 repeats were identified in the chloroplast genome of C. spicatus, which conformed with the two conditions that the length of the repeat unit was more than 20 bp and the similarity among the repeat unit sequences was more than 70% (Table S3) [45]. For interspersed repeats, 40 repeats were identified including 22 palindromic repeats and 18 direct repeats (Table S4). The length of repeat unit 1, 2 are between 30bp and 60bp. The E-values of interspersed repeats in the chloroplast genome of C. spicatus are from 7.80E-23 to 6.19E-04 (Table S4).
3.7 IR structures and genes analysis of seven selected species
The sizes of the four regions of the chloroplast genome from 7 selected species were analyzed, while the boundary between each two adjacent regions were also analyzed. The results showed that the selected chloroplast genomes had the diverse similiar structures confirming with the different sizes of four areas (Fig. 6). For the seven species, the rps19 genes were located in the border area of LSC and IRb in the species of C. spicatus, Salvia miltiorrhiza, Salvia miltiorrhiza f. alba and Tectona grandis. However, the rps19 genes were located in the area of LSC from Epimedium brevicornu and located in the area of IRb area from Cistanche deserticola. The rps19 gene had unstable position relationship, sometimes across the area of LSC-IRb, while other times only within the areas of LSC or IRb. There is no rps19 and ycf1 genes found in Glechoma longituba. In contrast, ndhF genes were located at the border area of IRb and SSC in the species of Salvia miltiorrhiza, Salvia miltiorrhiza f. alba and Tectona grandis and located at area of SSC in the species of Epimedium brevicornu [46]. Most of the ycf1 genes were located in the border area of SSC and IRa in the genus of Salvia and Tectona grandis,while it was also located in the area of SSC and IRb in C. spicatus, Epimedium brevicornu,Glechoma longituba and Tectona grandis. Besides, rpl2, trnH gene spacers were located in the IRa and LSC areas respectively, except the species of Glechoma longituba, respectively. A small fragment of the rps19 gene was found in the border area of IRa and LSC in the genus of Salvia and Cistanche deserticola. There had the psbA genes existing in the genus of Salvia, Tectona grandis and C. spicatus.
3.8 Hypervariable region identification
To investigate the chloroplast genome divergence of 5 relative species, those are C. spicatus, Salvia miltiorrhiza, Salvia miltiorrhiza f. alba, Tectona grandis,Glechoma longituba based on the results of phylogenetic analysis, we conducted a genetic distance analysis of intergenic spacer regions (IGS) for them. The result showed 30 out of 58 intergenic spacer regions were identical with K2p (Kimura 2-parameter) values varying from 6.084 to 29.242 (Fig. 7), of which five intergenic spacer regions had higher K2p values and variation above the value of 18.0, namely, ndhG-ndhI (29.242), accD-psaI (22.442), rps15-ycf1 (19.488), rpl20-clpP (18.322), and ccsA-ndhD (18.091). We can develop the specific molecular markers within the variations of these IGS regions and use them to distinctively identify the species [47].
3.9 Phylogenetic analysis
The structure of chloroplast genome is simple and the length is small. The sequence of it is conserved and genes are mostly orthologous. Therefore it is of great value to study the evolution relationship between green plants and the chloroplast genome. In this study, Clerodendranthus spicatus is from a kind of single genus from Lamiaceae family. There are 82 CDS shared gene nucleic acid sequences were extracted from the 15 species and used to construct the phylogenetic trees (Table S5). Among the CDS, some species,are distinct with the proteins after comparisons. Cistanche deserticola is special in the proteins of psbM, rpl14, rpl33, rpl36, rps3, rps4, rps7, rps12, rps14, rps16, rps18, rps19 and ycf4. In addition, the proteins of psbZ, rpoC2 are common in the 13 species. However, psbZ is loss in the species of Cistanche deserticola and Glechoma longituba. rpoC2 is loss in Cistanche deserticola and Tectona grandi. The protein of ycf15 exists in 14 species except the Epimedium brevicornu. The proteins of rpoC and lhbA are only common in the species of Tectona grandis, Glechoma longituba, apartly. In the species of Dipsacus asper, proteins of ndhI, psbC, rpl22, rpoB, rps2, rps14, rps18 are diverse from others. The psbI in Eucommia ulmoides, rps3 in Rheum tanguticum, rps12 and ycf4 in Cullen corylifolium are distinct (Table S5).
The phylogenetic tree showed that 13 species are clustered into one branch except two outgroups, the species of Epimedium brevicornu and Dioscorea polystachya. Bootstrap analysis showed that there were 7 out of 11 nodes with 100% bootstrap values. At the one branch, the tree is subdivided into two branches, the 6 species of Lamiales, Eucommia ulmoides and Dipsacus asper are clustered together, otherwise, 4 species from Polygonaceae family and Cullen corylifolium from Fabaceae family were clustered together (Fig. 8) [48]. It indicated that the herbaceous plants C. spicatus, two genus of Salvia and Glechoma longituba from Lamiaceae family were closely related genetic relationship. The xylophyta species of Tectona grandis is some correlation to these four plants above with bootstrap value of 87. Cistanche deserticola is clustered into single position within the one branch because it maybe a parasitic plant parasitic at the roots of the tree shuttle in the desert differed from others about the many proteins and specific genes. Nevertheless, the two plants of Epimedium brevicornu and Dioscorea polystachya were not related to the C. spicatus and were more distant relationships with others.