The Chloroplast Genome Structures of Zanthoxylum Species
Five Zanthoxylum species were sequenced, revealing a typcial quadripatite structure, with a range in length from 157,231 (Z. ailanthoides) to 158,728 bp (Z. piasezkii) (Fig. 1, Table 1). There are two inverted repeat (IR) regions in the plastid genome, IRa and IRb, each separated by a large single-copy sequence (LSC, 84, 368–86, 122 bp) and a small single-copy sequence (SSC, 17, 603–18, 293 bp). There was a slight variation in the GC content of plastomes among the five species, ranging from 38.4–38.5% (Table 1). Most of Zanthoxylum species possess 132 genes encoded by their plastomes, which include 87 protein-coding, 37 transfer RNA (tRNA) and eight ribosomal RNA (rRNA) genes (Fig. 1a, Table 1, Table 2), while there were 133 genes (88 protein-coding genes) in Z. piasezkii (Fig. 1b, Table 1). The main difference was that there were two rpl22 genes in Z. piasezkii, which were located on the IR side of the junction between IR and LSC. The length of rpl22 in Z. piasezkii was relatively short, only 168 bp, while that in other four species were relatively long, ranging from 255 to 450 bp, which is only located across the LSC/IRb junction.
Among these genes, nine protein-coding genes (rps19, rps7, rpl2, rpl23, rpl22, ycf2, ycf15, ndhB, and ycf1), seven tRNA genes (trnR-ACG, trnN-GUU, trnA-UGC, trnV-GAC, trnL-GAU, trnL-CAA, and trnL-CAU) and four rRNA genes (rrn5, rrn16, rrn4.5, and rrn23) were duplicated in the IR regions. Additionally, 19 genes, including 13 protein-coding genes and six tRNA genes, had two exons, while four protein-coding genes (pafI, clpP1, and two rps12) had three exons (Table S2 ).
Table 1
New sequencing and annotation of the complete chloroplast genomes of Zanthoxylum species.
Species
|
Accession Number
|
Plastome
(bp)
|
LSC
(bp)
|
SSC
(bp)
|
IRs
(bp)
|
Total genes
|
Protein-coding
genes
|
tRNA genes
|
rRNA genes
|
Overall GC content (%)
|
Z. bungeanum
|
MW206786
|
158,401
|
85, 898
|
17, 611
|
27, 466
|
132
|
87
|
37
|
8
|
38.5
|
Z. piasezkii
|
MW206785
|
158,728
|
85, 918
|
17, 612
|
27, 599
|
133
|
88
|
37
|
8
|
38.4
|
Z. armatum
|
MW602887
|
158,558
|
85, 759
|
17, 603
|
27, 598
|
132
|
87
|
37
|
8
|
38.5
|
Z. nitidum
|
MW602879
|
157,304
|
84, 368
|
17, 634
|
27, 651
|
132
|
87
|
37
|
8
|
38.5
|
Z. ailanthoides
|
MW478808
|
157,231
|
86, 122
|
18, 293
|
26, 408
|
132
|
87
|
37
|
8
|
38.4
|
Table 2
Annotated gene classification of the chloroplast genome of Zanthoxylum species
Category
|
Genes Group
|
Genes Name
|
Number
|
Self-replication
|
tRNA genes
|
trnH-GUG、trnK-UUU*、trnQ-UUG、trnS-GCU、trnG-UCC*、trnR-UCU、trnC-GCA、trnD-GUC、trnY-GUA、trnE-UUC、trnT-GGU、trnS-UGA、trnG-GCC、trnfM-CAU、trnS-GGA、trnT-UGU、trnL-UAA*、trnF-GAA、trnV-UAC*、trnM-CAU、trnW-CCA、trnP-UGG、trnI-GAU(2)*、trnL-CAA(2)、trnV-GAC(2)、trnI-GAU(2)、trnA-UGC(2)*、trnR-ACGc、trnN-GUU(2)、trnL-UAG
|
37
|
rRNA genes
|
rrn5(2)、rrn4.5(2)、rrn16(2)、rrn23(2)
|
8
|
DNA-dependent RNA polymerase
|
rpoC1*、rpoC2、rpoA、rpoB
|
4
|
Ribosomal small subunit
|
rps16*、rps2、rps14、rps4、rps18、rps12(2)*、rps11、rps8、rps3、rps19(2)、rps15、rps7(2)
|
15
|
Ribosomal large subunit
|
rpl33、rpl20、rpl36、rpl14、rpl16、rpl22(2)、rpl2(2)*、rpl23(2)、rpl32、
|
11
|
photosynthesis
|
Photosystem I
|
psaA、psaB、psaC、psaI、pafI**、pafII、psaJ
|
7
|
Photosystem II
|
psbA、psbB、psbC、psbD、psbK、psbI、psbM、psbZ、psbJ、psbL、psbF、psbE、psbT、psbN、psbH
|
15
|
Cytochrome b/f complex
|
petN、petA、petL、petG、petB*、petD*
|
6
|
ATP synthase
|
atpE、atpB、atpA、atpF*、atpH、atpI
|
6
|
Protease
|
clpP**
|
1
|
Large subunit of rubisco
|
rbcL
|
1
|
NADH dehydrogenase
|
ndhJ、ndhK、ndhC、ndhB(2)*、ndhF、ndhD、ndhE、ndhG、ndhI、ndhA*、ndhH
|
12
|
Others
|
Maturase
|
matK
|
1
|
Envelope membrane protein
|
cemA
|
1
|
Subunit of acetyl-CoA carboxylase
|
accD
|
1
|
Cytochrome c synthesis
|
ccsA
|
1
|
Function unknown
|
Open reading frames
|
ycf1(2)、ycf2(2)、ycf15(2)
|
6
|
Note: *Gene contains one intron; **Gene contains two introns; (2) indicates with two copies of the gene.
All of the protein-coding genes were composed of 25,825–26,512 codons in the chloroplast genomes of the five species of Zanthoxylum (Fig. 2, Table S3). Among these codons, leucine, arginine and serine represent the most abundant amino acids, whereas methionine (1.10–1.15%) has the lowest abundance. Based on the relative synonymous codon usage (RSCU) statistical analysis all amino acids have more than one synonymous codon, except for methionine (AUG) and tryptophan (UGG) (RSCU = 1). Moreover, half of the codons had RSCU > 1, and most of those (29/31, 93.5%) ended with A or U. The rest of the codons had RSCU < 1, and most of those (28/31, 90.3%) ended with G or C (Table S3).
IR Contraction and Expansion
The IR regions of the five chloroplast genomes ranged in size from 26,408 bp (Z. ailanthoides) to 27,651 bp (Z. nitidum) (Table 1, Fig. 1). We compared the IR borders in five widespread Zanthoxylum species, and the results showed that the IR junction regions showed slight changes (Fig. 5). In contrast, the rps3 and rps19 genes were fully located in the LSC and IR regions for five Zanthoxylum species, respectively. It is worth noting that the rpl22 gene was fully situated in the LSC/IRb border for Z. ailanthoides, Z. nitidum, Z. bungeanum, and Z. armatum, while the rpl22 gene of Z. piasezkii was truncated in the IRa and IRb regions (Fig. 5). The ycf1 was fully located in the SSC/IR borders with the same length 5487 bp expanded into the IRa region, while ycf1 gene was situated in IRb/SSC boundary have equal size 205 bp expanded into the SSC region for Z. nitidum, Z. bungeanum, Z. armatum, and Z. piasezkii, only have 4 bp expanded into the SSC region for Z. ailanthoides. The photosynthetic ndhF gene was fully situated in the SSC region in Z. ailanthoides, whereas Z. nitidum, Z. piasezkii, Z. bungeanum, and Z. armatum had an identical distance (23 bp long) from the SSC to the IRb regions, indicating a closer genetic relationship between them. The rpl2 gene were all in the IRa region for the five species. The gaps in the trnH sequences in the LSC region were 5bp, 36bp, 211bp, 52bp and 53 bp away from the IRa/LSC border in the five Zanthoxylum chloroplast genomes.