3.1 Genome Structure of Zhuyeqi Chloroplast Genome
Using the tea (Camellia sinensis) chloroplast genome (NC_038198.1) as a reference, we assembled and annotated the complete cp genome of Zhuyeqi using Illumina NovaSeq 6000 sequencing platform. The cp genome of Zhuyeqi has a typical quadripartite structure, consisting of one large single-copy (LSC) region, one small single-copy (SSC) region, and a pair of inverted repeats (IRa and IRb), with lengths of 86,628 bp, 18,282 bp, and 26,081 bp, respectively (Fig. 1, Table 1). The overall nucleotide composition were A of 31.12%, T of 32.11%, C of 18.78% and G of 17.99%, resulting a total AT content of the cp genome of 63.23% and GC of 36.77%. A total of 135 genes were annotated in the Zhuyeqi cp genome, including 90 protein-coding genes(CDS), 37 transfer RNA genes (tRNAs), and 8 ribosomal RNA genes (rRNAs). The AT content of the Zhuyeqi chloroplast genome is 62.71%, which is significantly higher than the GC content (37.29%). The GC content of the LSC, SSC, and IR regions is 35.31%, 30.54%, and 42.95%, respectively, with the GC content of the IR regions being higher than that of the LSC and SSC regions.
Table 1
Characteristics of Zhuyeqi cp genome
Category
|
Item
|
Describe
|
Chloroplast genome structure
|
Cp gene/bp
|
157072
|
LSC/bp
|
86628
|
SSC/bp
|
18282
|
IRA/IRB/bp
|
26081
|
Gene composition
|
Cp gene
|
135
|
CDS
|
90
|
tRNA
|
37
|
rRNA
|
8
|
GC Content (%)
|
Cp gene
|
37.29
|
LSC
|
35.31
|
SSC
|
30.54
|
IRA/IRB/
|
42.95
|
According to functional classification, the genes in the Zhuyeqi chloroplast genome can be divided into four categories: photosynthesis genes (44), self-replication genes (74), other genes (6), and genes of unknown function (10) (Table 2). There are a total of 20 duplicated genes in the Zhuyeqi chloroplast genome, including one NADH dehydrogenase subunit gene (ndhB), four self-replication genes (rpl2, rpl23, rps12, rps7), four rRNA genes (rrn16, rrn23, rrn4.5, rrn5), six tRNA genes (trnA-UGC, trnI-CAU, trnI-GAU, trnL-CAA, trnN-GUU, trnR-ACG, trnV-GAC), and four unknown function protein genes (orf42, ycf1, ycf15, ycf2). Most of the genes in the Zhuyeqi chloroplast genome are non-coding, with only a few genes containing one or two introns. Among them, ndhA, ndhB, petB, petD, atpF, rpl16, rpl2, rps12, rps16, rpoC1, trnA-UGC, trnI-GAU, trnK-UUU, trnL-UAA, trnT-CGU, trnV-UAC contain one intron, while clpP and ycf3 contain two introns. It is worth noting that further research using advanced molecular techniques is needed to determine the functions of the 10 unknown function genes identified in the Zhuyeqi chloroplast genome.
Table 2
Genes present in chloroplast genome of Zhuyeqi
Category of genes
|
Group of genes
|
Names of genes
|
Photosynthesis
|
Subunits of photosystem I
|
psaA, psaB, psaC, psaI, psaJ
|
Subunits of photosystem II
|
psbA, psbB, psbC, psbD, psbE, psbF, psbH, psbI, psbJ, psbK, psbL, psbM, psbN, psbT, psbZ
|
Subunits of NADH dehydrogenase
|
ndhA*, ndhB*(2), ndhC, ndhD, ndhE, ndhF, ndhG, ndhH, ndhI, ndhJ, ndhK
|
Subunits of cytochrome b/f complex
|
petA, petB*, petD*, petG, petL, petN
|
Subunits of ATP synthase
|
atpA, atpB, atpE, atpF*, atpH, atpI
|
Large subunit of rubisco
|
rbcL
|
Subunits photochlorophyllide reductase
|
-
|
Self-replication
|
Proteins of large ribosomal subunit
|
rpl14, rpl16*, rpl2*(2), rpl20, rpl22, rpl23(2), rpl32, rpl33, rpl36
|
Proteins of small ribosomal subunit
|
rps11, rps12*(2), rps14, rps15, rps16*, rps18, rps19, rps2, rps3, rps4, rps7(2), rps8
|
Subunits of RNA polymerase
|
rpoA, rpoB, rpoC1*, rpoC2
|
Ribosomal RNAs
|
rrn16(2), rrn23(2), rrn4.5(2), rrn5(2)
|
Transfer RNAs
|
trnA-UGC*(2), trnC-GCA, trnD-GUC, trnE-UUC, trnF-GAA, trnG-UCC, trnH-GUG, trnI-CAU(2), trnI-GAU*(2), trnK-UUU*, trnL-CAA(2), trnL-UAA*, trnL-UAG, trnM-CAU, trnN-GUU(2), trnP-UGG, trnQ-UUG, trnR-ACG(2), trnR-UCU, trnS-GCU, trnS-GGA, trnS-UGA, trnT-CGU*, trnT-GGU, trnT-UGU, trnV-GAC(2), trnV-UAC*, trnW-CCA, trnY-GUA, trnfM-CAU
|
Other genes
|
Maturase
|
matK
|
Protease
|
clpP**
|
Envelope membrane protein
|
cemA
|
Acetyl-CoA carboxylase
|
accD
|
c-type cytochrome synthesis gene
|
ccsA
|
Translation initiation factor
|
infA
|
other
|
-
|
Genes of unknown function
|
Conserved hypothetical chloroplast ORF
|
orf42(2), ycf1(2), ycf15(2), ycf2(2), ycf3**, ycf4
|
Note: * indicate one intron in the gene; ** indicate two introns in the gene; Gene (2): multiple copy gene |
3.2 Analysis of codon preference
There are significant differences in the pattern of codon preference among different species. Each amino acid is corresponded to at least one codon, and up to six codons. This unequal usage of synonymous codons is referred to as codon preference. Natural selection and base mutation are considered to be the main factors influencing codon preference. Codon W 1.4.4 software was used to analyze the relative synonymous codon usage (RSCU) of each amino acid in the Zhuyeqi chloroplast genome (Fig. 2 and Table 3). The results showed that there were 31 high-frequency codons with RSCU > 1, among which 13 and 16 high-frequency codons ended with A and U, respectively, accounting for 93.55% of the total. Only two high-frequency codons ended with G, and there were no high-frequency codons ending with C. It can be seen that the Zhuyeqi chloroplast genome has a preference for using A or U to end codons.
Table 3
RSCU analysis of each amino acid in the Zhuyeqi chloroplast genome.
AminoAcid
|
Symbol
|
Codon
|
No.
|
RSCU
|
*
|
Ter
|
UAA
|
42
|
1.4001
|
*
|
Ter
|
UAG
|
26
|
0.8667
|
*
|
Ter
|
UGA
|
22
|
0.7332
|
A
|
Ala
|
GCA
|
399
|
1.1384
|
A
|
Ala
|
GCC
|
226
|
0.6448
|
A
|
Ala
|
GCG
|
137
|
0.3908
|
A
|
Ala
|
GCU
|
640
|
1.826
|
C
|
Cys
|
UGC
|
75
|
0.4966
|
C
|
Cys
|
UGU
|
227
|
1.5034
|
D
|
Asp
|
GAC
|
205
|
0.3674
|
D
|
Asp
|
GAU
|
911
|
1.6326
|
E
|
Glu
|
GAA
|
1060
|
1.5068
|
E
|
Glu
|
GAG
|
347
|
0.4932
|
F
|
Phe
|
UUC
|
552
|
0.7164
|
F
|
Phe
|
UUU
|
989
|
1.2836
|
G
|
Gly
|
GGA
|
746
|
1.6496
|
G
|
Gly
|
GGC
|
188
|
0.4156
|
G
|
Gly
|
GGG
|
303
|
0.67
|
G
|
Gly
|
GGU
|
572
|
1.2648
|
H
|
His
|
CAC
|
140
|
0.4274
|
H
|
His
|
CAU
|
515
|
1.5726
|
I
|
Ile
|
AUA
|
746
|
0.9597
|
I
|
Ile
|
AUC
|
465
|
0.5982
|
I
|
Ile
|
AUU
|
1121
|
1.4421
|
K
|
Lys
|
AAA
|
1084
|
1.4818
|
K
|
Lys
|
AAG
|
379
|
0.5182
|
L
|
Leu
|
CUA
|
380
|
0.8112
|
L
|
Leu
|
CUC
|
209
|
0.4464
|
L
|
Leu
|
CUG
|
185
|
0.3948
|
L
|
Leu
|
CUU
|
587
|
1.2528
|
L
|
Leu
|
UUA
|
887
|
1.893
|
L
|
Leu
|
UUG
|
563
|
1.2018
|
M
|
Met
|
AUC
|
2
|
0.009
|
M
|
Met
|
AUG
|
661
|
2.982
|
M
|
Met
|
GUG
|
2
|
0.009
|
N
|
Asn
|
AAC
|
304
|
0.4582
|
N
|
Asn
|
AAU
|
1023
|
1.5418
|
P
|
Pro
|
CCA
|
336
|
1.1884
|
P
|
Pro
|
CCC
|
198
|
0.7004
|
P
|
Pro
|
CCG
|
146
|
0.5164
|
P
|
Pro
|
CCU
|
451
|
1.5952
|
Q
|
Gln
|
CAA
|
719
|
1.5216
|
Q
|
Gln
|
CAG
|
226
|
0.4784
|
R
|
Arg
|
AGA
|
506
|
1.872
|
R
|
Arg
|
AGG
|
165
|
0.6102
|
R
|
Arg
|
CGA
|
389
|
1.4388
|
R
|
Arg
|
CGC
|
90
|
0.333
|
R
|
Arg
|
CGG
|
114
|
0.4218
|
R
|
Arg
|
CGU
|
358
|
1.3242
|
S
|
Ser
|
AGC
|
120
|
0.3408
|
S
|
Ser
|
AGU
|
431
|
1.2246
|
S
|
Ser
|
UCA
|
417
|
1.1844
|
S
|
Ser
|
UCC
|
331
|
0.9402
|
S
|
Ser
|
UCG
|
181
|
0.5142
|
S
|
Ser
|
UCU
|
632
|
1.7952
|
T
|
Thr
|
ACA
|
416
|
1.2236
|
T
|
Thr
|
ACC
|
253
|
0.744
|
T
|
Thr
|
ACG
|
141
|
0.4148
|
T
|
Thr
|
ACU
|
550
|
1.6176
|
V
|
Val
|
GUA
|
539
|
1.4952
|
V
|
Val
|
GUC
|
175
|
0.4856
|
V
|
Val
|
GUG
|
198
|
0.5492
|
V
|
Val
|
GUU
|
530
|
1.47
|
W
|
Trp
|
UGG
|
483
|
1
|
Y
|
Tyr
|
UAC
|
201
|
0.3956
|
Y
|
Tyr
|
UAU
|
815
|
1.6044
|
3.3. Detection of Repeat Sequences and SSRs in the cp genome of Zhuyeqi
In the genome of Zhuyeqi chloroplast, 40 repeat sequences were identified, among which palindrome repeat sequences were the most common, accounting for 50% of the total repeats (20 repeats), while forward repeat sequences accounted for the other 50% (20 repeats). No reverse or complementary repeat sequences were found. There were 14 repeats with a length ranging from 30 to 38 bp, and 25 repeats with a length ranging from 39 to 82 bp. One repeat sequence had a length of 26,081 bp(Fig. 3).
The SSR analysis of the Zhuyeqi chloroplast genome was performed using MISA v1.0 software (Fig. 4). A total of 247 SSR loci that met the criteria were identified in the Zhuyeqi chloroplast genome, with 143, 50, and 54 SSR loci located in the LSC, SSC, and IR regions, respectively, accounting for 57.9%, 21.9%, and 20.2% of the total. Among all SSRs, there were 157 mononucleotide repeats, 4 dinucleotide repeats, 72 trinucleotide repeats, 12 tetranucleotide repeats, and 2 pentanucleotide repeats. The most commonly repeated base element was A/T, followed by AT/TA, ATA, TAA, and TTC. These results indicated that A/T bases were dominant and had a base preference, which was consistent with the AT content of 62.71% in the Zhuyeqi chloroplast genome, suggesting that the ease of strand separation of AT bases compared to GC bases may be related.
3.4. IR Expansion and Contraction
Taking the reported chloroplast genomes of 7 species or varieties in Camellia family and Camellia sinensis L. var Zhuyeqi together, the IR/LSC and IR/SSC boundaries were compared (Fig. 5). The total length of the chloroplast genome of Zhuyeqi (157,072 bp) was the shorter than Camellia var. ‘pubilimba’ and Camellia sinensis var. ‘Dahongpao’, while showed longer than the other six Camellia varieties. The main marker genes of rps19, ndhF, ycf1, and trnH in chloroplast were detected to be at the LSC/IRb, IRb/SSC, SSC/IRa, and IRa/LSC boundaries, respectively. Among these genes, rps19 mainly located at LSC region or crossed LSC/IRb boundary, showing − 284–52 bp within the IRb region. The rpl2 gene in the IR regions was 106 and 112 bp from the LSC in the 8 species (106 bp for Camellia sinensis var ‘Zhuyeqi’, Camellia taliensis, Camellia sinensis,Camellia clevfera, Camellia gymnogyna, Camellia sinensis var ‘assamica’ and Camellia japonica; 112 bp for Camellia sinensis var ‘pubilimba’). It is noteworthy that in all the compared 8 Camellia species or varieties, ycf1 crossed the IRb/SSC boundary with 899–1,071 bp within the IRb region, with 2–26 bp within SSC region. In addition, ycf1 gene located also at SSC/IRa boundary with 4,517–4,717 bp within the SSC region, with 967-1,105 bp within IRa region. The ndhF gene was located within the SSC region, and the distance to the IRb/SSC junction was 5, 56, 61, 64, 143 and 154 bp.The trnN gene was located entirely within IRa and was contracted by 971–1,381 bp. The trnH gene in the LSC region was contracted by 1 and 2 bp from the connection region of IRa/LSC (1bp for Camellia sinensis var ‘Zhuyeqi’, Camellia taliensis, Camellia sinensis var ‘pubilimba’, Camellia sinensis,Camellia clevfera, and Camellia gymnogyna; 2 bp for Camellia sinensis var ‘assamica’).
3.6 Ka/Ks analysis in the cp of Zhuyeqi
The diversity of nucleic acids can reveal the variation in nucleic acid sequences among different species, and regions with high variation can provide potential molecular markers for population genetics. The KaKs.Calculator was used to calculate the nucleotide nonsynonymous substitution rate (Ka), synonymous substitution rate (Ks), and their ratio (Ka/Ks) for the protein-coding genes in the Zhuyeqi chloroplast genome(Fig. 6). The Pi value range of the entire Zhuyeqi chloroplast genome was 0-0.01408, with a mean value of 0.00126. The average Pi value was highest in the SSC region (0.00143), followed by the LSC region (0.00128), and lowest in the IR region (0.00106). Additionally, five highly divergent regions were detected, including trnI-GAU (0.01408), rps12 (0.01323), rps19 (0.00952), rps16 (0.00852), and rpl33 (0.00616). The LSC region contained four different regions (rps12, rps19, rps16, and rpl33), and the IR region contained one divergent region (trnI-GAU).
3.7 Phylogenetic analysis
In order to determine the evolutionary position of Zhuyeqi in the family Theaceae, the obtained Zhuyeqi chloroplast genome sequence was aligned with the complete chloroplast genomes of 21 species Theaceae that have been published on the NCBI website using multiple sequence alignment, and an evolutionary tree was constructed using RAxML v8.2.10 software. As shown in Fig. 7, the species in different genera within the same family are relatively distant from Zhuyeqi, indicating that these species are more distantly related to Zhuyeqi. Five different Theaceae species and Zhuyeqi clustered together, with Camellia sinensis (KJ996106.1) being the closest to Zhuyeqi, indicating that Zhuyeqi is most closely related to this tea variety.