3.1 General Characteristics of the cp Genomes
Chloroplast genome structures of Handeliodendron bodinieri and Eurycorymbus cavaleriei are conserved and their cp genome sizes were 151,271 and 158,690 bp, respectively (Fig. 1). Both genomes presented the quadripartite structures including a pair of inverted repeats (IRs) of 25,724 bp and 26,910 bp, a large single copy (LSC) region of 85,092 bp and 86,874 bp, and a small single copy (SSC) region of 15,812 bp and 17,966 bp in H. bodinieri and E. cavaleriei respectively (Table 1). In the whole genome, the total GC content of H. bodinieri and E. cavaleriei were 37.8% and 37.9%, respectively. Moreover, the GC content was unevenly distributed in the cp genome of H. bodinieri and E. cavaleriei. The IR region of H. bodinieri showed the highest GC contents (43.1%), followed by 36.0% in the LSC region, whereas the SSC region exhibited the lowest GC content of 31.5%. Similarly, the IR region of E. cavaleriei showed higher GC contents (42.8%) than that of the LSC region (36.1%) and SSC region (32.3%). In the coding sequences (CDS), the GC content of H. bodinieri and E. cavaleriei were 38.1% and 38.4%, respectively.
Table 1
Comparison of chloroplast genome feature of Handeliodendron bodinieri and Eurycorymbus cavaleriei
Species
|
Location
|
length (bp)
|
T (U) (%)
|
C (%)
|
A (%)
|
G (%)
|
G + C (%)
|
H. bodinieri
|
LSC
|
85,092
|
32.8
|
18.5
|
31.1
|
17.5
|
36.0
|
SSC
|
18,731
|
34.2
|
16.3
|
34.4
|
15.1
|
31.5
|
IR
|
25,724
|
28.3
|
20.9
|
28.6
|
22.3
|
43.1
|
CDS
|
78,970
|
31.4
|
19.5
|
30.5
|
18.6
|
38.1
|
Total
|
155,271
|
31.5
|
19.3
|
30.6
|
18.6
|
37.8
|
E. cavaleriei
|
LSC
|
86,874
|
32.7
|
18.6
|
31.3
|
17.5
|
36.1
|
SSC
|
17,996
|
33.8
|
16.7
|
33.9
|
15.6
|
32.3
|
IR
|
26,910
|
28.5
|
20.8
|
28.6
|
22.1
|
42.8
|
CDS
|
79,251
|
31.3
|
19.7
|
30.3
|
18.7
|
38.4
|
Total
|
158,690
|
31.4
|
19.3
|
30.7
|
18.6
|
37.9
|
A total of 114 unique genes were annotated in H. bodinieri and E. cavaleriei, including 77 protein-coding genes (PCGs) in H. bodinieri, whereas 79 PCGs were annotated in E. cavaleriei. 31 tRNA genes and four rRNA genes were annotated in the two species (Table 2). Among these, three genes (infA, rpl22, rpl2) and one gene (infA) had the stop codon appearing prematurely, thus, were annotated as pseudogenes in H. bodinieri and E. cavaleriei, respectively. In total, 18 genes were duplicated in the IR regions of H. bodinieri cp genome, including seven tRNA genes (trnA-UGC, trnI-CAU, trnI-GAU, trnL-CAA, trnN-GUU, trnR-ACG, trnV-GAC), four rRNA genes (rrn4.5, rrn5, rrn16, rrn23), and seven PCGs (ndhB, rpl2, rpl23, rps7, rps12, ycf2, ycf15). For the cp genome of E. cavaleriei, seven tRNA genes (trnA-UGC, trnI-CAU, trnI-GAU, trnL-CAA, trnN-GUU, trnR-ACG, trnV-GAC), four rRNA genes (rrn4.5, rrn5, rrn16, rrn23), and eight PCGs (ndhB, rpl2, rpl23, rps7, rps19, rps12, ycf2, ycf15) were located in the IR regions.
Table 2
Summary of assembled gene functions of Handeliodendron bodinieri and Eurycorymbus cavaleriei chloroplast genomes
Gene Family
|
Gene Names
|
Subunits of ATP synthase
|
atpA, atpB, atpE, atpF*, atpH, atpI
|
Subunits of NADH dehydrogenase
|
ndhA*, ndhB*(×2), ndhC, ndhD, ndhE, ndhF, ndhG, ndhH, ndhI, ndhJ, ndhK
|
Subunits of cytochrome
|
petA, petB*, petD*, petG, petL, petN
|
Subunits of photosystem I
|
psaA, psaB, psaC, psaI, psaJ
|
Subunits of photosystem II
|
psbA, psbB, psbC, psbD, psbE, psbF, psbH, psbI, psbJ, psbK, psbL, psbM, psbN, psbT, psbZ
|
Subunit of rubisco
|
rbcL
|
Subunit of Acetyl-CoA-carboxylase
|
accD
|
c-type cytochrome synthesis gene
|
ccsA
|
Envelop membrane protein
|
cemA
|
Protease
|
clpP**
|
Translational initiation
|
ψinfA
|
Maturase
|
matK
|
Large subunit of ribosome
|
rpl2*(×2), rpl14, rpl16*, rpl20, rpl22H, rpl23(×2), rp32, rpl33, rpl36
|
DNA dependent RNA polymerase
|
rpoA, rpoB, rpoC1*, rpoC2
|
Small subunit of ribosome
|
rps2H, rps3, rps4, rps7(×2), rps8, rps11, rps12**(×2), rps14, rps15, rps16*, rps18, rps19(×E)
|
rRNA Genes
|
rrn4.5(×2), rrn5(×2), rrn16(×2), rrn23(×2)
|
tRNA Genes
|
trnA-UGC(×2), trnC-GCA, trnD-GUC, trnE-UUC, trnF-GAA, trnfM-CAU, trnG-GCC, trnG-UCC, trnH-GUG, trnI-CAU(×2), trnI-GAU(×2), trnK-UUU, trnL-CAA(×2), trnL-UAA, trnL-UAG, trnM-CAU, trnN-GUU(×2), trnP-UGG, trnQ-UUG, trnR-ACG(×2), trnR-UCU, trnS-GCU, trnS-GGA, trnS-UGA, trnT-GGU, trnT-UGU, trnV-GAC(×2), trnV-UAC, trnW-CCA, trnY-GUA
|
Unknown function
|
ycf1, ycf2(×2), ycf3**, ycf4, ycf15(×2)
|
* Genes containing a single intron; ** Genes containing two introns; (×2) Genes are located within the IR regions and therefore are duplicated; (×E) Genes present as two copies in the IR regions of E. cavaleriei; ψ indicates a pseudogene; H Pseudogene in H. bodinieri only. |
Among the annotated genes of H. bodinieri and E. cavaleriei cp genomes, 18 genes contain introns, including atpF, ndhA, ndhB, petB, petD, rpl16, rpl2, rpoC1, rps16, trnV-UAC, trnL-UAA, trnK-UUU, trnI-GAU, trnG-UCC, trnA-UGC, clpP, rps12, and ycf3. Gene clpP, rps12, and ycf3 contain two introns, whereas the other 15 genes have only one intron (Table 3). Rps12 is a trans-spliced gene with the 5′ end exon located in the LSC region, but the 3′ end in the IR region, as in most other angiosperms. In addition, the longest intron was detected in trnK-UUU of both cp genomes, and its length was 2,514 bp and 2,496 bp, respectively. Similar to other cp genomes, the matK gene is located in the intron of trnK-UUU.
Table 3
Genes with introns in the chloroplast genomes of Handeliodendron bodinieri and Eurycorymbus cavaleriei as well as the lengths of the exons and introns.
Gene
|
Location
|
H. bodinieri
|
E. cavaleriei
|
Exon I (bp)
|
Intron I (bp)
|
Exon II (bp)
|
Intron II (bp)
|
Exon III (bp)
|
Exon I (bp)
|
Intron I (bp)
|
Exon II (bp)
|
Intron II (bp)
|
Exon III (bp)
|
atpF
|
LSC
|
159
|
756
|
408
|
|
|
159
|
750
|
408
|
|
|
clpP
|
LSC
|
69
|
836
|
291
|
645
|
228
|
69
|
853
|
291
|
658
|
228
|
ndhA
|
SSC
|
552
|
1143
|
540
|
|
|
558
|
1089
|
540
|
|
|
ndhB
|
IR
|
777
|
682
|
756
|
|
|
777
|
680
|
756
|
|
|
petB
|
LSC
|
6
|
789
|
657
|
|
|
6
|
794
|
657
|
|
|
petD
|
LSC
|
9
|
740
|
525
|
|
|
9
|
677
|
525
|
|
|
rpl16
|
LSC
|
9
|
1049
|
402
|
|
|
9
|
923
|
402
|
|
|
rpl2
|
IR
|
393
|
661
|
435
|
|
|
390
|
664
|
435
|
|
|
rpoC1
|
LSC
|
435
|
674
|
1620
|
|
|
435
|
711
|
1620
|
|
|
rps12*
|
LSC
|
114
|
-
|
232
|
537
|
26
|
114
|
-
|
232
|
533
|
26
|
rps16
|
LSC
|
39
|
829
|
225
|
|
|
39
|
836
|
225
|
|
|
trnV-UAC
|
LSC
|
39
|
589
|
37
|
|
|
39
|
590
|
37
|
|
|
trnL-UAA
|
LSC
|
37
|
542
|
49
|
|
|
37
|
542
|
49
|
|
|
trnK-UUU
|
LSC
|
37
|
2514
|
38
|
|
|
37
|
2496
|
38
|
|
|
trnI-GAU
|
IR
|
42
|
953
|
35
|
|
|
42
|
946
|
35
|
|
|
trnG-UCC
|
LSC
|
23
|
724
|
48
|
|
|
23
|
721
|
48
|
|
|
trnA-UGC
|
IR
|
38
|
841
|
35
|
|
|
38
|
810
|
35
|
|
|
ycf3
|
LSC
|
126
|
731
|
228
|
746
|
153
|
126
|
732
|
228
|
767
|
153
|
We further compared these basic characteristics of Handeliodendron bodinieri and Eurycorymbus cavaleriei cp genomes with other genera of Sapindaceae (Table 4). Significantly, found the cp genome size of the Hippocastanoideae was generally smaller across Sapindaceae, their size ranged from 152,688 bp (Acer tataricum subsp. ginnala) to 157,367 bp (A. palmatum). Overall, the full-length of cp genome ranged from 152,688 bp (Acer tataricum subsp. ginnala) to 163,258 bp (Koelreuteria paniculata), but the total GC content was similar among 25 cp genomes of Sapindaceae. By comparing single copy regions, found that K. paniculata possessed the largest LSC region with a length of 90,236 bp, whereas Sapindus mukorossi possessed the largest SSC region (18,874 bp). Among these cp genomes, the length of IR regions varied from 25,656 bp (Aesculus chinensis) to 30,103 bp (Litchi chinensis). Interestingly, we also discovered L. chinensis presenting the smallest SSC region in these cp genomes. This analysis indicated that the number of rRNA is identical, whereas the number of tRNA (from 37 to 40) and PCGs (83–89) were remarkably similar.
Table 4
Statistics on the basic features of the chloroplast genomes from Sapindaceae species.
Species
|
size (bp)
|
LSC
(bp)
|
IR
(bp)
|
SSC
(bp)
|
GC content (%)
|
No. rRNA
|
No. tRNA
|
No. PCGs
|
Acer tataricum subsp. ginnala
|
152,688
|
82,529
|
26,306
|
17,795
|
38.2
|
8
|
40
|
87
|
Handeliodendron bodinieri
|
155,271
|
85,092
|
25,724
|
18,731
|
37.8
|
8
|
37
|
89
|
Aesculus chinensis
|
155,528
|
85,489
|
25,656
|
18,727
|
37.9
|
8
|
37
|
83
|
Aesculus wangii
|
155,871
|
84,882
|
26,390
|
18,209
|
38.0
|
8
|
40
|
84
|
Acer truncatum
|
156,262
|
86,018
|
26,086
|
18,072
|
37.9
|
8
|
40
|
89
|
Acer miaotaiense
|
156,595
|
86,327
|
26,100
|
18,068
|
37.9
|
8
|
40
|
89
|
Acer griseum
|
156,857
|
85,227
|
26,742
|
18,146
|
37.9
|
8
|
40
|
86
|
Acer buergerianum subsp.
ningpoense
|
156,911
|
85,314
|
26,752
|
18,093
|
37.9
|
8
|
40
|
89
|
Acer davidii
|
157,044
|
85,410
|
26,761
|
18,112
|
37.9
|
8
|
40
|
86
|
Acer wilsonii
|
157,067
|
85,418
|
26,760
|
18,129
|
37.9
|
8
|
39
|
88
|
Dipteronia dyeriana
|
157,071
|
85,529
|
26,730
|
18,082
|
38.0
|
8
|
40
|
87
|
Dipteronia sinensis
|
157,080
|
85,455
|
26,766
|
18,093
|
37.8
|
8
|
40
|
88
|
Acer sino-oblongum
|
157,118
|
85,558
|
26,722
|
18,119
|
37.9
|
8
|
39
|
88
|
Acer morrisonense
|
157,197
|
85,655
|
26,728
|
18,086
|
37.8
|
8
|
40
|
86
|
Acer palmatum
|
157,367
|
85,829
|
26,689
|
18,160
|
37.8
|
8
|
40
|
89
|
Eurycorymbus cavaleriei
|
158,690
|
86,874
|
26,910
|
17,996
|
37.9
|
8
|
37
|
87
|
Dodonaea viscosa
|
159,375
|
87,204
|
27,100
|
17,971
|
37.9
|
8
|
37
|
88
|
Sapindus mukorossi
|
160,481
|
85,649
|
27,979
|
18,874
|
37.7
|
8
|
39
|
88
|
Pometia tomentosa
|
160,818
|
85,666
|
28,396
|
18,360
|
37.9
|
8
|
37
|
88
|
Dimocarpus longan
|
160,833
|
85,708
|
28,428
|
18,269
|
37.8
|
8
|
37
|
87
|
Xanthoceras sorbifolium
|
161,231
|
85,299
|
28,620
|
18,692
|
37.7
|
8
|
38
|
86
|
Nephelium lappaceum
|
161,356
|
86,009
|
28,597
|
18,153
|
37.8
|
8
|
37
|
87
|
Litchi chinensis
|
162,524
|
85,750
|
30,103
|
16,568
|
37.8
|
8
|
37
|
87
|
Koelreuteria paniculata
|
163,258
|
90,236
|
27,377
|
18,268
|
37.3
|
8
|
37
|
85
|
3.1.1 Chloroplast Repeated Sequences and SSRs
In the current study, a total of 32 and 39 repeat sequences were detected in Handeliodendron bodinieri and Eurycorymbus cavaleriei cp genomes, respectively. In cp genome of H. bodinieri, there were 14 forward (F), 1 reverse (R), 16 palindromic (P), and 1 complement (C) repeats (Fig. 2, A). For E. cavaleriei cp genome, the number of the F, R, P, and C repeats was 16, 2, 20, and 1, respectively. We found that the length of repeat sequences ranged from 30 to 51 bp in H. bodinieri, 30–72 bp in E. cavaleriei (Fig. 2, B). In total, the results revealed that the P and F repeats were most abundant in all these repeat sequences, and most of palindromic and forward repeats were with 30–40 bp in length.
Here, we observed the simple sequence repeats of the H. bodinieri and E. cavaleriei cp genomes. The total number of the SSRs was 98 in H. bodinieri, whereas 60 in E. cavaleriei (Table S2; Fig. 2, C). In H. bodinieri, we detected five categories of SSRs, including mononucleotide, dinucleotide, tri nucleotide, tetra nucleotide, and penta nucleotide repeats. Additionally, none of hexa-nucleotides were detected in H. bodinieri cp genome. The number of mononucleotide, dinucleotide, tri nucleotide, tetra nucleotide, and penta nucleotide repeats were 75, 6, 7, 9, and 1, respectively (Fig. 2, C). The finding not only showed that the mononucleotide repeats were the most abundant in the cp genome, but also had an outstanding base preference, mainly consist of A or T. Notably, five SSRs were identified in the ycf1 gene of H. bodinieri cp genome, consisting of mononucleotide repeats that contain four poly (T) and one poly (A). In total, there were four types of SSRs in the E. cavaleriei cp genome, including mononucleotide, dinucleotide, tri nucleotide, and tetra nucleotide repeats (Table S3). The number of mononucleotide, dinucleotide, tri nucleotide, tetra nucleotide, and penta nucleotide repeats were 38, 5, 8, and 8, respectively. Among these SSRs, the most dominant of SSRs were A or T mononucleotides. We also observed four SSRs in the gene ycf1, consist of mononucleotide repeats. Within H. bodinieri and E. cavaleriei cp genomes, most SSRs were located in the intergenic spacer regions (IGS) (Fig. 2, D and E).
3.1.2. Relative Synonymous Codon Usage Analysis
The total number of the codons was 26,028 in Handeliodendron bodinieri, 26,445 in Eurycorymbus cavaleriei. Among the codons, the number of the amino acids less than 1000 was tyrosine (Tyr), glutamine (Gln), histidine (His), methionine (Met), tryptophan (Trp), cysteine (Cys) in both the H. bodinieri (Table S4) and E. cavaleriei (Table S5). The leucine (Leu) was the most amino acid encoded in the analysis, accounting for 10.5% and 10.6% on average of all amino acids in the H. bodinieri and E. cavaleriei cp genomes, respectively. However, the Cys has the lowest number of codons in both the H. bodinieri and E. cavaleriei cp genomes excluding the stop codons. The codon usage frequency and relative synonymous codon usage (RSCU) were summarized in Fig. 3. In H. bodinieri cp genome, 30 codons had RSCU values more than 1.00, and they all ended with A or U excluding UUG. For the E. cavaleriei, there were 31 codons with RSCU values more than 1.00, 29 of which ended with A or U codons, whereas two ended with C and G codons (UCC and UUG). Moreover, discovered that the RSCU values of three codons (AUG, UGG, and UCC) is 1.00 in the H. bodinieri cp genome, while only two codons (AUG and UGG) in E. cavaleriei cp genome.
3.2 Comparative Chloroplast Genomic Analysis
To identify the sequence divergence of Handeliodendron bodinieri and Eurycorymbus cavaleriei cp genomes, the genomic rearrangement was detected, with Litchi chinensis as the reference (Fig. 4). In total, 16 cp genomes of 13 genera from Sapindaceae were used for analysis. Comparative analysis showed that all cp genomes exhibited highly conserved, an indication that inversion and translocation in genes or plastid segments was not detected in the final results.
In addition, we performed multiple sequence alignment of the complete chloroplast genome sequences from different families in Sapindales with E. cavaleriei as the reference (Fig. 5). The comparison analyses revealed that coding regions were more conserved than the non-coding regions, and the SSC and LSC regions exhibited more variation than IR regions in all cp genomes. Moreover, there were almost no variation in four rRNA genes within LSC regions, and the genes were highly conserved. A total of five genes, including matK, accD, ycf1, ndhF, and rpl22, were detected the most divergent in these cp genomes. As shown in Fig. 5, the significant variations were detected in the intergenic regions of the LSC and SSC, including trnH-psbA, trnk-rps16, rps16-trnQ, psbM-trnY, psbZ-trnG, trnL-trnF, trnF-ndhJ, and rpl32-trnL.
3.2.1 Divergence Hotspot Identification
A total of 13 cp genome sequences were aligned among the Sapindaceae species to calculate nucleotide diversity (Pi), including Acer davidii, A. miaotaiense, Aesculus chinensis, Ae. wangii, Dimocarpus longan, Dipteronia dyeriana, D. sinensis, Dodonaea viscosa, Litchi chinensis, Nephelium lappaceum, Pometia tomentosa, Sapindus mukorossi, Xanthoceras sorbifolium, and Handeliodendron bodinieri, Eurycorymbus cavaleriei. Based on this analysis, we identified three remarkably divergent regions among these complete cp genomes, which included ndhC-trnV-UAC, rpl32-trnL-UAG-ccsA, and ycf1 (Fig. 6). Gene ycf1 is the most divergent region with the highest Pi value (0.127) and is located in the SSC region, whereas ndhC-trnV-UAC and rpl32-ccsA intergenic spacers were located in LSC regions. These highly divergent regions could be used as potential molecular markers for phylogenetic reconstruction of the family Sapindaceae. Overall, the result of this study revealed that sequence divergence was concentrated in the LSC and SSC regions, whereas IR regions presented less divergence, consistent with the mVISTA results (Fig. 5).
3.2.2. Expansion and Contraction of IRs
Comparison of the single-copy (SC) and inverted repeat (IR) boundary region among six families within the order Sapindales was presnted Fig. 7. A total of nine cp genomes were included, Koelreuteria paniculata (Sapindaceae), Litchi chinensis (Sapindaceae), Leitneria floridana (Simaroubaceae), Toona ciliata (Meliaceae), Anacardium occidentale (Anacardiaceae), Boswellia sacra (Burseraceae), Citrus limon (Rutaceae), as well as our newly sequenced species. The SSC/IRa boundary located in the coding region of gene ycf1 in all cp genomes, with 2,516 bp to 4,602 bp in SSC region. The gene ycf1 has the largest fragment in SSC region of H. bodinieri. In five cp genomes, including H. bodinieri, L. chinensis, T. ciliata, A. occidentale, and C. limon, the gene ndhF spanned IRb/SSC boundary, with 7–36 bp in the IRb region. However, the gene ndhF was wholly located in the IRb region of four cp genomes (E. cavaleriei, K. paniculata, L. floridana, and B. sacra), which was separated from the IRb/SSC border by a spacer varying from 0 to 84 bp. The gene ycf1 in the border region between IRb and SSC is treated as a pseudogene because of the incomplete duplication of the normal copy. Similarly, rpl22 and rps19 genes in IRa region near the IRa/LSC boundary region, and was annotated as a pseudogene, including E. cavaleriei, K. paniculata, T. ciliata, B. sacra, L. floridana, and A. occidentale. In all cp genomes, there was significant variation in the LSC/IRb boundary regions. The LSC/IRb boundary was crossed by the gene rpl22 in five cp genomes, and the length of the rpl22 fragment located in the LSC region ranged from 185 bp (L. floridana) to 449 bp (E. cavaleriei). The rpl22 was entirely located in the IRb region of C. limon cp genome. In H. bodinieri cp genome, rps19 and rpl2 genes were entirely located within the LSC and IRb region near the LSC/IRb boundary, respectively. In A. occidentale cp genome, we also found that rps19 gene was located in the LSC/IRb boundary, and 178 bp extended into the LSC region. In L. chinensis, the rps16 and rps3 genes in the near LSC/IRb were located in LSC and IRb regions, respectively. In the IRa/LSC boundary regions, gene trnH was completely located in LSC region of all cp genomes, which was 0–38 bp away from the IRb/SSC boundary. In a word, IR regions of the H. bodinieri showed a significant contraction, whereas it presented a notable expansion in E. cavaleriei cp genome.