3.1 The Structure of the Chloroplast Genomes of Four Camellia Species
The complete cp genomes of C. semiserrata cv‘hongyu 1’(GenBank accession no. OP953553), C. vietnamensis cv‘hongguo’ (GenBank accession no. OP 953555), C. osmantha cv‘yidan’(GenBank accession no. OP936137), and C. oleifera cv‘cenruan 3’(GenBank accession no. OP953554) were sequenced using Illumina sequencing technology (Fig. 1). The cp genomes of the four species are composed of a circular DNA molecule ranging in size from 156,807 to 157,005 bp, with the typical quadripartite structure consisting of two inverted repeats (IRa and IRb) and LSC and SSC regions (Table 2).
The C. semiserrata cv‘hongyu 1’, C. osmantha cv‘yidan’, and C. oleifera cv‘cenruan 3’ cp genomes each contain 134 genes (81 protein-coding genes, 39 transfer RNA (tRNA) genes, and 8 ribosomal RNA (rRNA) genes, as well as 6 genes with unknown functions. The C. vietnamensis cv‘hongguo’cp genome contains 136 genes (83 protein-coding genes, 39 tRNA genes, and 8 rRNA genes, as well as 6 genes with unknown functions, which includes two copies of the rpl2 gene. By contrast, rpl2 is not found in the other three species.
Among the 134 unique genes in C. semiserrata cv‘hongyu 1’, C. osmantha cv‘yidan’, and C. oleifera cv‘cenruan 3’, 15 contain one intron (petB, petD, atpF, ndhA, ndhB, rps12, rps16, rpl16, trnG-UCC, trnK-UUU, trnL-UAA, trnA-UGC, trnI-GAU, trnV-UAC, and rpoC1), and 2 contain two introns (clpP and ycf3). Previous studies reported that ycf3 is necessary for the stable accumulation of the photosystem I complex (Boudreau et al. 1997; Naver et al. 2001; Guo et al. 2018). Among the 135 unique genes in C. vietnamensis cv‘hongguo’, 16 contain one intron (petB, petD, atpF, ndhA, ndhB, rps12, rps16, rpl2, rpl16, trnG-UCC, trnK-UUU, trnL-UAA, trnV-UAC, trnA-UGC,trnI-GAU,and rpoC1), and 2 contain two introns (clpP and ycf3). The gene maps of C. osmantha cv‘yidan’, C. semiserrata cv‘hongyu 1’, C. oleifera cv‘cenruan 3’, and C. vietnamensis cv‘hongguo’are shown in Fig. 1.
3.2 Expansion and Contraction of the Border Regions
The border regions and neighboring genes of the four Camellia cp genomes were compared to analyze the expansion and contraction of the connected regions (Fig. 2). The cp genomic structures, including gene type, gene order, and gene number, were conserved in C. osmantha cv ‘yidan’and C. oleifera cv‘cenruan 3’, while the cp genomes of C. vietnamensis cv‘hongguo’ exhibited visible differences at the IRb/SSC/IRa/borders. The IRb region expanded into the gene ycf1 with 1042–1068 bp in the IRb regions (1068 bp for C. osmantha cv ‘yidan’ and C. oleifera cv‘cenruan 3’, 1042 bp for C. semiserrata cv‘hongyu 1’).
The IRa/SSC borders displayed large differences among the four cp genomes. The gene ndhF is located at the IRa/SSC or IRb/SSC junction, with 5–65 bp gaps between ndhF and the IR/SSC junction (5, 56, and 65 bp gaps in C. semiserrata cv‘hongyu 1’, C. osmantha cv‘yidan’, and C. oleifera cv‘cenruan 3’, respectively). The ndhF and ycf1 genes in C. vietnamensis cv‘hongguo’ are reversed in the IRb/SSC/IRa boundary region compared with the cp genome sequences of the other three species. ndhF in the SSC region was 56 bp from the IRb/LSC junction in C. vietnamensis cv‘hongguo’. By contrast, the IRa/LSC and IRb/LSC boundary regions were relatively conserved in the four cp genomes. The gene rpl2 formed another boundary by expanding into the IRa region in C. vietnamensis cv‘hongguo’, leading to complete duplication of the gene within the IRs.
3.3 Long-Repeat and Simple Sequence Repeat (SSR) Analysis
We detected palindromic, forward, complementary, and reverse repeats in the four cp genomes. Overall, 50 repeat sequences were identified in all Camellia cp genomes, of which 23–24 palindromic repeats, 16–17 forward repeats, 7–9 reverse repeats, and 2–4 complementary repeats were separately found (Fig. S1(A)). The lengths of palindromic repeats ranged from 19 to 79 bp, the forward repeats ranged in length from 19–42 bp, the reverse repeats ranged in length from 19–23 bp, and the complementary repeats ranged in length from 19–20bp (Fig. S1(B–E))
In this study, we found 50, 51, 51, and 53 SSRs in the C. semiserrata cv‘hongyu 1’, C. osmantha cv‘yidan’, C. vietnamensis cv‘hongguo’, and C. oleifera cv‘cenruan 3’ cp genomes, respectively (Fig. 3). These SSRs were mainly composed of adenine (A) or thymine (T) repeats and did not contain guanine (G) or cytosine (C) repeats. Moreover, the four cp genomes only contained mononucleotide repeats ranging from 10 to 17 bp.
3.4 Phylogenetic Analysis
We generated a phylogenetic tree using the nucleotide sequences of the cp genomes of 112 Camellia species and other oilseed crops using the maximum likelihood method (Fig. 4), and Coffea arabica (NC_008535.1) was selected as an outgroup. C. osmantha cv‘yidan’is most closely related to C. vietnamensis cv‘hongguo’ and C. oleifera cv‘cenruan 3’ , which belong to the section Oleifera Chang.