Chloroplast genome structure of Ilex species
All the seven newly sequenced chloroplast genomes obtained through this study possess a typical quadripartite structure of a vascular plant, which consisted of two single-copy regions (LSC and SSC) that are separated by a pair of inverted repeats (IRa and IRb) (Figure 1). By taking account of the 34 chloroplast genomes downloaded from the GenBank, the length of the chloroplast genome of Ilex ranged from 157,119 bp (I. graciliflora) to 158,020 bp (I. kwangtungensis). The length of the LSC region was ranged from 86,506 bp (I. graciliflora) to 87,414 bp (I. crenata), the SSC regions was ranged from 18,228 bp (I. yunnanensis) to 18,447 bp (I. lohfauensis), and the IR regions ranged from 26,005 bp (I. vomitoria) to 26,121 bp (I. rotunda) (Table 1). The GC content was from 37.60–37.69% (Table 1). All the chloroplast genomes obtained in this study contained 116 genes, including 82 protein-coding, 31 tRNA, and four rRNA genes, except for I. dasyphylla, which had 83 protein-coding genes (Table 2 and 3). All chloroplast genomes had the same gene order and gene arrangement.
Table 1
Summary of complete chloroplast genomes of Ilex species included in the present study. PCG indicates protein-coding gene.
Taxon | Accession number | Gene number | Length (bp) | GC(%) | AT(%) |
PCG | tRNA | rRNA | Full | Plastome | LSC | IRA/IRB | SSC |
Ilex asprella | NC_045274 | 94 | 37 | 8 | 139 | 157,856 | 87,265 | 26,075 | 18,441 | 37.62 | 62.38 |
I. asprella var. tapuensis | MT767004 | 87 | 37 | 8 | 132 | 157,671 | 87,161 | 26,065 | 18,380 | 37.65 | 62.35 |
I. championii | MT764248 | 87 | 37 | 8 | 132 | 157,468 | 86,878 | 26,074 | 18,442 | 37.64 | 62.36 |
I. chapaensis | MT764251 | 87 | 37 | 8 | 132 | 157,665 | 87,155 | 26,065 | 18,380 | 37.65 | 62.35 |
I. cinerea | MT764247 | 87 | 37 | 8 | 132 | 157,215 | 86,601 | 26,094 | 18,426 | 37.69 | 62.31 |
I. cornuta | MT764252 | 87 | 37 | 8 | 132 | 157,216 | 86,607 | 26,091 | 18,427 | 37.69 | 62.31 |
I. crenata | MW528027 | 88 | 38 | 8 | 134 | 157,988 | 87,414 | 26,076 | 18,422 | 37.65 | 62.35 |
I. dabieshanensis | MT435529 | 90 | 37 | 8 | 135 | 157,218 | 86,723 | 26,034 | 18,427 | 37.69 | 62.31 |
I. dasyphylla | This study | 92 | 40 | 8 | 140 | 158,009 | 87,388 | 26,105 | 18,411 | 37.62 | 62.38 |
I. dasyphylla | MT764253 | 87 | 37 | 8 | 132 | 158,009 | 87,388 | 26,105 | 18,411 | 37.62 | 62.38 |
I. delavayi | KX426470 | 95 | 40 | 8 | 143 | 157,671 | 87,077 | 26,078 | 18,438 | 37.65 | 62.35 |
I. dumosa | KP016927 | 86 | 37 | 8 | 131 | 157,732 | 87,033 | 26,087 | 18,415 | 37.62 | 62.27 |
I. ficoidea | MT764243 | 87 | 37 | 8 | 132 | 157,536 | 86,922 | 26,094 | 18,426 | 37.64 | 62.36 |
I. fukienensis | This study | 91 | 40 | 8 | 139 | 157,474 | 86,886 | 26,074 | 18,440 | 37.64 | 62.36 |
I. graciliflora | MT764249 | 87 | 37 | 8 | 132 | 157,119 | 86,506 | 26,093 | 18,427 | 37.61 | 62.39 |
I. hanceana | MT764246 | 87 | 37 | 8 | 132 | 157,478 | 86,889 | 26,074 | 18,441 | 37.63 | 62.37 |
I. integra | NC_044417 | 86 | 37 | 8 | 131 | 157,548 | 86,936 | 26,093 | 18,426 | 37.64 | 62.36 |
I. intermedia | MT471320 | 89 | 37 | 8 | 134 | 157,577 | 87,083 | 26,034 | 18,426 | 37.63 | 62.37 |
I. kwangtungensis | MT764241 | 87 | 37 | 8 | 132 | 158,020 | 87,400 | 26,104 | 18,412 | 37.62 | 62.38 |
I. lancilimba | MT767005 | 87 | 37 | 8 | 132 | 157,998 | 87,382 | 26,105 | 18,406 | 37.62 | 62.38 |
I. latifolia | NC_047291 | 95 | 40 | 8 | 143 | 157,601 | 87,020 | 26,077 | 18,427 | 37.63 | 62.37 |
I. lohfauensis | This study | 91 | 40 | 8 | 139 | 157,464 | 86,875 | 26,071 | 18,447 | 37.64 | 62.36 |
I. lohfauensis | MT764240 | 87 | 37 | 8 | 132 | 157,470 | 86,873 | 26,078 | 18,441 | 37.64 | 62.36 |
I. memecylifolia | MT764250 | 87 | 37 | 8 | 132 | 157,842 | 87,249 | 26,076 | 18,441 | 37.62 | 62.38 |
I. micrococca | MN830251 | 89 | 37 | 8 | 134 | 157,782 | 87,200 | 26,074 | 18,434 | 37.64 | 62.36 |
I. paraguariensis | NC_031207 | 86 | 37 | 8 | 131 | 157,614 | 87,154 | 26,076 | 18,308 | 37.63 | 62.35 |
I. polyneura | KX426468 | 95 | 40 | 8 | 143 | 157,621 | 87,140 | 26,061, 25,980 | 18,434 | 37.60 | 62.40 |
I. pubescens | KX426467 | 95 | 40 | 8 | 143 | 157,741 | 87,143 | 26,081 | 18,436 | 37.61 | 62.39 |
I. purpurea | MT471318 | 89 | 37 | 8 | 134 | 157,885 | 87,289 | 26,104 | 18,388 | 37.62 | 62.38 |
I. rotunda | MW292559 | 88 | 37 | 8 | 133 | 157,743 | 87,069 | 26,121 | 18,432 | 37.62 | 62.38 |
I. sp. | KX426469 | 95 | 40 | 8 | 143 | 157,611 | 87,137 | 26,020 | 18,434 | 37.62 | 62.38 |
I. suaveolens | MN830249 | 89 | 37 | 8 | 134 | 157,857 | 87,255 | 26,102 | 18,398 | 37.65 | 62.35 |
I. szechwanensis | KX426466 | 95 | 40 | 8 | 143 | 157,822 | 87,281 | 26,053 | 18,435 | 37.65 | 62.35 |
I. triflora | MT764242 | 87 | 36 | 8 | 131 | 157,706 | 87,183 | 26,065 | 18,393 | 37.67 | 62.33 |
I. venusta | This study | 91 | 40 | 8 | 139 | 157,860 | 87,290 | 26,079 | 18,412 | 37.66 | 62.34 |
I. viridis | This study | 91 | 40 | 8 | 139 | 157,673 | 87,150 | 26,065 | 18,393 | 37.68 | 62.32 |
I. viridis | MN830250 | 89 | 37 | 8 | 134 | 157,701 | 87,177 | 26,065 | 18,394 | 37.67 | 62.33 |
I. vomitoria | MT471319 | 90 | 36 | 8 | 134 | 157,328 | 86,920 | 26,005 | 18,398 | 37.66 | 62.34 |
I. wilsonii | KX426471 | 95 | 40 | 8 | 143 | 157,918 | 87,341 | 26,073 | 18,431 | 37.62 | 62.38 |
I. yunnanensis | This study | 91 | 40 | 8 | 139 | 157,833 | 87,389 | 26,108 | 18,228 | 37.65 | 62.35 |
I. zhejiangensis | This study | 91 | 40 | 8 | 139 | 157,182 | 86,575 | 26,092 | 18,423 | 37.69 | 62.31 |
Table 2
List of annotated genes in the chloroplast genomes of the Ilex species.
Function of Genes | Group of Genes | Gene Name |
Protein synthesis and DNA-replication | Transfer RNAs | trnC-GCA, trnD-GUC, trnE-UUC, trnF-GAA, trnfM-CAU, trnG-GCC*, trnG-UCC, trnH-GUG, trnK-UUU*, trnL-UAA*, trnM-CAU, trnQ-UUG, trnP-GGG, trnP-UGG, trnR-UCU, trnS-GCU, trnS-GGA, trnS-UGA, trnT-GGU (× 2), trnT-UGU, trnV-UAC*, trnW-CCA, trnY-GUA, trnA-UGC* (× 2), trnI-CAU (× 2), trnI-GAU* (× 2), trnL-CAA (× 2), trnL-UAG, trnN-GUU (× 2), trnR-ACG (× 2), trnV-GAC (× 2), trnM-CAU |
Ribosomal RNAs | rrn4.5 (× 2), rrn5 (× 2), rrn16 (× 2), rrn23 (× 2) |
Ribosomal protein large subunit | rpl33, rpl20, rpl36, rpl14, rpl16, rpl22, rpl32, rpl2* (× 2), rpl23 (× 2) |
Ribosomal protein small subunit | rps2, rps14, rps4, rps18, rps11, rps8, rps3, rps19, rps12* (× 2), rps7 (× 2) |
Subunits of RNA polymerase | rpoA, rpoB, rpoC1*, rpoC2 |
Photosynthesis | photosystem I | psaA, psaB, psaC, psaI, psaJ |
Photosystem II | psbA, psbB, psbC, psbD, psbE, psbF, psbG, psbH, psbI, psbJ, psbK, psbL, psbM, psbN, psbT, lhbA |
ATP synthase | atpA, atpB, atpE, atpF*, atpH, atpI |
Large subunit Rubisco | rbcL |
Cythochrome b/f complex | petA, petB*, petD, petG, petL, petN |
NADH-dehydrogenase | ndhA*, ndhB* (× 2), ndhC, ndhD, ndhE, ndhG, ndhH, ndhI, ndhJ, ndhK |
Other genes | Translation initiation factor | infA |
Cytochrome c biogenesis | ccsA |
ATP-dependent protease | clpP** |
Maturase | matK |
Inner membrane protein | cemA |
Acetyl-CoA carboxylase | accD |
Genes of unknown function | Conserved hypothetical gene | orf42 (× 2), orf56 (× 2), orf188, ycf3**, ycf4, ycf1, ycf2 (× 2), ycf15 (× 2), ycf68 (× 2) |
Note: (× 2) indicates the number of repeat units is 2; *Gene contains a single intron; **Gene contains two introns |
Table 3
Genes with introns in the chloroplast genome of Ilex species.
Gene | Location | Exon I(bp) | Intron I (bp) | Exon II (bp) | Intron II (bp) | Exon III (bp) |
rpl2 | IRa+IRb | 393 | 661 | 435 | | |
rps12 | LSC+IRs | 114 | 543 | 232 | | 26 |
clpP | LSC | 69 | 819 | 291 | 602 | 78 |
atpF | LSC | 159 | 681 | 408 | | |
rpoC1 | LSC | 456 | 756 | 1,635 | | |
ndhA | SSC | 552 | 1,140 | 540 | | |
ndhB | IRA | 777 | 679 | 756 | | |
petB | LSC | 6 | 745 | 657 | | |
trnA-UGC | IRa+IRb | 38 | 807 | 35 | | |
trnI-GAU | IRa+IRb | 42 | 934 | 35 | | |
trnL-UAA | IRa+IRb | 37 | 490 | 50 | | |
trnV-UAC | IRa+IRb | 39 | 579 | 37 | | |
trnG-GCC | LSC | 23 | 703 | 48 | | |
trnK-UUU | LSC | 37 | 2,562 | 35 | | |
ycf3 | LSC | 126 | 727 | 228 | 749 | 153 |
Comparative analysis of genomic divergence and genome rearrangement
The diversity of nucleotide variability (Pi) for the 41 chloroplast genomes was between 0.000 and 0.01266, while the average Pi value was 0.00286. Based on the cutoff value of Pi ≥0.009, eight highly variable regions (807 bp+trnRUCU +384 bp, 579 bp+psaC+382 bp, ycf1 (3378 bp–4798 bp), 136 bp+trnTGGU+801 bp, rbcL (335 bp–1134 bp), ndhC-trnVUAC, 1449 bp+trnQUUG +24 bp and petN-psbM) were identified, in which six of them (rbcL, trnQ, trnR, trnT, ndhC-trnV, and petN-psbM) were located at the LSC region, while two of them (psaC and ycf1) were from the SSC region (Figure 2, Additional files 1: Table S1). The Pi value of the eight hypervariable loci ranged from 0.00754 (807 bp+trnRUCU +384 bp) to 0.00955 (petN-psbM) (Table 4). At least four distinct gaps were observed in the chloroplast genome alignment, in which all of them were located in the LSC region (Figure 3). The gaps were located at the intergenic spacer regions, including cemA-ycf4, petA-psbJ, rpoB-trnC, and trnL-trnT. Species that have gap at the rpoB-trnC region were I. championii, I. fukienensis, I. hanceana, and I. lohfauensis; while three species, I. polyneura, I. pubescens, and I. rotunda, had gap at the petA-psbJ region. Species that contained gaps at the cemA-ycf4 region also contained gaps at the trnL-trnT region, which include I. cinerea, I. cornuta, I. dabieshanensis, I. ficoidea, I. graciliflora, I. intermedia, I. latifolia, I. zhejiangensis, and Ilex sp. However, two species, I. delavayi, and I. integra only had one of these gaps, which was the gap located at the cemA-ycf4 region. Upon manual checking, these variations were in forms of indels, ranging from about 210 bp (petA-psbJ) to 379 bp (rpoB-trnC) in length. Genome synteny of the 41 chloroplast genomes revealed no presence of large gene rearrangement event (Figure 4). In addition, a total of 2,947 polymorphic, 1,630 singleton variable, and 1,317 parsimony-informative sites were detected in the 41 chloroplast genome sequences.
Table 4
Variable site analyses in the chloroplast genomes of Ilex species.
Region | Total number of sites | Polymorphic sites | Singleton variable sites | Parsimony informative sites | Nucleotide diversity |
LSC | 88,362 | 2182 | 1200 | 982 | 0.00384 |
IRa | 26162 | 94 | 57 | 37 | 0.00055 |
SSC | 18460 | 582 | 319 | 263 | 0.00498 |
IRb | 26167 | 89 | 54 | 35 | 0.00050 |
Plastome | 159151 | 2947 | 1630 | 1317 | 0.00286 |
Expansion and contraction of the IR regions
Comparative sequence analysis of the Ilex species showed that chloroplast genome structure and the number and sequence of genes were highly conserved. However, some structure and size variations at the IR boundaries of those chloroplast genomes in Ilex were detected. The length of IRs among all Ilex species included for analysis was relatively consistent, in which the IRs of I. vomitoria were the shortest (26,005 bp), while those of I. rotunda were the longest (26,121 bp). For a total of 41 genomes analyzed in this study, the LSC/IRa junction of 22 species were located in the rps19 gene, with 4 to 5 bp spanning into the IRa region, indicating an expansion event of the IR in these species (Figure 5). The IRa/SSC junctions of majority species were located adjacent to genes ycf1 and ndhF. Overlap with 22 to 61 bp between the ndhF and ycf1 genes was detected in 26 Ilex species. However, in I. dasyphylla, I. fukienensis, I. lohfauensis, I. venusta, I. viridis, I. yunnanensis, and I. zhejiangensis, ndhF and ycf1 genes were absent from the IRa/SSC junction. In all Ilex chloroplast genomes, the SSC/IRb junction is located in the ycf1 gene, and the extension in sequence length into the IRb region ranged from 1,047 bp (I. lohfauensis) to 1,166 bp (I. dumosa) (Figure 5).
SSR Polymorphisms and Long Repeat Sequence Analysis
A total of 2,146 simple sequence repeat (SSR) loci were detected among the 41 Ilex chloroplast genomes, with a length variation from 10 to 168 bp (Figure 6, Additional files 2: Table S2). The mononucleotide repeats were most abundant (1,771), while tetranucleotide repeats had the least occurrence (49). In contrast, the number of di-, trinucleotide and compound repeats were 109, 79, and 138, respectively. For mononucleotide repeats, the A/T repeats occurred the most (1769), while C/G repeats were only detected from two taxa, including I. asprella var. tapuensis and I. micrococca. For dinucleotide repeats were represented with only the AT/TA motif; while trinucleotides and tetranucleotides contained motifs AAT/ATT, CAG/CTG, and TTC/GAA, as well as AAAG/CTTT, ATAA/TTAT, ATTT/AAAT, TATT/AATA, and TCTT/AAGA repeats, respectively. Most of the SSRs were primarily located at the LSC region (1649), followed by the IR regions (275), and the SSC region (222).
A total of 2843 large repeats were detected (Figure 7, Additional files 3: Table S3), in which I. crenata contained the highest number of large repeats (79), while I. latifolia had the least (62). All species involved in the long repeat sequence analysis had forward, palindromic, and tandem repeats. While 11 chloroplast genomes contained all four types of large repeats, 30 of them did not come with any complementary and/or reverse repeats.
Codon usage bias
From the total number of 23,092 (I. asprella and I. dumosa) to 23,137 (I. dabieshanensis) encoded codons analyzed in this study, the codon encoded amino acid that recorded the highest frequency was leucine (vary from 10.51–10.57% in different species), while the stop codon had the least occurrence (77 codons, 0.33% in each species) (Additional files 4: Table S4). The alanine-encoded GCU had the highest relative synonymous codon usage (RSCU) value (1.83–1.86), while both the serine-encoded AGC and alanine-encoded GCG were recorded with the lowest RSCU value (0.34–0.36) (Figure 8). Codon bias was not detected in methionine-encoded AUG or tryptophan-encoded UGG and was detected in other amino acid encoded codons (Figure 8, Additional files 4: Table S4). For stop codon, the amino acid UAA was like preferred when compared to UAG. Overall, codons with A/U-ends were more preferred than those with G/C-ends among the amino acids in Ilex species (Figure 8, Additional files 4: Table S4).
Phylogenomic analyses
The phylogenetic trees of both the maximum likelihood (ML) and Bayesian inference (BI) were reconstructed based on the 52 complete chloroplast genomes. The closely related species Helwingia himalaica (NC031370) was selected as outgroup [23]. The total alignment length of the dataset used was 157,836 bp after trimmed. A total of 8,869 variable sites and 1,735 parsimony informative sites were detected in the alignment. The phylogenetic trees based on the ML and BI displayed similar topologies; thus, the posterior probability (PP) values were merged into the ML tree for clearer presentation (Figure 9). The monophyly of Ilex was well supported by the two analyses (Fig. 8; ML BS: 100%; BI PP: 1.00).
Based on our phylogenetic analysis and with consideration of macro-morphological and distribution information, the genus Ilex was categorized into five highly supported clades (Figure 9; ML BS: 100%; BI PP: 1.00). The relationships among the five major clades were well resolved with high resolution (ML BS: 100%; BI PP: 1.00) in the present phylogenetic study. Members of clade D and clade E were sister to the rest, followed by the clade C, which is sister to a clade containing clade A and clade B. The relationships within each clade were also well resolved with high support values.