Assembly and annotation of the chloroplast genomes of Chimonobambusa hirtinoda.
Assembly resulted in a whole cp genome sequence of C. hirtinoda with a length of 139, 561 bp (Fig. 1), and consisted of an 83, 166-bp large single-copy region, a 12, 811-bp small single-copy region, and two 21,792-bp IR regions, respectively, comprising the typical quadripartite structure of terrestrial plants. The cp genome of C. hirtinoda was annotated with 130 genes, including 85 protein coding genes, 37 tRNA genes, and 8 rRNA genes (Table 1). Most of the 15 genes in the C. hirtinoda cp genome contain introns; of these, 13 genes contain one intron (atpF, ndhA, ndhB, petB, petD, rpl2, rpl16, rps16, trnA-UGC, trnI-GAU, trnK-UUU, trnL-UAA, trnV-UAC) and only gene cyf3 includes two introns, while the intron of the gene clpP was found to be deleted(Supplementary TableS1). Unusually, it was determined that the rps12 gene contained two copies, and the three exons were spliced into a trans-splicing gene 18.
Table 1
Summary of the chloroplast genome of C. hirtinoda.
Genome features
|
C. hirtinoda
|
Genome size (bp)
|
139,561
|
LSC size (bp)
|
83,166
|
SSC size (bp)
|
12,811
|
IR size (bp)
|
21,792
|
GC content (%)
|
38.9%
|
No. of genes
|
130
|
No. of PCGs
|
85
|
No. of tRNA
|
37
|
No. of rRNA
|
8
|
Note that the accD, ycf1, and ycf2 genes are missing in the cp genome of C. hirtinoda, and that the introns in the genes clpP and rpoC1 have been lost. This phenomenon is consistent with previous systematic evolutionary studies on the genome structure of plants in Poaceae 19. Such a phenomenon of missing genes has also been reported in other plants 20-23.
The total GC content found for the C. hirtinoda cp genome was 38.90%, The content for each the four bases A, T, G, and C was 30.63%, 30.46%, 19.57%, and 19.33%, respectively (Table 2). The LSC region (36.98%) and SSC region (33.21%) have much lower values than that in the IR region (44.23%), indicating that distribution of the content in the cp genome is not uniform. This is probably because there are four rRNAs in the IR region, which in turn makes the GC content higher in the IR region. These values were similar to cp genome results previously reported for some Poaceae plants 24, 25.
Table 2
Base composition in the C. hirtinoda choloroplast genome.
Region
|
Length(bp)
|
A (%)
|
T (%)
|
G (%)
|
C (%)
|
GC (%)
|
LSC
|
83,166
|
31.24
|
31.78
|
18.76
|
18.22
|
36.98
|
SSC
|
12,811
|
36.02
|
30.78
|
16.17
|
17.04
|
33.21
|
IRA
|
21,792
|
27.96
|
27.81
|
21.19
|
23.04
|
44.23
|
IRB
|
21,792
|
27.96
|
27.81
|
21.19
|
23.04
|
44.23
|
Total genome
|
139,561
|
30.63
|
30.46
|
19.57
|
19.33
|
38.90
|
CDS
|
60,531
|
29.63
|
30.85
|
21.20
|
18.3
|
39.53
|
Repeat sequences and codon analysis.
SSR consists of approximately 10-bp-long base repeats and is widely used for exploring phylogenetic evolution and for genetic diversity analysis 26-29.
In total, 48 SSRs were detected in C. hirtinoda, including 27 mononucleotide versions, which accounted for 56.25% of the total, mainly comprised of A or T. There were 4 dinucleotide repeats comprised of AT/TA and TC/CT repeats, and 3 tri, 13 tetra, and 1penta-repeats shown (Fig. 2A). From the perspective of SSR distribution, the vast majority of SSRs are found in the LSC area, with 38 (79%); in the IR region there are 6 (13%) and in the SSC region there are 4 (8%), respectively (Fig. 2B). Previous research reports suggest that the distribution of SSR numbers in each region and the differences among locations in terms of GC content are related to the expansion or contraction of the IR boundary30.
The REPuter program revealed that the cp genome of C. hirtinoda was identified with 61 repeats consisting of 15 palindromic repeats, 19 forward and no reverse and complement repeats (Fig.3). We notice that repeat analyses of three Chimonobambusa genus species showed a total of 61- 65 repeats, and there is only one reverse in C. hejiangensis. Most of the repeat lengths between 30 to 100 bp and almost all the repeat sequences were located in either IR or LSC region31 (Supplementary TableS2).
There were 20,180 codons identified in the coding region of C. hirtinoda (Fig. 4, Supplementary TableS3). Among these, the codon AUU of Ile was most widely used, and the codon TER of UAG was least often used, not counting the termination codons (817 and 19). Of those amino acids encoded by codons, Leu had the highest presence with 2,170 and TER was lowest at 85. A relative synonymous codon usage (RSCU) value greater than 1.0 means a codon is used more frequently 32. The RSCU values for 31 codons exceeded 1 in the C. hirtinoda cp genome, and of these the third most frequent codon was A/U with 29 (93.55%), and the codons ending in C and G had values of 1 (3.23%) and 1, respectively (3.23%).
Comparative analysis of genome structure.
The nucleotide variability (Pi) values for of the three cp genomes found in the Chimonobambusa genus species ranged from 0 to 0.021 with an average value of 0.000544, as found from analysis with the software package DnaSP 5.10. In Figure 5 that there are clearly five high peaks in the two single-copy regions, and the highest peak is in the trnT-trnE-trnY region of the LSC region. The Pi value for LSC and SSC is significantly higher than that of the IR region. In the IR region, no highly different sequences are found, and this is a highly conserved region. The sequences of these highly variable regions have also been reported in other plants during examinations for species identification, phylogenetic analysis, and population genetics research 33-35.
The structural information for the complete cp genomes among three Chimonobambusa genus species examined showed that those sequences in most regions were mostly conserved (Fig. 6). It can be seen from Figure 6 that the LSC and SSC regions show a large degree of variation, far higher than for the IR region, and the noncoding region demonstrates higher variability than is found in the coding region. In noncoding regions, 7-9k, 28-30k, 36k and other gene loci differ greatly. In the protein coding region, genes rpoC2, rps19, ndhJ and other regions show high differences. However, the agreement between the tRNA and rRNA regions is almost 100%. A similar phenomenon has also been reported by others36.
IR contraction and expansion in the chloroplast genome.
There are four regions and four boundaries in the cp genome of plants. During the process of species evolution, the stability of the two IR region sequences is ensured by the IR region of the chloroplast genome expanding and contracting to some degree, and this adjustment becomes the main reason for changes in chloroplast genome length37, 38.
It can be seen from Figure 6 that the three Chimonobambusa genus chloroplast genomes were found to highly similar in organization, gene content and gene order. The size of IR ranges fom 21, 797 bp (C. tumidissinoda) to 21,835 bp (C. hejiangensis). The ndhH gene spans the IRa/SSC boundary and has a duplication of 181–224 bp in the IRa region. The gene rps15 is located in the IR region (Fig. 7).
No inversion or translocation is found in the six genome sequences by mauve alignment, and the sequence is the same blocks, indicating that the cp genomes of the six species have not gene rearrangements (Fig. 8)
Phylogenetic analysis.
We performed phylogenetic analysis with both the complete chloroplast genomes and matK gene and observed complete chloroplast genome performed better to identify related species, consistent with previous study39. The maximum likehood (ML) analysis indicated 7 nodes with fully branch support (100% bootstrap values), however the three Chimonobambusa genus with moderately supported relationship as a result of less samples use, which supported C. hirtinoda to be closely related with C. tumidissinoda with 62 % bootstrap value more than C. hejiangensis. The result of phylogenetic tree based on matK gene showed that Chimonobambusa species clustered in one branch was consistent with the phylogenetic tree constructed by the complete cp genome tree (Fig. 10).