1.1 Chloroplast genome characteristics of Grevillea robusta
The chloroplast genome of Grevillea robusta is a typical tetrameric ring structure (Fig. 1), with a total genome sequence length of 158,642 bp, and a GC content of 38.13%, of which the LSC, SSC and IR regions were 87,021 bp, 18,961 bp and 26,330 bp, respectively .The GC content of the IR region (43.24%) was significantly higher than that of the SSC region ( The GC content of the IR region (43.24%) was significantly higher than that of the SSC region (31.78%) and the LSC region (36.43%). The Grevillea robusta chloroplast genome contained 129 genes, including 37 tRNAs, 8 rRNAs and 84 protein-coding genes. Grevillea robusta did not differ significantly from the other three genus species of the Proteaceae in terms of genome length, gene content and total GC content (Table 1). Grevillea robusta contained 16 intron-containing genes, clpP and ycf3 genes had two introns, and 14 genes contained one intron (Table 2).
Table 1
Comparison of chloroplast genome characteristics of four genera in Proteaceae
Species
|
G.robusta
|
H.shweliensis
|
M. integrifolia
|
P.kilimandscharica
|
Accession number
|
OK054586.1
|
NC_045942.1
|
KF862711.1
|
MH362765.1
|
Length (bp)
|
Total
|
158,642
|
157,151
|
159,714
|
158,657
|
LSC
|
87,021
|
85,490
|
88,092
|
87,241
|
SSC
|
18,961
|
18,256
|
18,812
|
18,534
|
IR
|
26,330
|
26,701
|
26,405
|
26,441
|
GC content (%)
|
Total
|
38.13
|
38.11
|
38.12
|
38.03
|
LSC
|
36.43
|
36.46
|
36.54
|
36.31
|
SSC
|
31.76
|
31.77
|
31.67
|
31.56
|
IR
|
43.24
|
42.93
|
43.03
|
43.14
|
Gene numbers
|
Total
|
129
|
129
|
129
|
128
|
tRNA
|
37
|
36
|
36
|
36
|
rRNA
|
8
|
8
|
8
|
8
|
Protein-coding
|
84
|
85
|
85
|
84
|
Table 2
Gene composition in the chloroplast genome of Grevillea robusta
Category of Genes
|
Group of Genes
|
Name of Genes
|
Ribosomal RNA
|
rRNA
|
rrn16S(x2),rrn23S(x2),rrn5S(x2),rrn4.5S(x2)
|
Transfer RNA
|
tRNA
|
37 unique trna genes
|
Photosynthesis
|
Subunits of ATP synthase
|
atpA, atpB, atpE, atpF*, atpH, atpI
|
Subunits of photosystem II
|
psbA, psbB, psbC, psbD, psbE, psbF, psbI, psbJ, psbK, psbM, psbN, psbT, psbZ, ycf3**
|
Subunits of NADH-dehydrogenase
|
ndhA*, ndhB*(x2), ndhC, ndhD, ndhE, ndhF, ndhG, ndhH, ndhI, ndhJ, ndhK
|
Subunits of cytochrome b/f complex
|
petA, petB*, petD, petG, petN
|
Subunits of photosystem I
|
psaA, psaB, psaC, psaJ
|
Subunit of rubisco
|
rbcL
|
Self-replication
|
Large subunit of ribosome
|
rpl14, rpl16, rpl2*(x2), rpl20, rpl22, rpl23(x2), rpl32, rpl33, rpl36
|
DNA dependent RNA polymerase
|
rpoA, rpoB, rpoC1*, rpoC2
|
Small subunit of ribosome
|
rps11, rps12(x2), rps14, rps15, rps16*, rps18, rps19, rps2, rps3, rps4, rps7(x2), rps8
|
Other genes
|
Subunit of Acetyl-CoA-carboxylase
|
accD
|
c-type cytochrom synthesis gene
|
ccsA
|
Envelop membrane protein
|
cemA
|
Protease
|
clpP**
|
Translational initiation factor
|
infA
|
Maturase
|
matK
|
Unknown
|
Conserves open reading frames
|
ycf1, ycf15*(x2), ycf2 (x2), ycf4
|
Note: (×2) indicates that the gene is located in IRs and therefore has two complete copies, * and ** indicate that the gene contains 1 / 2 introns. |
1.2 Replication and SSR analysis
Microsatellites, also known as simple repeat sequences (SSRs), are usually 6 bp tandem sequences in the genomes of eukaryotic organisms (Lovin et al. 2009), and the high polymorphism and co-dominant inheritance of SSRs have led to a wide range of applications in different types of molecular markers (Morgante et al. 2002). There are 34 oligonucleotide repeats in the chloroplast genome of Grevillea robusta, of which palindromic repeat sequences are the most abundant with 16, followed by dispersed repeats and tandem repeats, both with 8, and lastly, reverse repeat sequences with 2. We screened 56 SSR loci from the Grevillea robusta chloroplast genome, among which the single nucleotide repeat types were the most (37), and were mainly A/T sequence repeats, which accounted for about 62.50% of the total number of SSRs; there were three dinucleotide repeat types, two trinucleotide repeats, eight tetranucleotide repeats, five pentanucleotide repeats, and one hexanucleotide sequence type.
1.3 Boundary Characterization of the Chloroplast Genomes of Proteaceae
There are four boundaries in the tetrad structure of the chloroplast genome, namely, JLB(LSC/IRb), JSB(IRb/SSC), JSA(SSC/IRa), and JLA(IRa/LSC).Expansion and contraction phenomena of the IR region are very common in the process of the evolution of the chloroplast genome. In the chloroplast genomes of the six species of Proteaceae, the lengths of the IR regions were similar, the types of boundary genes were not significantly different, and the boundary genes of Grevillea robusta were similarly located to those of the four species of Macadamia and Helicia, which suggests that the IR regions of the chloroplast genome of Grevillea robusta did not undergo drastic contraction/expansion events, however, Protea kilimandscharica IRa/LSC boundary gene, trnH, entered the IRa region and was found to be in the IRa/LSC region. the IRa region and generated another copy of trnH in the IRb region. The IR region of Protea kilimandscharica underwent a small expansion event compared to the other five species of Proteaceae.
1.3 Analysis of gene selection pressures
The chloroplast protein coding genes of Grevillea robusta with Helicia shweliensis, Macadamia integrifolia and Protea kilimandscharica were analyzed for gene selection pressure by PAML software, and the results showed that rpl22 and matK had high ka values, and rpl33, ndhF, rpl22 and psaC had relatively high ks values. The Ka/Ks values of many genes are less than 1. It is evident that most of the protein-coding genes are in a state of purifying selection during the course of evolution. However, two genes (ycf1 and ycf2) had Ka/Ks values greater than 1, with Ka/Ks = 1.30 for ycf1 and 1.03 for ycf2. This suggests that they may have been positively selected during evolution.
1.4 RNA editing sites
The statistics of RNA editing sites are shown in (Fig. 5a). In the protein coding sequences of the Grevillea robusta chloroplast genome, a total of 148 RNA editing sites were detected for 42 protein coding genes, among which ndhB possessed the most editing sites (18), followed by psbA (13). In terms of editing efficiency, the number of sites possessing 100% editing efficiency reached 25, and the number of other low-frequency editing sites was also high (Fig. 5b), with a total of 47 sites with editing efficiency of less than 10%. In terms of editing types, all 12 possible editing types were found (Fig. 5c), among which, the highest number of C/U editing was found, accounting for 54.73% of the total RNA editing sites, i.e., 81 sites were edited from base C of the genome to base U of the mRNA strand, followed by T/C, i.e., 17 sites were edited from base T of the genome to base C of the mRNA strand.
1.5 Phylogenetic analysis
We selected representatives at the level of Proteales and constructed a phylogenetic tree based on the entire plastid genome sequence. The results showed that the four families in the order Proteales ( Proteaceae, Sabiaceae, Nelumbonaceae, and Platanaceae) were well clustered. Among them, Sabiaceae is at the most basal position and is the earliest one to be differentiated, followed by Nelumbonaceae. Proteaceae and Platanaceae were closer in affinity and had high support for each node. In the Proteaceae, Grevillea is in the same branch as Macadamia and Helicia with 100% support. Due to the scarcity of chloroplast genome resources in the Proteaceae, we were unable to make more in-depth and specific comparisons of the phylogenetic relationships of Grevillea robusta with species of other genera in the family.