Gene compositions of gymnosperm chloroplast tRNA
In 54 chloroplast genomes of gymnosperms (Table S1), 1,779 tRNA genes were totally annotated, encoding 20 essential amino acids. The chloroplast tRNA gene content of plants was relatively uniform [8]. In this study, the number of chloroplast tRNA genes in each species was approximately 33 in average. Very few species encode only 27 tRNAs, and very few species encode up to 39 tRNAs. Callitris rhomboidea, Dacrycarpus imbricatus and Pseudotaxus chienii encoded 27 tRNAs; Gnetum parvifolium and Macrozamia mountperriensis encoded 39 tRNAs (Table 1).
In the chloroplast genome of each species, almost every tRNA was encoded, but some tRNAs were not encoded in some species (Table 1). tRNAAla was found to be missing in 8 species; tRNAThr, tRNAGlu, tRNAPhe and tRNALeu were found to be missing in 1 species; tRNAVal was found to be missing in 2 species; tRNALys were found to be missing in 14 species; tRNAGln were found to be missing in 5 species. There were more tRNASer, tRNAArg and tRNALeu in the chloroplast genome of all species. tRNASer appeared three times in most species and two or four times in some species, and appeared six times in Nothotsuga longibracteata; tRNAArg and tRNALeu commonly appeared in various species for 2-3 times, tRNALeu appeared six times in Ephedra equisetina. tRNAGly, tRNAPro and tRNAThr were the second largest groups, occurring twice in most species. In contrast, suppressor tRNA and selenocysteine were completely absent in the chloroplast genome of gymnosperm.
In addition, the length of the chloroplast tRNAs ranged from 56 nt to 90 nt while the average length was about 82 nt. tRNAGly (UCC) of Cunninghamia lanceolate is the smallest gene found in this study, only contains 56 nucleotides. The sequence length of tRNALeu, tRNASer and tRNATyr were all more than 80 nt. A few tRNASer were found to reach 90 nt, and tRNAGly (UCC) of Sequoia sempervirens was also 90 nt in length. The length of other tRNA groups was all about 73 nt, but a few were shorter than 70 nt.
Gymnosperm chloroplast tRNAs contain 34 anti-codons
The genetic code was degenerate, in that 20 amino acids were encoded by 61 triplet codes [21], but analyses showed gymnosperm tRNA contain 34 different anti-codons in 1,779 tRNAs, the rest of the 27 anti-codons were not found in any of the tRNAs of the investigated gymnosperm chloroplast genomes (Table 2). The anti-codons encoded in this study include: tRNAAla (UGC), tRNAGly (GCC, UCC), tRNAPro (GGG, UGG), tRNAThr (GGU, UGU), tRNAVal (GAC, UAC), tRNASer (GGA, UGA, GCU), tRNAArg (ACG, CCG, UCU), tRNALeu (GAG, UAG, CAA, UAA), tRNAPhe (GAA), tRNAAsn (GUU), tRNALys (UUU), tRNAAsp (GUC), tRNAGlu (UUC), tRNAHis (GUG), tRNAGln (UUG), tRNAIle (GAU, CAU), tRNAMet (CAU), tRNATyr (GUA), tRNACys (GCA), tRNATrp (CCA). Among them, tRNALeu has the highest abundance of isoreceptor tRNALeu (GAG, UAG, CAA, UAA), followed by tRNASer (GGA, UGA, GCU), tRNAArg (ACG, CCG, UCU) and tRNAIle (GAU, CAU, UAU). In addition, tRNALeu (GAG) present only in Ephedra equisetina; tRNALys (CUU) present only in Cunninghamia lanceolata. tRNAIle (UAU) present only in Taxus baccata. tRNAMet (CAU) presents at least twice in each species. tRNAGly (GCC), tRNAPro (UGG), tRNASer (UGA, GCU), tRNAArg (ACG), tRNAAsn (GUU), tRNAAsp (GUC), tRNAHis (GUG), tRNAIle (CAU), tRNAMet (CAU), tRNATyr (GUA) and tRNATrp (CCA) present in every gymnosperms chloroplast genomes investigated.
Conservation of gymnosperm chloroplast tRNAs
Different tRNAs can transport different amino acids due to their different nucleotide compositions and structures. tRNA sequences were analyzed to find their conserved regions (Table 3). Through comparative analysis of the nucleotide composition of tRNA loops and arms, it was found that there were conserved nucleotides or nucleotide sequences in multiple positions. Analyses revealed that at the first position in acceptor arm, tRNAAla (UGC), tRNAGly (GCC, UCC), tRNAThr (GGU), tRNASer (GCU, GGA, UGA), tRNAArg (ACG, CCG), tRNALeu (CAA, GAG, UAA, UAG), tRNALys (UUU), tRNAPhe (GAA), tRNAAsp (GUC), tRNAGlu (UUC), tRNAHis (GUG), tRNAIle (CAU, GAU), tRNATyr (GUA) and tRNACys (GCA) contain a conserved 5’G nucleotide; but tRNAPro (UGG), tRNAMet (CAU) and tRNAVal (GAC, UAC) contain conserved A nucleotide; tRNAAsn (GUU) and tRNAGln (UUG) were found to contain U nucleotide.But tRNATrp (CCA), tRNAfMet (CAU) and tRNAThr (UGU) were not very conservative at the first position in acceptor arm. And the content of G nucleotides was higher in the region of the acceptor arm. And tRNASer (GCU, UGA) possess conserved nucleotide sequences G-G-A-G-A-G-A in the acceptor arm. At the first position of D-arm, tRNAVal (GAC) and tRNALys (UUU) contain a conserved A nucleotide; tRNATyr (GUA) contain a conserved C nucleotide; tRNAPro (GGG) and tRNAThr (GGU) contain an A or G nucleotide at this position; tRNAMet (CAU) contained no conserved nucleotides at this position; all the others tRNAs were all found to contain a conserved G nucleotide. At the same time, tRNAAla (UGC), tRNAThr (UGU), tRNAVal (UAC), tRNAArg (ACG, CCG), tRNALeu (GAG), tRNAPhe (GAA), tRNAAsn (GUU) and tRNAIle (GAU) contain a conserved nucleotide sequence G-C-U-C; tRNACys (GCA), tRNAHis (GUG) and tRNAGln (UUG) contain a conserved nucleotide sequence G-C-C. The D-loop was found to contain a conserved A nucleotide at the first position, except tRNAGly (GCC), tRNASer (GCU, GGA), tRNALeu (UAA) and tRNAIle (CAU). And the last position of D-loop was very conserved, there was always a conserved A nucleotide except tRNAGly (GCC). Conservative degree of the anti-codon arms were lower since there was no conservative nucleotide in any position (Table 3). The second position of anti-codon loop was found to contain a conserved T nucleotide. The last position of anti-codon loop was mostly a conserved A nucleotide. In addition, it can be noted that the anti-codon loop had a high preference for uracil and adenine. Moreover, nucleotide conservation was very low in the variable region due to its structural variability. But at the last position of the variable region, there were still many tRNAs possess a conserved C nucleotide. The Ψ-arm and Ψ-loop were the most conservative regions whether in nucleotide number or composition of nucleotides. The Ψ-arms all contained five nucleotides, at the last two positions of this region, the vast majority of tRNAs were G nucleotides. The Ψ-loops all contain seven nucleotides, there was found a totally conserved sequence U-U-C; and vast majority of tRNAs had a conserved U nucleotide at the last position.
Possession of an intact CCA was a basic prerequisite for tRNAs to participate in the decoding process [22]. The 3’ terminal of eukaryotic tRNAs generally did not have a CCA sequence, so adding a 3’ CCA tail is an important step in tRNA biosynthesis. In the gymnosperms of this study, tRNAAla, tRNAArg, tRNAGlu, tRNALeu, tRNATyr and tRNALys were found to contain a 3’ CCA tail (Figure. 1), most tRNAs did not have 3’ CCA tails.
Nucleotide variation in the arms and loops of tRNA
Conservation was also reflected in the number of nucleotides in each loop arm of tRNA. Out of all 1,779 tRNAs in this study, the number of nucleotides in the acceptor arm ranges from 0 to 8 (Table 4). Acceptor arms contained 7 nucleotides accounted for the vast majority (93.25%). And 58 (3.26%) tRNAs contained 6 nucleotides in the acceptor arm. The D-arm of most tRNAs contained 3 (34.23%) or 4 (65.65%) nucleotides. However, there were two tRNAs with a specific D-arm contained only one nucleotide, and both the two tRNAs were from Pseudotsuga sinensis var. wilsoniana. The D-loops had 6 to 26 nucleotides. Among the 1,779 tRNAs, 341 (19.17%) of the D-loops contained 7 nucleotides, 281 (15.8%) contained 8, 719 (40.42%) contained 9, 162 (9.11%) contained 10, 249 (14.00%) contained 11, 25 (1.41%) contained 12, one contained 6 nucleotides, and one contained 26 nucleotides. The anti-codon arm was identified as contained 4 or 5 nucleotides and none of the tRNAs possessed less than 4 or more than 5 nucleotides in the anti-codon arm. Most anti-codon loops contained 7 nucleotides (97.98%), the rest contained 9, 10 or 12 nucleotides. The number of nucleotides in the variable region changed significantly, most (1049, 58.97%) contained 5 nucleotides, the others contained 1 (0.17%), 2 (0.39%), 3 (5.45%), 4 (15.91%), 6 (12.65%), 7 (3.54%), 8 (0.06%), 11 (2.08%), 15 (0.06%), 16 (0.73%), 17 (0.06%) or 20 (0.06%). Among all the 1,779 tRNAs, only one tRNAMet had 6 nucleotides in the Ψ-arm, 11 (0.62%) contained 4, while the remaining tRNAs contained 5 nucleotides. All tRNAs possess 7 nucleotides in the Ψ-loop.
Four types of structural changes in tRNA
The general structural characteristics of tRNA were: an amino acid receiving arm, a D-arm, a D-loop, an anti-codon arm, an anti-codon loop, a variable region loop, a TΨC arm and a TΨC loop. However, in this study, some novel tRNA structures were found. These new tRNA structures can be divided into four types (Table 5, Figure. 2). Type 1: lacked of acceptor arm; type 2: the 3’-end contained extra nucleotides; type 3: the variable region contained loops or arms; type 4: the 3’-end contained extra nucleotides while the variable region contained a loop or an arm. Among the tRNA structures that produce changes, type 3 shows obvious conservation. The variable region of tRNALeu (CAA, UAA), tRNASer (GGA, UGA, GCU) and tRNATyr (GUA) in all species had the same structure: extra loops and arms. tRNALeu also possessed an anti-codon UAG, but the variable region of this tRNA did not have such a structure. And in type 4, there were only two kinds of tRNAs: tRNATyr (GUA) and tRNASer (UGA).
We calculated the minimum free energy (△G) of each type of novel structural tRNA as well as some normal structural tRNAs (Table 5). The result shows, the minimum free energy of type 1 tRNAs was -12.6 kcal/mol on average. tRNAGly (UCC) of Cunninghamia lanceolate was very high (△G = -9.6 kcal/mol), it shows that the absence of the acceptor arm leads to a very large impact on the stability of its structure. The minimum free energy of all tRNAs in type 1 was much higher than normal structural tRNAs (△G = -26.5 kcal/mol). Therefore, the absence of the acceptor arm can reduce the structural stability of tRNA in general. The minimum free energy of type 2 was around -19.3 kcal/mol. In type 2, tRNAGly (GCC) of Sequoia sempervirens possess the lowest value (△G = -28.3 kcal/mol), indicates that the presence of extra nucleotides at the 3’ end greatly improves the stability of the structure. On the contrary, tRNAMet (CAU) of Cephalotaxus oliveri possess a highest value (△G = -11.8 kcal/mol), the stability was greatly reduced due to the presence of atypical nucleotides at the 3’ end. The minimum free energy of type 3 was -33.2 kcal/mol on average. The minimum free energy values of these tRNAs were basically below -30.0 kcal/mol. tRNATyr (GUA) always possess a very low value. Therefore, the loops and arms that appear in the variable region work together with the structure of other regions to create an extremely stable tRNA structure. However, compared with other tRNAs in type 3, tRNALeu (CAA) was found to be special because of its higher minimum free energy value. The value of tRNALeu (CAA) was around -26.1 kcal/mol, it’s much higher than the average of type 3 (△G = -32.8 kcal/mol) and close to the normal structural tRNAs (△G = -26.5 kcal/mol). So for tRNALeu (CAA), the structural changes of the variable region did not bring any obvious effects. The minimum free energy of type 4 was -28.3 kcal/mol on average, and the values of each tRNA were quite different, some were above the average value and some were below. Therefore, when the structural changed at the 3’end and the variable region coexist, multiple influences may be caused. Moreover, took the value of normal structural tRNAs (△G = -26.5 kcal/mol) as a reference, type 1 and type 2 were much higher than the reference value; type 3 was lower than the reference value. As can be seen, various changes in the structure of tRNA can impact its stability.
Gymnosperm tRNA were evolved from multiple common ancestors
Constructed a phylogenetic tree used the maximum likelihood method to discuss the evolutionary relationship of all gymnosperm tRNAs (Figure. 3). Phylogenetic tree contained 2 large clusters and 32 small groups. Cluster I was much larger than cluster II as can be seen contained 28 groups while Cluster II contained 4. Not every type of anti-codon occupied a group, and those anti-codons that occured less frequently often appear in the same branch as other anti-codons. Such as tRNALys (CUU), it appeared only once in Cunninghamia lanceolata, then it grouped with tRNAAsn (GUU) together. And tRNAIle (UAU) appeared only once in Taxus baccata, it grouped with tRNAVal (GAC). The same thing happened to tRNALeu (GAG) which appeared twice in Ephedra equisetina, and it grouped with the branch of tRNAIle (GAU). In addition, tRNAThr (GGU) and tRNAPro (GGG) were grouped twice.
At the top clade of this phylogenetic tree, the branches of tRNAfMet (CAU) and tRNATrp (CCA), tRNAAsn (GUU), tRNAArg (ACG), and tRNAArg (CCG) together presented a stepwise evolutionary relationship. However, the other anti-codon UCU of tRNAArg did not appear with tRNAArg (ACG) and tRNAArg (CCG), it co-presented another stepwise evolutionary relationship with tRNAGlu (UUC), tRNAGly (UCC) and tRNALys (UUU). The three anti-codons (GGA, UGA and GCU) of tRNASer occurred simultaneously and grouped together with tRNAGln (UUG) in the same branch. It suggested that tRNAGln and tRNASer possessed a common evolutionary lineage. The three anti-codons (CAA, UAA and UAG) of tRNALeu occurred simultaneously in one branch and they were at the bottom of the phylogenetic tree, indicated that they were the earliest evolved tRNA in the chloroplast genome of gymnosperms. The branches of tRNALeu and tRNAPro (GGG) together formed the second cluster. Therefore, tRNAPro and tRNALeu had a close relationship and were far from the first cluster of tRNA groups. Moreover, tRNAThr (UGU), tRNAVal (UAC) and tRNAAla (UGC) were grouped together, indicated their common evolutionary lineage. Similarly, it can be seen the common evolutionary lineage of tRNAMet (CAU), tRNAThr (GGU) and tRNAVal (GAC) since they were in the same branch. And the branch of tRNATyr (GUA), tRNAPro (UGG), tRNACys (GCA) and tRNAHis (GUG) had the same case. We can also see from the phylogenetic tree that tRNAPhe (GAA) and tRNAIle (GAU) were grouped separately, they each occupied a small branch instead of grouping together with the other types of tRNAs.
The rate of transition was higher than transversion
Transition refers to a change from one purine to another purine (A to G or G to A) or one pyrimidine to another pyrimidine (C to U/T or U/T to C). Transversion refers to a change from one purine to a pyrimidine (A or G to U/T or C) or the opposite (U/T or C to A or G) [23]. Analyzing patterns of base mutations can help understand the molecular basis of evolution. Table 6 showed the transition and transversion rates for each tRNA as well as the overall level of gymnosperm investigated. As can be seen from table 6, tRNAAsp had the highest base transition rate (25.00). tRNAPhe (22.41), tRNATrp (21.50) and tRNAGlu (21.13) also had high transition rates. tRNAHis (11.40) had the lowest base transition rate. tRNAAla (12.53), tRNAGly (12.61) and tRNAMet (13.35) also had low transition rates. In addition, in base transversion events, tRNAAla (6.23), tRNAGly (6.19) and tRNAHis (6.80) had relatively high values. In contrast tRNAGlu (1.94), tRNAPhe (1.29) and tRNATrp (1.75) had low transversion rates. The most special group was tRNAAsp, which transversion rate was 0. Overall, the transition rate was higher than transversion rate, no transversion rate was found to be higher than the transition rate. Similarly, we calculated the values at the overall level of tRNA genes, and the result showed that the transition (18.3) rate was higher than transversion rate (3.19). Moreover, the transition rate was essentially inversely proportional to transversion rate. When a class of tRNA had a higher transition rate, it often had a lower transversion rate.
Duplication and loss events of gymnosperms chloroplast tRNA
After a gene duplication event, a copy of each replicated gene pair tends to happen a loss event. And the gene loss events have been happening all the time [24]. We calculated the duplication and loss events of gymnosperms chloroplast tRNA (Figure. 4). The results showed that 1,333 genes were duplicated whereas 3,657 genes were found to be lost. There were 314 genes happened conditional duplication events. The loss events were far more than duplication events, majority of chloroplast tRNAs underwent loss events during the course of evolution.