Tandem duplication of Rca gene occurred after the divergence of Poales families
With the objective to accurately determine origin of the tandem duplication in the RCA gene, we evaluated the Rca gene structure in three distantly related monocot species: banana (Musa acuminata), date palm (Phoenix dactylifera) and pineapple (Ananus comosus; family: Bromeliaceae; order Poales). Unlike the grass family, no tandem duplication of the Rca gene was observed in any of these species. Instead, these species contained three Rca gene copies on separate chromosomes (table S1). The Rca gene copies in banana were located on chromosomes 2, 10 and 11, whereas in date palm, the copies were present on two different unplaced scaffolds NW_008246537.1 (1.6 Mbp) and NW_008246711.1 (0.506 Mbp) and were not tandemly duplicated as was seen in grasses. The gene copies in pineapple were located in the linkage groups (LG) LG14, LG18, and LG23. Otherwise, the structure of the gene copies was similar to those of other higher plants. In all three species, the exon sizes and relative order were highly conserved among the three copies as was seen in other plant species. Major variation was however observed for intron length (Fig. 2b). Based on these observations, we concluded that the gene duplication event occurred after the divergence of the Poales families and is probably specific to the Poaceae.
Rca gene structure among cereals
After evaluating Rca gene structure among various grass family members, we identified and sequence confirmed Rca gene copies from seven (O. brachyantha, O. australiensis, O. sativa, T. aestivum, Z. mays, S. bicolor, and Setaria italica) species. Since rice showed major variation for the Rca gene structure, we also obtained sequence of seven cultivated rice accessions and two wild rice species (Oryza australiensis and Oryza brachyantha). For each accession, both genomic and cDNA copies were cloned and sequenced. While the Rca2 sequence from one of the accessions was reported earlier (To et al., 1999; Scafaro et al., 2016), we cloned and sequenced the Rca1 gene from O. australiensis (GenBank Acc. MH115971), and O. brachyantha (GenBank Acc. MH220418), and Rca2 gene from O. brachyantha (GenBank Acc. MH660917).
The structural comparison of Rca1 gene copy across various cultivated and wild rice species revealed presence of large deletions in the exons 2, 3, 4, 5, 6 and 7 which appeared to have caused frameshift mutation in the Rca1 of cultivated rice species (Fig. 3). In other species, between 11 to 13 independent deletions were detected in all four exons, resulting in 157 to a total loss of 218 nucleotides. These Rca1 deletions appear to have occurred multiple times, but in different time periods during the evolution. For example, O. barthi and O. glumipatula have a 15 bp deletion on the third exon, whereas O. nivara, O. sativa, and O. rufipogon showed a 39 bp deletion (Fig. 3; Box1). Similarly, the fourth exon of Rca1 of O. nivara, O. sativa, and O. rufipogon contains two additional deletions of 13 and 24 nucleotides (Fig. 3; Box 2), which were absent in other wild species. Additionally, species specific indels in the Rca1 gene were also identified. The six bp addition in the second exon of O. punctata Rca1, a 25 bp deletion in the fourth exon of O. nivara, and a nine bp instead of the 14 bp deletion in the second exon of O. glumipatula Rca1 occurred probably after the divergence of the species. However, it is not clear how the 24 bp deletion is absent in O. sativa ssp. indica but present in most other species including O. sativa ssp. japonica. Interestingly, the intron lengths of Rca1 were determined before the major Oryza species divergence and maintained thereafter (Fig. 3). Alignment of RCA2 amino acid sequences of these species showed 100% similarity, except for O. brachyantha and O. punctata, which had only 95.5% and 97.4% similarities with the other Oryza species respectively.
Rca1 gene structure in Oryzoideae subfamily
From the comparative analysis of Rca genes in Poaceae, it was evident that both Pooideae and Panicoideae subfamilies have two functional Rca gene copies present in tandem (Fig. 1). The OsRca1 copy contained multiple small deletions that seem to have occurred after the divergence of these sub-families. Comparison of the two wild and seven cultivated rice species revealed the presence of a putative Rca1 gene in all these species. Structurally, the Rca1 gene in O. australiensis, O. brachyantha, and O. punctata contained four exons with slight size variation for the three introns (Fig. 3). The remaining five species including the two accessions of cultivated rice showed length of the three introns were remarkably conserved. The exons of these five species showed 11 to 13 deletions (Fig. 3). Nine of these deletions were common to all five species and the remaining were specific to one or more species. At the predicted protein level, the largest protein among these three species was predicted to be 466aa in O. punctata. The predicted protein in O. australiensis is 465aa as compared to 458 in O. brachyantha. Unlike the RCA1 of Pooideae members, these predicted peptides also contain the light sensitive CTE domain as identified in the RCA1 of the Panicoideae members.
Thermostability conferring amino acids in RCA1 isoform of cereals
Previous studies showed that Arabidopsis transgenic plants expressing RCA variants with certain amino acid substitutions were heat tolerant, and the modified Arabidopsis RCA showed up to 10°C increase in stability and Rubisco activation activity (Kurek et al., 2007). One such substitution T274R (Threonine 274 Arginine) in the variant 183H12, which alone could provide the maximal heat stability and activity, was identified in all three copies of TaRCA1 and ZmRCA2 (Fig. 6). Other natural variants observed in the same position were glutamine (Q) in SbRCA1, SbRCA2, and ZmRCA1, and lysine (K) in ObRCA1 and ObRCA2 of O. brachyantha, and in all of the RCA2 isoforms of rice and wheat. Based on these results, we hypothesize that the isoform expressed during heat stress may have better thermal stability than the isoforms expressed at normal conditions because of the putative thermal tolerant amino acid substitutions. The role of these natural RCA variants in thermal tolerance is not known, and further experiments are needed to identify the significance of these sequence changes in providing thermal tolerance.
Phylogenetic analysis of monocot RCA
Phylogenetic analysis performed with the alignment of RCA protein from distantly related monocots, using the MEGA7 maximum likelihood method, revealed different clusters of RCAs, based on the sequence similarity and evolutionary pattern (Fig. 7). In deep nodes, the phylogenetic relationship is unclear, and the bootstrap (BS) values are low (< 70%). The comparative analysis showed how the duplicated Rca genes evolved in cereals. Thus, we did not collapse the deep nodes with lower statistical support. Based on the phylogenetic analysis, RCA isoforms of banana, date palm and pineapple, and the cereal isoforms, formed distinct clades, with cereals further comprising of RCA1 and RCA2 from the tandemly duplicated genes. Although the RCA isoforms of banana (MaRCA1, 2 and 3), date palm (PdRCA1 and 2), and pineapple (AcRCA1, 2 and 3) were clustered together, the subgroup pattern within the cluster suggests that the genes coding these isoforms were duplicated or triplicated after the divergence of monocot orders. Similarly, the RCA1 and RCA2 of cereals formed different clades, but within each clade, the isoforms of closely related species, or species from a shared family, showed less divergence with high BS values. The branch and individual leaf lengths show the rate of substitution, the RCA1 isoforms of wild rice and Panicoideae species showed a higher rate of changes than did the RCA1 of Pooideae. Together, these results suggest that after the divergence for a common monocot ancestor, the Rca gene copy number increased either through polyploidization or tandem duplication and evolved at different rates in the studied species.
Expression pattern of RCA isoforms in cereals
Detailed analysis of 302 ESTs from banana (Musa acuminata) and 331 from pineapple (Ananus comosus) showed that in both banana and pineapple, only one of the three copies codes for RCA-α, whereas the other two copies code for only the β isoform (Supplementary Table S1). EST sequences showed no evidence of alternative splicing in the RCA-α coding copy, suggesting a possible loss of the splice junction during evolution. In comparison, alternate splicing was observed in date palm (Phoenix dactylifera) based on the analysis of 305 ESTs. One of the two Rca copies, PdRca1, codes for both RCA-α and RCA-β isoforms via alternative splicing, whereas the other gene copy (PdRca2) codes only for the RCA-α isoform.
Evaluating the EST data, MaRca1-α, MaRca2-β, and MaRca3-β of Banana has 101, 101 and 100 ESTs respectively; PdRca1-α, PdRca1-β and PdRca2-α of date palm has 103, 102 and 100 ESTs respectively and AcRca1-α, AcRca2-β and AcRca3-β of pineapple has 110, 115 and 106 ESTs respectively. Based on this data, the three splice variants appear to be equally expressed in these three plants. With the objective to compare expression of the gene(s) at the RNA level with that at the protein level, and to identify the relative contribution of the two tandemly duplicated Rca gene copies, we performed gene specific expression analyses using quantitative Real Time PCR (qRT-PCR) for both Rca1 and Rca2 genes of O. brachyantha, O. australiensis, O. sativa, T. aestivum, Z. mays, and S. bicolor at normal as well as at high temperature conditions (Fig. 5). In cultivated rice (O. sativa), either under heat stress or normal plant growth conditions, no expression was observed for the Rca1 copy in either of the two cultivars that were used for the analysis (Fig. 5a). O. australiensis showed some expression of Rca1 under normal growth conditions and the expression reduced both under 35oC and 45oC treatment. O. brachyantha on the other hand showed a different expression pattern for Rca1 where its expression was very low under normal plant growth conditions but increased under the heat stress conditions. Rca1 expression was however significantly less than that for Rca2. The Rca2 copy in cultivated rice showed very high expression level under normal plant growth conditions and its expression reduced dramatically under heat stress conditions. There was significant difference between the two cultivated rice lines for the expression of Rca2 copy. The two rice wild relatives showed dramatically higher expression for Rca2 under heat stress. Even between the two wild relatives, O. brachyantha showed significantly higher expression at 45oC treatment as compared to O. australiensis.
Heat stress also affected the alternative splicing mechanism, as the reduction in the levels of transcripts coding RCA2α varied during heat stress (Fig. 4). In rice, OsRca2 gene expresses OsRca2α and OsRca2β transcripts via alternate splicing mechanism. At normal conditions, the level of OsRca2β transcripts was many hundred-folds higher than that of the OsRca2α transcripts, but at 42°C, the OsRca2α transcript level was either equal or higher than that of the OsRca2β transcripts. Interestingly, ObRca2α transcripts were stable at all studied temperatures, however, ObRca2β showed more than two-fold reduction at higher temperatures. These results suggest that the temperature increase, at least among various rice species, favors transcripts coding RCA2α isoform. Even though the transcripts encoding OsRCA2α and OsRCA2β showed a significant reduction at 45°C in rice, the quantity of both the isoforms from the heat stressed leaves is comparable or higher to their control leaves (Fig. 4).
The expression pattern of the Rca gene copies in wheat was very different from that observed in rice. Rca1 showed essentially no expression at normal plant growth conditions, but its expression increased dramatically under heat stress conditions (Fig. 4b). The expression pattern was very similar among the three cultivars (Chinese Spring, Giza168 and PBW343) although significant differences were observed among cultivars. In all three cultivars, transcript level for Rca2 copy was significant at all three temperature conditions although its level at 37oC was significantly higher than either of the other two conditions. Genotypic differences were significant among the three genotypes at each temperature condition. Between the two isoforms of Rca2, transcript level of Rca2α isoform was much lower than the Rca2β form and stayed about the same over the three temperature treatments (Fig. 4b). The Rca2β form showed significant increase at 37oC treatment but reduction at 42oC.
Both in maize and sorghum, Rca1 copy made only the alpha isoform of the gene whereas the Rca2 copy made the beta form (Fig. 4c and d). In both of these species, the beta form was made at all three temperatures although significant differences were observed both among temperatures as well as among genotypes. The alpha form showed essentially no accumulation at 25oC, slight increase at 35oC treatment and dramatic increase at 45oC. In general, the genotypic differences for relative expression of the two isoforms were much higher than that observed among the wheat genotypes.
Using the same plant samples as that used for real-time expression analysis, RCA protein level was also studied using immunoblot analysis with an RCA specific antibody (Fig. 5). Except for wheat, all of the studied cereals expressed a 42 kDa RCAβ isoform at normal as well as under heat stress conditions whereas the 46 kDa RCAα isoform was expressed only during the heat stress conditions (Fig. 5). In comparison to Rubisco, the expression level of wheat RCAβ was significantly higher at the normal temperature as compared to that under heat stress. Relative to RCAβ, expression of the RCAα is much lower at normal conditions but is undetectable after the heat treatment. Although it is difficult to accurately quantify, expression levels of both Rubisco, as well as RCA of rice, maize, and sorghum, increased after 45°C treatment as compared to that at the normal temperature. Wheat, on the other hand did not show the 46 kDa RCAα isoform under heat stress conditions along with a significant reduction in the RCAβ level. With the current analysis, it was not possible to determine if the 42 kDa RCAβ isoform of wheat is expressed from TaRca1 or the TaRca2 copy.
Regulatory elements in the Rca genes of cereals
As the two Rca gene copies of cereals showed very different expression pattern, we looked for differences in expression control elements between promoters of the two copies. We evaluated regulatory elements in ~ 1.5 kb region upstream of the transcription start site of the two Rca gene copies of the studies species along with the intronic sequences, with a focus on heat regulated elements (material and methods). Along with the core promoter elements, this analysis showed various putative cis-elements including that for circadian rhythm, light, hormone and biotic and abiotic stress responsives (Table S2). The Rca1 copy showed insertion of heat shock elements (HSE) in its promoter with wild rice (O. brachyantha) having three HSEs. These HSEs were missing in the Rca1 promoter of cultivated rice (O. sativa). Interestingly, Rca2 of O. sativa have HSEs in its promoter and in the 5th intron. Transposable element (TE) of size 366bp, 347bp and 320 bp were present in the promoter of Rca1, promoter of Rca2 and in the 5th intron of Rca2, respectively (Fig. 1). In wheat, Rca1 promoter of the A and the B genome showed one HSEs and the D genome has two. There were no HSEs present in the promoter of Rca2 of wheat but promoter of D genome showed insertion of 162 bp TE. In maize (Z. mays) and sorghum (S. bicolor), both Rca1 and Rca2 had insertion of HSEs with sorghum having maximum of 5 HSEs in the promoter of Rca1. Promoter of Rca2 had two HSEs in sorghum and one in maize. TE were also present in the promoter of Rca1 in both maize and sorghum (Fig. 1)