Examining the impact of drought at the transcriptome-wide level in guayule
Guayule is a drought tolerant species that has likely evolved a number of physiological mechanisms that enable it to mitigate the effects of drought prevalent in its native environment. To gain an understanding of what genes might be involved in guayule’s drought response mechanisms, we evaluated the guayule accession AZ–3 grown in plots for 29 months in Maricopa, Arizona having two contrasting irrigation regimes, I100% and I25% (Fig. 1a; (29,30). The I100% (or control treatment) was completely replenished with irrigation water, meeting measured evaporative soil water losses, while the I25% received only 25% of the irrigation given to I100%. At the time of collection in March of 2015, the 29-month-old I25% guayule plants were flowering in comparison to those grown at I100%, which were not (Additional file 1: Figure S1). Stem tissue, the predominant location of guayule rubber biosynthesis, was collected from three biological replicates in each irrigation regime for transcriptomic analysis (Fig. 1b).
Given that no guayule genome is currently available for public use, we utilized a previously published de novo assembled transcriptome generated from a mixture of 150 and 300 bp reads (13) for read mapping. This transcriptome contains >200,000 transcripts, suggesting the presence of incomplete or redundant (identical) transcripts. The presence of multiple fragments corresponding to the same transcript might confound our attempts to identify genes that are differentially expressed in response to limited water. The Stonebloom and Scheller transcriptome was filtered in two ways (Fig. 2a), collapsing the transcriptome from 219,819 transcripts to 63,672, a figure congruent with expectations. To ensure that filtering had not removed a significant number of actual transcripts, we mapped our RNA-sequencing data to both filtered and unfiltered transcriptomes and compared the number of reads that mapped to both. No differences were observed in mapping rates (~0.5% improvement in mapping to filtered set over unfiltered; Additional file 2: Table S1), suggesting that the filtered transcriptome would be sufficient for differential expression (DE) analyses.
Differentially expressed genes were identified by comparing the I25% irrigation treatment to the I100%. Of the 63,672 transcripts, 42,711 were expressed (minimum of 0.5 TPM in all replicates) in the control conditions and 43,002 in the samples grown under the limited water. Of these, 251 transcripts were upregulated under the water-limited irrigation regime whereas 393 were downregulated (Fig. 2b and Additional file 3: Table S2; adjusted p-value of 0.01). The transcript most significantly upregulated in the water-limited treatment, GFTW01080018.1 (Fig. 2c), was expressed 23-fold compared to the control treatment (~9 –fold increase observed with qRT-PCR, Additional file 4: Figure S2). In contrast, the transcript most significantly downregulated, GFTW01080137.1 (Fig. 2d), was reduced more than 200-fold to near imperceptible detection levels, a value confirmed by qRT-PCR (Additional file 4: Figure S2).
To gain an understanding of the cellular mechanisms that are involved in guayule’s response to drought, we performed a GO analysis of the significantly up- and down-regulated transcripts. An InterPro ID or shared similarity with an Arabidopsis protein-coding gene allowed us to infer biological processes for 273 of the 393 downregulated, and 163 of the 251 upregulated transcripts (Additional file 4: Table S3). Transcription factors (regulation of transcription) were the most abundant class of both up and down-regulated transcripts (Fig. 3). In agreement with previous data from drought-stressed plants, defense response, trehalose biosynthesis (31), glycosyltransferase activity (32,33), and response to water deficit were among the processes more likely to be upregulated under the water-limited irrigation treatment, whereas isoprenoid/terpenoid biosynthesis, carbohydrate metabolism, and lipid metabolism processes were more likely to be downregulated (Fig. 3).
Next, the most differentially expressed transcripts were assessed. The most significant, highly upregulated transcript, GFTW01080018.1, appears to be orthologous to the Arabidopsis PIP2s (specifically PIP2A, B, and C; Additional file 6: Figure S3), a family of aquaporins important for hydraulic regulation (34). Despite the recovery of numerous PIP2 paralogs in the genomes of Helianthus annuus and Lactuca sativa, two close relatives of guayule within the Asteraceae (35); Additional file 6: Figure S3), and three paralogs in the guayule transcriptome, only one aquaporin was differentially expressed in response to water deficit (I25%). The most significantly down-regulated transcript, GFTW01080137.1, shares sequence similarity to Arabidopsis Cold Regulated Gene 27 (COR27; AT5G42900). Interestingly, in Arabidopsis, COR27 and another cold regulated gene with little sequence similarity, COR28, are positive regulators of flowering (36). In guayule, putative orthologs for both COR27 and COR28 (GFTW01080137.1 and GFTW01127972.1, respectively) are both significantly repressed under water limited conditions, despite the near uniform flowering that was observed for these plants (Additional file 1: Figure S1). Finally, GFTW01028919.1, the transcript that displayed the greatest decrease in transcription (although not the most significant), at > 900-fold (adjusted p-value < 2E–12; Additional file 7: Figure S4) is a putative ortholog of Arabidopsis Terpene Synthase 3 (AT4G16740) and is one of 12 downregulated guayule transcripts involved in isoprenoid/terpenoid biosynthesis (Fig. 3). In sum, guayule’s transcriptomic response to water-limited conditions includes a dramatic increase in aquaporin production and defense response genes, as well as a decrease in terpenoid biosynthesis, carbohydrate metabolism, and oxidation reduction mechanisms.
Examining the evolutionary history of duplicated drought-responsive transcripts
The GO-term analysis revealed that some of the differentially expressed guayule transcripts displayed similarity to the same Arabidopsis gene, suggesting one of three possibilities: 1) an ancient expansion in a stress-responsive gene family, 2) that the transcripts are paralogs that emerged following the cross-hybridization and polyploidy event that gave rise to AZ–3, or 3) that the transcripts contain the same functional domain but bear no phylogenetic relationship. Specifically, 127 guayule stress-responsive transcripts clustered, in sets of 2 - 4 transcripts each, with 56 Arabidopsis genes. For example, the downregulated guayule terpene synthase ortholog (GFTW01028919.1) groups with AT4G16740 along with two other guayule transcripts (GFTW01072004.1 and GFTW01017460.1). We first determined if the guayule transcripts were indeed the product of a gene duplication by examining codon-guided multiple sequence alignments. Transcripts associated with roughly half (n = 27) of the Arabidopsis gene clusters either did not share a recent evolutionary past (sequence identity < 50%) or there was not enough evidence to support a gene duplication (e.g., guayule gene fragments that did not overlap one another in the alignment). The three guayule transcripts within the terpene synthase cluster with AT4G16740 shared sufficient sequence similarity to proceed forward to phylogenetic analysis, whereas three guayule transcripts that shared similarity with an Arabidopsis mitogen-activated protein kinase (MAPK16, AT5G19010) exhibited little to no similarity outside of the kinase domain and were not considered further.
To determine the timing of the guayule gene duplication events associated with the remaining 29 Arabidopsis gene clusters, we took a comparative and evolutionary approach, searching the genomes of sunflower (H. annuus; (35)) and lettuce (L. sativa; (37)) for homologs to the stress-responsive guayule transcripts and their putative Arabidopsis orthologs. We then inferred phylogenies for each of these gene families to determine when the observed gene duplication occurred. Two whole genome triplication events are shared between sunflower and guayule, with an additional, species-specific whole genome duplication event occurring in each species (Fig. 4a). Thus, we examined the resulting phylogenies for two patterns that would indicate that the guayule transcripts were the result of an Asteraceae (or earlier) duplication event (Fig. 4b, left; “Asteraceae event”). In this scenario, each of the guayule transcripts would be immediately-sister to a sunflower gene. In the event that the transcript duplication was AZ–3 specific, we would observe the duplicated transcripts first sister to each other and then to a sunflower gene (Fig. 4b, right; “AZ–3 event”). Of the 20 Arabidopsis gene clusters comprised of down-regulated guayule transcripts, 13 contained transcripts where the gene duplication was inferred to be an Asteraceae event (Fig. 4c, purple bar), 7 arose from an AZ–3 event (Fig. 4c, blue bar), and two gene clusters contained both types of duplication events. Of the nine Arabidopsis gene clusters comprised of up-regulated guayule transcripts, three of the paralogs arose from an Asteraceae event, whereas six where AZ–3 specific (Fig. 4c). One example of a AZ–3 event can be seen in the putative guayule orthologs of AT1G01060 (LHY), a transcription factor that regulates flowering and circadian rhythm (Fig. 4d, blue box). These transcripts, all of which were significantly upregulated, fall sister to one another in the phylogeny with strong bootstrap support. In contrast, the terpene synthase gene cluster, contained two guayule transcripts that were each sister to multiple sunflower genes (Fig. 4e, purple box).
Duplication and expression do not necessarily imply that the resulting transcript is capable of encoding for a protein. Pseudogenization or neo-functionalization of a locus (protein-coding gene -> long non-coding RNA) can occur through the disruption of a protein-coding gene’s open reading frame (ORF). We examined each of the gene clusters for loss of ORF integrity in at least one (but not all) of the duplicate guayule transcripts. We found that 6/20 of the down-regulated gene clusters had experienced a pseudogenization event that left them with a single protein-coding gene, whereas 7/9 up-regulated gene clusters were left with a single protein-coding transcript (Fig. 4c, tan bars). Thus, it appears that a number of stress-responsive paralogs with intact ORFs have been retained through multiple speciation events, suggesting they may help guayule mount a response to drought conditions.
A role for long non-coding RNAs in guayule’s drought response
The identification of stress-responsive transcripts that are no longer protein-coding raises the possibility of uncovering long non-coding RNAs (lncRNAs) that are also differentially expressed under the water-limited irrigation regime. While not as extensively studied in plants as in vertebrate systems, a number of plant lncRNAs have been reported to differentially expressed in response to abiotic and biotic stress (38–42), where, among many functions, they can act as regulators of transcription, microRNA sponges, and influence alternative splicing (25,43,44). Although not differentially expressed under the imposed irrigation treatments, a homolog of the deeply conserved light responsive lncRNA, HID1 (45), was present in the guayule transcriptome (Fig. 5a). As expected based on prior analyses, the protein interaction domain annotated as SL2 was highly conserved between Asteraceae, Arabidopsis, and rice (Fig. 5a), suggesting a potentially shared role for this lncRNA across flowering plants. In addition, the identification of a guayule HID1 demonstrates that the Stonebloom and Scheller transcriptome captured polyadenylated lncRNAs as well as protein-coding transcripts.
To identify putative lncRNAs, we focused on the set of differentially expressed transcripts that bore no similarity to any known protein domains (Fig. 5b). We then removed potential transposable elements (TEs) and known housekeeping RNAs (rRNAs and spliceosomal RNAs). To be conservative in our lncRNA identification, we also removed any transcripts that overlapped a protein-coding gene in the H. annuus genome, as these guayule transcripts may reflect incompletely assembled protein-coding genes resulting from technical difficulties of de novo transcriptome assembly. Following these filters, we recovered 31 putative lncRNAs that were down-regulated and 39 that were up-regulated in response to drought (a complete list can be found in Additional file 8: Table S4).
We then took an evolutionary approach to identify putative lncRNAs for which we could recover sequence homologs in other species under the premise that conservation implies functionality (26). Of the 70 guayule putative lncRNAs, we identified a sequence homolog for 14 in the sunflower genome (Fig. 5c). We uncovered evidence of conservation for three lncRNAs in the lettuce genome, suggesting that these loci emerged at least ~39 million years ago. Four of the fourteen sunflower conserved lncRNAs were also annotated as lncRNAs in that system, with one also annotated as a lncRNA in lettuce, lending additional confidence in their lncRNA designation (Fig. 5c).
Next, an attempt to assign a function to these putative lncRNAs beyond “stress-responsive” was made. Our experimental design lacked depth to attempt a “guilt-by-association” analysis, and the absence of a guayule genome precludes the association between a lncRNA and the neighboring protein-coding gene it might regulate. Therefore, we focused on whether the set of guayule lncRNAs might be involved in sequestering miRNAs away from their intended targets, or in miRNA or phasiRNA, biogenesis. Using psRNAtarget (46), we predicted whether miRNAs might bind to the 14 lncRNAs for which we identified sequence homologs in sunflower. We then scanned the homologous locus in sunflower (and in lettuce) for conservation of the miRNA binding site. Using this approach we identified six lncRNAs with conserved miRNA binding sites (Fig. 5c; Additional file 8: Table S4). One of the guayule lncRNAs conserved and annotated as a lncRNA in both sunflower and lettuce, GFTW01168370.1, harbors a completely conserved binding site for miR166 (Fig. 5d), a microRNA associated with tissue development and whose knockdown in Arabidopsis leads to an enhanced drought response (47). As a miRNA sponge, GFTW01168370.1 would act to recruit miR166 away from its intended target, in short mimicking the knockdown response reported in Arabidopsis. Thus, within the dataset of drought-responsive transcripts, a subset was identified that showed the hallmarks of being lncRNAs. Several of these lncRNAs contain conserved miRNA binding sites, with one in particular likely helping to mediate the guayule drought response.