Background Data for a “Sense of Place”
To test the hypothesis that the transcript abundance of grape berries during the late stages of ripening differed in two locations with widely different environmental conditions, we compared the transcript abundance of grape berry skins in BOD and RNO. The vineyards were originally planted in RNO in 2004 and in BOD in 2009. The RNO vines were grown on their own roots, whereas the BOD vines were grafted on to SO4 rootstock. A vertical shoot positioning trellis design was used in both locations. There were a number environmental variables that differed between the two locations. BOD is located at a slightly more northern latitude than RNO. This resulted in slightly longer day lengths in BOD at the beginning of harvest and slightly shorter at the end of harvest (Table 1). On the final harvest dates, the day length differed between RNO and BOD by about 30 min.
RNO had warmer average monthly maximum temperatures than that in BOD, but minimum September temperatures were cooler in RNO (Table 1). Thus, RNO had a larger average daily day/night temperature differential of 20°C, whereas BOD had a smaller average daily day/night temperature differential of 10°C during the harvest periods. RNO had warmer day temperatures by about 6°C and cooler night temperatures by about 4°C than that of BOD.
The RNO site was much drier than the BOD site (Table 1). September monthly precipitation totals were 65.5 and 2.03 mm and had average relative humidities of 74 and 34% for BOD and RNO, respectively. The soil at the BOD site was a gravelly soil with a pH of 6.2 and the soil at the RNO site was a deep sandy loam with a pH of 6.7. No pathogen, nutrient deficiency or toxicity symptoms were evident in the vines.
Transcriptomics
The analysis of transcript profiles of Cabernet Sauvignon grapes harvested in RNO in September of 2012 was previously described [4]. Individual berry skins were separated immediately from the whole berry and the individual total soluble solids (°Brix) level of the berry, which is mostly composed of sugars, was determined. The Cabernet Sauvignon berry skins from BOD were harvested in a similar manner as the RNO berry skins. The BOD berry skins were harvested from the middle of September in 2013 until the end of the first week of October (Table 1). Berry skins were separated and analyzed in the same manner as in RNO with the same Illumina technology. Grapes were harvested at a lower °Brix range in BOD (19.5 to 22.5°Brix) than in RNO (20 to 26°Brix) because fruit maturity for making wine is typically reached in the BOD region at a lower sugar level.
Transcript abundance of the RNA-Seq reads from both RNO and BOD was estimated using Salmon software [24] with the assembly and gene model annotation of Cabernet Sauvignon [25, 26]. The TPM (transcripts per million) were computed for each gene from each experimental replicate (n = 3) from berry skins at different sugar levels ranging from 19 to 26°Brix (Additional File 1). Principal component analysis of the transcriptomic data showed clear grouping of experimental replicates with the largest separation by location (principal component 1 (PC1) = 51% variance) and then °Brix (principal component 2 (PC2) = 22% variance) of the berry skin samples (Fig. 1).
To get different perspectives of the data, three approaches were used to further analyze the transcriptomic data. One focused on expression at one similar sugar level in both locations. Another identified a common set of genes whose transcript abundance changed in both locations. And the third one was a more comprehensive network analysis using all of the sugar levels and the two locations.
We chose two very similar sugar levels to determine the differential gene expression between the two locations, since sugar levels were not exactly the same at harvest. We identified 5528 differentially expressed genes (DEGs, at an FDR padj-value < 0.05) between the two locations in approach 1 at the sugar level closest to the 22°Brix level (21.5°Brix in BOD and 22°Brix in RNO) using DESeq2 [27] (Additional file 2). DEGs will refer to this set of differentially expressed genes throughout this manuscript. Gene set enrichment analysis using topGO determined the top gene ontology (GO) categories for biological processes for these 5528 genes (Additional file 3). Based on the number of genes identified, the top GO categories were cellular metabolic process (3126 genes, padj-value = 2.3E-03), biosynthetic process (2371 genes, padj-value = 7.7E-09), and response to stimulus (2324 genes, padj-value = 1.21E-26). Other important and highly significant categories were response to stress (1514 genes, padj-value = 5.69E-24) and developmental process (1280 genes, corrected p-value = 8.09E-12). There were 910 GO categories in total that were significantly enriched (Additional file 3). The relationship between the top 25 GO categories can be seen in Additional file 4. We use the term “significantly” throughout this text to mean statistically significant at or below a padj-value of 0.05. Amongst the top stimulus subcategories with the largest number of genes were response to abiotic stimulus (950 genes; padj-value = 9.1E-29), response to endogenous stimulus (835 genes, padj-value = 1.43E-21; 256 of which were related to response to abscisic acid), response to external stimulus (719 genes, padj-value = 1.08E-24), and biotic stimulus (520 genes, padj-value = 5.29E-22). Some other significant environmental stimuli GO categories included response to light stimulus (234 genes), response to osmotic stress (171 genes), and response to temperature stimulus (158 genes).
In approach 2, we examined which gene expression was changing with °Brix level in both locations to identify a common set of genes differentially expressed during berry development with very different environmental conditions. The significant differences in transcript abundance in each location was determined with DESeq2 using the lowest °Brix sampling as the control. For example, the control sample in RNO was the lowest sugar sampling at 20 °Brix; the transcript abundance of the three higher °Brix samplings were compared to the transcript abundance of the control. The genes that had significantly different transcript abundance relative to control in at least one of the comparisons were identified in RNO and BOD. These gene lists were compared and the common gene set consisting of 1985 genes for both locations was determined (ap2 tab in Additional file 5). Comparing this common gene list (ap2) to the DEGs from approach 1 identified 907 genes that were common to both sets, indicating that this subset was differentially expressed between the locations at 22°Brix. The other 1078 genes did not differ significantly between locations. This 1078 gene subset list can be found in Additional file 5 (ap2-ap1 tab). The GO categories most enriched in this gene set included response to inorganic substance, response to abiotic stimulus and drug metabolic process.
In approach 3, using a more powerful approach to finely distinguish the expression data for all sugar levels, Weighted Gene Coexpression Network Analysis (WGCNA) identified gene sets common to (based upon correlation) and different gene expression profiles between BOD and RNO. All expressed genes for all °Brix levels (Additional file 1) were used in this analysis. Additional details of the analysis are described in the Materials and Methods section. Twenty-one modules or gene subnetworks were defined (Additional file 6) and a heat map was generated displaying the module-trait relationships (Additional file 7). The grey module is not a real module but a place to put all genes not fitting into a real module; thus, it was not counted as one of the twenty-one gene modules above. Eight modules had similar gene expression profiles for BOD and RNO (padj-value > 0.05); these included cyan, midnightblue, pink, green yellow, salmon, blue, grey60, and royalblue. This gene set consisted of 8017 genes (see ap3 tab in Additional file 5 for the gene list). Comparing this common gene set from the WGCNA with the DEGs from approach 1 revealed that 524 genes in common were found in both sets. This subset was removed from the WGCNA to produce a gene list of 7492 common to both locations and not differing in their transcript abundance at 22°Brix (ap2-ap1). This represents 25% of the total 29,929 genes in all of the modules. This gene set was compared with the ap2-ap1 gene set from approach 1 and 845 genes were found in common in both sets. The remainder from ap2-ap1 provided an additional 232 genes to the common set of genes from ap3-ap1 not affected by location giving a total number of 7724 genes, representing 25.8% of the genes expressed. This gene set is listed in ap2-ap1_union_ap3-ap1 tab in Additional file 5. The GO categories most enriched in this gene set included general categories such as organic substance biosynthetic process and organelle organization. There were 785 enriched GO categories in total.
In approach 3, further analysis of the gene modules using gene set enrichment analysis was performed with genes that had a kME > 0.80 for each module in the WGCNA (Additional file 8). The similar gene sets in common with both locations with decreasing transcript abundance as sugar levels increased (negative correlation with °Brix) were enriched with the GO categories involving growth and water transport (blue module), and translation (grey60 module). The common gene sets with increasing transcript abundance as sugar levels increased were enriched with the GO categories involving gene silencing (cyan module), aromatic compound metabolism (midnight blue), organic substance catabolism (pink module), and DNA metabolism (salmon).
Most modules were positively or negatively correlated with BOD and RNO berries (e.g. black, yellow, red, turquoise, etc.). The turquoise module was the largest module and consisted of 5029 genes; it had the most positive and negative correlations for BOD and RNO, respectively (Additional file 7). This gene set was similar to the DEGs defined by DESeq2 with the largest differences between BOD and RNO at 22°Brix. Gene set enrichment analysis of genes within the turquoise module having a kME of 0.80 or higher (1090 genes) revealed many common GO categories with the DEGs (Additional file 8); 81% (481 of 594) of the GO categories from the turquoise module subset were also found in the 910 GO categories of the DEGs (53% of total). Some of the most enriched GO categories in the turquoise module were organic acid metabolism, flavonoid metabolism, lipid biosynthesis, response to abiotic stimulus, isoprenoid metabolism, response to light stimulus and photosynthesis. The gene expression profiles of this module declined in transcript abundance with increasing sugar levels (negative correlation with °Brix).
The yellow module was another large module (3008 genes) that was the second most positively correlated with BOD. This module was highly enriched with GO categories involving biosynthesis, defense responses and catabolic processes. The WRKY75 gene (g104630; this g# term is used as an abbreviated gene loci name in the Cabernet Sauvignon genome throughout this paper) was in the top 4 hub genes (kME = 0.97) in the yellow module (Additional file 6). WRKY75 is a transcription factor that positively regulates leaf senescence. It is induced by ethylene, ROS (reactive oxygen species) and SA (salicylic acid) and is a direct target of EIN3 (ethylene insensitive 3) [28].
The green (2287 genes) and brown (4147 genes) modules were also large modules that were most positively correlated with RNO (0.92 and 0.9, respectively). The green module was highly enriched in the GO category involving response to chemical. The brown module was highly enriched in GO categories involving multiple catabolic processes.
Thus, the WGCNA results, which utilized all of the expressed genes from all °Brix levels were consistent with the DESeq2 results that only compared transcript abundances at 22°Brix or between locations. The WGCNA results were more comprehensive and complemented the results of approaches 1 and 2 by identifying hub genes and gene subnetworks. These subnetworks were linked and further defined by their highly correlated coexpression profiles and enriched GOs.
Transcriptomic profiles dynamically changing with sugar levels
DEGs with largest increases in transcript abundance between sugar levels
As a first approach to examining the 5528 DEG dataset, differences between the transcript abundance in berries with the lowest and highest °Brix levels at the two locations were determined (Additional file 2). Eight examples of the many DEGs with the largest transcript abundance differences from BOD and RNO (EXL2 (exordium like 2, g068700), HB12 (homeobox 12, g223410), BSMT1 (benzoate/salicylate methyltransferase 1, g336810), HAD (haloacid dehalogenase-like hydrolase protein, g070140), STS24 (stilbene synthase 24, g435870), NAC073 (NAC domain containing protein 73, g125400), TPS35 (terpene synthase 35, g087040), and MAT3 (methionine adenosyltransferase 3, g013310)) were selected and are presented in Figure 2. The major point of showing this plot is to highlight the general trends of continuously increasing transcript abundance with sugar levels for these genes; half of these genes start at similar transcript abundance levels around 20°Brix for both BOD and RNO berries and increase in transcript abundance at a higher rate in BOD grapes as sugar levels increase. The other half increased at approximately the same rate for both locations but had higher amounts at the BOD location at the same sugar levels. These data were fitted by linear regression to compare the slopes of the lines. The slopes were significantly higher for EXL2, BSMT1, STS24, and TPS35 for BOD as compared to RNO berries, but not for the other four genes (data not shown). The significantly increased rate of change for transcript abundance in the BOD berries indicated that the berries in BOD may have ripened at a faster rate relative to sugar level.
To get deeper insights into these dynamic gene sets from BOD and RNO, gene set enrichment analysis of the top 400 DEGs with the greatest increase in transcript abundance from the lowest sugar level to the highest sugar level was performed for each location. The top 400 DEGs for BOD berries were highly enriched in biosynthetic processes involving amino acid and phenylpropanoid metabolism (Additional File 9); defense responses, response to fungus and response to ethylene stimulus were other highly enriched categories. The top 400 DEGs for RNO berries were enriched in response to oxygen-containing compound, response to hormone and response to abscisic acid (Additional File 10).
DEGs with largest decreases in transcript abundance between sugar levels
Eight examples of DEGs with the greatest decrease in transcript abundance with increasing sugar levels are presented in Figure 3. They include lipid and cell wall proteins (e.g. extensin like and lipid transferase proteins) and an aquaporin (TIP1;1, tonoplast intrinsic protein 1;1). The data were fitted to linear regressions and the slopes statistically compared between BOD and RNO berries. In some cases, the slopes of the linear regression lines of the DEGs were not statistically different (bifunctional inhibitor lipid transfer protein and DUF642 (domain of unknown function 642); but in the other cases presented, there were similar amounts of transcript abundance around 20°Brix, but there were significantly different slopes. Again, there is a trend for the transcript abundance of many of these genes to change more significantly in BOD berries than RNO berries relative to sugar level.
Differences in autophagy genes between BOD and RNO
Berry ripening involves autophagy [4]. Generally, there was an increase in the transcript abundance of genes involved in autophagy as sugar level increased and the transcript abundance in BOD berries was higher relative to RNO berries at the same sugar level (Fig. 4). These trends are consistent with the hypothesis that BOD berries ripened faster than RNO berries relative to sugar level.
Aroma- and flavor-associated DEGs
Many aroma and flavor-associated compounds are synthesized in the late stages of berry ripening and sensitive to the environment. The major metabolic pathways affecting flavor and aromas in grapes and wines include the terpenoid, carotenoid, amino acid, and phenylpropanoid pathways [6]. These pathways were identified by topGO to be highly enriched in the DEGs and the turquoise module. Some of the DEGs are involved in these pathways and will be presented in the following subsections.
Terpene synthase genes with the greatest transcript abundance differences between BOD and RNO
Cultivar differences in berries are often ascribed to differences in aroma compounds. One of the main classes of cultivar specific aroma compounds is the terpene group [29]. The transcript abundance of a number of terpene synthases were higher in BOD berries as compared to RNO berries (Fig. 5). All but one of these (terpene synthase 55; TPS55) increased in transcript abundance with increasing sugar levels. TPS35 is a b-ocimine synthase (b-ocimine is a main component of snapdragon flower aroma [30]). TPS08 is a g-cadinene synthase, TPS26 is a cubebol/d-cadinene synthase, TPS4 and 10 are (E)-a-bergamotene synthases, and TPS07 and 28 are germacrene-D synthases; these enzymes produce sesquiterpenes found in essential plant oils (see [29] and references therein for the function of all of these terpenoid genes). TPS55 is a linalool/nerolidol synthase which synthesizes acyclic terpene alcohols; linalool contributes significantly to the floral aromas of grape berries and wines. TPS68 is a copalyl diphosphate synthase involved in diterpenoid biosynthesis and TPS69 is an ent-kaurene synthase. Both TPS68 and 69 are diterpene synthases and are part of the ent-kaurene biosynthesis pathway.
Other terpenoid and carotenoid metabolism-related genes
Carotenoid metabolism is another biosynthetic pathway that contributes to flavor and aroma in grapes [31]. There are a number of key genes that contribute to terpenoid and carotenoid metabolism that have a higher transcript abundance (Additional file 2) in BOD berries as compared to RNO berries. For example, DXR (1-deoxy-D-xylulose 5-phosphate reductoisomerase, g360850) catalyzes the first committed step and HDS (4-hydroxy-3-methylbut-2-en-1-yl diphosphate synthase; g379980) enzyme controls the penultimate steps of the biosynthesis of isopentenyl diphosphate (IPP) and dimethylallyl diphosphate (DMAPP) via the methylerythritol 4-phosphate (MEP) pathway. Other examples are two phytoene synthases (PSY), g180070 and g493850); PSY is the rate-limiting enzyme in the carotenoid biosynthetic pathway.
Amino acid and phenylpropanoid metabolism genes
Amino acids contribute to the aroma and flavor of grapes and wines [7]. The amino acid metabolism functional GO category is highly enriched in the group of DEGs between BOD and RNO (Additional file 3) and more specifically in the top 400 BOD DEGs (Additional file 9). Some examples of genes involved in amino acid metabolism that have a higher transcript abundance in BOD berries (see Fig. 6) are phenylalanine ammonia lyase 1 (PAL1, g533070 and eight other paralogs can be found in Additional file 2), which catalyzes the first step in phenylpropanoid biosynthesis, branched-chain-amino-acid aminotransferase 5 (BCAT5, g220210), which is involved in isoleucine, leucine and valine biosynthesis, 3-deoxy-D-arabino-heptulosonate 7-phosphate synthase 1 (DHS1, g082490), which catalyzes the first committed step in aromatic amino acid biosynthesis), and tyrosine aminotransferase 7 (TAT7; g116950), which is involved in tyrosine and phenylalanine metabolism. Included in this group were 44 stilbene synthases (STS), which are part of the phenylpropanoid pathway; these STSs had a higher transcript abundance in BOD berries as compared to RNO berries, with very similar transcript abundance profiles to PAL1 (see Additional file 11 for two typical examples).
DEGs associated with abiotic stimuli
Light-responsive genes
In a previous analysis, WGCNA defined a circadian clock subnetwork that was highly connected to transcript abundance profiles in late ripening grapevine berries [4]. To compare the response of the circadian clock in the two different locations, we plotted all of the genes of the model made earlier [4]. Most core clock genes (Additional file 12) and light sensing and peripheral clock genes (Additional file 13) had significantly different transcript abundance in BOD berries than that in RNO berries at the same sugar level (profiles bracketed in a red box). All but one of these (PHYC, phytochrome C, g088040) had higher transcript abundance in BOD berries relative to RNO berries. The transcript abundance of other genes had nearly identical profiles (not bracketed in a red box). These data are summarized in a simplified clock model (Fig. 7), which integrates PHYB as a key photoreceptor and temperature sensor [32, 33] that can regulate the entrainment and rhythmicity of the core circadian clock, although to be clear it is the protein activity of PHYB, not the transcript abundance that is regulating the clock.
Chilling-responsive genes
Temperatures were colder in RNO than BOD, reaching chilling temperatures in the early morning hours. A number of previously identified chilling-responsive genes [34] in Cabernet Sauvignon had a higher transcript abundance in RNO berries as compared to BOD berries (Fig. 8). These genes included CBF1 (C-repeat/DRE binding factor 1, g435450; previously named CBF4, but renamed to be consistent with the ortholog of Arabidopsis), a transcription factor that regulates the cold stress regulon [35], IDD14 (Indeterminate-Domain 14, g000790), a transcription factor that generates an inhibitor to regulate starch metabolism [36], CML41 (calmodulin-like 41, g041290), that encodes a calmodulin-like protein, CYSB (cystatin B, g023260), a cysteine proteinase inhibitor that confers cold tolerance when overexpressed [37], XTH23 (xyloglucan endotransglucosylase/hydrolase 23, g572510), that encodes a cell wall loosening enzyme, and SULTR3;4 (sulfate transporter 3;4, g392710).
DEGs associated with biotic stimuli
The top DEGs in berries from BOD were highly enriched in the GO category for biotic stimuli including genes encoding pathogenesis proteins (PR). The transcript abundance of such genes in BOD berry skins was higher than those in RNO berries (Fig. 9). The transcript abundance of PR10 (g212910) increased with increasing sugar level. This gene responds to powdery mildew (Erysiphe necator) inoculation in Cabernet Sauvignon leaves [38]. There were other genes induced by powdery mildew that were also at much higher transcript abundance levels in BOD berries than in RNO berries, such as a PR3 protein that is a class IV chitinase, a thaumatin-like protein (PR5) and a number of the stilbene synthases (see the phenylpropanoid metabolism section above and Additional file 11). MLA10 (Intracellular Mildew A 10, g343420; Affymetrix probe set 1615715_at in [38]) matches to a fungal protein from E. necator. In that study, it was used as a control probe set to detect the presence of powdery mildew [38]. There was a higher transcript abundance of g343420 in BOD berries than that in RNO berries. These results indicate that there may have been a higher powdery mildew infection in BOD berries along with a higher induction of the phenylpropanoid pathway.
DEGs associated with hormonal stimuli
Auxin signaling genes
Auxin transport (38 genes; padj-value = 4.43E-08) and cellular response to auxin stimulus (45 genes, padj-value = 9.12E-05) were highly enriched GO categories for the DEGs (Additional file 3). Auxin is known to have multiple effects on grape berry ripening [39, 40]. Auxin can delay berry ripening at the veraison stage, which is at the beginning of berry ripening. Some auxin metabolism (GH3.1; GH3 family protein, g538930) and signaling genes such as IAA13 (indole acetic acid 13, g527400), IAA27 (g326620), and ARF5 (auxin-response factor, g075570) had a higher transcript abundance in RNO berries (Additional file 1). Other auxin metabolism (GH3.6, JAR1 (jasmonate resistant 1), g170030) and auxin signaling genes had a higher transcript abundance in BOD including ARF2 (g469780), ARF8 (g180460), ARF11 (g380160), IAA16 (g318830), ARAC1 (Arabidopsis RAC-like 1, g320970), and GID1B (gibberellic acid insensitive dwarf1B, g071190), a gibberellin receptor (Additional file 1).
ABA metabolism and signaling genes
ABA is a stress hormone that responds to water deficits in grapevine [41]. A number of ABA-related genes are differentially expressed in berry skins between the two locations (Additional file 14). NCED3 (nine-cis epoxycarotenoid dioxygenase 3, g221190) and NCED5 (g404590), which are responsive to water deficit [42, 43], had higher transcript abundance in RNO and NCED6 (g203160), which is highly expressed in embryos [42], had higher transcript abundance in BOD. NCED6, but not NCED3 is involved in seed ABA and seed dormancy [43]. Additionally a number of other genes involved in the ABA signaling pathway had higher transcript abundance in BOD including ABF2 (abscisic acid responsive elements binding factor 2, g286950), ABF4 (g312300) and ABCG40 (adenosine triphosphate binding casette G 40, g143240) [44, 45]. Interestingly, BAM1 (Barley Any Meristem 1) was identified to be the receptor to a root signaling peptide hormone (CLE25, clavata3/esr-related 25) that responds to water deficit and upregulates NCED3 transcript abundance in Arabidopsis leaves [46]. The transcript abundance of BAM1 (g220020) was significantly higher in RNO berries than that of BOD berries (Additional File 14). There were no significant differences in the transcript abundance of CLE25 (g007470); it was highly variable.
Ethylene signaling genes higher in BOD
There were 71 DEGs that were enriched in the response to ethylene GO category (Additional file 3). Ethylene is a stress hormone that responds to many types of biotic [47] and abiotic [48] stresses in addition to its role in fruit development and ripening [49]. Many ethylene-related genes had a higher transcript abundance in BOD berries. These included ethylene biosynthesis, ethylene receptors and ERF (ethylene response factor) transcription factors (Fig. 10). ERF1 and ERF2 are at the beginning of the ethylene signaling pathway and are direct targets of EIN3 [45, 50]. Other ERF transcription factors (e.g. ERF98; g156210) identified as hubs in the ethylene signaling pathway in Arabidopsis leaves [51] were also differentially expressed in a similar manner as ERF1 (g060690) and ERF2 (g482650) between the two locations (data not shown).
DEGs associated with mineral nutrients
Iron-related genes
Fourteen DEGs were associated with genes enriched in response to iron ion (Additional file 3); Eight examples of DEGs involved in iron homeostasis are shown in Figure 11. Iron homeostasis genes SIA1 (salt-induced ABC kinase, g336700), VIT1 (vacuolar iron transporter 1, g001160), ATH13 (Arabidopsis thaliana ABC2 homolog 13, g146610), IREG3 (iron regulated 3, g098530), and ABCI8 (ATP-binding cassette I8, g163790) have higher transcript abundance in BOD berries than in RNO berries. Iron homeostasis genes YSL3 (yellow stripe-like 3, g223320), FER1 (ferritin 1, g606560), and NRAMP3 (natural resistance-associated macrophage protein 3, g413920) had higher transcript abundance in RNO berries compared to BOD berries. Several other ferritin genes were expressed similarly to FER1 (data not shown). Average available iron soil concentrations were about 5 times higher in the BOD vineyard soil compared to the RNO vineyard soil (Table 1).