Global expression patterns
We use a replicate population of the DSPR comprising >800 RILs. This population was developed from eight inbred founder lines that have been fully genetically characterized (full sequences, the haplotype structure inferred, ~1.2 million SNPs identified, and the RILs genotyped at >10,000 SNPs). We generated a single outbred panel from 835 RILs by intercrossing the lines for five generations. Resulting flies were reared on three experimental diets (DR, C, and HS) for 10 days post-eclosion before isolation of total RNA from pools of 100 female fly tissues (head, body and ovary pair) in six replicates for each tissue-diet combination (Fig. 1). These 54 RNA samples (18 for each diet) were sequenced single end, generating a total of 35,572 transcripts, out of which 18,678 remained for analysis after removal of transcripts with a variance across samples of less than one [73]. Overall expression levels were generally consistent across diet treatments and tissues (Fig. 2a). One sample (bodies, B2) in the DR treatment showed slightly lower median expression compared to the rest, but was similar enough to the others and was retained in the analysis.
To assess global expression patterns over tissues and diets we performed principal components analysis (PCA) on all samples using an expression matrix from which batch effects had been removed (Fig. 2). A similar figure prior to batch removal is shown in Additional file 1. As expected, tissue effects strongly dominated variance in the first two components which jointly accounted for 94% of the total variance. PC1 which explains 65% of the variance in expression presents non-overlapping separation of tissue expression, although body and head expression appear somewhat similar compared to the ovaries. PC2 (29%) distinguishes expression in bodies from that in heads and ovaries.
Differential gene expression in response to diet
We used DESeq2 to quantify differential gene expression in head, ovary and body samples obtained from adult flies held on C, DR, and HS diet treatments. We obtained lists of genes significantly differentially expressed due to the main effect of diet. After filtering out genes with a low overall count, a total of 12,614 genes remained in the experiment based on which we report all subsequent results. Of these, 2,475 genes (19.6%, Additional file 2) were differentially expressed in response to diet treatment, and 978 (7.8%, Additional file 3) for the interaction between diet and tissue (LRT, Padj < 0.05). The overall expression differences are visualized for each tissue and diet pair in Figure 3. Overall, relative to the C diet, many genes in all organs were expressed in the same direction in the DR and HS diets, meaning that the genes that have increased expression in the DR diet tend to also have increased expression in HS, and vice versa. This is indicated by the positive relationship between the fold changes for each of these diets (bodies: r = 0.64; heads: r = 0.59; ovaries: r = 0.59) and the proportion of genes that trend in the same direction for these two diets (i.e. number upregulated in both + number downregulated in both divided by the total number of genes; bodies: 0.70; heads: 0.82; ovaries: 0.66). However, this observed relationship between fold changes could be a result of comparing two ratios that are both calculated relative to the same reference diet (C), as randomly generated data will produce a positive relationship between these quantities and greater than 50% would be expected to show a fold change in the same direction. Several lines of evidence suggest this trend is biologically meaningful and not simply a result of comparing ratios. First, PCAs performed for each tissue separately show that clusters for DR and HS diets overlap for both bodies and heads, while the C diet forms its own cluster (Fig. 4). For ovaries, all three diets form separate clusters. Second, we calculated fold changes using both other diets as the reference diet and compared the correlation and proportion of genes trending in the same direction. In all cases, the correlation we observe between the DR and HS fold changes relative to C are higher than the correlations we observe for the other pairs of diets (Additional file 4). This also held true when comparing the proportions of genes that trend in the same direction for bodies and heads. In ovaries, the highest proportion trending in the same direction was observed for HS and C relative to DR (Additional file 4). Third, we performed 100 permutations of our expression data randomizing across the diets but constraining this to two randomly selected samples from each diet to ensure we obtained null datasets with no expectation of a diet effect and calculated pairwise fold changes, which allowed us to calculate empirical p-values (see Methods for details; Additional file 1). Only the comparison between DR and HS showed significant relationships, with no other comparison yielding a p-value less than 0.1 for either the correlation or the proportion trending in the same direction (Additional file 4. For heads, the proportion trending in the same direction is significantly greater than expected by chance (empirical p = 0.01). For ovaries, the correlation is significantly greater (empirical p = 0.04) and for bodies, the correlation is marginally significant (empirical p = 0.08). This general trend suggests a similar change in global transcription pattern in response to both the DR and HS diets relative to the C diet, despite their very different compositions by weight and subsequently their caloric content. Further, the 2,475 DEGs for the main treatment effect were distributed across all diet-tissue combinations (Fig. 5), making it challenging to narrow down to a smaller list of genes for further examination.
Gene set enrichment analysis
We performed gene set enrichment analysis (GSEA) on the significantly differentially expressed genes (i.e. 2,475 DEGs) for the main effect of diet, using the fold changes for each diet-tissue combination to identify pathways and gene sets which were significantly perturbed relative to all DEGs in the model. Of these pairwise comparisons, only DR versus HS in bodies and DR versus C in bodies showed evidence for significantly enriched gene sets/pathways at an FDR Padj. < 0.05 (Benjamini & Hochberg procedure). We identified four pathways showing gene set level changes for bodies in DR relative to HS: Metabolic pathways (two-sample t-test, mean change = 5.38, FDR = 2.94e-06), Carbon metabolism (two-sample t-test, mean change = 3.31, FDR = 2.26e-02), Oxidative phosphorylation (two-sample t-test, mean change = 2.95, FDR = 4.52e-02), and Protein processing in endoplasmic reticulum (two-sample t-test, mean change = 2.83, FDR = 4.52e-02, Additional file 1). Notably, metabolic pathways (dme01100), which was most significantly enriched, is a large group of pathways in the KEGG database (https://www.genome.jp/kegg-bin/show_pathway?dme01100). At the default threshold (FDR Padj. < 0.1) in GAGE, ten more pathways appeared for DR relative to HS in bodies (Additional file 5). These additional pathways encompass three main metabolic themes: carbohydrate, amino acid and protein, and drug/xenobiotics. For the comparison of DR vs C in bodies, oxidative phosphorylation (dme00190) was significantly enriched (two-sample t-test, mean change = 3.2, FDR Padj. = 7.36e-02).
Further, we examined GO term gene set enrichment for biological process (BP) to capture significant diet-related differences occurring below the level of pathway. Four terms were enriched at an FDR Padj < 0.01. Small molecule metabolic process was enriched for the DR vs HS comparison in bodies (mean change = 4.49; Padj = 5.84e-3). Cell communication (mean change = 5.10; Padj = 1.83e-4), signaling (mean change = 5.06; Padj = 1.83e-4), and signal transduction (mean change = 4.56; Padj = 1.37e-3) were all enriched for the HS vs C comparison in heads. At an FDR Padj. < 0.05, 41 unique enriched terms were observed , of these, 34 terms were enriched for HS relative to C diet in heads (Additional file 5). These terms highlighted a broad range of themes including signaling, metabolism, growth, cytoskeleton, gene expression and development. Three terms were enriched for HS relative to C in bodies, including cell communication, signaling, and system process. The remaining six terms were all for the HS diet relative to DR in bodies, all within one theme of metabolism (acid, small molecule, carbohydrate). No terms were enriched for the comparisons in ovaries. To understand broader inclusive processes represented by these GO terms, we evaluated our list for ancestral terms using QuickGO (EMBL-EBI https://www.ebi.ac.uk/QuickGO/). Nine ancestral terms at the same hierarchical level immediately below category BP were observed (metabolic process, biological regulation, cellular process, localization, response to stimulus, cellular component organization, multicellular organismal process, growth, and developmental process). Among these, metabolic process, cellular process, and developmental process had the most connections to child terms. Our GSEA analysis therefore highlights multiple pathways and biological processes triggered by diet changes, especially in bodies and heads, and encompassing broad themes from metabolism to signaling to homeostasis, but none of the canonical nutrient sensing pathways such as IIS/TOR and FOXO signaling pathways. Notably, our results do not show particular enrichment of diet-associated terms in ovaries, at least for biological processes.
Diet-induced gene coexpression
Next, we asked how diet treatment affected the correlation patterns among genes (i.e. co-expression) across samples. To identify sets of genes that are highly correlated in their expression patterns (or modules), we performed hierarchical clustering on a batch-controlled, rlog transformed expression data including all replicate samples over all treatments using WGCNA [74]. We first computed a matrix of pairwise correlations for all genes on which we performed hierarchical clustering to produce module assignments. We then used a resampling procedure to determine if genes were correctly assigned to modules (see Methods for details and literature). Setting the minimum module size to 30 genes, a total of 31 modules were detected (range gene number 39 - 3,240), with 1,049 unassigned genes (grey module). After merging highly similar modules (i.e. eigengene correlation r > 0.9, see methods), 21 modules were identified with an additional module holding all unassigned genes (Additional file 5).
To appreciate module-level effects of diet and tissue on coexpression, we visualized eigengene expression across diets (Fig. 6, Additional file 6). It is clear from these plots that some modules showed greater diet by tissue interaction effects than others (e.g. e, f, m, q, s and v). These modules show either reduced or increased expression for one or two tissues in one or two diets. To gain better insight into these intra-modular effects of diet and diet-tissue interaction, we fit an analysis of variance model (ANOVA) to module eigengenes. For the main effect of diet, all modules turned up significant (FDR Padj. < 0.05), except modules c (Fig. 6). Similarly, for the effect of the interaction between diet and tissue, all modules showed a significant effect (FDR Padj. < 0.05), except module a.
Focusing on the modules showing a statistically significant interaction effect, and divergent expression profiles in one or more diets for a given tissue (), several distinct patterns became apparent: 1) generally reduced expression in the DR diet for ovaries and bodies unlike the rest of diets (Fig. 6e,f, k and s), 2) increased expression in the DR diet for ovaries and bodies (i, m), 3) elevated expression in bodies in both DR and HS diets (v), and 4) different responses in all three diets (g, r). An attempt to isolate specific diet-tissue combinations driving the interaction effect using post hoc tests revealed large numbers of highly significant combinations. We therefore explored the modules further via functional enrichment to identify the processes driving these coexpression patterns.
We conducted functional analysis on all modules to identify enriched GO terms (Bonferroni corrected enrichment P values, Additional file 7). Of 12,614 Entrez identifiers in our experiment, 10,334 mapped in current GO categories (see methods), and therefore used as a background list for enrichment analysis in WGCNA. A large number of terms were obtained across CC, MF and BP categories: 658 terms (P < 0.01), and 791 terms (Bonferroni corrected P < 0.05) (Additional file 7). A visual inspection of enriched terms in the 21 robustly assigned modules confirmed a large diversity of highly significantly enriched biological processes in most modules, ranging from nuclear processes to membrane and cytosolic processes; from structural to signaling and immune response processes; and from pigmentation to homeostatic processes (Additional file 7).
The first module (Fig. 6a) which included 2,956 showed 291 GO terms (Bonferroni corrected, Padj. < 0.01), and had the most significantly enriched terms (i.e. > 60 terms ranged between Padj. < e-156 - <e-15). This module was characterized by greater eigengene expression in ovaries compared to heads and bodies, although the diet effect was subtle but significant. Nuclear and intracellular organelle processes including gene expression, and RNA processing were key tissue (ANOVA, P < 2e-16) and diet (ANOVA, P < 0.002) effects independently regulated (i.e. no interaction effect). With reference to the trends described above (Fig. 6), those modules showing generally reduced expression in the DR diet for ovaries and bodies (e,f, k and s), are associated with biological processes including signaling (e, Padj. <1.1e-10), cellular component organization (k, Padj. <5.8e-09), nervous system development (f, Padj. <1.3e-14), signaling and protein localization on Golgi apparatus (s, Padj. <3.0e-06). Interestingly, expression increase in DR in bodies and heads compared to ovaries is related to ubiquitin-dependent proteolytic processes in the proteasome (i, Padj. <1.8e-08), and cytosolic vesicle transport/mitochondrial activities (m, Padj. <8.9e-156). Module (v, Padj. <1.1e-21) was interesting because bodies show monotonic increase in expression from C to DR to HS, a trend that may relate to the GO term chitin-based cuticle structure development (Padj. = 5.78e-30), indicating cuticular remodeling in stressful diets (DR and HS), presumably to accommodate gain or loss of body mass.
Analysis of our modules therefore revealed a large number of biological processes (BP), molecular function (MF) and cellular components (CC) (Additional file 7), suggesting that response to diet changes in natural D. melanogaster involves a multi-system response rather than one or a few signaling pathways that can be very different in different tissues.
Previously implicated pathways
Several pathways have been well-studied in the context of diet-induced phenotypic changes (see Background). We specifically examined these pathways in order to characterize the transcriptional response. We obtained a precomputed list of genes for key metabolic pathways (fbgn_annotation_ID_fb_2019_02.tsv): IIS (55 genes), TOR (152 genes), and FOXO (110 genes, which includes the sirtuins). Of 317 pathway genes, 47 (14.8%) were significantly differentially expressed for the diet effect (Additional file 8), representing fewer pathway genes compared to 2,475 of 12,614 of all genes, genome-wide (i.e. 19.6%). However, our GSEA results presented above did not show pathway level enrichment of any of these pathways as defined in KEGG Pathway Database (https://www.genome.jp/kegg/pathway.html), although as mentioned above, a handful of member genes were differentially expressed. These genes that were differentially expressed however, showed mixed modular membership in our clustering results. Similarly, GO analysis on module genes did not enrich categories specifically including ‘insulin’, ‘TOR’ and ‘FoxO’ in the term. Figure 7 shows pattern of expression across diets, of the insulin peptide and sirtuin genes that are widely cited to act regulate IIS/TOR and AMPK. From these results we could not link any of these pathways with particular modules within our data. Importantly, we did not find evidence for significant whole pathway perturbation of IIS/TOR and the downstream FOXO signaling in our co-expressed modules.
Parsing previously identified QTL for the response to diet
A useful application of genome-wide expression data is to identify possible regulatory variants underlying QTL. In a previous study, we used a multivariate approach to identify a global expression QTL for the response to diet treatment of 52 genes in the IIS/TOR pathways [64] that we hypothesized was influencing the expression of many of the genes in these pathways. After performing a discriminant function analysis predicting diet (DR or C) from expression measured on whole flies of 52 genes in the IIS/TOR pathways, we mapped the difference in discriminant function to identify this eQTL. The eQTL interval, as defined by the Bayesian credible interval, contains 327 genes, making it challenging to narrow to possible candidates. We therefore searched for differentially expressed genes identified in this study that fall within this eQTL region of interest. Of these, we find 49 genes are differentially expressed in the different diet treatments and 13 show a diet by tissue interaction, with 5 of these genes showing both a main treatment effect and an interaction effect (i.e. Odc1, Dgat2, CG12822, CG12159, and Obp44a, Additional file 9). The patterns of expression across tissues for this set of genes is shown in Figure 8. While our expression results don’t allow us to narrow to a single candidate, they do significantly reduce the list and provide detailed expression data for the possible causal genes.