Revisiting regulatory decoherence: Accounting for temporal bias in a co-expression analysis reveals novel candidates controlling environmental response

Transcriptional Regulatory Networks (TRNs) orchestrate the timing, magnitude, and rate 2 of organismal response to many environmental perturbations. Regulatory interactions 3 in TRNs are dynamic but exploiting temporal variation to understand gene regulation 4 requires a careful appreciation of both molecular biology and confounders in statistical 5 analysis. Seeking to exploit the abundance of RNASequencing data now available, many 6 past studies have relied upon population-level statistics from cross-sectional studies, 7 estimating gene co-expression interactions to capture transient changes of regulatory 8 activity. We show that population-level co-expression exhibits biases when capturing 9 transient changes of regulatory activity in rice plants responding to elevated temperature. 10 An apparent cause of this bias is regulatory saturation, the observation that detectable co- 11 variance between a regulator and its target may be low as their transcript abundances are 12 induced. This phenomenon appears to be particularly acute for rapid onset environmental 13 stressors. However, exploiting temporal correlations appears to be a reliable means to 14 detect transient regulatory activity following rapid onset environmental perturbations 15 such as temperature stress. Such temporal correlation may lose information along a more 16 gradual-onset stressor (e.g., dehydration). We here show that rice plants exposed to a 17 dehydration stress exhibit temporal structure of coexpression in their response that can 18 not be unveiled by temporal correlation alone. Collectively, our results point to the need 19 to account for the nuances of molecular interactions and the possibly confounding e ﬀ ects 20 that these can introduce into conventional approaches to study transcriptome datasets. 21


22
Living organisms evolve to maximize their individual performance under dynamically 23 changing environments. Transcriptional regulation plays a fundamental role in the behavior 24 of cells responding to external and internal environmental cues, and is aberrant in many 25 diseases (Lee and Young 2013). Networks comprised of directed links between pairs of 26 genes are termed "Transcriptional regulatory networks" (TRNs), and are often used by 27 organisms to coordinate cellular, physiological, and developmental response to the varying 28 environment. TRNs orchestrate the timing and rate of genome-wide gene expression 29 2 plastic responses (Gibson 2008) and, as such, a long-standing goal in molecular biology 30 is to understand functional linkages between regulatory architectures in TRNs and the 31 dynamic behavior of organisms (Smith 1990). Transcription factors (TFs) are key nodes 32 in TRNs that regulate the expression of other genes (Buchanan et al. 2010), coordinating 33 the entire transcriptional program. Accordingly, TFs remain appealing therapeutic and 34 engineering targets for disease (Neef et al. 2011, Spampanato et al. 2013, Neef et al. 2010, 35 Cuadrado et al. 2018) and stress tolerance for crop production (Wang et al. 2016, Lan 36 Thi Hoang et al. 2017). While decades of research provide insight into the basic mechanics 37 of transcription, relatively little is known about how TFs function collectively in intricate 38 regulatory networks in multicellular eukaryotes to achieve complex biological outputs 39 in response to dynamic environments. As large-scale high-throughput -omic data are 40 now readily generated for many ecologically and economically important species (Chen 41 et al. 2020, Müller et al. 2017, Rastogi et al. 2018, Wilkins et al. 2016, 42 data-driven methods are needed for discovering key responsive TFs and for understanding 43 how such TFs drive dynamic responses to the environment. 44 Many regulatory interactions in TRNs are context-dependent (Dunlop et al. 2008, Lus-45 combe et al. 2004. Environmental cues can affect the behavior of regulators (e.g., by 46 changing their abundance or their binding affinity to target DNA sequences), and thereby 47 change the transcriptional output and regulatory interactions with other genes. For in-48 stance, an interaction can be inactive simply because the concentration of the regulator is 49 outside its effective range for the target (Dunlop et al. 2008). Notably, even if a regulatory 50 interaction is activated, its regulatory activity can be low as the dose-response curve may 51 be under a saturated regime in which additional units of the regulator do not result in 52 changed activity of its target(s). Alternatively, interactions may be inactive as a result of 53 the chromatin state of target genes, the post-translational modification of the regulator 54 itself, the presence of inhibitory factors, or the absence of co-factors (Toledo and Wahl 55 2006, Piggot and Hilbert 2004). 56 Identifying transient changes in regulatory activities (i.e., detecting responsive regulatory 57 interactions) upon environmental perturbation is critical to understand how genes and 58 proteins modulate cellular and organismal responses to variable environments. One 59 approach to directly estimate the strength of regulatory interactions is to use ChIP-seq. 60 However, genome-wide ChIP-seq for hundreds of transcription factors across multiple 61 3 conditions is costly, technically challenging, and therefore currently inaccessible in most 62 organisms. Few dynamic transcription factor binding studies have been undertaken, with 63 many of these focused on a small set of TFs or only assaying physical binding under a 64 specific environmental condition (Chang et al. 2013, Ni et al. 2009, Garber et al. 2012, 65 Zinzen et al. 2009). Numerous computational studies have also addressed the dynamic 66 nature of TRNs, e.g., by exploiting static network priors and gene expression profiles to 67 identify subnetworks activated by environmental perturbation (Luscombe et al. 2004, Scott 68 et al. 2005, Ernst et al. 2007). Gene co-expression analysis is another widely used tool to 69 identify regulatory interactions that are activated/deactivated following perturbations.

70
For example, differential co-expression analysis has been used to identify gene function 71 underlying differences between healthy and disease samples (Amar et al. 2013, Hu et al. 72 2009, Kostka and Spang 2004, Hudson et al. 2009, Fiannaca et al. 2015, Bhar et al. 2013, 73 Gao et al. 2016, between species (Ferrari et al. 2018, Gao et al. 2012 2015, Ruprecht et al. 2017a,b), or under different environmental conditions (Yan et al. 75 2019, Lea et al. 2019). The most frequently used programs for co-expression analysis 76 are WGCNA (Langfelder and Horvath 2008), DICER (Amar et al. 2013) and DiffCoEx 77 (Tesson et al. 2010); often, however, the ease of performing analyses on these platforms 78 obscures the assumptions they make about regulatory interactions. Because the ultimate 79 aim in deciphering complex biological processes is the discovery of the causal genes and 80 regulatory mechanisms controlling biological processes, it is critical to understand possible 81 biases and confounding factors during co-expression analysis.

82
Here, we exploit the temporal component of gene co-expression to characterize the dynamic 83 regulatory map and co-expression patterns in a static network prior. The static network 84 prior was previously derived from ATAC-seq and known TF-binding motifs (Wilkins 85 et al. 2016 can detect whether a certain regulatory interaction is induced or not, it is unable to 96 track the temporal evolution of the regulatory activity. By analyzing the co-expression 97 pattern upon elevated temperature and dehydration process, we found that for rapid 98 onset environmental stressors (e.g., heat shock), temporal correlation is more robust than 99 population correlation in revealing transient response of regulatory activity and identifying 100 responsive regulatory interactions. We thus provide a possible explanation for a seemingly 101 inconsistent result from a recent study, which demonstrated that internal and external 102 stressors can cause regulatory "decoherence" (lower correlation) (Lea et al. 2019). On 103 the other hand, for mild and gradual stressors (e.g., drying for plants), the temporal 104 correlation is relatively rough and incapable to unveil the full picture and complicated 105 dynamics during perturbations. Extensive efforts have been made to exploit the so-called 106 "fourth dimension" of response -time -to better understand the dynamics of TRNs 107 and to identify putative signaling pathways or key regulatory genes (Bechtold et al. 2016, 108 Yeung et al. 2018, Varala et al. 2018, Zander et al. 2020, Song et al. 2016 2017, Windram et al. 2012, Gargouri et al. 2015, Alvarez-Fernandez et al. 2020). Our work 110 reinforces the importance of temporal dynamics and reveal the signature of regulatory 111 saturation, a specific confounding factor which may lead to bias in reconstructing dynamic 112 regulatory maps, pointing to the need to account for the possibly confounding effects that 113 can introduce into conventional approaches to study transcriptome datasets. as the cutoff for the activation of regulatory interactions. We use the terms regulatory 125 coherence and decoherence to mean increasing or decreasing co-expression, respectively.

126
Coherence in our analyses is reflected by higher MCCs under heat or dehydration stress 127 conditions when compared to control samples, as imposed by Wilkins et al. (2016).  Our observation that co-expression increases with the onset of stress (regulatory coherence) 142 is seemingly inconsistent with a recent study which used gene expression profiles collected 143 from human monocytes exposed to a stress in vitro to calculate the differential population

164
Through a simple mathematical model, we illustrate how regulatory saturation may be a 165 confounding factor for identifying responsive links. We contrast the outcomes of population- by the ability of circuits to respond to input change but to return to the pre-stimulus 184 output level, even when the input change persists (Ma et al. 2009, Briat et al. 2016.  That is to say, whenever a low population coexpression corresponds to a low activity of 212 the link even if a link is activated and responsive towards the treatment. The low activity 213 of a regulatory interaction means additional regulators will not induce more expression 214 of its target gene. Therefore, under a saturated regime, either low or high regulator 215 abundance lead to inactive regulatory interaction and low regulatory activity. In the 216 following sections, we will leverage the temporal component of stress response by using 217 temporal correlations and population correlations over time.

219
lating stress responses 220 We next analysed dynamic transcriptional rewiring through temporal correlation. We 221 examined whether certain TF families affect the activation of regulatory interactions 222 (Fig. S10) and find that, as expected, many TFs with high differential mean MCC in the 223 heat-stress data set are annotated as Heat Shock transcription Factors (HSFs).

224
Inspecting the relationship between differential gene expression and the differential activity 225 reveals that several known HSFs do not independently show strong expression response to 226 the stressor but do, however, show a clear response according to the differential activity 227 calculated by the temporal correlation of a TF with its target genes. We also find several 228 interesting candidate TFs outside of the HSF family which have high differential regulatory 229 scores but little or no apparent differential expression in pairwise contrasts between control 230 and treatment conditions (Fig. 5A). In the heat data set, the TF OsTCP7 has a differential 231 regulatory score of 0.54 but was not identified as differentially expressed by Wilkins et TFs which have high differential regulatory score but low differential expression response.

244
One such gene is PIF-Like 12 (Nakamura et al. 2007) which, to our knowledge, has no 245 known role in dehydration response but is paralagous to OsPIL1 which integrates cues 246 from the circadian clock and dehydration signaling to control internode elongation in 247 rice (Todaka et al. 2012). Additional candidate genes with high regulatory scores under 248 elevated temperature or dehydration stress are shown in Table S1 and S2. We hypothesize 249 that the differential activity calculated by temporal correlation could be used to identify 250 novel stress-responsive regulators.

265
We first construct a network hierarchy in the TF-only subnetwork from the network prior 266 (Fig. 6A). Since the network has feedback loops (See Supporting Information), we used a 267 generalized bottom-up approach (Yu and Gerstein 2006). In essence, we define all TFs  Overall, we observed two regulatory waves for temporal activities (Fig. 6C), which were 293 not prominent in the temporal expression profile considered above (Fig. 6B). We speculate 294 that these two waves may represent distinct phases rice response to the sever drying 295 imposed, perhaps before and after turgor loss occurs (Buckley 2005). We note that Wilkins TFs representing the shortest path between these two TFs were identified in the TF-only 310 network prior (Fig. 6C).  (Fig. 3 and Fig. 4). On the other hand, temporal correlation 336 can fail when we aim to track real time regulatory activity (Fig. 4). Our results reinforce 337 the importance of temporal component in gene expression (Bar-Joseph et al. 2012). 338 Additionally, even with the temporal information in hand, one should analyze the data 339 according to its specific aim: detecting responsive links (Fig. 5) or tracking regulatory 340 activities (Fig. 6).

341
In the present study, we implemented a stochastic simulation of a simple regulatory model 342 under two perturbation regimes. We show that, under a cooperative binding mode, the what extent such type of saturated regulatory regime is pervasive.

373
Our observation that co-expression increases with the onset of two stresses in rice is 374 seemingly inconsistent with a recent study which used gene expression data collected 375 from human monocytes to infer population correlation among transcripts (Lea et al. 376 2019). Several other studies have likewise reported that environmental perturbation may 377 lead to declining co-expression (Southworth et al. 2009, Anglani et al. 2014. From a 378 quantitative genetic perspective, a commonly observed result is that phenotypic integration 379 in a population (i.e., the number of significant correlations among traits) increases with 380 environmental stress (Waitt and Levin 1993, Schlichting 1989, Gianoli 2004, García-381 Verdugo et al. 2009). However, the stress-induced decanalization theory (Gibson 2009)

456
To illustrate the potential bias in capturing changing regulatory activities by using 457 population level correlation, we implement simulation through a minimal activation model.

458
The rate of production of TF X and gene Y ( Fig. 2A) [Y ]= βy(    (Wang et al. 2004, Ohama et al. 2017. C. A schematic diagram depicts possible explanation of the temporal bias through regulatory saturation. The blue link is activated upon the perturbation (ground truth) by increasing the concentration of the regulator (an activator). However, if the dose-response curve is a sigmoid shape function, chances are the population correlation may not be able to detect such activation. C-E . The cross correlation function and population-level correlation between activator X and target Y under a saturated regime. The cross correlation function robustly reveals a peak in response to perturbations while the perturbation may lead to reduction of correlation when using population correlation over the time course. F-H . The cross correlation function and population-level correlation between activator X and target Y under a non-saturated regime. Under a non-saturated regime, both the population correlation and the temporal correlation can detect elevated level of coexpression. Colors represent three different types of external environmental conditions which lead to internal signaling (Steady state, press perturbation, and pulse perturbation). R(τ ) is the cross correlation function with τ indicating the time delay. Note that the perturbation is imposed at t =0in E and H. A. The average differential Max Correlation Coefficient (MCC) for each regulator in the network prior under heat condition. Violin plots show members of the HSF family TF and non-HSF family TF, respectively. B. The average differential MCC for each regulator in the network prior under drought condition. Known drought-related TFs were obtained from https://funricegenes.github.io/, where genes linked with keywords "drought", "ABA", and "drought tolerance" were extracted. The average differential MCC is calculated as the averaged MCC changed across conditions. The comparison of heat C. and drought D. differential expression (the number of time points showing differential expression from the original Wilkins et al. analysis) versus differential MCC. Salmon points denote the Heat Shock Family (HSF) regulators. The number of time points with differential expression is counted for each time point and each genotypes (Maximum number is 4 * 16 = 64). Negative numbers on the x-axis indicate number of time points in which the gene was observed to be downregulated as compared to control conditions. Coefficients of a TF with all its target genes. Two waves of TF activity can be observed; we thus clustered all the TF's activity within the hierarchy over time with the assistant of STEM (Ernst and Bar-Joseph 2006). Four distinct patterns are shown: 1) Continuously increasing D; 2). ; Two groups of TFs drive the first and second regulatory wave separately Insets. of C.. The shortest path in the C shows the regulatory cascade driving the second regulatory wave.