Network-based transcriptome analysis of two maize genotypes identified pathways associated with differences in salt tolerance

A better understanding of the molecular effects of salinity stress is key to improving salt tolerance in Zea mays . In 25 this study, we combined phenotyping with transcript profiling and network analysis to study genotype-specific 26 differences in salt tolerance in Zea mays . An extensive phenotypic screening identified two genotypes with an extreme phenotypic difference in tolerance 29 towards salt stress. RNA-seq analysis of the selected salt-tolerant (R9) and salt-sensitive (S46) genotype was 30 performed to unveil the molecular mechanism underlying the difference in salt tolerance. GO enrichment and 31 network analysis on the results of the expression analysis identified phosphorylation-dependent signaling 32 processes, ion transportation, oxidation-reduction, glutathione and tryptophan metabolism as the main processes 33 different between the selected tolerant and sensitive genotypes. Genes belonging to the subnetwork enriched for 34 phosphorylation and kinase activity shared a common regulatory element in their promoter region, which matched 35 the binding site of an Arabidopsis TF with known role in salt-stress response. Network-based transcriptome analysis of two maize genotypes identified pathways associated with differences in 38 genotype-specific salt tolerance and identified a link between transcriptional and posttranslational regulation of 39 salt tolerance.


Abstract 23
Background 24 A better understanding of the molecular effects of salinity stress is key to improving salt tolerance in Zea mays. In 25 this study, we combined phenotyping with transcript profiling and network analysis to study genotype-specific 26 differences in salt tolerance in Zea mays. Salt stress is one of the abiotic stresses that negatively affects crop growth and productivity. More than 47 six percent of the world's total land area is affected by excess salt [1]. Maize (Zea mays L.) is the third 48 most important world's cereal crop after wheat and rice and is used for food, feed and biofuel [2,3]. Like 49 most crop species, the majority of maize genotypes are moderately sensitive to salinity and their growth 50 and production are adversely affected by salt stress [2,4]. Excessive salt concentrations in the plant root 51 rhizosphere results in osmotic stress, ion imbalance, and oxidative stress [5][6][7]. Osmotic stress caused by 52 salt stress has similar effects as stress induced by water shortage and leads to water deficiency [8,9]. Ion 53 imbalance results in cytotoxicity due to the accumulation of ions such as sodium (Na + ) and chloride (Cl − ) 54 in plant cells [2,5,10]. An excess Na + and Cl − ions disrupts the uptake of other ions particularly, Ca 2+ 55 and K + which are essential for the catalytic activity of most enzymes [11,12]. Excess salt can also result 56 in oxidative damage due to the production of reactive oxygen species (ROS), the effect of which depends 57 on the intensity and duration of the stress and the growth stage of the plant [12][13][14][15]. To counter the 58 negative impact of salt stress, plants have developed avoidance and tolerance mechanisms. Avoidance is 59 a rapid reaction to prevent or delay the negative impact of salt stress [16]. Tolerance is achieved by a 60 rapid decrease in stomatal conductance, compartmentation of toxic ions into vacuoles, and accumulation 61 of compatible solutes such as proline, glycinebetaine, sugars, proteins, and polyols that result in ionic 62 and osmotic homeostasis [12][13][14][15]. Tolerance comes at the expense of a decreased photosynthetic rate and 63 metabolic capacity. Those aforementioned responses are regulated in plants by initiating fast and efficient 64 signaling reactions such as the abscisic acid (ABA)-dependent and independent signaling of which the 65 regulation can be lineage-specific [17][18][19][20]. Hence, because of the complexity of the salt stress phenotype, 66 developing genotypes with increased salt-tolerance requires a deeper understanding of the molecular 67 basis underlying the tolerance phenotype [1,13,21]. The well-known model plant, Arabidopsis, has been 68 instructive in furthering our understanding of salt tolerance mechanisms. However, as regulation 69 mechanisms vary among plant species, it is difficult to extrapolate results between species [22,23]. 70 Thanks to its cross-pollinated behavior, maize is highly polymorphic and provides genotypes with 71 different tolerance to salt stress. Exploiting this genotypic and phenotypic variation offers the opportunity 72 to study salt tolerance mechanisms in maize. 73 Transcriptome analysis reveals genes of which the expression levels are significantly altered 74 between conditions. Although useful to identify the biological processes that are activated or inhibited 75 under these conditions, the mechanisms that led to these changes often remained unexplained [24,25]. 76 In addition, transcriptome analysis can result in both false positive and negatives: on the one hand 77 spuriously differentially expressed genes can be identified that do not contribute to the process of interest, 78 whereas on the other hand genes that are not being regulated at expression level or that exhibit too subtle 79 5 Na + and K + contents were measured 12 days after salt treatment in the stressed and control plants. 109 Under control conditions, the K + content of the two genotypes was comparable but sharply decreased in 110 plants of the sensitive genotype under salt treatment (Fig 1). Unlike the K + , the Na + content did not show 111 significant difference between two genotypes. This indicates that under salt stress the tolerant genotype 112 was effectively taking up K + to keep the K + /Na + ratio in balance while this was not the case for the 113 sensitive genotype. Morphologically, interfering with K + uptake is known to lead to disturbances in 114 stomatal modulations and causes water loss and necrosis [34]. In our study, neither the resistant nor the 115 sensitive genotype showed excess accumulation of Na + in the leaves. Hence, an inefficiency of K + uptake 116 led to imbalance in K + /Na + in plants of the sensitive genotype ( Figure 1). This result suggests that plants 117 of the tolerant genotype might possess an efficient regulation of K + up-take that allows maintaining the 118 K + and Na + homeostasis under salt stress. 119 120

Transcriptome analysis 121
To explore the molecular differences between the salt-tolerant (R9) and sensitive (S46) genotypes that 122 underlie their difference in salt response, expression profiles of plants belonging to both genotypes were 123 compared under normal and salinity stress using two biological replicates (each replicate represents 124 pooled samples collected at two different time points post the application of the salinity stress). Data 125 were preprocessed as explained in materials and methods. We were interested in identifying genes that 126 showed a significantly different response to salt stress between the R9 and S46 genotypes. This requires 127 identifying all genes that display an altered expression under salt stress versus normal conditions and for 128 which the response is different between two genotypes. Genes that display an altered expression under 129 salt stress versus normal conditions, but that show the same response between the two genotypes were 130 removed as for those genes it is difficult to distinguish whether their altered expression under salt 131 conditions is caused by sub-optimal growth under salt stress or by the genotype-specific triggering of 132 salt-tolerance mechanisms (Fig. 2). In total, our candidate gene list contained 113 genes potentially 133 involved in the genotype-specific response to salt stress (fold change > 1.5, FDR < 0.05) (Additional file 134 1 and 2). This list contains genes that display an altered expression under salt stress versus normal 135 conditions and for which the alteration in expression is different between two genotypes. 136 GO analysis of the gene list showed enrichment in 'response to endoplasmic reticulum (ER) 137 stress', 'systematic acquired resistance', 'tryptophan metabolism', 'dicarboxylic acid metabolism' and 138 'protein retention in Golgi apparatus' (Fig 3). Some of these identified pathways can be related to salinity 139 stress. However only 12 genes from our candidate list belonged to the enriched functions although 111 140 genes could be mapped to at least one GO biological process term. This indicates that a substantial part 141 of the salt tolerance mechanism remained unexplained by the enrichment analysis. 142 143

Network-based analysis 144
To better explain the mechanisms of salt tolerance reflected by the candidate genes, network-based 145 analysis was performed. We generated first a physical maize interaction network using known interaction 146 data including protein-DNA (regulatory), protein-protein, and metabolic interactions (KEGG). The 147 network scaffold was complemented with additional functional edges derived from the large body of 148 publicly available expression data in maize. Expression derived edges were derived from a co-expression 149 network (Materials and methods). To avoid adding spurious interactions, only highly confident co-150 expression edges from the co-expression network were added. To assess the relevance of the selected 151 edges, we clustered the high confident co-expression network containing these high confident edges and 152 performed GO enrichment of the obtained clusters. The majority of the clusters showed high confidence 153 enrichment to at least one biological pathway (out of 67 overlapped clusters, 41 clusters showed high 154 enrichment for at least one biological process with p-value < 10e-5), supporting the relevance of the 155 added edges. 156 Prior to performing network-based analysis, the obtained integrated network was converted into 157 a weighted network using a topology based weighting scheme (Materials and Methods). This weighting 158 scheme aims at weighting the edges based on their global connectivity in the network, and reduces the 159 impact of hubs on the network-analysis: hubs with many neighboring genes risk connecting candidate 160 genes in connected components that do not have direct biological links. To perform network analysis, the 161 candidate genes were mapped on the interaction network (Fig 2). Subsequently, subnetworks were 162 identified that connect as many as possible candidate genes using the least number of edges [27,31]. 163 Those sub-networks are proxies of pathways that contribute to genotype-specific expression differences. 164 Six subnetworks of different size connecting the candidate genes were identified (Fig 4). The first 165 subnetwork (Subnetwork1) contains most genes and displays the highest connectivity among its nodes. 166 Subnetwork 1 and 2 consist mostly of protein interactions whereas subnetworks 3-5 contain metabolic 167 interactions only. This indicates that each molecular layer in the network contributes different 168 information and little redundancy exists between interaction networks at different molecular layers, as is 169 observed in many organisms [35][36][37]. Relatively few co-expression edges have been used to connect the 170 candidate genes on the interaction network, indicating that adding them did not result in over-connecting 171 the network. No regulatory edges were used to connect the genes from the candidate list, indicating that 172 regulatory edges are quite sparse and understudied in the used interaction network. 173 Subnetwork1 is highly enriched for "phosphorylation" related processes (FDR: 6.6e-6) with 174 13 genes annotated to those processes. As expected, some of the genes in our selected subnetwork 175 annotated as phosphorylation-related were not in our candidate gene list (Zm00001d049727, 176 Zm00001d048054, Zm00001d010234, Zm00001d020355, Zm00001d043562, Zm00001d005135, 177 Zm00001d017525). These genes were not significantly differentially expressed themselves but are 178 recovered as 'connector' genes because of their high connectivity to the differentially expressed candidate 179 genes (Fig 4, subnetwork1, gray nodes). Zm00001d020355 and Zm00001d017525 are involved in the 180 "stress-activated protein kinase signaling" pathway (GO:0031098). Given these enrichments and the fact 181 that this subnetwork 1 is biased towards protein-protein interaction, differences in salt-tolerance 182 mechanism between B9 and S46 seem mostly related to protein mediated post-translational protein 183 modifications and signaling (Fig 5). 184 The second-largest subnetwork (subnetwork 2) is highly enriched for 'oxidation-reduction' related 185 processes (GO:0055114) ( Transporters with known roles in ion homeostasis were identified in subnetwork 1 and 3. Among 192 those were genes with GTPase activity (Zm00001d039091, Zm00001d011474, and Zm00001d039090). 193 Genes with GTPase activity have shown to be essential during ion homeostasis, particularly for the 194 maintenance of Na + and K + homeostasis under salt stress [46]. For example, the overexpression of one 195 of the GTPase activity gene, PtRabE1b, conferred salt tolerance in poplar [47]. GTPase and ATPase 196 activity genes are also known to interact with genes involved in salt tolerance [48,49]. In addition,197 GTPase activity was shown to be required for the reorganization of microtubules, a key response 198 mechanism during salt stress in plants [20]. 199 200

Motif detection 201
We identified 5 sub-networks containing candidate genes. These subnetworks are likely to reflect 202 processes involved in salt tolerance. Given that the genes in these subnetworks are triggered by salt stress 203 at the level of transcription, we assumed that they might be under control of the same transcriptional 204 program that acts under salt stress. Given the large underrepresentation of transcriptional interactions in 205 our network, we were not able to unveil this regulatory program through network analysis. Here we use 206 as alternative a de novo approach based on motif analysis to recover the missing regulatory program. We 207 hereby focused on the largest subnetwork 1, which contains genes enriched for phosphorylation and 208 kinase activity. Assuming that the genes of this subnetwork were regulated by the same TF, we searched 209 for a statistically overrepresented motif in their promoter region. As de novo motif detection is very 210 sensitive to the presence of false positives, we excluded the genes that are only marginally connected in 211 the subnetwork (genes with at most one edge connectivity to any other gene in the network). 212 The 1kb upstream of the transcription start site (TSS) was selected to search for regulatory 213 elements. When the gene was located on the negative strand, the reverse complement of the sequence 214 was considered. We found one motif overrepresented in the selected genes of subnetwork 1. This motif 215 occurred in 17 out of 21 sequences and showed a positional bias towards the TSS in most of the sequences 216 ( Fig 6). The identified motif turned out to be highly similar to a known Arabidopsis motif from the DAP- In this study, we selected two genotypes with an extremely different phenotypic behavior towards salt 224 stress and subjected them to expression analysis. We identified genotype-specific differences in salt 225 induced gene expression. The genes that were salt induced in a genotype-specific way were functionally 226 characterized by complementing a GO enrichment with network analysis [27,53]. Hereto we used an 227 integrated network as a backbone to search for subnetworks that most optimally connect the differentially 228 expressed genes of our candidate gene list. These subnetworks reflect the pathways of relevance triggered 229 by the studied process. The advantage of this approach is that the backbone network used to drive the 230 analysis contains next to well-annotated connections between genes also less well documented edges, 231 derived from the large body of publicly available expression data on Zea Mays. As such also the less 232 characterized differentially expressed genes (i.e. genes without GO annotation) can be assigned to a 233 common process, here reflected by a subnetwork. Indeed we observed that trough network analysis 234 considerably more genes of the candidate gene list could be explained than by performing a GO 235 enrichment analysis (of the total 113 prioritized candidate genes, GO enrichment and network analysis 236 could explain 12 and 38 genes, respectively). In addition to these 38 recovered candidate genes, network 237 analysis also identified 28 connector genes. These genes were not identified as differentially expressed 238 under salt stress, but belong to the same network neighborhood as the differentially expressed candidate 239 genes, indicating that they also might play a role in the salt tolerance mechanism. 240 Cellular potassium homeostasis is known to be one of the main contributors to salt tolerance [54]. 241 Salt stress activates the salt-overly-sensitive (SOS) system to maintain ion homeostasis [55]. The SOS 242 system perceives salt stress through the rise in free cytosolic calcium [41]. Those calcium signals activate 243 kinases and subsequent phosphorylation cascades which result in regulating gene expression in response 244 to salt stress. Subnetwork 1 is highly enriched for phosphorylation related processes. This process was 245 not identified by performing GO analysis directly on the candidate gene list: phosphorylation related 246 processes are likely to be regulated by post-transcriptionally rather than by transcriptional mechanisms. 247 Hence, they cannot be identified through differential expression analysis only. However, by combining 248 expression with network analysis genes belonging to these processes can be recovered as connector 249 genes. This was indeed the case: subnetwork 1 contains next to genes that are differentially expressed 250 under salt stress (Zm00001d051915, Zm00001d018799, Zm00001d029258, Zm00001d011573, 251 Zm00001d019411, Zm00001d022166) and that were annotated as phosphorylation-related also several 252 connector genes associated to phosphorylation-related processes (Zm00001d049727, Zm00001d048054, 253 Zm00001d010234, Zm00001d020355, Zm00001d043562, Zm00001d005135, Zm00001d017525) (Fig  254   4). This result indicates that the activated phosphorylation cascades contribute to the maintenance of 255 potassium homeostasis in the tolerant genotype (R9) which is in line with the phenotypic results (Fig 1). 256 Next to the subnetwork enriched for phosphorylation, network analysis identified several sub-257 networks enriched for pathways and processes with known relevance to salt stress (ion transportation, 258 oxidation-reduction, glutathione and tryptophan metabolism) [5,13,21]. The enrichment of processes 259  We are interested in identifying genotype-specific expression changes that can explain the 324 difference in phenotypic behavior between the sensitive and tolerant lines. These genes are referred to as 325 candidate genes. To select these genes an ANCOVA model was used: gene expression is modeled as the 326 response variable, 'salt treatment' and 'genotype' as the explanatory variables. The genes of interest are 327 modeled as the ones having an interaction effect. By only considering the interaction term in the model, 328 genes that are affected to the same degree (fold) by salinity stress in both lines are removed because they 329 cannot be associated with genotype specific differences in response to salinity stress. 330 Therefore, for each gene (g), its expression (yig) was modeled by negative binomial (NB) 331 distribution using salt treatment (S) and genotype (G) as explaining variables as follows. Where βS:G corresponds to the interaction effect of the explaining variables in the design matrix. The 346 design and contrasting matrices in edgeR were set based on the formula mentioned above. 347 Gene ontology was downloaded from maizegdb.org. GO enrichment was performed using 348 ClusterProfiler [65] and GO enrichment graphs were created using R and Plaza [66]. 349

Constructing the interaction network 350
A high confident protein interaction network for maize was downloaded from the PPIM database [67]. 351 Since the PPIM database uses the old version of gene ids (B73-v3 reference genome), the ids were 352 mapped to the B73-v4 reference genome using the v3 to v4 mapping ids provided by maizegdb.org. This In addition, genes that were not expressed in at least 10% of the samples were excluded from the dataset 399 prior to calculating the correlation in order to avoid adding spurious relations for lowly expressed genes. 400 To assess the quality of the coexpression network, we searched for clusters of connected genes using 401 ClusterOne with default parameters. ClusterOne allows for overlap between clusters and only returns 402 highly connected components [75]. GO enrichment was performed on the clusters obtained by 403 ClusterOne. From the coexpression network, 66383 functional interactions were extracted and added to 404 the aforementioned interaction network. This resulted in a final interaction network consisting of 269731 405 unique edges between 21236 genes (Additional file 3). 406

Weighting the network and performing network analysis 407
In a probabilistic network analysis, weights on the edges between nodes are used to drive the search for 408 subnetworks. To design a weighting scheme we performed a Random walk with Restart (RWR) (restart 409 parameter = 0.05) [76,77]. The RWR uses as input the degree normalized adjacency matrix of the 410 interaction network. The RWR performs a topology based weighting reducing the impact of hubs. 411 However, RWR produces relatively small values for most genes whereas genes that are marginally 412 connected in the network get a relatively high weight. As we do not want to bias the network search too 413 much towards the marginally connected genes, we rescaled the RWR obtained values using the following 414 heuristic formula. 415 Where "s" is the source node and "t" the target node. This formula transforms the minimal weigh to 0.3 417 ensuring that all edges are considered during network analysis but that edges between highly connected 418 genes (higher RWR value) remain higher in weight. 419 Extracting subnetworks from the weighted interaction network was performed using 'Phenetic', a 420 probabilistic pathfinding approach [27,31] with following parameters: mode: downstream; min cost: 0.1; 421 max cost: 5; step size: log scale between max and min cost with 28 steps; Path-length=4; k-best paths: 422 20. For each edge cost, the highest scoring subnetwork was selected and a stability score was computed 423 for all the subnetworks with this edge cost. For each cost, the subnetwork is rejected if either it has a low 424 stability score (minimal stability score is 0.5) or is too large (max size is 80). The final subnetwork was 425 extracted as a combination of all these "best networks" that passed the tests. 426 De novo motif detection was performed using MEME [78]. The promoter regions were defined as 1kb 427 upstream of TSS and first-order model of sequences was used as background. The promoter sequences 428 were searched for motifs with a width ranging from 6 to 15 using the sequence of the lead (+) strand 429 only.

Consent for publication 552
Not applicable. 553

Competing interests 554
The authors declare that they have no competing interests. 555