RNA-sequencing and mathematical modeling identify suite of light- 1 sensitive circadian genes in an orb-web weaving spider

Consistent with our model, we found a cluster of transcripts with the flipped pattern of 38 expression between the two collection times, dependent on the application of light. 39 Conclusions Our DE transcripts represent the first genetic evidence for circadian output in spiders. 41 Furthermore, those transcripts with a flipped pattern of expression represent prime candidates for 42 light-sensitive circadian genes, which may be involved in entraining the circadian clock to light. 43 Functions of these genes varied from growth and development to reproduction to gene 44 regulation, consistent with other circadian systems. 45 46

cycle even in constant darkness, with an intrinsic period called the free-running period (FRP). 23 Spiders have unusual FRPs, with some species having extremely short FRP (e.g. 18 hours for 24 trashline orb weaver), and many having highly variable FRPs (intraspecific variation of up to 10 25 hours). In the absence of any genetic model of circadian rhythms in spiders, we developed a 26 mathematical model to optimize experimental conditions for identifying circadian genes that also 27 respond to light cues. 28 Results 29 Our mathematical model involved a single gene that encodes a protein that inhibits its own 30 transcription. In our model, light degrades the circadian transcript, which allows a broad range of 31 FRPs to be entrained to a 24-hour day. Our model predicted that exposing spiders to a pulse of 32 light in the middle of the night would cause a pattern of expression between two later time points 33 that was opposite the pattern exhibited by spiders who did not receive a pulse of light. RNA-34 sequencing of four groups of adult female orb weaving spiders, Metazygia wittfeldae, under 35 these experimental conditions resulted in 528 significantly differentially expressed (DE) 36 transcripts between the two collection times or between the light pulse and no light pulse. 37 (Araneidae), which has a mean FRP of 18.7 hours in DD [10]. This is the shortest known 83 naturally occurring FRP, even shorter than ones caused by lab-induced mutations, such as the 84 hamster tau mutant (20 hours [11]) and the fruit fly per S mutant (19 hours [12]). Other species of 85 spiders have been found to have extreme variation in FRP among individuals. For example, the 86 ranges of FRPs exhibited by three cobweb weaving spider species (Theridiidae) were 19-23.5 h 87 in the common house spider P. tepidariorum (mean FRP = 21.7 h), 20-29 h in the subsocial 88 spider A. studiosus (mean FRP = 23.1), and 20-30.1 h (mean FRP = 24.5) in the southern black 89 widow Latrodectus mactans [13]. Even the orb weaver Metazygia wittfeldae (Araneidae), which 90 possesses an almost "typical" FRP (mean = 22.7 h) varied from 19 -24 hours [14]. The three 91 theridiid and two araneid species discussed here have FRPs that vary an order of magnitude more 92 than those of most animals examined thus far [13]. 93 The high levels of FRP variance within each of these spider species sharply contrast with the 94 low levels of variance in other organisms, where it is suggested that the endogenous period of the 95 circadian clock is under tight genetic control [11,[15][16][17][18]. The combination of high variation in 96 FRP and divergence of FRP from 24 hours suggests that spiders have evolved novel elements of 97 circadian control. Possible new mechanisms include a highly robust circadian response to light 98 stimuli, which allows rapid entrainment, or a weakly oscillating system that could be easily 99 perturbed by environmental stimuli, among others [13]. 100 Thus far, the genetic control of circadian rhythms in spiders has not been investigated. To 101 identify genes potentially involved in circadian entrainment in spiders, we subjected the spider 102 species, M. wittefeldae, to two experimental conditions. In brief, we compared gene expression 103 between spiders kept in constant darkness to those exposed to a pulse of light. We used a 104 mathematical model to predict two time points with the greatest difference in gene expression 105 levels between the constant dark group and the group exposed to a light pulse. Our simulations 106 further predicted that the direction of differential expression would switch between the two time 107 points. RNA-sequencing of four experimental groups of spiders identified a cluster of transcripts 108 that fit the prediction of a switched pattern of gene expression. 109 110 Results 111

Mathematical model predicts optimal experimental protocol 112
To optimize the search for circadian genes, we utilized a simplified mathematical model. 113 Most of the circadian clocks studied so far are based on a negative feedback loop, where a group 114 of circadian genes inhibit their own expression [2]. We therefore assumed that the circadian 115 clock in spiders has at its core at least one circadian gene (gene X). Translation of the circadian 116 gene's mRNA (mRNA X) induces production of the circadian protein (protein X), which is 117 transferred back to the nucleus and inhibits its own transcription ( Fig 1A). We simulated light in 118 the model by a rapid degradation of mRNA X. The model reproduces a wide range of free-119 running periods in the dark as well as entrainment to LD 12:12. 120 Our experimental goal was to identify the light-sensitive subset of circadian genes. To 121 achieve this, we first entrained our model to LD 12:12 and then simulated a brief (1 hour) pulse 122 of light in progressively increasing time intervals after the end of the last light phase (pulse 123 group), while keeping an identical model in constant darkness (no pulse group) (Fig 1B). Based 124 on these simulations, we predicted that applying the light pulse 5 hours after the last light phase 125 (experimental time (et) 5et 6) would produce the greatest change in magnitude of gene 126 expression of the model's circadian mRNA X (Fig 1B). To further improve the signal to noise 127 ratio, we also used the model to identify collection times that would provide the largest 128 difference between the pulse and no pulse groups. By comparing the ratio of mRNA X with and 129 without pulse at every hour after the light, we determined that collection times 7 and 16 hours 130 after the last light phase (et7 and et16), produces the largest difference between pulse and no 131 pulse conditions (Fig 1C). Model simulation of 36 hours of LD 12:12 condition followed by DD conditions with (top) and 136 without (bottom) application of a brief (1 hr.) light pulse. The sampling times (red arrows in top 137 graph, blue arrows in bottom graph) were selected to optimize the differences between mRNA 138 levels with and without the light pulse. (C) Experimental protocol following the optimal 139 collection times predicted to yield the greatest differences in mRNA levels between the two 140 experimental conditions. 141 142 generates a positive log FC (et7/et16) ( Fig 2D). Therefore, our model predicts that differential 166 expression of circadian genes will be reversed by both pulse and collection time conditions. 167 We used our model to optimize the experimental timeline for spider entrainment, application of 181 the light pulse, and tissue collection from the cephalothoraxes of adult female M. wittfeldae. 182 These experimental spiders included 10 in the pulse and 10 in the no pulse groups. The pulse 183 group received a 1-hour light pulse five hours (et5-6) after lights-off at the end of the 5 th day of 184 LD 12:12 but no light thereafter. The no pulse group received no light after the 5 th day of LD 185 12:12. In each group, 5 spiders were sacrificed in the dark 7 hours (et7) and 16 hours (et 16) after 186 the last LD 12:12 lights-off transition (see Figure 1C). 187 We de novo assembled 266,300 transcripts from the 20 individual RNA-seq libraries, 188 representing 184,077 Trinity-defined "genes". Annotation of the "genes" resulted in 9 categories 189 of sequences, 4 of which were unlikely to encode proteins and were thus removed prior to further 190 analyses (

199
We conducted four separate differential expression analyses at the "gene" level (see Methods 200 and Additional File 1: Table S1 Figure S1). The pulse of 204 light also resulted in many more differentially expressed genes 1-hour after the light pulse than 205 10 hours later (282 vs. 57, Table 2, Figure 3). Consistent with a large effect of light on gene 206 expression, more genes were differentially expressed between the two collection times in the 207 group that received a light pulse than the one that did not (227 vs. 95, Table 2, Figure 3). The 208 light pulse caused many more genes to be downregulated (70%) than upregulated (30%) relative 209 to no light pulse, when collected at et7 (Additional File 2: Figure S2); however, 10 hours later, 210 approximately equal number of DE genes were upregulated and downregulated (Table 2 and  211 Additional File 2: Figure S2). These findings are consistent with the prediction of our model that 212 light would degrade circadian transcripts, but that expression would subsequently increase over 213 time ( Figure 1b). 214 215

216
Despite a large effect of the 1-hour light pulse on differential expression patterns (Table 2) We also investigated the expression patterns of 7 genes known to control the circadian clock in 234 the fruit fly and other arthropods. We identified orthologs of each of these 7 canonical clock 235 genes (tim, per, cyc, Clk, dbt, cry1, cry2) in our M. wittfeldae transcriptome using a phylogenetic 236 approach (see Methods for details). Of the 7, only cry2 was found to be significantly 237 differentially expressed (FDR < 0.01). Specifically, cry2 was more highly expressed at the  Table S3). 282 When comparing the two collection times, the pulse condition yielded the largest pool of 283 enriched GO terms ( Figure 5C). Our analysis indicates that a wide range of biological processes 284 were affected. Notably, response to stress and cell death were among the enriched GO terms. 285 Also, GO terms related to proliferations were enriched, such as growth, embryo development, 286 and cell differentiation, among others. In addition, enriched GO-terms in this category included 287 some of the regulatory processes such as translation, secondary metabolic processes, etc. Those 288 processes were not localized, but widely distributed within and outside the cell, since cellular 289 component enriched terms included cell, organelle, nuclear chromosome and extracellular space. 290 However, only three molecular functions were enriched: GTPase activity, oxidoreductase 291 activity, unfolded protein binding (Additional File 4: Table S3). 292 Finally, when comparing the two collection times, the no pulse condition yielded only a few 293 enriched terms ( Figure 5D). Biological processes included carbohydrate metabolic processes, 294 cell population proliferation, and aging. However, this effect was likely localized to extracellular 295 space because it was the only enriched group in the CC category. In the molecular function 296 category, terms involved ion binding, enzyme regulator activity, and hydrolase activity acting on 297 glycosyl bonds were enriched (Additional File 4: Table S3). Similarly, our model predicts that circadian genes should be downregulated at et7 compared 332 to et16 in the pulse condition ( Figure 2B, left). Circadian gene expression is expected to be 333 flipped in the no pulse condition ( Figure 2B, right). As a result, we expect a negative log FC with 334 a pulse and a positive log FC with no pulse ( Figure 2D). In our coordinate system that plots log 335  Table  341 S4). Therefore, we hypothesize that the transcripts in the third quadrant of figure 5B are involved 342 in circadian response to a pulse of light. 343 Our proposed light-sensitive circadian genes (upper left quadrants of Figure 6A&B  To our knowledge, this study is the first to investigate genetic mechanisms of the circadian 368 clock in spiders. By comparing differential gene expression between two time points under two 369 light conditions, we were able to identify potential cycling genes, as well as candidate genes for 370 circadian light entrainment. 371 The total number of DE transcripts between our two collection times (227 for pulse, 95 for 372 no pulse, 2 overlap, Fig. 3) is consistent with previously reported sets of time-dependent D. were not differentially expressed between our collection times or between the light pulse and no 384 pulse conditions. The lack of overlap in differentially expressed genes between D. melanogaster 385 and M. wittfeldae suggests that the circadian mechanism, or at least the circadian output, in 386 spiders differs significantly from the fruit fly. The idea of distinct set of circadian genes in 387 spiders was also supported by the finding that a large fraction of DE genes in our data set did not 388 have homologs outside of Araneae (75% of the 385 genes DE between time points). 389 Our comparison of two time points in constant darkness is most similar to the D. Another study measured response of the Arabidopsis transcriptome to 1-hour light pulses 438 given either in the middle of the subjective day or in the middle of the subjective night [33]. This 439 study identified a total of 2,237 light-sensitive genes, which is much higher than in our study. A 440 light stimulus during the night preferentially promoted the expression of certain key clock 441 components, consistent with the general observation that light present at the beginning or end of 442 the photoperiod adjusts the circadian clock to seasonal changes in day length. 443 We were unable to find gene expression studies exposing any arthropod species to a short 444 pulse of light. The most comparable arthropod experiments we could find applied weak light at 445 night. For example, fireflies exposed to weak light at night for 2 weeks experienced much higher 446 mortality than fireflies not exposed to light at night [34]. Furthermore, weak light at night 447 resulted in 1262 down regulated and 105 upregulated genes [34]. We similarly found more 448 downregulated genes in the short-term response to a light pulse in M. wittfeldae. Enriched GO 449 terms in the firefly DE genes included functions related to immune response, development, 450 reproduction, and biological rhythm. Some of the immune response and development terms 451 reflect GO terms we found with response to light. The DE genes related to biological rhythm 452 were not the "main" clock genes but ones that interact with the "main" clockso it is perhaps 453 not surprising that we did not identify an effect of light on canonical clock genes. 454 455 Transcripts involved in the circadian response to light relate to immune function, growth, 456

development, and reproduction 457
Our most striking finding is the cluster of transcripts with switched patterns of differential 458 expression over time, dependent on light pulse, as predicted by our mathematical model (Fig. 6). 459 These transcripts represent our best candidates for light-sensitive circadian genes. These 460 transcripts had multiple functions and were distributed throughout the cell (Table 3). GO-terms 461 involved in immune response, growth, development, and reproduction were enriched in this 462 candidate set of light-sensitive circadian genes (Table 3). It has been shown in D. melanogaster 463 that the immune response is under circadian control [35,36]. It is also has been found that 464 exposure to night light leads to reductions in immune function in vertebrates and invertebrates 465 [37,38]. 466 We also found enriched GO-terms for growth, development, and reproduction in our 467 candidate light-sensitive circadian genes, such as "cell morphogenesis", "embryo development", 468 "cell differentiation", and "cell division" (Table 3) To our knowledge, this work is the first to identify circadian and light-responsive transcripts in 481 spiders. We used a mathematical model of the circadian clock in spiders to design a light 482 application experiment, and the subsequent collection times to optimize differences in gene 483 expression. By comparing the levels of expression at two time points after an application of light, 484 or not, we were able to identify distinct clusters of transcripts responding to time of day and 485 light. These transcripts are our best candidate set for light-sensitive circadian genes. Many of 486 these transcripts likely encode outputs of an entrained circadian clock, including immune 487 response, reproduction, and development. However, it also is possible that some of these 488 transcripts encode central components of the clock that are critical to entrainment. Given the 489 unusual characteristics of spider free-running periods, and the fact that many of our candidate protein, which changes its structure, enabling it to enter the nucleus. As illustrated in Figure 1(a), 499 the mRNA, produced from a gene through the process of transcription, is positively coupled to 500 the cytosolic protein through the process of translation. After the cytosolic protein changes its 501 structure and re-enters the nucleus, the resulting nuclear protein acts to suppress expression of 502 the gene coding for the mRNA, representing a negative coupling. Natural degradation of the 503 mRNA sequence and the cytosolic protein is also included in the model. 504 The coupled system of equations governing the circadian oscillator is as follows: 505 We used the model to predict the time to apply a 1-hour light pulse that would maximize the 521 difference in gene expression levels between spiders receiving the light pulse and those that did 522 not. We also selected two collection points to optimize the ratio of mRNA at two collection 523 points for each experimental group. The final experimental protocol is shown in Figure 1    no BLAST alignments, was chosen to represent that gene. To further reduce redundancy and 572 identify fragments of genes, the best matching gene of each protein in the curated silk gene 573 database plus the 5 species proteomes were also identified. These reciprocal best matches were 574 always retained for downstream analyses (BEST, Table 1). If a Trinity gene was not the best 575 match of a database protein it was considered "GOOD" if it aligned to >20% of the database 576 protein and exceeded 1 TPM (transcript per million) in at least one RNAseq library. Genes that 577 aligned to >20% of the database protein and did not meet the 1 TPM threshold were dubbed 578 "LOW_EXP" ( Table 1). Genes that aligned to <20% of the database proteins were dubbed 579 "LOW_COV" (Table 1). For the genes with no BLAST alignments to a proteome, the gene was 580 called "LONGORF" if the ORF exceeded 50 amino acids and had >1 TPM in at least one 581 RNAseq library ( Our initial BLAST results identified multiple transcripts with significant alignments to the 594 canonical circadian genes: Clk, cyc, tim, per, dbt, cry1, cry2. Since these genes are each part of 595 larger multi-paralog gene families, we used a phylogenetic approach to determine which 596 transcript was an ortholog of the canonical clock gene in D. melanogaster or Apis mellifera. We 597 first used Orthofinder v2. [59] to identify groups of homologous genes among 6 spider species, a 598 tick, and two insects. In addition to our M. wittfeldae transcriptome, we used the predicted 599 proteomes from published transcriptomes or genomes for the following species  The circadian genes (bold) were identified for D. melanogaster from FlyBase v2019_08. In cases where more than one M. wittfeldae transcript was in an orthogroup with a circadian gene, phylogenetic trees were examined for orthology (Additional File 2: Figures S4-S5). The ortholog is bolded.

Differential gene expression analysis 614
Differential gene expression among experimental groups was based on gene-level expression 615 estimates derived from transcript abundance estimates as recommended by [61]. In brief, the 616 expected read counts of each Trinity-assembled transcript were calculated by RSEM v.1.3.1 [20], 617 which takes into account read-mapping ambiguity due to multiple isoforms or even alleles 618 having been assembled. The gene-level counts were then calculated as the sum of the included 619 transcripts, weighted by their length as described in Soneson et al. [61], and implemented 620 through Trinity v2.8.4. Because RSEM can distribute one read among multiple transcripts, some 621 counts were not integer values. These values were rounded down for input into differential 622 expression analyses, which require integer values for read counts. Prior to further expression 623 analyses, genes unlikely to encode proteins were removed (RNA, CHIMERA, LOW_COV, and 624 NO_HITS, Table 1). For comparing expression levels among circadian genes, we used the TPM 625 normalized for differences in sequencing depth among libraries using the Trimmed Mean of M 626 (TMM) values calculated with Trinity. 627