Transcriptional expression changes during compensatory 1 plasticity in the central nervous system of the adult cricket 2 Gryllus bimaculatus 3 I. Auditory system plasticity in the prothoracic ganglia

Most adult organisms are limited in their capacity to recover from neurological damage. The 28 auditory system of the Mediterranean field cricket, Gryllus bimaculatus , presents a compelling 29 model for investigating neuroplasticity due to its unusual capabilities for structural 30 reorganization into adulthood. Specifically, the dendrites of the central auditory neurons of the 31 prothoracic ganglion sprout in response to the loss of auditory afferents. Deafferented auditory 32 dendrites grow across the midline, a boundary they normally respect, and form functional 33 synapses with the contralateral auditory afferents, restoring tuning-curve specificity. The 34 molecular pathways underlying these changes are entirely unknown. Here, we used a multiple k- 35 mer approach to re-assemble a previously reported prothoracic ganglion transcriptome that 36 included ganglia collected one, three, and seven days after unilateral deafferentation in adult, 37 male animals. We used EdgeR and DESeq2 to perform differential expression analysis and we 38 examined Gene Ontologies to further understand the potential molecular basis of this 39 compensatory anatomical plasticity. Enriched GO terms included those related to protein 40 translation and degradation, enzymatic activity, and Toll signaling. Extracellular space GO terms 41 were also enriched and included the upregulation of several protein yellow family members one 42 day after deafferentation. Investigation of these regulated GO terms help to provide a broader 43 understanding of the types of pathways that might be involved in this compensatory growth and 44 can be used to design hypotheses around identified molecular mechanisms that may be involved 45 in this unique example of adult structural plasticity.


3
Background 49 Most adult organisms, especially mammals, are limited in their capacity to adapt and 50 recover from neurological damage (1,2). The Mediterranean field cricket, Gryllus bimaculatus, 51 provides a model of neuroplasticity due to its demonstrated ability to compensate for neuronal 52 damage with novel dendritic growth and synapse formation, even into adulthood. Specifically, 53 the central auditory system, much of which resides in the prothoracic ganglion, reorganizes in 54 response to deafferentation caused by unilateral transection of auditory afferents in the adult (3-55 5). 56 In G. bimaculatus, auditory information is transduced by the auditory organs, located on 57 the prothoracic limbs. Auditory afferents receive the sensory stimuli and convey this information 58 into the prothoracic ganglion where they form synapses with several different auditory neurons 59 (6,7). These neurons exist as mirror image pairs and their dendritic arbors remain localized 60 ipsilateral to the auditory input, typically not projecting contralaterally across the midline (8). 61 However, previous research has shown that after amputation of the prothoracic leg in the adult, 62 which removes the auditory organ and severs the afferents, the deafferented dendrites of the 63 ipsilateral auditory neurons sprout across the midline and form functional synapses with the 64 intact auditory afferents on the contralateral side. This reorganization is evident whether 65 deafferentation occurs in larvae (9,10) or adults (3-5). Various aspects of the physiological 66 consequences of this compensatory behavior have been studied (3,9,10), however little is known 67 about the molecular pathways and mechanisms underlying this growth. 68 Although the genome has only just become publicly available (11), various de novo 69 transcriptomes have been created for use in this species (12)(13)(14). Recently, a de novo 70 transcriptome of the prothoracic ganglion was assembled in an attempt to understand the 71 molecular basis of the compensatory response (15). This transcriptome was built with RNA from 72 individual prothoracic ganglia of both control and deafferented adult male crickets. Initially, this 73 transcriptome was assembled and mined for the presence of developmental guidance molecules. 74 These guidance molecules are known to play a well-conserved role in regulating the specific 75 growth of axonal and dendritic projections during the development of many species (16,17). 76 While these molecules have mainly been studied for their role in development, it has also been 77 suggested that alterations in their expression may influence the ability of axons and dendrites to 78 recover from injury in adulthood (15,(18)(19)(20). Mining this cricket transcriptome revealed that 79 many well-conserved developmental guidance molecules, including slit, ephrins, netrins, and 80 semaphorins, were present within the adult prothoracic transcriptome (15). However, it is still 81 unknown whether the expression of these transcripts, or any other transcripts, are significantly 82 altered during this compensatory growth process. 83 The goal of this study was to better understand the underlying molecular control of the 84 compensatory growth behavior observed in the cricket. We assembled a new, more 85 representative and less redundant transcriptome of the cricket prothoracic ganglion using 86 multiple k-mer values during the assembly process. We also utilized the Evidential Gene 87 tr2aacdsmRNA classifier to reduce redundancies (21). This new transcriptome was used to 88 analyze changes in expression levels one, three, and seven days post-deafferentation. The 89 identified genes were then analyzed using GO annotation analysis to determine the classes of 90 genes that are differentially regulated over the course of the injury response. By performing this 91 analysis, it was possible to discover changes in gene expression that occur during the 92 compensatory growth response, allowing for insights into possible pathways or key molecules 93 critical to this process. 94

Transcriptome Assembly and Analysis 96
This transcriptomic study focused on the cricket, Gryllus bimaculatus, whose nervous system has 97 been shown to have an unusual level of adult structural plasticity (3-5). We deafferented 98 sensory neurons, including the auditory neurons, in the prothoracic ganglion of the adult cricket, 99 by unilateral amputation of the prothoracic leg at the femur. Control amputations consisted of 100 removal of the distal tip of the tarsus. We harvested prothoracic ganglia one, three, and seven 101 days post-amputation. These time points were designed to capture transcriptional changes in 102 response to the injury (one day), during initial sprouting (one and three days), growth across the 103 midline (three and seven days), and de novo synapse formation (3,22). 104 Although a G. bimaculatus prothoracic ganglion transcriptome from this tissue was previously 105 assembled, analyzed, and mined (15), the present study re-assembled a new transcriptome based 106 on those original cleaned and trimmed RNA-Seq reads. Five individual de novo transcriptomes 107 were built in Trinity using five different k-mer lengths (21, 25, 27, 30, and 32). Transcriptome 108 construction with longer k-mer lengths produces more reliable contigs, though biased toward 109 highly expressed transcripts. In a complementary fashion, using a shorter k-mer length produces 110 a more exhaustive set of contigs though also one more prone to noise (23,24). This trade-off 111 between bias and noise induced by the choice of k-mer length suggests how a higher quality de 112 novo assembly can be derived by integrating multiple k-mer lengths into an analysis (23). By 113 combining results across k-mer lengths, we ensured that a complementary selection of contigs 114 was included in the analysis (23,25,26). Correspondingly, we combined the five assemblies into 115 a single reference transcriptome and subsequently filtered redundancies and fragments. 116 The individual assemblies had an N50 ranging from 1,219 -2,341, with the longer k-mer 117 assemblies yielding a longer N50 (Table 1). The median, average, and maximum contig length 118 also increased as the k-mer length was increased. The total number of Trinity "genes" ranged 119 from 283,278 to 351,829, with higher k-mer assemblies yielding fewer predicted genes. The GC 120 content for each assembly remained roughly constant, between 40-41%. The overall alignment 121 was greater than 98%, with multimapping percentages between 90.04-93.33% (Table 1). This 122 high multimapping percentage can likely be attributed to the Trinity assembly process, which is 123 conservative in its separation and identification of unique transcripts, producing high intra-124 assembly redundancy (27). 125 126

130
The five transcriptomes were combined to generate a transcriptome with a total of 2,099,002 131 contigs (Figure 1), presumably many of which were redundant. We used the EvidentialGene 132 tr2aacdsmRNA classifier to filter the redundancies within our transcriptome, which were present 133 due to both intra-and inter-assembly redundancies (21). The EvidentialGene program employs 134 an algorithm that operates on the open reading frames of the contigs to generate a non-redundant 135 transcriptome containing the optimal set of transcripts based on biological relevance and coding 136 potential (21,28). This program is often used in 'over-assembly' procedures where multiple 137 assemblies are combined (29-31). With our multi-k-mer assembly, EvidentialGene produced a 138 main 'okay' set, containing 55,895 contigs, and an alternative 'okalt' set, containing 143,364 139 contigs, which were combined to produce a final transcriptome with a total of 199,357 contigs, 140 reducing the overall number of contigs by 90.5%. BLAST searches across all the contigs yielded 141 matches for 127,324 transcripts, a 63.87% BLAST hit rate for the entire transcriptome. The 142 number of Trinity predicted genes after running EvidentialGene dropped slightly to 132,972. The 143 multimapping percentage was reduced from approximately 90% to around 21%. 144 To check the accuracy of the sequences predicted in the transcriptome, we used Sanger 145 sequencing to independently confirm the sequences of six randomly selected transcriptome 146 transcripts. Of these six, four of them were predicted to contain complete open reading frames 147 (ORFs), and two were missing the 3' end. We analyzed 14,299 nucleotides of 15,478 predicted 148 base pairs (92%). The number of substitutions (16), insertions (84), and deletions (0) were noted; 149 overall, these differences accounted for approximately ~0.1% of the sequenced nucleotides (data 150 not shown). A few additional randomly selected sequences were highly repetitive and were not 151 amenable to Sanger sequencing; we did not proceed with an analysis of any of these candidates. 152

Differential expression during compensatory plasticity 153
To determine genes that were differentially regulated during compensatory plasticity, the 154 reads for each of the 16 Illumina libraries, which excluded the two outliers and three backfill 155 libraries (see Methods), were mapped back to our multiple k-mer transcriptome creating a counts 156 matrix. Pairwise comparisons of normalized counts data from deafferented vs. control crickets 157 were performed at each time point using both algorithms, EdgeR and DESeq2 (See Supplemental 158 Materials). The distribution of differentially expressed genes was initially visualized using 159 volcano plots (Figure 2). These plots revealed slightly different distributions of upregulated 160 versus downregulated genes between the two programs. Overall, however, we saw strong 161 correlations between these two programs for all time points (Figure 3), with the exception of a 162 few of the high fold-change candidates. Within this range, EdgeR was consistently more 163 conservative than DESeq2, which was especially true for a small number of upregulated 164 candidates (Figure 3a-c). 165 The majority of the transcripts were upregulated in the 2 to 4-fold range at one day (59% 166 of the transcripts), three days (55% of the transcripts), and seven days (45% of the transcripts). 167 The next largest group of transcripts was upregulated at 0 to 2-fold at one day (33% of 168 transcripts), three days (41% of transcripts), and seven days (39% of transcripts). For those 169 candidates that were downregulated, a majority of them at one day (63%) and three days (86%) 170 were downregulated less than 2-fold. At seven days, the bulk of candidates (70%) were 171 downregulated 2 to 4-fold. Analysis of 10 of the transcripts at each time point with the largest 172 fold changes revealed that most were unidentified and did not match anything in the NCBI 173 database when BLASTed. A few of these transcripts did have BLAST hits, such as a mucin-174 5AC-like (down at one day), larval cuticle protein-3-like (down at seven days), and hypothetical 175 accessory gland protein (up at three and seven days). 176 Using EdgeR, 261 genes were found to be downregulated at one day post-deafferentation,  DESeq2 and EdgeR are the leading programs used for the analysis of RNA-Seq data, 207 with thousands of reports both using these methods for analyzing differential expression and 208 comparing their computational methods (32,33). While both operate under the hypothesis that 209 the majority of genes are not differentially expressed, they employ different computational 210 methods, especially with respect to the normalization process, to determine differentially 211 expressed genes (33). EdgeR and DESeq2 both use a normalization by distribution method, but 212 EdgeR relies on the Trimmed Mean of the M-values method, whereas DESeq2 uses a Relative 213 Log Expression method (34-37). Since different methods rely on differing assumptions in order 214 to identify differentially expressed genes, the results will vary slightly. One experiment 215 comparing EdgeR and DESeq2 found relatively similar lists of differentially expressed genes 216 produced by the two programs, with EdgeR producing more conservative, smaller gene lists (32). 217 In this study we decided to use two different programs to conduct the differential expression 218 analysis in order to create a smaller, more conservative set of genes for future functional 219 analyses. Out of the six comparisons between EdgeR and DESeq2 (upregulated and 220 downregulated at one, three, and seven days post injury), four out of the six resulted in EdgeR 221 producing a smaller set of genes than DESeq2 ( Figure 5), in line with the study by Raplee and 222 colleagues (32). Although the two programs generated varying numbers of differentially 223 regulated genes, similar patterns in relative numbers were observed. Both programs showed a 224 decrease in the number of genes upregulated across time. For the downregulated genes, a peak in 225 the number of differentially regulated genes was found at three days post injury. This similarity 226 was expected given the relative similarity and previous studies of both analysis programs. 227

BLAST and Gene Ontology Annotations 228
Once we had identified a conservative set of transcripts predicted to be differentially 229 and an additional 15% had only BLAST hits ( Figure 6). 239 For the six lists of differentially expressed transcripts, there was a range between 37-81% 240 of transcripts having BLAST hits against the nr database and 23-62% against the manually 241 curated and annotated Swiss-Prot database. After mapping with GO terms, this was reduced to 242 about 31-59%. This left close to half of the differentially regulated transcripts with no functional 243 information. These transcripts could represent uncharacterized proteins, which may or may not 244 be playing an important role in the compensatory growth response. Since we performed a 245 BLASTx looking at proteins, it is also possible that these transcripts are non-coding RNAs. 246 Although polyA selection was used as part of the RNA-Seq process, this may not be completely 247 efficient in removing all non-coding RNAs, specifically long non-coding RNAs (39,40). Finally, 248 it is also possible that there were issues within these transcripts themselves, either due to an error 249 during the assembly process or the sequences being too short to be matched with confidence. 250 Regardless, we completed no further analysis of these transcripts. 251 One interesting group of proteins that was found to be upregulated at one day was the 252 protein yellow family. Protein yellow belongs to the major royal jelly protein family and are 253 secreted proteins found in the extracellular region (41). Protein yellow was first characterized in 254 Drosophila melanogaster for its role in pigmentation (42). Other research in honeybees revealed 255 the importance of royal jelly proteins in development and social behavior in addition to a 256 possible role in the CNS (41,43). However, the function of these yellow/royal jelly proteins is 257 not completely understood (42). While the role of these proteins in crickets is unclear, they were 258 statistically differentially regulated and would be an interesting molecular family to investigate 259 for their role in deafferentation-induced plasticity. 260

GO Term Distributions 261
Based on a preliminary grouping of GO terms by the three root classes, it appeared that 262 several classes of GO terms were found to be associated with our differentially expressed genes 263  Figure 8a, b). The percentage of genes indicates the percentage of the genes within a given list 270 that were annotated with the given GO term or one of the child nodes of the term. GO terms with 271 higher percentage representation included terms describing membrane-related components as 272 well as terms related to catalytic activity, binding, and metabolic and cellular processes. 273

Gene Ontology Categories 274
We examined whether any of the candidate GO terms we identified here were associated 275 with injury-related plasticity paradigms identified in other species. For example, perhaps 276 successful regeneration after injury depends on the recapitulation of developmental proteins that 277 promote neurogenesis (46) or guide axons and dendrites (18). If these molecular strategies were 278 important for the plasticity observed in the cricket, we would predict that we might see changes 279 in the expression of transcription factors involved in neurogenesis and/or the regulation of 280 several classes of guidance cues normally involved in midline control in insect embryos. When 281 searching our differentially regulated candidates, a few genes downregulated at three days were 282 associated with GO terms that were related to neurogenesis (GO:0007465: R7 cell fate 283 commitment and GO:0045466: R7 cell differentiation). We found only one transcript that was 284 annotated with an axon guidance-related GO term, which was identified as a "twitchin-like 285 protein," (Table 2) additional functional categories that we hypothesized could change during the compensatory 300 growth response were those related to apoptosis (48,49) and Wnt signaling (50). In our 301 differentially expressed gene sets we did not find enrichment in terms related to apoptosis. 302 Searching our results for Wnt-related GO terms, revealed a few genes annotated with Wnt 303 pathway members at 3 days post deafferentation (Table 2). These genes had a top BLAST hit of 304 atrial natriuretic peptide-converting enzyme isoform X1, Frizzled-4, and secreted frizzled-related 305 protein 5-like. 306 We looked for the presence of a number of additional groups of proteins that influence 307 neuronal morphogenesis, plasticity, or remodeling (Table 2). For example, the matrix 308 metalloproteases (MMPs) are required for axonal guidance (51) as well as dendritic remodeling 309 during metamorphosis in Drosophila melanogaster (52). Importantly, the expression of some 310 MMPs appear to contribute to poor recovery after spinal cord injury in mammals (53). We did 311 not find enrichment in any of these terms at any time point (Table 2), indicating that the injury-312 induced anatomical plasticity in the cricket may rely on different pathways than have been 313 identified in other species. Furthermore, it is notable that factors, such as MMPs, that can restrict 314 growth or contribute to pruning in other organisms are not upregulated upon injury in the cricket. 315 Several GO terms associated with the candidates found in a previous subtraction 316 hybridization study (54) were also found to be differentially expressed in the present experiment, 317 often showing significant changes in expression at the three-and seven-day post-deafferentation 318 time point (Table 2, bold). These include oxidoreductase, alpha-amylase, endoglucanase, and 319 alcohol dehydrogenase. As noted by Horch and colleagues (54), many of these enzymes have 320 been observed in the hemolymph of insects and play a role in metabolism and immune response. 321 Although it is possible that these findings are due to contaminants during the extraction of the 322 prothoracic ganglion, the results would imply that the extractions differed between control and 323 experimental animals in multiple experiments. Given that the differential expression of several 324 enzyme transcripts was found both in this study and in the SSH study, it is less likely that these 325 enzymes are artifact or contamination effects. Particularly, several differentially regulated 326 transcripts were associated with oxidoreductase activity across all time points. The BLAST hits 327 of these transcripts showed some of the enzymes to be retinal dehydrogenases. Retinal 328 dehydrogenase along with alcohol dehydrogenase, another regulated GO term, are involved with 329 the production of retinoic acid. Retinoic acid has been shown to be involved with development, 330 regeneration, synaptic plasticity, and neurite outgrowth (55-57) implying that regulation of 331 retinoic acid production may influence these processes. Another class of oxidoreductases that 332 appeared abundant within the BLAST hits was the cytochrome P450 family. Cytochrome P450 is 333 a superfamily of monooxygenase enzymes and several families of cytochrome P450 exist in 334 insects. These enzymes are known to have a variety of functional roles in insects including 335 growth and development (58). Cytochrome P450 has also been shown to regulate ecdysone 336 signaling in insects, including crickets (59,60). Ecdysone signaling is crucial for developmental 337 processes and morphogenesis, but has also been shown to be important in the dendritic 338 remodeling of Drosophila melanogaster sensory neurons (52,59). While these protein families 339 represent some of the transcripts annotated with "oxidoreductase activity", given the wide range 340 of such transcripts, it is difficult to discern the role of all of the regulated proteins. 341 342   analysis, there were 22 enriched GO terms found including those related to morphogenesis, 363 extracellular space, and neuron fate commitment (Figure 9). No enriched GO terms were found 364 in the upregulated gene set at seven days post-deafferentation. 365 One category of interest that was revealed in this enrichment analysis was the GO Term 366 "Regulation of Toll-signaling pathway." Toll receptors are most commonly associated with their 367 function in immunity and development, however, research in Drosophila melanogaster suggests 368 that they may also play a role in regulating cell number, connectivity, and synaptogenesis (62).

Animals, injury, and library preparation 396
Prothoracic ganglia from approximately 60 adult, male Mediterranean field crickets, 397 Gryllus bimaculatus were harvested and 21 individual ganglia were ultimately used as the 398 sources of RNA for this transcriptome (15). Male crickets that were adults for 3-5 days received 399 either a control amputation of the distal segment of the left tarsus ("foot chop" control crickets), 400 or the left prothoracic leg was severed mid-femur removing the auditory organ and deafferenting 401 the ipsilateral central auditory neurons ("deafferented" experimental crickets). Males were 402 chosen due to the potential sexual dimorphism in rates of dendritic growth after deafferentation 403 (22). (15). Several crickets were prepared for backfill as previously described (15). Prothoracic 404 ganglia were removed from these crickets 1, 3, or 7 days after amputation at the femur or tarsus 405 and total RNA was purified as previously described. A k-mer number identity was added to each contig's Trinity ID, all five assemblies were 426 concatenated, and the Evidential Gene program was used to create a single non-redundant 427 assembly. Evidential Gene relies on the Transdecoder.LongOrfs method, identifies the longest 428 ORFs, removes fragments, and uses a BLAST on self to identify highly similar sequences (98%). 429 The main (okay) and alternative (okalt) sets output from Evidential Gene were combined into a 430 final FASTA file used as the transcriptome for all subsequent analyses. Bam files, sorted bam 431 files, bam index files, and idxstats.txt files were created using samtools (67). This assembly is 432 publicly available at NCBI (Bioproject: PRJNa376023, SUB8325660). The metajinomics python 433 mapping tools (68) were used to generate a counts matrix. 434

Coverage Analysis 435
Samtools was used to extract the sequencing depth at every base position for each contig 436 in every cricket sample, and a python script was used to extract the mean and standard deviation 437 of depth for each contig. The package plotly in R (69) was used to plot the depth of each cricket 438 sample. Graphs were visually compared to determine outliers. Two outliers, 1C1 and 7C2, were 439 removed. 440 EdgeR and DESeq were used to run the differential expression analysis, (35,37). The were then compared and visualized between the two programs. Another set of lists containing the 448 genes overlapping the two programs was created for continued analysis. The EnhancedVolcano 449 package available in R was used to visualize differential gene expression in volcano plots (70). 450

PCR Confirmation 451
Six sequences were randomly selected for amplification and Sanger sequencing in order 452 to validate the assembly. Sets of primers were designed to obtain the sequence of most of each 453 sequence as predicted by the Trinity assembly. Primers, available on request, were designed for 454  list that were annotated with the GO term/child term are noted on the right. On the right axis, the 758 top numbers (olive) corresponds to the one-day data, the middle numbers, (magenta) corresponds 759 to the three-day data, and the bottom numbers correspond to the seven-day data (teal).