Identification of Mblk-1 target gene candidates in the worker MBs by ChIP-seq analysis
Mblk-1 is expressed in a lKC-preferential manner in adult worker MBs (Fig. 1A) [16, 18]. When we searched for highly expressed genes in the honey bee MBs using previously reported RNA-seq data , we found that Mblk-1 was the 38th most highly expressed gene and the most highly expressed among transcription factors in MBs (Fig. 1B), implying its important role in the adult honey bee MBs.
We previously prepared an antibody that binds specifically to Mblk-1 protein (hereafter termed anti-Mblk-1 antibody) (Fig. S1A) . We first tested whether the anti-Mblk-1 antibody could be used for ChIP-seq analysis. Full-length Mblk-1 protein was mainly detected as 250-kDa bands in the immunoprecipitated sample as well as in the input and supernatant samples after immunoprecipitation using worker MB lysate (Fig. S1B), indicating that the anti-Mblk-1 antibody can be used for immunoprecipitation.
We performed ChIP-seq analysis using homogenates of adult worker MBs and the anti-Mblk-1 antibody. A total of 593 genes were identified as target gene candidates of Mblk-1, in which Mblk-1-binding signals were located within 10 kb upstream or downstream of the corresponding genes (Table 1 and Supplementary Data 1&3) when using a peak calling q-value threshold of 0.005 and the input DNA sample as a control. Schematic diagrams of some peak signals and their surrounding Mblk-1 target gene candidates are shown in Fig. 2A-F. The signal with the lowest q-value was in the third intron of an uncharacterized non-coding RNA gene (Gene 1 in Table 1 and Fig. 2A). Genes related to neural function, such as suppressor of lurcher protein 1 (Gene 2 − 1 in Table 1 and Fig. 2B), fasciclin-3 (Gene 4 in Table 1 and Fig. 2C), potassium voltage-gated channel subfamily KQT member 1 (Gene 9 in Table 1 and Fig. 2D), and CaMKII (Gene 17 − 1 in Table 1 and Fig. 2F) were also identified as Mblk-1 target gene candidates. Moreover, among the Mblk-1 target gene candidates, some ecdysone-signaling pathway-related genes such as ecdysone-induced protein 75 (E75) and Ultraspiracle (Usp) were identified. This is consistent with the fact that E75 is a direct regulatory target of Drosophila melanogaster E93 (DmE93) . Usp was listed in Table 1, and the peak signal was detected in the 5’ untranslated region of this gene (Fig. 2E).
Three distinct DNA binding motifs of Mblk-1
To search for Mblk-1-binding motifs, we next performed motif analysis using sequences corresponding to the Mblk-1-binding regions and MEME-ChIP . Three consensus sequences were identified by DREME: GA-rich sequence (GAGA or GAG, hereafter termed a GAGA motif), GAAATTTT, and AT(C)TTTGTA (Fig. 3A). In addition, a combination of these motifs was found by MEME (Fig. 3B). In a previous study, we identified an Mblk-1 preferred binding sequence named MBE (Mblk-1-binding element, 5’-AGGTAGAGATCGATCGATAGGG-3’) by using an in vitro binding site selection method . The motif we found in this study is consistent with MBE, because MBE has a GAGA sequence. In addition, the ATTTTC(T)GG motif is reported as a binding motif of DmE93 , and it is also reported that Bombyx mori E93 (BmE93) binds to GAGA-containing motifs . These previous findings support the quality of our data and suggest that Mblk-1-binding motifs are conserved in insects.
We next performed a luciferase assay to examine whether Mblk-1 recognizes these 3 binding motifs and affects the transcription of a reporter gene. Human HEK293T cells were transfected with an Mblk-1 expression vector; a Renilla luciferase reporter vector; and 5 types of pGL4.23 firefly luciferase reporter vectors, each of which contains 6xGAGA, 6xGAGA with 4-bp spacer sequences, 6xATTTTG, 6xGAAATTTT, or 6xATTTTGG upstream of the minimal promoter, respectively (Fig. 3C). Normalized firefly luciferase activity increased about 3.0-fold and 6.6-fold when co-transfected with pGL4.23 vectors containing 6xGAGA or 6xGAGA with 4-bp spacer sequences, respectively, compared with that when co-transfected with a pGL4.23 vector containing no binding motifs (Fig. 3D). This indicates that Mblk-1 recognizes GAGA motifs and upregulates the transcription of genes that have these motifs. By contrast, normalized firefly luciferase activity significantly decreased by approximately 60% when co-transfected with pGL4.23 vectors containing 6xGAAATTTT, compared with that when co-transfected with a pGL4.23 vector containing no binding motifs (Fig. 3D). These results strongly suggested that Mblk-1 recognizes GAAATTTT motif to affect the transcription of genes with these motifs. On the other hand, no significant decrease in firefly luciferase activity was observed when co-transfected with a pGL4.23 vector containing 6xATTTTG or 6xATTTTGG (Fig. 3D), although these results do not exclude the possibility that Mblk-1 binds to these motifs. Taken together, our results indicated that Mblk-1 affects transcription by recognizing the identified Mblk-1-binding motifs.
Identification of genes upregulated in lKCs as candidates for Mblk-1 target genes
Given that Mblk-1 upregulates its target genes in a lKC-preferential manner in the honey bee MBs, Mblk-1 target gene candidates are expected to be expressed preferentially in lKCs. To find genes expressed preferentially in lKCs among the identified Mblk-1 target gene candidates, we used recently reported honey bee MB single-cell RNA-seq data . When the MB cells were classified into 10 clusters (Fig. 4A), preferential Mblk-1 expression was observed in clusters 1, 3, and 5 (Fig. 4B). In addition to Mblk-1, genes shown to be preferentially expressed in lKCs by in situ hybridization, such as Ryr (cluster 1) , IP3R (clusters 1, 3, and 5) , disks large homolog 5 (clusters 1 and 3) , and broad-complex (cluster 3)  were also more highly expressed in clusters 1, 3, and/or 5 than in other clusters (Fig. 4B). Therefore, we regarded these 3 clusters as lKCs.
We newly identified 12 Mblk-1 target gene candidates with higher expression in at least one of clusters 1, 3, and 5 than in the other clusters (Fig. 4B). In situ hybridization analysis of genes with the top 6 peak signals was performed to confirm that these genes are expressed preferentially in lKCs. As expected, CaMKII, Baiap3 and pumilio homolog 2 were expressed preferentially in lKCs (Fig. 4D, E and H), and CUGBP Elav-like family member 4 was expressed in the whole MBs, but not in the other brain regions (Fig. 4F). For the other 2 genes, specific expression in lKCs or MBs was not observed (Fig. 4C and G). Schematic diagrams of the peak signals and their surrounding Mblk-1 target gene candidates are shown in Fig. S2.
Comparison of Mblk-1 target gene candidate profiles between pupal and adult worker honey bee brains
Mblk-1 expression level is higher in pupal brains than adult brains . In addition, DmE93 is expressed transiently in pupal MB neuroblasts to activate autophagy during metamorphosis . Thus, we next examined whether Mblk-1 has distinct target genes between adult and pupal worker brains in the honey bee. To this end, we performed ChIP-seq analysis using pupal worker brain homogenates and the anti-Mblk-1 antibody. A total of 263 genes were identified as pupal Mblk-1 target gene candidates, for which Mblk-1-binding signals are located within 10 kb upstream or downstream of the corresponding genes when using a peak calling q-value threshold of 0.005 and the input DNA sample as a control. Approximately half of the pupal Mblk-1 target gene candidates (138/263, 52%) were detected specifically in the ChIP-seq analysis using pupal brains (Fig. 5A and Supplementary Data 2&3). The Mblk-1-binding signals with the 20 lowest q-values (top 20 signals) and genes that have these signals within 10 kb upstream or downstream are shown in Fig. 5B. Among the 20 pupal target gene candidates shown in Fig. 5B, 8 genes were pupal-specific (8/20, 40%). For almost half of the 20 signals (9/20, 45%), no genes were localized within 10 kb upstream or downstream of the signals. For example, a neural developmental gene, Down syndrome cell adhesion molecule (Dscam) shown in Fig. 5B was not present among the adult target gene candidates, suggesting that this gene is a pupal-specific target. The peak signal was found in the first intron of Dscam, and only the GAGA motif was found in the Mblk-1 binding region. In addition to Dscam, several developmental genes, including Ultrabithorax and Tyrosine-protein kinase transmembrane receptor Ror, were specifically detected in the pupal analysis (Supplementary Data 2).
For the 6 genes analyzed by in situ hybridization, potassium voltage-gated channel subfamily KQT member 1, CaMKII, CUGBP Elav-like family member 4, neural-cadherin, and pumilio homolog 2 were listed as the common target gene candidates. Baiap3 was the only adult-specific gene. Although CaMKII is reported to regulate learning and memory abilities in adult workers [26, 27], it is likely that CaMKII is regulated by Mblk-1 in both pupal and adult brains and is involved in neural development through regulating synaptic plasticity in pupae. All the Mblk-1 target genes specific for adult MBs or pupal brains and the common target gene candidates are shown in Supplementary Data 1, 2, and 3, respectively.
Self-regulation of Mblk-1 expression in honey bee MBs
Interestingly, two distinct peak signals were found 2 kb and 25 kb upstream of the Mblk-1 gene, respectively (Fig. 6A). All these signals contained GAGA motifs, which induced the expression of the downstream luciferase gene in the reporter assay (Fig. 3D). From this result, we expected that Mblk-1 induces its own expression via these GAGA motifs. To confirm this, we performed a luciferase assay with a pGL4.23 firefly luciferase reporter vector that contains the sequence corresponding to the peak signal located 25 kb upstream of Mblk-1 gene (Fig. 6A). We observed an approximately 4.9-fold increase in firefly luciferase activity when using the pGL4.23 vector containing the peak signal sequence compared with that when using the pGL4.23 vector containing no binding motifs (Fig. 6B). Therefore, Mblk-1 is suggested to recognize this GAGA-containing region as an enhancer to upregulate its own expression. It is possible that once Mblk-1 is expressed, it can upregulate its own expression, which leads to its constitutive expression in adult honey bee lKCs.
Expression analysis of Mblk-1 homolog in the sawfly
In the honey bee, Mblk-1 is almost selectively expressed in the brain among various adult tissues . In addition, in the ant Harpegnathos saltator workers and gamergates, the expression of the Mblk-1 homolog in the brain is observed . We next tested whether the constitutive expression of Mblk-1 homologs in the adult brain is conserved even in other hymenopteran insect species (Fig. 6C). We performed qRT-PCR analysis to examine the expression of the Mblk-1 homolog (LOC105691346) in various adult body parts of the sawfly (Athalia rosae), which is a solitary, phytophagous, and primitive hymenopteran insect species . Although significant expression of the sawfly Mblk-1 homolog was detected in the head without the brain, thorax, and abdomen, the expression in the brain was below the detection threshold (crossing point Cp: 40) (Table S2). The relative expression levels of the Mblk-1 homolog did not differ significantly among body parts when the brain Cp values were set to 40 (Fig. 6D). These results indicate that the expression level of Mblk-1 homolog is very low in the sawfly brain, and that brain-specific expression, which is observed in the honey bee, is not conserved in the sawfly.
Comparative analysis of the upstream sequences of Mblk-1 homologs
Finally, we investigated the relationship between the accumulation of the GAGA motif (GAGA and GAG) and the expression of Mblk-1 homologs in the brains of Drosophila melanogaster and hymenopteran insects. We counted the number of GAGA motifs located within upstream of 2 kb and 10 kb from the transcription start site of each Mblk-1 homolog (Fig. 6E and Fig. S3). In all the four species of Aculeata, the numbers of GAG sequences within upstream of 2 kb were over 100, and the numbers of GAGA sequences were over 30. On the other hand, in Drosophila melanogaster and Athalia rosae, either of which no Mblk-1 homolog expression was observed in adult brains, the number of GAG sequences was 76 and 41, respectively. In addition, to determine the evolutional stage in hymenopteran insects at which the accumulation of GAGA motifs was acquired, we investigated the numbers of GAGA motifs in the 2 parasitoid wasps, Nasonia vitripennis and Fopius arisanus. The numbers of GAGA motifs in Nasonia vitripennis were comparable to those in Aculeata, while the numbers in Fopius arisanus were closer to those in Drosophila melanogaster and Athalia rosae. The same trend was observed in the numbers of GAGA motifs within upstream of 10 kb (Fig. S3).