Evolution and genomic organization of the sHSP gene cluster in Arthropods


 Background

Conserved syntenic gene complexes are rare in Arthropods and likely only retained due to functional constraint. Numerous sHSPs have been identified in the genomes of insects, some of which are located clustered in close proximity. Previous phylogenetic analyses of these clustered sHSP have been limited to a small number of holometabolous insect species and have not determined the pattern of evolution of the clustered sHSP genes (sHSP-C) in insect or Arthropod lineages.
Results

Using eight genomes from representative insect orders and three non-insect arthropod genomes we have identified that a syntenic cluster of sHSPs (sHSP-C) is a hallmark of most Arthropod genomes. Using 11 genomes from Hymenopteran species our phylogenetic analyses have refined the evolution of the sHSP-C in Hymenoptera and found that the sHSP-C is order-specific with evidence of birth-and-death evolution in the hymenopteran lineage. Finally we have shown that the honeybee sHSP-C is co-ordinately expressed and is marked by genomic features, including H3K27me3 histone marks consistent with coordinate regulation, during honeybee ovary activation.
Conclusions

The syntenic sHSP-C is present in most insect genomes, and its conserved coordinate expression and regulation implies that it is an integral genomic component of environmental response in arthropods.


Background
Conserved syntenic gene complexes are rare in Arthropods and likely only retained due to functional constraint. Numerous sHSPs have been identi ed in the genomes of insects, some of which are located clustered in close proximity. Previous phylogenetic analyses of these clustered sHSP have been limited to a small number of holometabolous insect species and have not determined the pattern of evolution of the clustered sHSP genes (sHSP-C) in insect or Arthropod lineages.

Results
Using eight genomes from representative insect orders and three non-insect arthropod genomes we have identi ed that a syntenic cluster of sHSPs (sHSP-C) is a hallmark of most Arthropod genomes. Using 11 genomes from Hymenopteran species our phylogenetic analyses have re ned the evolution of the sHSP-C in Hymenoptera and found that the sHSP-C is order-speci c with evidence of birth-and-death evolution in the hymenopteran lineage. Finally we have shown that the honeybee sHSP-C is co-ordinately expressed and is marked by genomic features, including H3K27me3 histone marks consistent with coordinate regulation, during honeybee ovary activation.

Conclusions
The syntenic sHSP-C is present in most insect genomes, and its conserved coordinate expression and regulation implies that it is an integral genomic component of environmental response in arthropods.

Background
Evolutionary conserved syntenic gene complexes are relatively rare in arthropod genomes, with only three major well-characterised examples, the Hox complex (1)(2)(3)(4)(5)(6), the E(spl)-C (7,8) and the runt complex (9) represented in the majority of arthropod lineages. Those complexes, in which gene-order remains conserved over evolutionary time, are likely to be constrained by factors such as coordinated regulation (as in the case of the Hox and E(Spl)-C). Most gene complexes (though not all) (10), are formed by duplication of a single ancestor, but the evolutionary history of such complexes is often complicated with instances of duplication, loss, and gene conversion. In this study we examine the evolutionary history of a cluster of small heatshock protein encoding genes (sHSP-C) (11) in the genomes of arthropods.
sHSPs are found throughout the three domains of life and these proteins act as molecular chaperones that bind misfolded proteins. The sHSPs are one of the most up-regulated classes of heat-shock proteins following stress (12)and have critical roles in normal development in Drosophila (13). The sHSPs are categorized as having molecular weights between 12 kDa to 42 kDa and are comprised of a conserved α-crystallin domain of about 100 amino acid residues and a highly variable N-terminal domain (14,15). The sequence diversity of the N-terminal region has been proposed to be responsible for regulation of expression, species-speci c functions, cellular localisation or tissue speci c expression (11,16,17) and oligomerization and chaperoning capacity (18).
Numbers of sHSPs encoded in the genome vary between insects, with 16 sHSPs encoded in the genome in Bombyx mori (silkworm), 11 in Drosophila melanogaster, 10 in Apis mellifera (honeybee), and seven in Anopheles gambiae (Li et al. 2009). The most well studied sHSP are seven clustered at a 15 kb region at the 67B locus in D. melanogaster, and the sHSP lethal (2) essential for life (l(2)e ). These sHSPs all have similar sequences in their promoter regions, with multiple heat shock elements (HSE) and binding sites for Broad-Complex (BR-C), an ecdysone regulated transcription factor (19,20). This implies that the sHSP genes in D. melanogaster share both a common ancestral ortholog as well as similar functions and expression (21)(22)(23). Supporting this conjecture, the seven D. melanogaster sHSPs in the sHSP-C have coordinated expression in response to temperature shock, while having distinct expression patterns during normal development (24)(25)(26)(27).
sHSP clusters have been identi ed in other insects including six sHSP genes located on chromosome 27 and ve on chromosome 5 of the silkworm (11), eight genes on chromosome 2 of honeybee, seven genes on chromosome 8 of Tribolium castaneum, and six genes on chromosome 2 of A. gambiae (11). The close linkage of sHSPs throughout insects and evidence of coordinated expression of the sHSP cluster in insects (11,24) implies a functional role for the genomic clustering of insect sHSPs. However phylogenetic evidence, restricted to ve holometabolous insect species (~ 340 my diverged (28)), implied that these sHSP clusters were lineage speci c indicating that the sHSPs have undergone multiple independent expansion events in the evolution of insects (11). Here we analyse representative genomes from arthropod taxa for sHSP-C to determine its genomic structure, and assess whether the sHSP-C is a genomic regulatory domain that responds to environmental stimuli in the honeybee. By examining the genomes of non-insect arthropods we show that sHSP-Cs are ancestral components of pan-crustacean genomes but their evolution has been complex, with both evolutionary constraint of an ancestral cluster as well as multiple expansion events in different species.

Results
A sHSP-Cluster is a hallmark of arthropod genomes Previous analyses identi ed clustered sHSP genes in the holometabolous insects Apis mellifera, Tribolium casteneum, Bombyx mori, and Drosophila melanogaster (11). Here, we extended these analyses with the addition of representative species from the hemimetabolous insects Ladona fulva, Ephemera danica, Zootermopsis nevadensis, Pediculus humanus, and include non-insect arthropods Tetrachynus urticae [chelicerate]; Daphnia pulex [crustacea]; Strigamia maritima [myriapoda] and the Lophotrochozoan Biomphalaria glabrata [mollusca] as a non-arthropod outgroup in order to systematically examine the evolution of these genes and gene cluster. In our analyses all arthropods have multiple sHSP sequences.
In the majority of arthropods sHSP genes are found adjacent to each other in gene clusters on contigs or genome scaffolds ( Supplementary Fig. 1) these genes are herein named sHSP-C genes. In comparison sHSP were not found to be clustered in a molluscan genome. Thus a sHSP-C appears to be a feature of arthropod genomes (with the exception of the hemimetabolous insect Zootermopsis). l(2)e is an insect innovation BLAST identi ed two categories of non-linked sHSPs; those that were most similar to Drosophila l(2)e and non-linked sHSP most similar to the sHSP-C genes. We carried out phylogenetic analyses of the nonlinked sHSP genes most similar to Drosophila l(2)e and the genes of the sHSP-C from a single species from each order of the insects, and one species each from chelicerates, crustacean, myriapoda and mollusca ( Fig. 1). Due to the large number of non-linked sHSP that are most similar to the sHSP-C genes these were excluded from further analyses of the Arthropod sHSPs.
Phylogenetic analysis examining the relationships of these genes indicates the existence of a clade containing the orthologs of the D. melanogaster l(2)e gene from each insect species (Fig. 1, indicated by black dashed box). Examining the genomic organisation of gene from this clade in each species we nd that in the holometabolous and hemipteroid insects this gene is not clustered with other sHSP. However, the ortholog of l(2)e is clustered with the linked sHSP genes in the basally branching Palaeoptera Ladona and Ephemera ( Supplementary Fig. 1). In Ladona there are three copies of this gene clustered in two separate sHSP-C and in Ephemera one of the linked sHSP-C genes falls into the orthologous insect l(2)e phylogenetic group. We nd no evidence for orthologs of l(2)e in the genomes of non-insect arthropods thus the l(2)e gene appears to be an innovation of the insect lineage.
The insect sHSP-C shares a common ancestor with l(2)e Consistent with previous data (Li et al., 2009), the majority of the arthropod sHSP-C genes fall into orderspeci c phylogenetic clades (coloured boxes on Fig. 1) implying that these sHSP-C genes may be the result of group-speci c expansion events from an ancestral gene or genes. There are, however, two exceptions. One Tribolium sHSP-C gene (dark blue square) and one Tetrachynus non-linked sHSP paralog (green square) form a clade with the sHSP-C genes from Drosophila (light blue box). Finally with the exception of the Bombyx sHSP-C genes (which form a sister clade to all other holometabolous and hemipteran insect sHSPs), all of the insect sHSP-C genes share a common branch-point with the homologous non-linked l(2)e genes (dashed box Fig. 1), which implies that the insect sHSP-C shared a common ancestor with l(2)e .

sHSP-C copy number is conserved in holometabolous insects
The phylogenetic evidence shows that there is signi cant ux in the numbers of sHSP-C genes in the insect genomes. Copy number appears to be retained in the holometabolous insects (6-7 sHSP-C genes) whereas in the hemimetabolous insects it is unclear whether the Ladona expansion is a lineage speci c event, or if the sHSP-C has been reduced (or lost in the case of Zootermopsis) in hemipteran genomes. The apparent clade-speci c expansions of the sHSP-C genes and the complexity of the phylogeny make it di cult to identify the ancestral sHSP-C gene(s) and thus trace the evolution of the sHSP-C in the insect lineage. Consistent with previous analyses carried out by Li et al. the relationships seen in the phylogram are best explained by independent expansion events of the sHSP-C genes in each of the insect orders.
However the species investigated in these initial analyses here and by Li et al. diverged 300 my ago thus the phylogenetic distance between species may not provide su cient resolution to understand the true relationships of the sHSP-C between species. To combat this issue we investigated the genome organization of the sHSP-C and carried out phylogenetic analyses of the sHSP-C genes in the more closely related species in the hemipteran (hemimetabolous sister group to the holometabola) and hymenopteran (most basally branching holometabolous insects) to deduce the origins of the insect sHSP-C.
All Hemipteran insects have a reduced or absent sHSP-C To assess the evolution of the sHSP-C genes in the hemipteroid lineage we identi ed sHSP-C, l(2)e and sHSP paralogs in hemiptera ( Supplementary Fig. 2) and constructed a phylogram of these plus the honeybee (basally branching holometabolous insect), Odonata and Ephemera (representative nonhemipteran hemimetabolous insects) sHSP-C and l(2)e genes (Fig. 2). With the exception of Diaphorina citri, the hemipteroid sHSP-C genes and hemipteroid non-linked l(2)e genes separate into two distinct phylogenetic clades (Fig. 2, dashed box and yellow box) consistent with our previous arthropod phylogeny (Fig. 1). The hemipteroid sHSP-C genes and non-linked sHSP paralogs share a common branch point with the Apis mellifera sHSP-C genes and two of the Ephemera sHSP-C genes. Thus the hemiptera sHSP-C is most closely related to the genes of the sHSP-C of Apis mellifera (most basally branching holometabolous insect) and Ephemera (hemimetabolous insect). In the absence of additional genomes from species that are evolutionary intermediates between holometabola and hemiptera we cannot determine if the hemiptera sHSP-C has been reduced or if sHSP-C gene expansion has occurred in the holometabolous insects. Our conclusion from the phylogeny is that the sHSP-C genes from Ladona, excluding those that cluster with l(2)e , appear to be a consequence of lineage speci c expansion from non-linked l(2)e .
In hemiptera we see evidence of species-speci c duplications of the sHSP-C genes. Within the phylogenetic group of the hemipteroid sHSP-C and non-linked sHSP paralogs each hemipteroid sHSP-C gene is most closely related to a sHSP-C gene from the same species-speci c cluster (in bold on Fig. 2). In contrast, the non-linked sHSP paralogs are, in some cases, more closely related to the non-linked sHSP paralogs from closely related species e.g. the non-linked sHSP from Halyomorpha and Oncopeltus (marked by # on Fig. 2), a closely related Pentatomomorpha species (82 mya diverged (30)).
We found that the Diaphorina sHSP cluster, consisting of two genes ( Supplementary Fig. 2), instead aligns with the alpha crystallin genes (excluded from the phylogeny). Additionally the cluster of three sHSP genes from Diaphorina group together in a separate clade adjacent to all other insect sHSPs. These data indicate that the clustered genes in Diaphorina are instead more similar to the conserved alpha-crystallin genes. The absence of a true sHSP-C in Diaphorina is consistent with the absence of a sHSP-C in the other true bug species A. pisum.
Birth and death evolution of the Hymenoptera sHSP-C Clustered sHSP genes, non-linked sHSP and l(2)e were identi ed in hymenopteran species ( Supplementary Fig. 3). Comparable to the arthropod phylogeny ( Fig. 1) the homologous l(2)e forms a phylogenetic group with all Hymenoptera and other l(2)e genes in other insects (Fig. 3). The genomic organization of hymenopteran sHSPs indicates that sHSP-C is more stable in the hymenopteran lineage than in hemiptera. This is supported by the conservation of orientation of transcription and retention of several genes in the sHSP-C ( Supplementary Fig. 3). To further support this conclusion order speci c analyses resolves the phylogenetic signal for the hymenopteran sHSP-C genes. Each gene of the Hymenopteran sHSP-C forms a phylogenetic group with representative genes from the other Hymenopteran species, indicating that the expansion of sHSP genes giving rise to the sHSP-C in Hymenoptera are order, rather than species-speci c (Fig. 3). The genes of the Hymenoptera sHSP-C likely originate from two ancestral sHSP genes which have duplicated to give rise to a core hymenopteran sHSP-C (dotted box on Supplementary Fig. 3). This core was present deep in hymenopteran evolution as evidenced by its presence in Cephus, Orussus and the Apidae clades which last shared a common ancestor ~ 190 million years ago (28).
In Hymenoptera we nd evidence to support birth and death evolution whereby new genes arise from duplication and are either maintained in the genome, deleted or become nonfunctional via deleterious mutations (31). We see evidence of gene loss in the basal branching saw ies and the ants (Supplementary Fig. 3). The absence of some of the sHSP-C genes in Athalia appears to be due to loss events because we see the full complement of ~ 7 hymenopteran sHSP-C genes in another saw y Cephus cinctus. We also identify remnants of sHSP ORFs in the ant Atta cephalotes sHSP cluster (whether these are expressed or functional remains to be determined) whereas the ant Ooceraea biroi has retained four of the sHSP-C genes and has another sHSP-C of two genes at another location in the genome. The phylogeny indicates that the genes of the second Ooceraea sHSP-C are most similar to that of the rst Ooceraea sHSP-C. Other examples where genes of the same species sHSP-C are more similar to one another than other Hymenoptera sHSP-C genes are observed twice in Orussus, and once in Athalia and Microplitus, suggestive of recent species-speci c duplications.
The cluster observed in Cephus cinctus is conserved through to the Apoidea (~ 190 million years diverged) which indicates that the ~ 7 sHSP-C in the extant species of hymenoptera is likely very similar to the ancestral hymenoptera sHSP-C. This analysis indicates that the Hymenopteran sHSP cluster was formed early in hymenopteran evolution but has been remodelled, through the birth and death of sHSP-C genes into the clusters seen in different species (Supplementary Fig. 3).
The sHSP-C is a coordinately regulated gene cluster in A. mellifera That the sHSP genes have remained clustered in the same genomic location in Hymenoptera over 190 million years of evolution, implies that this genomic organisation is constrained and functionally important. In Drosophila, the sHSP genes are regulated co-ordinately in response to heat-shock and stress (32)(33)(34). We hypothesise that coordinated regulation may explain the maintenance of sHSP clusters in hymenoptera similar to what is seen for the insect E(spl)-C (7). To test the possibility that coordinate regulation of the sHSP-C genes occurs in hymenoptera, we examined the expression of these genes in two published RNA-seq datasets (29,35). Both of these datasets examine the differences in gene expression in response to an environmental stimulus; the differential feeding of honeybee larvae that results in queen development (Cameron et al., 2013) and the effect of queen mandibular pheromone (QMP) on honeybee ovary activity (Duncan et al., 2020).
In honeybees, the sHSP-C genes are co-ordinately expressed in a data set of gene expression during activation of the honeybee worker ovary (Supplementary Fig. 4). This coordinate regulation is not seen in datasets of honeybee female larval development, indicating that, as in Drosophila and Bombyx, these clusters may be regulated in a coordinate way in some situations, and separately in others. This is consistent with the ndings of Duncan et al., (2020) where they identi ed a signi cant number of coregulated gene clusters associated with the response of adult honeybees to QMP. However, a systematic comparison with four datasets revealed only a very small number of these clusters were co-ordinately regulated in response to different environmental stimuli.
To gain more resolution of the co-ordinate regulation of this sHSP-C gene cluster in honeybees we performed RT-qPCR and con rmed that the genes of the sHSP-C are differentially regulated in the honeybee ovary in response to QMP (with the exception of LOC724274). However, the expression of the adjacent genes that are not related at the sequence-level to sHSP genes; Gmap and LOC100576174 are not signi cantly different between queen-right ovary and queen-less ovary de ning the borders of the coregulated gene cluster (Fig. 4A). We then assessed how this co-regulation might be occurring by determining if there were binding motifs for Br-C and heat shock elements that are known to regulate the expression of these genes in Drosophila (19). We found that the sHSP-C genes had predicted binding motifs for Br-C and HSE in their promoters, consistent with Drosophila (23,(36)(37)(38). To assess epigenetic regulation we examined the sequence for CTCF binding sites and assessed H3K27me3 enrichment (hallmarks of topologically associated domains) at the cluster in queen-less worker ovary, queen-right worker ovary and queen ovary. We found that although the overall levels of H3K27me3 at the sHSP-C are not different between queen-less worker ovary, queen-right worker ovary and queen ovary ( Fig. 4B and C), the boundaries of the sHSP-C including two adjacent non-sHSP genes (Gmap and LOC100576174) are demarcated by higher levels of H3K27me3 at the 5' and 3' anks of the cluster which coincides with the predicted CTCF sites. This implies that the sHSP-C cluster in honeybees is co-ordinately regulated by transcription factor binding to promoter sites but also permissive chromatin structure bounded by the CTCF insulator sites, consistent with the sHSP-C acting as a topologically associated domain in the honeybee ovary.

Discussion
Tracking the evolution of the sHSP cluster in insect genomes is challenging due to the complex phylogenetic relationships of the many sHSP genes in insects. While the presence of a cluster of sHSP genes is widely conserved across insect genomes, phylogenetic evidence indicates that the genes that make up those clusters have arisen in a species independent way, as they are not phylogenetically related across insect orders. In our analyses we only nd support for ancient stable clusters of sHSP genes composed of genes with clear orthologous relationships in Hymenoptera.
We nd that the majority of genes of the hymenoptera sHSP-C are order-speci c and orthologs are not detected outside of the Hymenoptera. Order-speci c sHSP-C genes have also recently been identi ed in a study of eight species of Lepidoptera (39). Consistent with previous analyses (11) gene conversion cannot explain this phylogenetic signal that we observe in Hymenoptera. Instead our analyses support birth-and-death evolution of this multigene family. We show evidence for sHSP gene loss from the cluster in saw ies, wasps, ants and bees, and we see evidence for duplication in the Ooceraea and Microplitus. We propose that multiple duplication and loss events during the evolution of the Hymenopteran? sHSP-C have led to the complex phylogenetic signal, which depicts multiple lineage speci c sHSP-C. Without the ancestral species and/or more closely related non-hymenopteran insect species intermediary to the more basal branching hemimetabolous insects it is not possible to identify the genes that have been lost or duplicated in the insect lineage. It is possible that the evolutionary mechanisms driving the sHSP-C evolution in hemiptera are very different to that of the holometabolous insects, and we cannot rule out that either gene conversion or recent species-speci c duplications of the genes in a heavily reduced sHSP-C could be responsible for the phylogenetic signal we see in the hemiptera. However, with the addition of more hymenopteran species, we are able to de ne the ancestral hymenopteran sHSP cluster and we are able to identify the birth and death events that have occurred in this lineage.
Based on these analyses we propose a model of the ancestral insect sHSP-C which likely contained two divergent sHSP genes (Fig. 5). In the basal branching insect species Ladona and Ephemera this cluster is retained. In Ephemera a sHSP paralog is duplicated to give rise to a sHSP-C containing three genes. In contrast, the Ladona genome appears to have undergone multiple duplications of the ancestral insect cluster producing three separate sHSP-C. Each of these clusters contains at least one gene copy of each of the ancestral sHSP genes (l(2)e and sHSP paralogs). The existence of the two divergent copies of sHSP in Ladona and Ephemera is different to that of the rest of the insects' sHSP-C. It becomes evident in the higher branching hemiptera that these two ancestral sHSPs are separated and remain so in the holometabolous insects. This separation gives rise to the homologous non-linked insect sHSP l(2)e , which is shared with all insects. Evidence of l(2)e sHSP and sHSP-C separation might be seen in the hymenoptera where in some hymenopteran species (Apis and Atta) the homologous l(2)e gene is retained on the same chromosome as the sHSP-C genes. The genes of sHSP-C present in the hemipteroid insects and holometabolous are paralogous in nature and, after the split of the holometabola from the hemimetabola, it is these sHSP paralogs which undergo extensive duplication and loss events to give rise to the lineage speci c sHSP-C genes via birth-and-death evolution.
There is evidence for other gene clusters in insects undergoing birth-and-death evolution. The odorant binding protein family is a multigene family contained within clusters in the genome. In Drosophila species these clusters contain a large number of genes ranging between 40 and 61 genes. Phylogenetic analysis and assessment of genome organization in 12 Drosophila species showed that this cluster in Drosophila is prone to duplication, loss and pseudogenisation, and these speci c events could be tracked throughout the 12 species (40).
Our phylogenetic evidence implies that the sHSPs also have a tendency to proliferate, become pseudogenes and vanish. This is supported by the large expansion and multiple sHSP-C that are seen in Ladona and the contrasting absence of sHSP-C genes in some hemipteroid genomes. In addition, the existence of the numerous species-speci c individual sHSPs that are most likely species-speci c duplication events, and the evidence of loss and pseudogenisation in the hymenoptera (Athalia, Atta and Bombus), further supports the theory of birth and death evolution as the main mechanism behind the insect sHSP-C evolution. Finally, we do see evidence of recombination of non-sHSP members of the HSP family into the sHSP-C in insects. In Bombyx HspB1 is an intervening gene in the sHSP-C and in Frankliniella Hsp68 is found immediately adjacent to the genes of the sHSP-C. These data suggest that during the evolution of insects sHSP-C has recombined with other similar regions of the genome and therefore we can not de nitively exclude gene conversion.
Interestingly, our phylogenetic evidence also suggests that genes of the sHSP-C remain linked during the evolution of arthropods and insects. In general, arthropod genomes are subject to rapid changes in genome structure in comparison to vertebrates (41,42) and therefore gene clusters in the arthropods are less likely to remain linked over large amounts of evolutionary time unless there is a selective advantage to do so. For example the E(spl)-C (7,8) evolved through the co-option of genes to genome location that are co-ordinately regulated and remain as a result of functional constraint. However, the sHSP-C genes differ dramatically from E(spl)-C, in that the sHSP-C consists of genes that are paralogous in nature and therefore the evolution and functionality of the sHSP-C is starkly different. The retention of multiple sHSP genes in the genomes of arthropods implies that copy number of the sHSPs is functionally important. Functionally sHsps assemble into large oligomers of 12-32 subunits (43). Thus copy number in the genome might be functionally constrained to achieve e cient production of these large oligomers. Additionally the genes of the sHSP-C do share regulatory sequences (11) and evidence from expression analyses in Drosophila (24), Bombyx and in this study in the honeybee support the fact that under certain environmental circumstances the sHSP-C genes are co-ordinately expressed. Thus it is possible that the sHSP-C genes are maintained due to functional constraint. Indeed, Duncan et al. 2020 show that gene clusters that are differentially expressed in honeybee ovaries in response to the loss of QMP are marked by H3K27me3 and that H3K27me3 pre gures the gene expression changes that occur as the bee responds to this environmental stimulus. Although H3K27me3 enrichment at the sHSP-C does not change between honeybee ovary types, we do see demarcation of the sHSP-C by H3K27me3 at the 5' and 3' ank of the sHSP-C which coincides with predicted CTCF sites. Previous studies have shown that H3K27me3 enrichment is found in broad genomic bands marking regions of silent genes and that expressed genes are found preferentially immediately adjacent to the anks of H3K27me3 enrichment.
Although diminished H3K27me3 (a hallmark of open-chromatin) at the sHSP-C alone is not enough to explain the coordinated regulation of the sHSP-C genes during ovary activation, it does suggest that genomic features consistent with coordinate regulation exist around the cluster. Taken together, the results of this study suggest that the sHSP-C genes are maintained in clusters and undergo birth and death evolution, which is driven perhaps by species or order speci c selection pressures.
What is clear from this study is that the sHSP cluster is uid and less conserved in structure that other insect gene clusters. However, its presence in so many insect genomes, and its conserved coordinate expression and possibly regulation, implies that it is an integral genomic component of environmental response in arthropods.

Phylogenetic analysis
The amino acid sequences of sHSP genes of arthropods were aligned using clustal X v. 1.8 (ref). Using the entire aligned amino acid sequences we reconstructed the phylogenetic relationship of the arthropod sHSP genes using Bayesian methods implemented in MrBayes v.3.1.2. The WAG model (45) of amino acid replacement was used exclusively after testing with mixed models. MCMC chains were run for at least 3000000 generations until average standard deviation of split frequencies became stable below 0.05. The rst 25% of resulting trees discarded as burnin. The resulting phylogenetic trees were visualised using Figtree (46).

Motif analyses
CTCF binding sites were located using the Gene Palette program. The sHSP-C and surrounding genes were loaded into the library as sequence, and the consensus sequence for CTCF (CNNNAGNNGGCGC) (Van Bortle et al., 2012) from Drosophila was added as a feature, not allowing for mismatches. Potential binding sites for Br-C and HSE were located using gene sequences from NCBI Gene (http://www.ncbi.nlm.nih.gov/gene) put into Match-1.0 Public (2005), which uses positional weight matrices from TRANSFAC Public 6.0. Settings were altered to minimise false positives, and to use invertebrate sequences. Potential transcription factor binding sites were located in the 500 bp upstream of the transcriptional start site.
RNA-seq and ChIP-seq analyses RNA-seq and ChIP-seq data was accessed from Gene Expression Omnibus (GEO), reference numbers GSE120563 and GSE52291. Fold change for the honeybee sHSP-C genes, homologous l(2)e and nonlinked sHSP paralog were extracted from the ovary (GSE120563) and larval (GSE52291) RNA-seq datasets. The heatmap was generated in R studio and was used to visualise the expression values and hierarchical clustering of the honeybee sHSP genes. We used bdgcmp within MACS2 (v2.1.1.20160309) (47) to calculate fold enrichment of ChIP samples relative to input (background) at the sHSP-C in queenright, queen-less, and queen ovary samples. The grep command was used within the Linux environment to extract the fold enrichment for H3K27me3 from the sHSP-C and the anking regions. Flanking regions were de ned as ½ the length of the cluster as de ned by the predicted CTCF binding sites (50 kb to the 5' and 3' of the sHSP cluster). To test whether the levels of H3K27me3 enrichment vary across the sHSP-C, we calculated mean fold enrichment across the 5′ anking regions, sHSP cluster, and 3′ anking regions.

RNA extraction and RT-qPCR
Ovary tissue was removed from queen-right worker (n = 20), queen-less worker (n = 10) and queen abdomens (n = 3) on ice in PBS before snap-freezing on dry ice and storage at -80 °C. RNA extraction was carried out on biological replicates (n ~ 5) using Trizol reagent (Invitrogen) and puri ed using RNeasy columns (Qiagen). RNA concentration was determined using the NanoDrop ND1000 spectrometer (NanoDrop). RNA was converted to cDNA using a SuperScript VILO cDNA Synthesis Kit (Invitrogen) as per the manufacturers instructions. For primer design with Primer3 (available online http://frodo.wi.mit.edu (Rozen and Skaletsky 1999) gene sequences for the sHSP-C genes, Gmap and LOC100576174 were obtained from NCBI. Primers were checked for speci city using the Primer BLAST program available at http://blast.ncbi.nlm.nih.gov/Blast.cgi. RT-qPCRs were carried out on a BioRad CFX Real-Time PCR detection system with SsoFast EvaGreen PCR mastermix, 5 ng of cDNA, and 300 nM of each primer. The RT-qPCRs were carried out on three biological replicates with each measurement made in duplicate and analysed using the BioRad CFX (CFX Manager software version 3.1). Gene expression was normalized to the geometric mean of two reference genes Rpn2 and mRPL44. Differences in target gene expression were determined by ANOVA (Analysis of variance) with a Tukey's post hoc test after con rming that the data t a normal distribution using a Shaprio-Wilk test.