Haemolymph profiling reveals altered diapause phenotype.
In total 129 unique multiple-peptide supported haemolymph proteins were identified from the premated and mated queens collected across the six time-points examined (Supporting information Table S1) of which 79 were identified to be statistically significant differentially abundant (SSDA) following multivariate analysis (ANOVA, FDR < 0.05) (Supporting information Table S2a).
To determine whether samples clustered based on life-cycle stage, we performed a principal component analysis (PCA) on Z-score normalised label-free quantification (LFQ) intensities for the 79 SSDA proteins. The first two principal components explained 29.6% and 18.19% of the variance within the dataset, respectively. The first principal component clearly separated early, late diapause and 6 hours post-diapause individuals from the three other time-points (Figure 1) highlighting distinct differences in the haemolymph proteome profiles associated with diapause. As a complementary approach, we performed hierarchical clustering of Z-score normalised LFQ intensities for the 79 SSDA proteins and identified ten distinct clusters of proteins with similar expression profiles (Figure 2; Supporting information Table S3). Two clusters (Clusters B and C) comprise proteins that represent a clear diapause phenotype with proteins of high or highest abundance in the haemolymph of early diapause, late diapause or 6 hours post-diapause queens. An additional cluster (Cluster A; n = 6) consisted of proteins that had the highest abundance during late diapause. There were also two clusters (Clusters G and H) of proteins with reduced abundances during diapause with the largest identified cluster (Cluster G; 27/79 proteins) consisting of proteins with low abundances during late diapause and six hours post-diapause. By 48 hours post-diapause, the majority of these proteins had increased at least two-fold in abundance with levels comparable to pre-diapause haemolymph. Five proteins were identified with elevated abundances post-mating (Cluster D). Cluster E (n = 6 proteins) and cluster F (n = 4 proteins) comprised proteins with the highest abundances in early diapausing queens and 48 hours post-diapause, respectively. An additional cluster (Cluster I; n = 4 proteins) consisted of proteins that had relatively high abundances in gynes, pre- and early-diapausing queens but decreased abundance in the haemolymph of late diapausing queens and at 48 hours post-diapause. The last cluster (Cluster J; n = 3 proteins) consisted of proteins elevated post-mating and maintained at high levels during diapause before a reduction post-diapause.
Comparative analyses of consecutive stages of the queen life cycle.
Pairwise t-tests (p < 0.05) were performed to identify differences in mean LFQ intensities between consecutive time-points which enabled the characterisation of stage-specific changes in the queen life-cycle, including changes in response to mating, diapause entry, as well as diapause termination (Supporting information Table S2b).
a) Comparison of virgin and mated pre-diapause queens: Increased abundance of AMPs in mated queen haemolymph.
We identified ten proteins with significant (two-sample t-test; p < 0.05) abundance changes between the haemolymph proteome of virgin and mated queens (Supporting information Table S2b). Three of these proteins were annotated as putative antimicrobial peptides (AMPs) with the greatest increase evident for hymenoptaecin (ADB29130.1), which was 183-fold more abundant (p < 10-4; Figure 3) in the haemolymph of mated queens. Increased abundances were also observed for defensin (17.9 fold increase; p < 10-4) and abaecin (9.4 fold increase; p < 10-3; Figure 3). Post-mated queens also had increased abundance of a putative esterase FE4 (XP_003397300.2; 7.24 fold increase; p < 10-2) and a putative kappa-theraphotoxin-like protein (XP_020718646.1; 3.48 fold increase; p < 0.05). Five proteins had reduced abundances within the mated queen haemolymph including a BMP endothelial regulator protein (XP_003393207.1), an ecdysteroid regulated 16 kDa (XP_003395667.1), a ferritin subunit (XP_012167824.1), a putative muscle protein (XP_020722973.1) and an odorant binding protein (XP_003397877.1) demonstrating that mating can affect the expression of proteins involved in diverse biological processes.
b) Comparison of mated pre-diapause and early diapause queens: Early diapause phenotype characterised by an increase in cuticular and ejaculatory bulb-specific proteins.
Sixteen proteins changed significantly in abundance between the haemolymph of pre-diapause mated and early diapause queens. There was a significant trend of elevated expression of proteins (n = 13) in the haemolymph of early diapause queens (binomial test, p = 0.02). Of the 13 proteins with elevated abundance, three were annotated as ejaculatory-bulb specific proteins (each had at least >2 fold increase; p < 0.05; Figure 4a), while four were annotated as cuticular proteins, including one protein which was increased 478-fold within the haemolymph of early diapause queens in comparison to pre-diapause queens (Figure 4b). Additional proteins with significantly increased abundances included a serine protease inhibitor (XP_003401209.1), a neurofilament heavy polypeptide (XP_003394564.1), a heat shock protein (XP_003402976.1), a gamma interferon-inducible lysosomal thiol reductase (XP_003397075.2), a putative scavenger receptor protein (XP_012163150.1) and a BMP-binding endothelial regulator (XP_003393256.1). We identified three proteins with significant reductions in the early diapause queen haemolymph, including a putative salivary protein (XP_012167732.1), a prion-like protein (XP_012175279.1) and a hexamerin (XP_003401781.1), which was reduced 23.92 fold.
c) Comparison of early and late diapause queens: Selective reduction in immune expression in late diapause queen haemolymph.
We identified 29 SSDA proteins between the haemolymph of early and late diapause queens (Supporting information Table S2b). The majority of these proteins (n = 21) had lower abundance within late diapause queens, which was a significant trend (binomial test, p < 0.03). Of the 21 proteins whose abundance was reduced at least 1.5 fold in the haemolymph of late diapause queens, 12 had putative roles in immunity, including two pathogen recognition proteins, two immune signalling proteins, five effector proteins, two regulatory proteins and one putative detoxification enzyme.
d) Comparison of late diapause and 6 hours post-diapause queens: Reduced abundance of detoxification and structural proteins in queen haemolymph post-diapause.
We identified 14 SSDA proteins within the queen haemolymph between six hours post-diapause and late diapause (Supporting information Table S2b). Of the nine proteins significantly reduced at least 2 fold within the haemolymph of six hours post-diapause queen haemolymph, three proteins were annotated with putative roles in detoxification (XP_003397315.1: superoxide dismutase; XP_020719829.1: glutathione peroxidase; XP_003394648.1: cytochrome c). Four of the reduced proteins had putative structural roles with two annotated as muscle-associated proteins (XP_012172948.1; XP_020722973.1), one annotated as an actin-depolymerising protein (XP_012174797.1), and one protein annotated as a cuticular protein (XP_003394953.1). The final protein with significantly reduced expression was annotated as an odorant binding protein (XP_003397877.1). Five proteins had at least a 1.3 fold increase in abundance, including an endoribonuclease (XP_012167789.1), a glyoxylate reductase/hydroxypyruvate reductase (XP_012170485.1), an iron-binding ferritin (XP_012167818.1), a putative clotting factor (XP_012175299.1), as well as a hypothetical protein of unknown function (XP_003395010.1).
e) Comparison of 6 and 48 hours post-diapause queens: Recovery in queen reproductive and immune protein expression at 48 hours post-diapause.
The greatest number of SSDA proteins (n = 52) was identified between the haemolymph proteome of queens at 6 hours and 48 hours post-diapause with the majority (n = 32) increasing in abundance in the haemolymph 48 hours post-diapause although this pattern was not significant (binomial test, p = 0.1263). The greatest increase in abundance was seen in a storage protein (hexamerin; XP_003401781.1), which had a 49.8 fold increase. Two other putative reproductive proteins (XP_020718283.1: membrane metalloendopeptidase-like 1; XP_012163499.1: vitellogenin) increased in abundance at least 5-fold in the haemolymph at 48 hours post-diapause. The haemolymph proteome also had increased abundances of proteins with putative roles in immunity (n = 12), muscle-associated proteins (n = 2), olfaction (n = 2), venom (n = 3), as well as proteins of unknown function (n = 5). The remaining proteins were annotated with putative roles in cholesterol metabolism (XP_003395667.1: ecdysteroid-regulated 16 kDa), chitin binding (XP_003393340.1: peritrophin-1), neurotransmitter synthesis (XP_003401022.1: glutamate decarboxylase), as well as gene expression regulation (XP_020722259.1: trithorax).
Twenty proteins had significantly reduced expression in the queen haemolymph at 48 hours post-diapause. The protein with the greatest reduction was a single domain Von Willebrand factor type C domain-containing protein (XP_003399812.1), which had a 22-fold reduction. Certain immune-associated proteins, including two AMPs (ADB29130.1: hymenoptaecin; ADB29128.1: abaecin), two putative serine protease inhibitors (XP_003398424.1: serpin-B3; XP_012169463.2: alaserpin), and a putative anti-viral protein (XP_003399869.1: protein son of sevenless) were reduced in comparison to the haemolymph of queens at six hours post-diapause. A sixth putative immune protein, a leucine-rich repeat domaining protein (XP_003394691.1) also had reduced abundance. Five proteins with putative structural roles were reduced, including three cuticular proteins, a neurofilament heavy polypeptide (XP_003394564.1) and a chitinase (XP_003394129.1). Similar to other time-points, putative olfaction-related proteins were also affected with two proteins (XP_003397885.1: OBP3; XP_012173545.1: PEBIII) reduced within the haemolymph at 48 hours post-diapause. Other proteins reduced at least 2.6 fold included a glyoxylate reductase/hydroxypyruvate reductase (XP_012170485.1), a GILT protein (XP_003397075.2), an esterase (XP_003397300.2), a glutathione peroxidase (XP_003395541.1), a protein of unknown function (XP_003395010.1) and a putative BMP-binding endothelial regulator (XP_003393256.1). Interestingly, this last protein is coded for by a gene that forms part of a novel gene family consisting of five genes coding with proteins expressed at high abundance in the queen haemolymph (Supporting information File S1; Supporting information Table S2b).
Proteogenomic analysis of a novel haemolymph-associated protein family.
It is estimated that more than 40% of genes in sequenced eukaryotic genomes do not have assigned functions [64]. In our analysis, we provide evidence for the expression of 33 hypothetical or uncharacterised proteins in the queen haemolymph, of which 16 changed in abundance over the life stages of the queen. Of most interest, we identified five novel proteins which shared high sequence similarity (minimum sequence similarity of 45.75%; E-value < 1e-58) that were all expressed in the haemolymph proteome. These five proteins were coded for by four individual protein-coding genes suggestive of a previously undocumented gene family (Supporting information Table S4).
These four genes, hereafter known as the highly abundant haemolymph-associated protein (HAHP) family are located within a genomic region spanning approximately 100kb on chromosome one (BG_1; NC_015762.1) in the B. terrestris genome (Supporting information Fig. S1). There is evidence of a fifth protein-coding gene (LOC110119444), but this predicted protein was not identified in the queen haemolymph in our analysis. Although one of the proteins was annotated as a BMP-binding endothelial regulator protein (XP_003393256.1), additional functional domain analysis did not identify any conserved domains present on any of the HAHP family members. At the nucleotide and protein level, we identified high sequence similarity with five putative orthologues identified in the genome of the Eastern bumblebee, B. impatiens. We performed homology searches against other insect predicted proteomes identifying single protein matches in each of the four sequenced honeybee (Apis) genomes. Similarly, functional domain analysis of honeybee species homologues only identified the presence of a predicted signal peptide domain. No homologous sequence was identified for the fruit fly Drosophila melanogaster.
Interestingly, one member of the HAHP family (XP_003393256.1; LOC100652150: HAHP1) displayed an expression profile that suggests a potential role in diapause (Supporting information Fig. S2). This protein has relatively high abundance levels in gynes and pre-diapausing queens and significantly increases in early-mid diapause and 6 hours post-diapause. Protein abundance returns to pre-diapause levels 48 hours post-diapause.
Weak conservation of protein and transcript expression profiles during diapause.
In an attempt to explore the conservation of molecular mechanisms underlying diapause, we compared our proteomic dataset with a previously published transcriptomic dataset of bumblebee queens collected before, during and after diapause [46]. We identified 114 genes (88.4% of genes coding for haemolymph-associated proteins) expressed in the queen fat body at three time-points [46](Supporting information Table S5). Of this number, only three genes (LOC100643414: protein spaetzle; LOC100648549: cytochrome c; LOC100651094: glycine-rich cell wall structural protein) had conserved directional changes in transcript and protein abundance during diapause (Supporting information File S1). The majority differed in direction of expression profile at the transcript and protein level, which was a significant trend (binomial test, p < 0.005) and highlights a weak association between transcript and protein expression. We identified two Gene Ontology terms (GO:0032504, ‘multicellular organism reproduction’; GO:0005615, ‘extracellular space’) as significantly enriched in both the transcriptome and haemolymph proteome (Supporting information Table S6).