Functional annotation of the PRM predicted transcriptome derived from the draft genome
The final assembly of the PRM genome (Accession Number: QVRM00000000.1) contains 7,171 contigs with an N50 value of 278,630 bp and an overall genome GC content of 44.6%. The assembled genome size is ~959 Mb containing 14,608 predicted protein coding genes, for which BLAST hits against the NCBI non-redundant (nr) database (July 2018) were obtained for 13,840 genes (Burgess et al., 2018). The genome assembly is significantly larger than many other mite genomes identified to date, for example those of Tetranychus urticae (90.8Mb), Psoroptes ovis (63.2Mb), Sarcoptes scabiei (56.3Mb), Dermatophagoides farinae (53.5Mb), Varroa destructor (294.1Mb) and Metaseiulusoccidentalis (151.7Mb) and more similar in size to tick genomes, i.e. Ixodes scapularis (2.1Gb) and Rhipicephalus microplus (2.2Gb) where an increase in the degree of non-coding DNA as well as an abundance of repeat sequences have been observed (Ullmann et al., 2005, Burgess et al 2018). Gene Ontology (GO) analysis, performed in OmicsBox (Version 1.3.11, Biobam, Spain) resulted in the assignment of GO terms for 11,624 genes and further functional annotation of 10,914 genes.
Interactive web-based presentation of the entire PRM genome and stage gene expression facilitates interrogation of individual genes and their stage-specific expression profiles
The full annotation of the PRM genome has been made publicly available via the Online Resource for Community Annotation of Eukaryotes (OrcAE: https://bioinformatics.psb.ugent.be/orcae/overview/Degal (Sterck et al., 2012). To maximise the utility of this information for researchers, for each gene we created a gene-specific page, describing the full annotation available for that gene, including information relating to: gene function, GO, Pfam protein domains, protein homologues and significant BLAST hit data, gene structure, coding sequence, protein sequence and, where available, transcript evidence based on associated ESTs/cDNA data. This PRM gene expression atlas also features a fully searchable database of the entire genome assembly as well as incorporating a display of the Illumina gene expression data across the PRM 6 stages (egg, larvae, protonymph, deutonymph, adult male and adult female) as described here. Each gene has been assigned a unique locus identifier with the following format: DEGALXXgYYYYY, where XX defines the scaffold ID and YYYYY denotes the specific location within the scaffold.
Expression profiling of genes across the D. gallinae lifecycle
Multiple collections of stage-sorted D. gallinae mites were pooled and total RNA purified for each stage. Total RNA yields of >7.5μg and RNA integrity numbers (RIN values) of greater >7.2 (range 7.2 -9.4) were obtained for each stage. Illumina sequencing resulted in 42-67 million raw sequence reads for each of the six independent sequencing libraries (one for each of the six D. gallinae lifecycle/adult sex stages). For each stage a set of expression estimates (transcripts per million, TPM) was generated from the trimmed reads, using the transcript quantification tool Kallisto (Version 0.46.2) (Bray et al., 2016) and the predicted transcriptome derived from the PRM genome (Burgess et al., 2018) , with both sequencing depth and gene length considered in the expression estimate. The expression pattern of all transcripts with a read count of >50 TPM is presented in Figure 2 and a clear demarcation of transcript expression between the different stages is apparent. The greatest concentration of highly expressed genes was in the adult females, with some apparent overlap in the expression pattern between adult females and eggs, which could be expected as eggs are also present in the reproductive tract of the adult female.
Network analysis and clustering of stage-enriched gene expression in PRM
The network analysis of the entire PRM transcriptome (Figure 3) was performed in order to determine genes expressed in either single or multiple stages of PRM. Genes sharing similar signalling pathways and biological functions often display similarities in their patterns of expression and therefore regulation (van Noort et al., 2003) and a similar expression pattern across multiple samples may indicate that they could be involved in similar biological processes, i.e. guilt-by-association (Klomp et al., 2012). The PRM lifecycle expression network was generated in the Graphia version 2 package (Freeman et al., 2020) using the count data derived from Kallisto. A Pearson correlation cut-off value of ≥0.97 was applied, resulting in a final gene network containing 13,967 nodes (genes) linked by 45,230 edges. Clustering with a Markov Cluster Algorithm (MCL) cut-off of ≥1.2 resulted in the generation of 44 MCL clusters. MCL clusters sharing similar expression patterns across the stages were further merged, resulting in a total of 17 superclusters (Table 1). The distribution of genes across each MCL cluster and supercluster are shown in Supplementary File 1. The genes within each supercluster were mapped back to the original PRM genome annotation and a Gene Ontology (GO) analysis was performed within the Blast2GO/OmicsBox package to identify associated GO terms for molecular function, biological process and cellular component attributed to each supercluster.
Assessment of the most abundantly expressed genes for each D. gallinae stage
To allow comparison of the most abundantly expressed genes of known function across the PRM lifecycle, we selected the top 100 most highly expressed transcripts from each PRM stage (following removal of transcripts for ribosomal proteins and those with no known function, see Methods section and Supplementary File 2). A six-way Venn/Euler diagram was generated using the top 100 most highly expressed transcripts of known function for each (Figure 4). The transcript identity, associated annotation and expression data (TPM) attributed to each element of the Venn diagram are detailed in Supplementary File 3. The highest numbers of transcripts showing exclusive expression within a specific stage were observed in eggs (n=47) and larvae (n=38); followed by the reproductive adult stages (adult females (n=35) and adult males (n=25)), and finally the feeding juvenile stages, deutonymph (n=7) and protonymph (n=5). To allow comparison of the functions of these highly expressed genes between the individual stages, each transcript was assigned to a broad category indicative of their biological function, which was based on the associated annotation (comprehensive assessment of data from Blastp homology, associated GO annotations and InterPro terms) and is summarised in Supplementary File 4.
Genes most abundant in multiple life stages and sexes
Examining arms of the Venn diagram with transcripts enriched in more than one stage can be informative for identifying genes associated with stage-specific traits. For example, six genes were present in the Venn sector with highly abundant transcripts present in all blood-feeding stages (protonymphs, deutonymphs and adults) (Figure 4). These transcripts are therefore likely to underpin the common parasitic biology and processes that are potentially associated with the acquisition, ingestion and digestion of a blood meal. The two most abundant transcripts in the blood feeding stages were DEGAL6771g00070 and DEGAL6824g00220, with estimated TPM values ranging from 36,322 to 71,157. Both of these transcripts are structurally related to each other and have a functional description of “mucin-peritrophin like salivary proteins”. Both proteins are predicted to be glycosylated, DEGAL6771g00070 contains 3 predicted O-linked glycosylation sites and DEGAL6824g00220 contains 7 predicted O-linked and 2 N-linked glycosylation sites. Also amongst the six transcripts, the protein encoded by DEGAL4040g00020 is a serine endopeptidase belonging to the S1A chymotrypsin family whose members are involved in food digestion, including fibrinolysis (Rawlings and Barratt, 1994). Two serine endopeptidase proteins with multistage sex expression patterns were also identified (DEGAL4040g00010 and DEGAL2792g 00010) that are structurally (56% identity, E<3e-108) related to DEGAL4040g00020. Expanded families of genes involved in feeding-associated fibrinolysis are often found in haematophagous arthropods, and are part of the anti-haemostatic pathways essential for keeping ingested blood in a liquid form to allow access for digestive enzymes (reviewed Martinez-Olivier et al., 2007; Dostalova et al., 2011).
Functional analysis of stage-enriched gene expression in PRM
Eggs
There were 1,052 transcripts with enriched expression in PRM eggs (Supercluster 6; Table 1) amongst these were a number of genes related to egg hatching and embryonic development including histone, histone-lysine N-methyltransferase and histone deacetylase. In addition, multiple copies of genes involved in cytoskeletal development, translation factors and splicing factors were identified in this supercluster (Supplementary file 1). Analysis of the 47 transcripts exclusive to eggs (termed “E47”) when compared to the top 100 most abundant genes of known function in each stage (Figure 4 and Supplementary File 4) underlined the abundance of chromatin remodelling proteins involved in histone deacetylation or ATP-dependent histone interaction. Cellular adhesion proteins with known functions in embryogenesis, including gastrulation, were also present in the egg exclusive transcript set.
Larvae
We identified 1,907 transcripts with enriched expression in PRM larvae (Supercluster 5; Table 1). Of all the mobile stages, larvae contained the highest numbers of genes involved in maintaining the structural integrity of the cuticle (5% of total larval-enriched transcripts). Many of these genes were cuticular proteins (CPs) (10.9, 63, 65, 14 and 14a) which combine with chitin filaments to form flexible or rigid matrixes (Pan et al., 2018). The chitin in arthropod larval cuticles is generally translucent and relatively flexible during this stage and is the base for polymerisation and formation of a ridged sclerotized layer in later developmental stages (Nation, 2008). This sclerotized layer protects the mites from desiccation and mechanical stress, and provides a substrate for muscle attachment (Hackman, 1987; Flynn and Kaufman, 2015). Genes encoding putative allergens, including venom allergen 5, a homologue of the Lepidoglyphus destructor mite allergen 7 like and a house dust mite, Dermatophagoides farinae allergen group 27 like serpin (An et al. 2013) were also enriched in this stage. Initial formation of the peritrophic membrane in preparation for a blood meal is evidenced by a putative peritrophic membrane chitin binding protein largely found in peritrophic matrixes, which contains the chitin binding protein domains IPR002557 and IPR0365508.
Analysis of the top 100 most highly expressed genes in each stage (Figure 4) showed 38 transcripts in larvae, which were not present in the top 100 expressed genes of known function for the other stages (termed “L38” transcripts below) and that the largest functional categories of transcripts in larvae were energy metabolism and cuticle proteins with 6 transcripts in each (Supplementary File 4). The energy metabolism transcripts (mitochondrial ATP synthase and cytochrome c subunits) are all involved in the pathway for the synthesis of ATP and none of these transcripts are truly specific to larvae, still having a high abundance in other stages (though approximately 1-5-fold less in other stages). The expanded category of cuticle proteins is, however, specific to larvae: The transcript DEGAL1578g00100, which is present in the L38 transcripts has 76% identity to the tick (Ixodes scapularis) RIM-36 cement and cuticle-79 proteins (E<9e-40) and a further cuticular protein, represented by the L38 transcript DEGAL2920g00060, has an extended RR1 domain, which is a non-cysteine chitin binding domain (non-cysCBD) typically found in the flexible cuticles of larval/pupal stages of arthropods (Rebers and Riddiford, 1988) and in the soft endocuticle of other stages (Vannini and Willis, 2017). Within the L38 transcripts, there is also a group encoding chitin-binding proteins that are non-cuticular e.g. chitinases, which peak in activity during arthropod ecdysis (Winicur and Mitchell, 1974), lectins and peritrophic membrane proteins, which characteristically contain a cysteine chitin binding domain (cysCBD).
Within the L38 group, transcripts DEGAL3518g00030 and DEGAL6700g00030 encode a 168aa glycine-rich Ctenidin-like protein that has been shown to have antimicrobial properties, specifically against gram-positive bacteria (Baumann et al., 2010).
Protonymphs
Supercluster 4 contained 165 transcripts specifically enriched in the protonymph stage, and was the smallest supercluster representing the six stages. During the protonymph stage, mites become more mobile and actively seek out and acquire their first blood meal, which is required for further development (Pritchard et al. 2015). This increase in activity is reflected by a wider range of receptors sensitive to external stimuli and genes involved in the preparation for, and digestion of, blood meals. Within this cluster, 5 genes were identified belonging to the iGluR gene superfamily. This gene superfamily is ubiquitous amongst arthropods (Croset et al., 2010; Robertson, 2019; Vizueta et al., 2018) and is likely to be the primary modality of olfaction in mite species (Eliash et al. 2017 and Gulia-Nuss et al., 2016). Analysis of the top 100 most highly expressed genes in each stage (Figure 4) showed that a limited number of transcripts, 5 and 7, were exclusive to protonymph and deutonymph Venn clades, respectively; indicating that there are relatively few highly abundant transcripts that have a protonymph or deutonymph specific expression pattern. The TPM values of the transcripts in these two nymph stages all showed multi-stage expression profiles (see Supplementary File 3).
Deutonymphs
Network clustering analysis identified 295 transcripts (Supercluster 3) with deutonymph enriched expression patterns. The expression of genes involved in ATP-binding activity is higher in the deutonymph stage than all other stages, with ~16% of the deutonymph stage enriched genes in the network clustering analysis involved in this process. As the deutonymphs were not sex-sorted in this analysis, some transcripts, which were later demonstrated to be enriched in different sexes in adult PRM are also present in this final pre-adult stage. For example, cathepsin L2 (CatL; n=2), insulin degrading enzyme (n=2), serine protease (n=1), serine/threonine-protein kinase (n=27) are enriched here, but are also expressed in adult females. Transcripts involved in muscle and dorsal formation (e.g. 3 copies of dishevelled-associated activator of morphogenesis 1), cuticle development related genes (n=7), and venom allergens (n=3) are enriched here, but are also expressed in adult males. In addition, this supercluster also contained 9 copies of a highly expressed gene encoding the functionally uncharacterised protein BIW11 with an average read count over 250 in deutonymphs. Analysis of the 7 deutonymph exclusive genes in the top 100 most highly expressed genes in each stage (Figure 4), termed “D7” transcripts here, identified a D7 transcript encoding a calnexin homologue (DEGAL6897g00080), which stores and holds calcium in the endoplasmic reticulum and binds and retains incompletely folded N-glycosylated proteins whilst protein maturation occurs, thus preventing premature destruction of unfolded proteins (Kozlov and Gerhing, 2020). Another D7 transcript (DEGAL5401g00010) encodes a homologue of a perlwapin-like mollusc protein that prevents calcium crystallization (Treccani et al., 2006). It is unclear what the function of this protein may be in a non-mollusc species, but it is interesting to note that the blood calcium levels of adult laying hens is approximately 3-fold higher than in mammals (Johnson, 2000) and this protein may assist in preventing calcium crystallization in the gut, haemolymph or biomineralisation of the cuticle.
Adult Females
Network clustering analysis revealed a total of 2,725 transcripts with adult female-enriched expression patterns, which is the largest stage specific cluster in this study (supercluster 1; Table 1). Genes encoding proteins with roles in oogenesis (vitellogenin 1 (n=2), vitellogenin 2 (n=6), vitellogenin receptor (n=3), and apolipophorins (n=3)) were highly expressed in the adult females. Additional reproduction-related genes were also identified in this supercluster including Beta-1,4-mannosyltransferase/Egh, which is a key component of the oocyte-follicle cell adhesive system; chorion peroxidase; beta-1,3-galactosyltransferase/Brn; peroxidase-like isoform X2 and 3 copies of peroxidase-like isoform X3. Other transcripts represented in this supercluster included: heat shock proteins (HSPs), HSP-binding proteins and antioxidants (e.g. peroxiredoxin 1, glutathione reductase, DNA repair factor IIH helicase subunit XPD, thioredoxin-2 and the hypoxia response element, delta-aminolevulinic acid dehydratase). As one of the feeding stages in PRM, several blood meal digestion and metabolism related transcripts were enriched in the adult females, including two copies each of the proteases cathepsin D and cathepsin L (CatD and CatL). Haem released from the digestion of haemoglobin can be toxic to blood-feeding organisms and transcripts encoding proteins putatively involved in haem-handling in adult females included allene oxide synthase-lipoxygenase protein, 4 peroxidases (2 isoform X2 and 2 isoform X3), 2 cytochrome C, 7 Cytochrome P450, sulphite oxidase and chorion peroxidase (reviewed Whiten et al., 2018). In addition, the insulin-receptor signalling pathway was also highly represented in this supercluster by insulin-degradation enzyme, insulin-like growth factor-binding protein, insulin receptor substrate 1 and large subunit GTPase 1.
Venn analysis of the top 100 most highly expressed genes in each stage (Figure 4) showed that 35 transcripts partitioned in the adult females clade (termed “AF35” transcripts below). The most abundant transcripts in the AF35, with the highest TPM values, represented the vitellogenins (DEGAL5400g00090 and DEGAL3689g00030) and a vitellogenin receptor (DEGAL2803g00030) that are uniquely associated with yolk lipid transport and uptake in the developing oocyst. The largest functional category amongst the AF35 contained 12 transcripts encoding proteins associated with nucleic acid binding, predominantly histones and helicases, one of which (DEGAL1221g00050) was associated with the GO term “gamete formation”. The remaining 5 nucleic acid binding proteins have more diverse nucleic-acid binding descriptions (“Other function”) including: tRNA-splicing ligase, chromatin structure regulation, Argonaute gene silencing, RNA decapping and a Zinc finger transcription factor. Other AF35 proteins likely to be involved in cellular expansion are the 3 alpha-tubulin transcripts (Cytoskeleton category) that are associated with cytoskeleton organisation of the mitotic spindle (Meunier and Varnos, 2012).
The second largest AF35 category contained transcripts associated with arthropod innate defence mechanisms, including those potentially involved in mitigating oxidative stress (HSP70 (DEGAL4639g00020, DEGAL3163g00010, DEGAL6541g00010) and a peroxiredoxin, DEGAL4937g00010) and one potential complement binding protein (DEGAL3914g00030).
Adult Males
Gene ontology analysis of the 292 genes enriched in the adult male supercluster revealed that 43% of these genes were related to metabolic processes (supercluster 2; Table 1). Hydrolases including serine proteases (n = 39) and cysteine proteases (n = 13), were highly represented in the adult males supercluster. Many of these hydrolases are also present in the predicted secretome of PRM (Schicht et al., 2013) and have also been identified as potential allergens (Reithofer and Jahn-Schmid, 2017).
Analysis of the top 100 most highly expressed genes in each stage (Figure 4) showed 25 transcripts in adult males, which were not present in the top 100 for other stages (termed “AM25” transcripts below). Proteolytic enzymes comprise the largest functional category in the AM25, including 6 cysteine-type peptidases and 5 serine endopeptidases. In addition, two transcripts encoding serpins were identified in the AM25 set (DEGAL5529g00010, DEGAL6577g00030) both with the domains associated with Kunitz-type serine protease inhibitors. The most abundant transcript in the AM25 was DEGAL6668g00010 which has a >42-fold increase in relative expression over any other stage. It encodes a Niemann-Pick C2 epididymal secretory protein, which is similar in domain structure to the group 2-like allergens, however, unlike the other group 2 allergens identified in this study (see Allergens section, below), DEGAL6668g00010 lacks a significant homology with the house dust mite (HDM) protein group 2 allergen (E=0.037).
A chitin-binding protein (DEGAL3530g00010) normally associated with peritrophic membrane/matrix was also identified in the AM25 set. Although the transcript for this protein was identified in all blood feeding stages, in adult males its relative expression was 3-fold higher than any other stage.
Genes differentially expressed between D. gallinae stages
In total, 15 pairwise comparisons were performed at two simulation probability cut-offs (P>0.95 and P>0.99) between the different PRM stages as shown in Table 2, resulting in a total of 10,122 (P>0.95) or 6025 (P>0.99) genes that were identified as being significantly differentially expressed in at least one of the selected pair-wise comparisons. The list of all DEGs and their log2 ratio (M value) are displayed in Supplementary Files 5 (P>0.95) and 6 (P>0.99). Here we have focussed on the most biologically relevant transitions or comparisons between stages and sexes, namely: adult females (AF) vs adult males (AM); deutonymphs (D) vs adult females (AF) or adult males (AM); larvae (L) vs protonymphs (P) and eggs (E) vs adult females (AF) at the simulation probability cut-off of >0.99.
Adult females (AF) vs. adult males (AM):
Overall, there were 1,625 genes differentially expressed between AF and AM, and 771 of these were upregulated in AM, whilst 854 were upregulated in AF. Genes with the highest differential expression in AF compared to AM, encoded vitellogenins (DEGAL5400g00090, DEGAL3689g00030); tensin-like proteins (DEGAL2625g00040 and DEGAL2625g00020) and histone-associated transcripts. Two serine protease-encoding genes (DEGAL5835g00120, DEGAL1643g00030) were highly expressed in AF with up to 130-fold change compared with AM. The three genes with the highest differential expression in AM compared with AF encoded a CatL-like protein (DEGAL5935g00010); a legumain-like protease (DEGAL4163g00020), and hypothetical protein BIW11_05264 (DEGAL6170g00010). Overall, proteolysis-related genes were upregulated in AM compared with AF, including 11 transcripts encoding legumain, 24 transcripts encoding CatL, 4 transcripts encoding CatD, and 7 transcripts encoding chymotrypsin-like proteins. Transcripts encoding allergens (see below) were also enriched in AM compared with AF.
Adult females (AF) vs. deutonymphs (D)
The comparison of adult females and deutonymph gene expression can indicate changes involved in sexual maturation from deutonymph to the ovigerous adult female stage. In total, 1,326 genes were differentially expressed between these two stages, and 818 genes were upregulated in adult females, while 508 were downregulated with respect to deutonymphs. The top 3 most upregulated genes in AF encoded vitellogenins 1 (DEGAL5400g00090, 35,929-fold higher expression in AF) and 2 (DEGAL3689g00030, 84,343-fold higher expression in AF) and tensin-like isoform 1 protein (DEGAL2625g00040, 24,080-fold change). The top 3 most upregulated genes in deutonymphs encoded homologues of a kelch-like protein 10 (DEGAL5866g00040), which functions in protein binding, CRE-DIG-1 protein (DEGAL5234g00050), and an uncharacterized protein (DEGAL5253g00030), all with ≥1,940-fold higher expression in deutonymphs compared with adult females. A further group of 15 ATP-related genes were highly expressed in AF compared to deutonymphs, consisting of ATP-dependent RNA helicase (n=5), ATP-binding cassette (n=3), ATPase (n=3), DNA replication ATP-dependent helicase (n=1), ADP/ATP translocase (n=1), Werner syndrome ATP-dependent helicase (n=1) and ATP carrier protein (n=1). Another group of 30 histone-related genes were upregulated in AF, including histone-lysine N-methyltransferase (n=13), H2A (n=4), H2B (n=1), H3 (n=3), H4 (n=1), histone acetyltransferase (n=2), histone chaperone (n=1), set1/Ash2 histone methyltransferase complex (n=1), histone RNA hairpin-binding protein (n=1), histone acetyltransferase (n=2), histone deacetylase (n=1), indicating the potential chromatin regulation and DNA strand compacting during the oogenesis and cellular development of developing larvae contained within the reproductive tract of the adult female. Of 3 transcripts encoding for arginine kinase, two were upregulated in AF, while the third showed higher expression in deutonymphs, indicating that there might be different isoforms or families of arginine kinase regulating phosphotransferase activity in different stages. A further 4 transcripts encoding serine proteinases were upregulated in AF, with up to 122-fold change. Transcripts upregulated in deutonymphs included those encoding a calcium ion binding protein, peflin (n=4) and a kelch-like protein (n=4) functioning in ubiquitination and protein binding.
Adult males (AM) vs deutonymphs (D)
This pair-wise comparison provides information relating to male maturation from the final nymph stage and identified 873 differentially expressed genes of which 606 were upregulated in AM, while 267 were downregulated in deutonymphs. Among AM upregulated genes, the top 3 most differentially expressed genes encoded for homologues of an uncharacterized protein LOC111253214 (DEGAL5539g00020, 2,296-fold higher in AM), cuticle protein 10.9 (DEGAL6018g00220, 1,495-fold higher in AM), and hydrolase activity related pancreatic lipase-related protein 2 (DEGAL7063g00020, 254-fold higher in AM). An additional group of 10 cuticle formation related genes, including cuticle protein 7 (n=4) and 10.9 (n=6) were upregulated in adult males compared with deutonymphs, as were 18 genes representing serine carboxypeptidases and 12 genes representing legumain, which were all highly expressed in AM with up to 188-fold increased expression.
Among the deutonymph upregulated genes, the top 3 highly differentially expressed genes represent homologues of a centrosomal protein of 97 kDa-like isoform X1, which functions in protein binding (DEGAL2866g00020, 99-fold higher in deutonymphs); an organic cation transporter protein (DEGAL3613g00020, 8-fold higher in deutonymphs) and a peptidase activity-related protein: chymotrypsin elastase family member 3B (DEGAL3923g00040, 8-fold higher in deutonymphs).
Larvae (L) vs protonymphs (P)
This stage transition represents the transition from free-living, non-feeding larvae to parasitic protonymphs and we identified 1,352 differentially expressed genes between these stages. Of these, 776 genes were upregulated in larvae with 576 downregulated with respect to the protonymphs. Of the 776 upregulated genes in larvae, the top 3 differentially expressed genes encoded a homologue of a cuticle protein (DEGAL5073g00020, 18,982-fold higher in larvae), endochitinase-like isoform X1 (DEGAL1215g00020, 12,114-fold higher in larvae) and cuticle protein 63 (DEGAL5246g00010, 11,843-fold higher in larvae). A group of 86 genes encoding homologues of cuticle proteins were highly upregulated in larvae. For protonymph upregulated genes, the 3 most differentially expressed genes encoded two homologues of phosphatidylinositol phosphatase (DEGAL1303g00030, 1,039-fold higher in protonymphs and DEGAL1303g00050, 700-fold higher in protonymphs) and a cuticle protein (DEGAL3159g00010, 505-fold higher in protonymphs).
Eggs (E) vs Adult females (AF)
The comparison of gene expression between eggs and adult females is important to clarify which genes are expressed in the tissues of the adult female mite rather than in the eggs, which are contained within. Of a total 1,029 differentially expressed genes, 628 genes were upregulated in eggs with 1,301 genes downregulated with respect to adult females. Of the genes upregulated in the eggs, the top 3 most differentially expressed genes all encoded homologues of uncharacterised proteins; LOC100900955 (DEGAL5376g00010, 57,726-fold higher in eggs) and protein BIW11_01270 (DEGAL4071g00020 and DEGAL3196g00010 with 16,607- and 11,909-fold higher expression in eggs, respectively). Of the 1,301 genes upregulated in adult females, the top 3 differentially expressed genes encoded homologues of vitellogenin 2 (DEGAL3689g00030, 81,556-fold higher in AF), an uncharacterised protein (DEGAL4206g00020, 26,617-fold higher in AF), and vitellogenin 1 (DEGAL5400g00090, 17,360-fold higher in AF). Overall, there were 10 vitellogenin-related genes upregulated in AF compared with eggs; vitellogenin 1 (n=2), vitellogenin 2 (n=3), and vitellogenin receptor (n=5). Histones and histone-related genes were also upregulated in both egg and AF with different fold change levels. For example, the following genes were identified as being upregulated in AF: histone H1 (n=2), H2A (n=1), H2B (n=4), H1/H5 (n=1), H3.3 (n=2), histone-binding protein Caf1 (n=1), histone demethylation protein (n=1), histone-lysine N-methyltransferase (n=5), histone demethylase (n=2), histone deacetylase (n=1); whereas H3 (n=2), histone-lysine N-methyltransferase (n=13), histone acetyltransferase (n=1), histone deacetylase (n=1) were upregulated in eggs.
Dermanyssus gallinae putative allergens
Currently, a total of 39 allergen groups have been classified for house dust mites (HDM) by the WHO/International Union of Immunological Societies Allergen Nomenclature Subcommittee (WHO/IUISAN http://www.allergen.org/) based their predicted immunoreactivity and function. BLASTp homology searching (with a cut off of E<10-05) of the inferred proteome of PRM with selected archetypal mite allergens from the HDMs, Dermatophagoides pteronyssinus, D. farinae and Bloomia tropicalis and the Astigmatid mite Psoroptes ovis identified homologous PRM proteins. Confirmation of the expression of these genes was acquired by analysis of associated transcript data for all homologues. In addition, conservation of functional domains and active sites for the major allergen groups 1 and 2 were confirmed by bioinformatics analysis to remove non-functional homologues and pseudogenes. Using these selection criteria, homologous proteins belonging to 32 of the 39 defined allergen groups were identified in the predicted PRM proteome. The PRM genes representing these putative allergens with the top BLASTp hit against the allergens from the other mite species, and which met the inclusion criteria for each allergen group (as described above) are described in Table 3. No PRM protein homologues were identified for the allergen groups: 5, 7, 19, 21, 36 and 38. In addition, a BLASTp search for Group 17 allergens could not be performed as the sequence was not available in the WHO/IUISAN database or literature. The expression profiles of the genes encoding the top BLASTp hits for the 32 allergen groups encompassed all stages of PRM (Figure 5). The larval life stage had the highest expression levels of the genes encoding the highest numbers of allergen groups (14 allergen groups). The lowest expression of top BLASTp hits to the allergen groups was seen in the protonymph and deutonymph stages.
A single BLASTp hit, that met the inclusion criteria, was identified for 4 allergen groups: groups 4 (alphamaylase), 14 (vitellogenin), 25 (triphosphate isomerase) and 32 (inorganic pyrophosphatase). Multiple related PRM proteins were identified for 31 allergen groups and several were expanded multigene families and contained 11 or more related proteins including: Groups 3, 6 and 9 (serine proteases), Groups 15 and 18 (chitinases) and Groups 1, 8, 28, 29, 33 and 39 representing the cysteine proteinases, glutathione S-transferase, heat shock protein, cyclophilin, alpha-tuberlin 1A and troponin C, respectively. The complete list of related proteins for each allergen that meets the inclusion criteria is presented in Supplementary File 7.