Genome and Metagenome of The Phytophagous Gall-Inducing Mite Fragariocoptes Setiger (Eriophyoidea): Are Symbiotic Bacteria Responsible For Gall-Formation?

Eriophyoid mites represent a hyperdiverse, phytophagous lineage with an unclear phylogenetic position. These mites have succeeded in colonizing nearly every seed plant species, and this evolutionary success was in part due to the mites' ability to induce galls in plants. A gall is a unique niche that provides the inducer of this modication with vital resources. The exact mechanism of gall formation is still not understood, even as to whether it is endogenic (mites directly cause galls) or exogenic (symbiotic microorganisms are involved). Here we (i) investigate the phylogenetic anities of eriophyoids and (ii) use comparative metagenomics to test the hypothesis that the endosymbionts of eriophyoid mites are involved in gall-formation. Our phylogenomic analysis robustly inferred eriophyoids as closely related to Nematalycidae, a group of deep-soil mites belonging to Endeostigmata. Our comparative metagenomics, uorescence in situ hybridization, and electron microscopy experiments identied two candidate endosymbiotic bacteria shared across samples, however, it is unlikely that they are gall-inducers (morphotype1: novel Wolbachia, morphotype2: possibly Agrobacterium tumefaciens). We also detected an array of plant pathogens associated with galls that may be vectored by the mites; a mite pathogenic virus (Betabaculovirus) has the potential to be used in the biocontrol of agricultural pests. Xanthomonas campestris, Rhodococcus fascians, Rhodococcus nr. olei, Erwinia nr. persicina, Clavibacter michiganensis (bacteria), Albugo laibachii (Oomycota), and Erysiphaceae (powdery mildews). Some mite-associated microorganisms (Xanthomonas, Pseudomonas, and Albugo laibachii) can use their host to penetrate through the plant cell walls at the mite feeding site. In return, these microorganisms can help the mite to suppress host plant defenses at early stages of plant colonization. Furthermore, we found a mite pathogenic virus, Betabaculovirus, which is a double-stranded DNA virus, that may have a potential use in the control of agricultural pests.


Introduction
Eriophyoid mites (four-legged mites, gall mites) represent an ancient lineage of common and widely distributed microscopic plant symbionts, with 4,400 nominal species primarily associated with ferns, gymnosperms and angiosperms 1,2 . Some of these mites are of agricultural importance, damaging host plants through feeding, gall-formation, and vectoring plant pathogens [3][4][5][6] . The ability to induce galls in their plant hosts is the most distinctive feature of eriophyoid mites. Galls create a unique niche that ensures the survival and sustainable population growth of mites through their manipulation of the host plant 2 . Many species of eriophyoid mites are gall-forming, indicating that gall-inducement may be a key innovation, enabling these mites to colonize many terrestrial seed plant species. Gall-forming ability is an evolutionarily labile trait and a possible driver of mite speciation 7 . Different mite species, including non-gall-formers and those that produce different types of galls, can co-exist on a single plant host. Therefore, both host speci city and habitat partitioning via gall-inducement effectively increase eriophyoid species richness 7 .
Molecular mechanisms of gall-inducement are well known in bacteria 8 . However, in metazoan organisms, especially in mites and other arthropods, the exact nature of gall-formation is not well understood [9][10][11][12][13] . In many phytopathogenic bacteria, gall-formation ability is attributed to the production of cytokinins; these compounds, in the presence of auxin, lead to cell division and proliferation of plant tissue, resulting in the formation of galls or tumors [14][15][16] . Auxin-like and cytokinin-like activities have been detected in the salivary gland secretions of mites, and these phytohormones are delivered to the host plant during feeding 17 . High levels of auxins and cytokinins have been detected in mite-induced gall tissues 18 . It is likely that mites, like gall-forming insects, nematodes, and fungi, produce cytokinins endogenously via the regular tRNA-ipt pathway, which is present in all cellular organisms except for Archaea 10,19,20 . Gallinducing organisms may also inject effector proteins (produced endogenously) such as CLE and CEP-peptides into their plant host, which are plant hormone mimics 21,22 . However, associated bacteria may also enhance the production of phytohormones by gall-inducing arthropods 23,24 . Here we test the latter hypothesis of an exogenous mechanism of gall-inducement 23,24 . Our assumption is that if bacteria play an important role in gall-formation, a common bacterial species (gall-inducer) must exist across galls harboring the same mite species. Alternatively, if no common bacterial species is found across conspeci c mite samples, then mite-speci c bacteria are unlikely to cause galls or enhance their production. A key to this approach is accurate estimation of relative and absolute abundances of mite-associated microorganisms.
The phylogenetic position of Eriophyoidea on the tree of life is also a long-standing and contentious issue. One hypothesis places them near Nematalycidae (Acariformes: Endeostigmata) 25,26 , a deep-soil, basal acariform lineage, whereas another hypothesis places them within Eupodina (Trombidiformes) 25,27 , a relatively derived lineage that includes inhabitants of soil and plants. De nitive resolution of this question is key for understanding the basal relationships and early ecology of acariform mites, one of the oldest known terrestrial lineages 28, 29 . Inferring an accurate position for Eriophyoidea is essential to ensure the stability of higher-level mite classi cation, since the two major mite divisions, Trombidiformes and Acariformes, are effectively undiagnosable without knowing the true position of Eriophyoidea. However, phylogenomic-scale datasets have only recently become available to answer this question.
Here we test the two aforementioned hypotheses on exogenous gall-inducement and the phylogenetic position of Eriophyoidea using deep, short read sequencing of the mite Fragariocoptes setiger, which causes distinctive galls on leaves of the green strawberry (Fragaria viridis) in Europe (Fig. 1). First, we used several metagenomic analyses to characterize the microbiome composition of the mite from two independent samples. We then used these data to nd a common bacterial species, a potential bacterial gall-inducer. We combined this metagenomic evidence with FISH-hybridization experiments and TEM microscopy. Second, we assembled a whole genome of F. setiger and inferred a phylogenomic tree of acariform mites, including two novel genomes of basal endeostigmatan taxa. With respect to the hypothesis on gall inducement, other interesting research directions, like searching for genomic signatures of gallinducement in mites, will be addressed in separate papers using additional evidence (for example, comparative genomics using a closely related non-gall inducing mite species, namely Fragariocoptes ambulans).

Results
Genome. The mite genomic assembly was 40,873,958 bp in size, had 3,581 contigs-N50=31,600 (Table 1)  protein-coding genes, rDNAs, and tRNAs. Mitochondrial gene arrangement (GenBank accession JAIFTH000000000) was similar to that of other known eriophyoid mites, except the 16S-2S rDNA gene segment was inverted in Fragariocoptes (lineage Phytoptidae), which corresponds to the ancestral chelicerate order, while in other known eriophyoids (all belong to the lineage Eriophyidae s.l.) this segment is not inverted. Phylogenetic analyses. Our phylogenomic analysis shows strong support for Eriophyoidea being part of Endeostigmata ( Fig. 2) (SH-aLRT support=100%, bootstrap support=99%) ( Table 2: detailed mite systematics). Particularly, Eriophyoidea was sister to Nematalycidae (SH-aLRT support=100%, bootstrap support=100%). Inspecting the source gene alignment matrices revealed that the Endeostigmata+Eriophyoidea grouping was supported by many molecular synapomorphies ( Fig. 3a-b). Endeostigmata (including Eriophyoidea) was inferred as sister to Sarcoptiformes (SH-aLRT support=93.2%, bootstrap support=91%). Topology based on the non-curated matrix was essentially the same (not reported further). Table 2. GenBank/SRA accession ids and taxonomic classi cation of taxa used to infer the position of eriophyoid mites on the tree of life. SRA short read data were assembled as part of this project. G=genome, T=transcriptome; "-"=no name at this level of taxonomic hierarchy.
Rhizobium rubi, Erwinia herbicola) were not detected; they only had distant matches to taxa from our datasets ( Table 3). Validation of these results via read assembly  followed by a BLAST search con rmed that these taxa actually belong to different species of mostly free-living bacteria (Table 3). In addition, these taxa were not present in both samples (Table 3).
K-mer-based short read identi cation (Kraken). Kraken decomposes short reads into k-mers of size 35, which allows extremely fast read classi cation against a reference database. Our database included nearly all GenBank genomic sequences, except plants were represented by Fragaria vesca only. For the abundance threshold ≥0.0338% for all bacterial reads, 83 bacterial genera were identi ed in both samples, 28 in sample 1 and 76 in sample 2; 21 genera were shared across the two samples (Fig. 1d). Among the 10 most abundant bacterial genera in each sample, two were shared: Cutibacterium and Wolbachia (Fig.  1e). Detailed taxonomic classi cation and abundance estimates for this and other threshold and no-threshold analyses are given in Supplementary Table S2,  Supplementary Table S3, Supplementary Table S4  Read mapping on marker genes. We identi ed OTUs shared across the two samples using mapping of raw metagenomic reads onto 14 marker, single-copy genes in SingleM 30 (Supplementary Table S5). This method is largely taxonomy-independent and does not suffer from issues related to copy-number variation in ribosomal genes (16S, 23S), plasmids, and transposable elements. This analysis identi ed 16 OTUs present in both samples. Of them, 3 OTUs were found at high abundances (percentages of reads are given in parentheses for samples 1 and 2, respectively): Wolbachia (67.47, 22.65%), Sphingomonadaceae (12.72, 5.02%), and Propionibacteriaceae (4.46, 7.03%) (Fig. 3c). Remarkably, SingleM did not nd Agrobacterium and Pseudomonas in sample 2, and therefore these genera are not present among the 16 OTUs.
Assembly intersection. To detect OTUs common to the two samples, the two NGS assemblies were intersected and shared contigs were classi ed using the BLAST nucleotide database (Fig. 3d). The following four most abundant species were shared between the two samples: Wolbachia sp. were only abundant in sample 2. All other species occurred at low abundances in both samples (Fig. 3d). In addition, a plant pathogenic bacterium, Xanthomonas campestris, was identi ed in both samples at a low abundance (0.01%, 0.09%).
Non-bacterial taxa. Because non-bacterial taxa can also induce galls 8 , we conducted a brief exploratory survey of major viruses and fungi that can elicit gall symptoms in their plant hosts using raw Kraken results with the con dence score set to 0.1 (as in all analyses above) but without an abundance cutoff. In our samples, there were 62 genera of viruses, but Phytoreovirus (causes galls) was absent (Supplementary Table S4). The two most abundant viral genera were Pahexavirus (0.0002%, 0.0012%), which was probably a Cutibacterium acnes phage, and Betabaculovirus (0%, 0.0006%), which probably uses the mite as a host. Among gall-inducing Oomycota, we found Albugo laibachii at a perceptible abundance, especially in sample 2 (0.036%; for comparison, sample 1 = 0.003%) (Supplementary Table S4). The fungus Ustilago maydis (causing corn smut) was found at a very low abundance (0.00006% and 0.00416% in samples 1 and 2, respectively; Supplementary Table S4).
Phytohormones and horizontal gene transfer. The annotated mite genomic assembly revealed no known bacterial/plant genes responsible for the production of phytohormones and enzymes involved in plant growth regulatory metabolism.
TEM observations. We found two endosymbiotic bacterial morphotypes. Morphotype 1 was globular ( Fig. 4i-j), which is consistent with the Wolbachia morphology. However, unlike all known Wolbachia, this bacterium was extracellular ( Fig. 4i-j). Morphotype 2 was rod-shaped and also extracellular ( Fig. 4i,k-l). Both morphotypes were most often closely associated with mite cell-plasma membranes. There were three distinct localizations: (i) around gigantic parenchymal cells (forming the fat body) lled with what is presumably lipid or glycogen vesicles (Fig. 4l); (ii) around and inside the salivary glands (in both cases surrounding the salivary gland cells, rather being inside these cells); (iii) under the mite epidermis, between the cells of underlying tissues (muscles and the fat body) (Fig. 4i). Bacteria were not found in mite oocytes, inside the gut or gut lumen.

Discussion
Our phylogenomic analysis robustly inferred Eriophyoidea as sister to Nematalycidae (Fig. 2), a group of deep-soil, vermiform mites belonging to Endeostigmata. This result is well-supported and provides nearly decisive evidence for the long-standing controversy about the phylogenetic position of Eriophyoidea, which could not be con dently placed within a major mite lineage (see detailed discussion in 25,26  Using comparative metagenomics, we also tested whether gall-inducement in the Fragariocoptes setiger system is of a bacterial nature. To nd a potential gall-inducer, two independent metagenomes (sample 1 and 2; Fig. 1) of the gall-inducing mite were analyzed. We conducted several metagenomic analyses, each using a different methodology: Gall-ID (comparison with known gall-inducers), Kraken (k-mer-based classi cation using nearly the entire GenBank nucleotide data as the reference database), SingleM (comparison with 14 single-copy bacterial genes), and BLAST (classi cation of the intersection of the two metagenomic assemblies). Below we discuss the results of these analyses and then provide a synthesis of these data before giving concluding remarks.
Mapping reads of known gall-inducing bacteria in Gall-ID identi ed several potential candidates: Agrobacterium tumefaciens (99.8-100% match for 16S in both samples) and Rhodococcus fascians (16S match 99.7%, sample 2 only) ( Table 3); there were also matches with Pseudomonas savastanoi and Erwinia herbicola, but these matches were not con rmed by validation analyses (Table 3). Agrobacterium tumefaciens is a common and widespread bacterium responsible for formation of crown galls in the rhizosphere of various plants, but only strains containing a tumor-inducing plasmid (Ti plasmid, pTi) are virulent. We found no evidence for the presence of a complete Ti plasmid, especially its functionally important virulence genes Vir 36 . Other genes that may be associated with tumor-inducement pathways and encoded on the Ti plasmid 37 were only found in sample 1: nopaline permease ATP-binding protein gene and a substantial portion of Type IV secretion system components (Table 3). However, these genes may be encoded on other plasmid types occurring in nonvirulent bacterial strains 38 . Because crucial components of the Ti plasmid that are functionally important for tumor-formation (Vir genes) are lacking and A.
tumefaciens is commonly found on healthy plants, either externally 39 or internally 40 , this bacterium probably does not use the classical Ti-plasmid inducement pathway in our system, but its role in gall inducement using a different pathway cannot be completely excluded (see also TEM microscopy and FISH experiments below). Rhodococcus OTUs with 97.7% (sample 1) and 99.9% (sample 2) identity to Rhodococcus fascians had unequal 16S abundances (0.18x10 − 6 and 3.58x10 − 6 of all reads in samples 1 and 2, respectively). Given the very low abundance of Rhodococcus in both samples (Fig. 1e) and a 2.1% difference in their 16S rDNA genes, we believe that it is unlikely that this bacterium has a biologically important role in our system. The genus Pseudomonas can colonize a wide range of ecological niches 41 ; as a plant pathogen it can cause tumorous overgrowths (knots), cankers, foliar necrosis, and bacterial blight [42][43][44][45] . Knot-inducing pathovars encode genes related to indole acetic acid, cytokinins, rhizobitoxine, bacteriophytochrome, and others 42,46 . Our Gall-ID analyses identi ed the 16S gene of Pseudomonas savastanoi in both samples, albeit with mismatches with the reference sequences ( Table 3). Validation of these data via a separate BLAST search did not con rm the presence of Pseudomonas savastanoi; instead, two different species were identi ed, P. yamanorum (CP012400.2) and P. sp. DHXJ03 (JN244973.1) in samples 1 and 2, respectively ( Table 3). None of these species are known to induce galls. Gall-ID also identi ed the ISEhe3 insertion element of Erwinia herbicola in sample 1 only (Table 3). This bacterium is a widespread epiphyte on many different plants, also occurring in other habitats, such as seeds, water, humans, and animals 47 . Several plant tumorigenic strains of E. herbicola have been identi ed; all carry a pPATH pathogenicity plasmid encoding virulence genes 48 . ISEhe3 and other insertion elements are also present on the plasmid of plant-pathogenic strains, which suggests that these elements could participate in the evolution of the pPATH plasmid 48 . Validation of these data yielded a 98.5% match with Erwinia persicina, a bacterium which is known to be plant pathogenic but not gall-inducing 49 . The ISEhe3 insertion element was not detected in sample 2, indicating that Erwinia is probably not responsible for gall inducement in our system.
Among the 10 most abundant genera identi ed by Kraken in each sample, only two were shared: Wolbachia and Cutibacterium. The latter genus was represented by Cutibacterium acnes. This bacterium is associated with the human skin, and is a widespread contaminant of DNA extraction kits 50 ; we consider its presence as a likely artefact. With respect to the well-known gall-inducers discussed above, our analysis showed highly uneven or low abundances in samples 1 and 2: Agrobacterium (11.64, 0.18%), Pseudomonas (64.23, 1.34%), Rhodococcus (0.06, 2.04%), Erwinia (0.04, 0.05%) (Supplementary Table S2; Fig. 1e). These data, therefore, agree with our conclusion that these bacteria (except probably Agrobacterium) do not play an important role in gall-formation in our system (see above). Below, we brie y discuss several other bacterial genera from our samples that have gall-inducing species. A novel species of Wolbachia was common among the two samples, occurring at 16.7% (sample 1) or 9.7% (sample 2) of all bacterial reads ( Fig. 1e; Supplementary Table S2). It has been hypothesized that Wolbachia is used by caterpillars of a leaf-mining moth to produce green islands in yellowing leaves, which act as sinks for nutrients 51 . Manipulation of cytokinin levels by the endosymbiotic bacterium was suggested as the cause of green-island formation 24,51 . However, the exact molecular mechanism is not known and co-phylogenetic evidence indicates that the correlation of Wolbachia and the 'green-island' phenotype is high but not absolute 52 . Wolbachia associated with root-feeding insects can lower plant defenses 53 , and mites may use this property to invade new host plants.
Xanthomonas was found at low abundances, 0.4 and 1.7% of all bacterial reads in samples 1 and 2, respectively. This bacterium interacts with the host plant by using a type III secretion system (T3SS) to secrete an array of effector proteins. Virulence factors include lytic enzymes that attack the plant's cell wall, in addition to proteases, amylases, cellulases and lipases that help lower the plant's defense mechanisms 54 . It would therefore be interesting to further explore whether eriophyoid mites can use associated bacteria, such as Xanthomonas or Pseudomonas, to suppress host plant defenses at early stages of plant colonization 54,55 , while bacteria can use mites to penetrate through the plant cell walls at the mite feeding site. Furthermore, the following four OTUs were identi ed by Kraken at low abundances, 0.00001-0.15%: Paraburkholderia, Rhizobacter, Frankia, Phytoplasma (sample 2 only) (Supplementary Table S4).
These bacterial genera include gall-inducing species 56-60 , with Phytoplasma being unique as it can replicate intracellularly both in plants and their insect vector hosts, while other bacteria can replicate only in plant cells 60-62 . Based on their extremely low and uneven abundances, it is unlikely that these OTUs are responsible for gall-formation in our system.
In addition to the above Gall-ID and Kraken analyses, we also ran SingleM (Fig. 3c) and assembly intersection analyses (Fig. 3d). These analyses returned similar but not identical results (Fig. 3c,d). First, unlike Gall-ID and Kraken, these analyses largely do not rely on existing taxonomy to identify OTUs shared across samples and, therefore, may be more accurate with respect to organisms having no sequence data in GenBank. Second, the differences can also be attributed to disparate underlying methodologies used by these analyses, data complexity, and the uneven coverages of the two datasets. For example, in Bacteria, rDNA may have multiple copies per genome (e.g., Agrobacterium), resulting in higher coverages, and therefore affecting both k-mer-based and assembly-based methods. The assembly of rDNA reads may also be positively biased due to rDNA sequence conservatism also affecting assembly-based methods. These issues are exaggerated if closely related species are present (e.g., Pseudomonas, Agrobacterium in our samples). Furthermore, k-mer-based (Kraken) and assembly-based methods may be affected by the presence of plasmids and mobile elements, which may have multiple copies in the genome (thus a species abundance can be overestimated) and may be shared across species (thus creating spurious classi cations when based on reference sequence databases). Our SingleM analyses, relying on single-copy protein-coding genes, did not detect low abundance taxa, such as Agrobacterium and Pseudomonas, in sample 2 (Fig. 3c), while other analyses, using rDNA among other sequence data, were able to detect these taxa (Fig. 1e, Fig. 3d). In other words, differences between various metagenomic analyses conducted here are expected, and we consider our results to be complementary to each other.
A comparison of our metagenomic results, FISH and TEM microscopy suggests that Wolbachia was the only abundant OTU shared across the two samples. The substantial abundance of this bacterium points to a functional importance for its mite host. Some Wolbachia are known to be bene cial to nematode or insect hosts 63 and it is likely that this is also the case here. Wolbachia is not known to induce galls but was suspected of manipulating cytokinin levels 24,51 (see above). The Fragariocoptes endosymbiotic Wolbachia is a novel and very divergent species, with a substantial average nucleotide difference (20.7%) with respect to other known Wolbachia. For this reason, it may have unexpected properties, including gall-inducement. Additional experiments would be required to con rm this hypothesis. Our FISH experiments and the metagenomic analyses (sample 1) suggested the presence of Agrobacterium tumefaciens (morphotype 2), which is a major inducer of crown galls (Fig. 4b,d,e,g). This is also an unexpected result since its intimate association with arthropod hosts has not been documented in the literature so far, except for a single study that provided experimental evidence that this bacterium can be vectored by an insect 64 . Both FISH and TEM microscopy identi ed a rod-shaped endosymbiotic, extracellular bacterium that characteristically surrounds gigantic parenchymatic mite cells and congregates in intermuscular spaces, especially around salivary gland cells (Fig. 4d,e,g,h,k,l). The abundance of this bacterium and its characteristic distribution inside the mite indicate a strong biological association with the mite. Gall-inducement by this bacterium cannot be excluded with data at hand, and further work is needed to evaluate this possibility. Given the incomplete Ti plasmid and the substantial abundance of Agrobacterium tumefaciens in sample 1 (see above), we cautiously suggest that a role of this bacterium in gall-formation in our system is unlikely and needs to be further evaluated. A similar conclusion of no bacterial involvement in gall formation has been recently made for insect gall inducers 65 In conclusion, here we use comparative metagenomics to test the hypothesis suggesting that a bacterial symbiont can be involved in gall-inducement in eriophyoid mites using two independent samples from the mite Fragariocoptes setiger. We found a novel bacterial species of Wolbachia shared across all analyzed metagenomic samples of the gall-inducing mite. Another endosymbiotic extracellular, rod-shaped bacterium (morphotype 2, possibly Agrobacterium tumefaciens) was also detected, and based on its distribution inside the mite, it appears to form a biologically important association with the mite. Although we were able to demonstrate the presence of the two potential candidates we suggest that they role in gall-inducement is unlikely.  Gall-ID. For identi cation of gall-inducing bacteria in samples 1 and 2 using raw reads, we used SRST2 72 and Gall-ID databases 73 , with the minimum gene coverage parameter set to 50% and maximum divergence parameter set to 10%: srst2 --input_pe $input_ le_reads_forward $input_ le_reads_reversed --max_divergence 10 --min_coverage 50 --log --output $out --gene_db $input_ le_gene_db --threads $proc --report_all_consensus. Gall-ID databases have either functional genes known to be part of gall-inducement pathways or house-keeping genes (16S rDNA) that can be used to identify gall-inducing bacteria. Since the use of a majority rule consensus sequence (the Gall-ID default) is unreliable in the presence of multiple similar bacterial species, we also conducted validation of our Gall-ID results: (i) reads mapped on target genes in the Gall-ID databases were extracted (samtools fastq − 1 forward.fq -2 reverse.fq -s singletons.fq -0 other.fq in.bam), (ii) and assembled in SPAdes (for PE reads: spades.py -1 forward.fq -2 reverse.fq -s singletons.fq -t $proc; for SE reads: spades.py -s $extracted.reads.fq -t $proc -k 127), (iii) SPAdes contigs were then classi ed by BLAST. Kraken report les to MetaPhlAn format and then combined these converted les as follows: kreport2mpa.py -r $kraken_report -o $kraken_report.mpa; combine_mpa.py -i kraken_report1.mpa,kraken_report2.mpa -o kraken.mpas.combined.txt. To estimate relative abundances, we also tried Bracken 76 , using the read length value as appropriate, 150 bp (sample 1) and 250 bp (sample 2). However, this analysis produced spurious results, e.g., the read proportion for Enterobacteriaceae were seemingly overestimated: 310.6 times (Salmonella) and 299.2 times (Escherichia) higher in sample 1 as compared to the Kraken data. Abundances of these taxa were also substantially overestimated in sample 2. Since these unusually high abundances were not supported by any other analyses (Kraken, singleM, BLAST, see below), we do not report Bracken analyses here.
Our initial Kraken analysis yielded a large number of OTUs, suggesting that many reads were probably overclassi ed 75 . For example, there was a total of 1,124 genera, including 975 bacterial genera. We believe that such a large diversity is biologically unrealistic and we used a combination of Kraken con dence ltering (0.1, see above) and an abundance cutoff (≥ 0.0338%) as suggested in the literature 75 . For comparison, we also ran an analysis with a lower abundance cutoff (≥ 0.0005%).
To calculate the taxonomic intersection (shared OTUs in samples 1 and 2), bacterial genera with a fraction of reads ≥ 0.0338% at least in one sample were selected, and then these data were used to create Venn diagrams (OTU counts) and abundance heatmaps. For Bacteria, this analysis yielded a total of 83 genera in both samples and 21 genera in the sample intersection, which we consider biologically meaningful, and so we present this as our main result. These data were clustered based on Euclidean distances and visualized as heatmaps in TBTools 77 . For comparison, abundance heatmaps were also generated for all organisms using a lower abundance cutoff value (≥ 0.0005%), yielding 171 classi ed genera.
Unique representative sequences (used as "OTUs" in SingleM) shared across the two samples were ltered. This dataset was used to construct a heatmap (see the next subsection), where: (i) percentages of the average read counts across the 14 marker genes were used for each OTU in both samples; (ii) gene count, which is indicative of the data completeness, was recorded and visualized on the heatmap.
Identifying common OTUs across samples: Assembly intersection. Common OTUs present in the two NGS samples could also be identi ed via read assembly for each sample followed by assembly intersection. Intersection was done by standalone BLAST where sample 1 contigs were the query and sample 2 contigs were the subject. Matches having ≥ 98% similarity and bitscore ≥ 500 were then classi ed by BLAST. OTUs having > = 96% similarity with GenBank nucleotide (nt) database were classi ed at the species level, while all other matches were classi ed at the genus level (top hits were reported in case of multiple matches). This approach generated a subset of contigs shared between the two assemblies, and these contigs were identi ed by their sequence similarity (not by taxonomic labels). This methodology effectively minimizes the effect of high misclassi cation rates due to incompleteness of the GenBank databases 78 . For each contig, read-based coverage was calculated in bbduk (bbmap.sh ref=$ref in1=$f1 in2=$f2 covstats = covstats.txt), and the number of mapped reads per classi ed OTUs were recorded. To minimize the in uence of rDNA (which may have multiple copies per genome), plasmids, and mobile/transposable elements (which may occur in multiple species), BLAST results were checked and edited. Final read count data were normalized by calculating read percentages. It is important to emphasize that this procedure was done only for OTUs present in the intersection (so these data can only be interpreted in the context of the subset of OTUs present in both samples). Then these data were log 2 -transformed, and a heatmap was constructed in TBtools v.1.0 77 . In this heatmap, each OTU was labeled with (i) an intersection bitscore, which is indicative of the magnitude of common matches across the two metagenomic samples for a given OTU, and (ii) average percent identity with the closest GenBank matches. Clustering was done using Euclidean distances. We did this analysis using only bacterial taxa. Among fungi, some of which also can cause galls, we found only a single dominant taxon, the family Erysiphaceae (powdery mildews, which do not produce galls). Best matches were Cystotheca wrightii AB120747.1 (18S identity = 100% bitscore = 813) followed by Podosphaera pannosa AB525937.2 (rDNA identity = 99.88% bitscore = 1,578) and Podosphaera leucotricha JAATOF010000279.1 (mt-DNA identity = 99.9%, bitscore = 5,723).
Declarations Figure 2 Relationships of parasitiform and acariform mites. Phylogenomic inference was undertaken using a Maximum likelihood framework in IQ-TREE based on 90 orthologous proteins.
Bacterial OTUs shared between samples 1 and 2 as identi ed by mapping metagenomic reads onto a set of 14 single-copy genes in singleM (c); for each OTU, a gene count returning matches in both samples is given. The heatmap gives read percentages in the intersection, while its colors are based on log2 of these values. Normalization was done only for OTUs present in the intersection. Bacterial OTUs shared among samples 1 and 2 as identi ed via intersection of two metagenomic assemblies (d); for each OTU, intersection bitscore, and average sequence identity (BLAST using GenBank nucleotide database) are given. The heatmap gives normalized read percentages, while its colors are based on log2 of these values. Normalization was done only for OTUs present in the intersection. Only alignments having a bitscore ≥1500 are shown.