Analyzing indoor mycobiomes through a large-scale citizen science study of houses from Norway

Background: In the built environment, fungi can cause important deterioration of building materials and have adverse health effects on occupants. We here investigate what are the most important environmental variables (climate, building features or occupant characteristics) driving the indoor mycobiome structure and diversity in houses. We used Norway as study area since it spans a wide range of environmental gradients such as temperature and precipitation. We organized a large citizen science sampling campaign including 271 houses and 807 dust samples collected from three different house compartments: outside of the building, living room and bathroom. Fungal community composition and diversity were determined by DNA metabarcoding, and correlated to the environmental variables by multivariate statistical analyses. Results: The fungal community composition was clearly different between indoor and outdoor samples, but there were no signicant differences between the two indoor compartments (i.e living room vs. bathroom). The 32 selected variables, related to the outdoor environment, building features and occupant characteristics, accounted for 15% of the overall variation in community composition, with the house compartment as the most important factor (7.6%). Next, the regional-scale climate was the main driver of the dust mycobiomes (4.2%), while building and occupant variables had signicant but minor inuences (1.4% and 1.1%, respectively). The house dust mycobiomes were mainly dominated by ascomycetes ( (cid:0) 70%) with Capnodiales and Eurotiales as the most abundant orders. Compared to the outdoor, the indoor mycobiomes showed higher species richness, which is probably due to the mixture of fungi from outdoor and indoor sources. The main indoor indicator fungi belonged to two ecological groups with allergenic potential: xerophilic molds (mainly Penicillium and Aspergillus) and skin-associated yeasts (Malassezia, Debaryomyces, Candida and Rhodotorula). Conclusions: We show that indoor mycobiomes represent a mixture of fungi from indoor and outdoor sources, and that a multitude of indoor and outdoor variables structure indoor mycobiomes. We also demonstrate that citizen science is a successful and effective strategy to obtain samples for characterizing the indoor microbiomes at large geographical scales.

important driver. Next, we ask (ii) which fungi dominate the house-dust mycobiomes in Norway? We hypothesize (H2) that ascomycetes, and especially stress-tolerant ascomycetes are the dominant groups in this environment. We also ask (iii) how much of the indoor mycobiome overlaps with the outdoor mycobiome? Related to this question, we hypothesize (H3) that a major fraction of the indoor fungi derives from outdoor sources, while a relatively minor fraction originates from indoor sources.

Results
Citizen scientists were recruited through scienti c networks and diverse actions in social and public media. The volunteers provided dust samples and information (metadata) about their houses (Fig. 1a). Following our instructions, dust samples were swabbed from the upper doorframes of three house compartments: outside of the building (main entrance), living room and bathroom. Doorframes act as passive collectors of dust deposited during an unknown amount of time. The dust coverage in the living room was measured in parallel, using one adhesive tape (Mycotape2; Additional le 1: Fig. S1), and included as a variable. To minimize the in uence of seasonality effects, all samples were collected in a short time span during spring 2018, mainly in May. In addition to the building features and occupant characteristics provided by the volunteers, variables related to the outdoor environment (climate, topography, geology and land use), speci c to each house, were obtained. In total 269 houses were sampled from mainland Norway, covering its major climatic gradients (Fig. 1b). Two houses from Longyearbyen, in the Arctic Archipelago of Svalbard, were also included. To characterize the dust mycobiomes in houses across Norway, we carried out DNA metabarcoding analyses of all collected samples sequenced on Illumina MiSeq platform after ampli cation of the internal transcribed spacer 2 (ITS2) region.

Data features and overall fungal diversity
After quality ltering, denoising and sequence clustering, the nal dataset contained 7,110 operational taxonomic units (OTUs) (22,622,391 reads), distributed among 810 dust samples from 271 houses. The number of reads per sample varied widely, from 1,367 to 245,588 (mean = 27,929), and the number of OTUs per sample (richness) ranged from 7 to 867 (mean = 270) (Additional le 1: Fig. S2). Likewise, the abundance of OTUs varied extensively, from 2 to 2,040,802 reads per OTU (mean = 3,182), while their occurrences ranged from 1 to 807 dust samples (mean = 31). For further statistical analyses, we resampled the data to a relatively low sequencing depth (2,000 reads per sample) in order to keep the majority of samples (three samples were excluded) and houses, representative of a wide geographic area. The rare ed dataset contained 6,632 OTUs distributed across 807 samples.
The average richness and diversity (Shannon) per sample were signi cantly higher in the indoor compared to outdoor samples (Fig. 2a). The evenness as well as other diversity indices (Simpson and Chao1) followed a similar trend (Additional le 1: Fig. S3). The two indoor compartments, i.e. living room and bathroom, had similar levels of richness and diversity. Sample origin (indoor vs. outdoor) was the strongest predictor of fungal richness according to generalized linear model (GLM), (p = 7.48e-05). Several other variables (mean temperatures of warmest and driest quarters, temperature seasonality, annual precipitation, snow covered area in February, latitude and number of children) signi cantly improved the AIC of the null model (p < 0.05). However, taken together with the strongest predictor (indoor vs. outdoor), they showed no signi cant effect on model outcome. Houses from higher latitudes (northern Norway) possessed on average higher fungal diversity (Shannon) compared to houses in the south (Additional le 1: Fig. S4). In contrast to richness and diversity, the turnover in species composition (beta diversity) was higher among the outdoor samples (Fig. 2b).

Factors driving the community composition
We observed a marked compositional difference between indoor and outdoor mycobiomes, as revealed by NMDS ordination of all dust samples (Fig. 3a). House compartment (outside, living room or bathroom) was the key factor structuring the fungal community composition, accounting for 7.66% of the overall variation (variation partitioning analysis; Fig. 4). While, there was no difference between the two indoor compartments, living room versus bathroom (Fig. 3a). Noteworthy, only a relatively low proportion of the variation in fungal community composition was explained by the assessed factors (Table 1), altogether accounting for about 15% of the variation (Fig. 4). Climatic variables were, next to house compartment, also of importance for the fungal community composition in the dust samples; as seen in the ordination plot. Various climatic variables were correlated with the second ordination axis (Fig. 3b, Additional le 1: Table S2). Together, climatic variables accounted for 4.18% of the variation among all dust samples, which increased to 5.54% for the outdoor samples when analyzed separately (Fig. 4). The four most important climatic variables were annual temperature variation (Temperature seasonality; BIO4), mean temperature of warmest (BIO10) and driest (BIO9) quarter, as well as annual precipitation (BIO12) ( Table 1). There was a clear geographic signal in the fungal community composition. This was especially the case for the outdoor samples, but also, to a lesser extent, for the indoor samples (Additional le 1: Fig. S5a and c, Table S2), which again relate to the regional climate variability in the study area ( Fig. 1). Building features and occupant characteristics could only account for 1.44% and 1.11% of the overall variation in fungal community composition. Their contributions increased to 2.1% (building features) and 1.94% (occupant characteristics), respectively, when analyzed on indoor samples exclusively (Fig. 4, Table 1). Interestingly, the more occupants staying in the houses, the more similar the indoor samples were to outdoor samples in fungal community composition (Fig. 3b).

Dominant fungi in house dust
The taxonomic assignment for the most abundant fungi detected in dust samples is shown in Fig. 5 and Table S1 (Additional le 2: Top-200 most abundant OTUs). From the nine phyla identi ed, Ascomycota dominated in both indoor and outdoor samples, including on average 70% of the sequences per sample, while Basidiomycota made up around 25% (Fig. 5a). The third most abundant phylum was Mucoromycota, showing higher percentages of sequences in the indoor samples (2.1% living room and 1.5% bathroom), compared to outside (0.3%). Six other fungal phyla were detected in much smaller proportions (< 0.1%) and with more limited distribution, sorted by decreasing abundance: Mortierellomycota, Olpidiomycota, Chytridiomycota, Rozellomycota, Entomophthoromycota, and Entorrhizomycota.
As seen from the OTU ordination plot in Fig. 3c, there was a broad-scale structuring of the major taxonomic groups. Both for the indoor and outdoor samples, ascomycetes were in general more associated with areas with higher precipitation, lower mean temperature of warmest quarter and low degree of seasonality in temperature, while the basidiomycetes show an opposite pattern ( Fig. 3b and c). Hence, basidiomycetes were to a larger extent associated with more continental climates (warm summers, high annual temperature variation and low precipitation) and the opposite for the ascomycetes.
Looking at the taxonomic composition at the order level, there were some marked differences between indoor and outdoor samples (Fig. 5b). Eurotiales, the most common order, including 20.8% of the total sequence count, was far more abundant in the indoor compartments (30.5% in living rooms, and 25.2% in bathrooms) than outside (6.5%).
The same trend occurred for Saccharomycetales, Agaricales, Helotiales, Malasseziales and Mucorales. In contrast, Capnodiales, Pucciniales, Lecanorales and Chaetothyriales were clearly more abundant in the outdoor samples.
Like for the order level, there were also clear trends for the most common genera (Fig. 5c): Penicillium, Aspergillus, Saccharomyces, Malassezia and Botrytis were far more abundant indoors, while Cladosporium, Thekopsora, Verrucocladosporium, Scoliciosporum and Hypogymnia were more abundant outside the buildings. Cladosporium was the overall most abundant genus, representing 13.3% of all sequences, which mostly correspond to the most abundant OTU (OTU1 with 12.6% of total sequences; Fig. 3d). In addition, the yeast genera Malassezia and Aureobasidium were particularly abundant in bathrooms.

Indoor versus outdoor mycobiomes
A large proportion of the fungi (36.3% OTUs) were present in all three house compartments, and 50.6% of the OTUs were shared between indoor and outdoor compartments ( Fig. 6a; left). However, after excluding low-abundance OTUs (with < 10 reads per sample), only 27.4% of the OTUs were shared between outdoor and indoor samples ( Fig. 6a right), indicating that the relatively high overlap was largely driven by rare fungi. In addition, comparing the overlap on a house-by-house basis revealed that only 15% of the OTUs on average were shared between outdoor and indoor samples, while 75% of the OTUs appeared uniquely in one of the house compartments (Fig. 6b). Based on a GLM analysis, none of the assessed variables signi cantly explained the varying degree of overlap in community composition between indoor and outdoor compartments (p > 0.05).  * Top-20 most abundant fungi, also detailed in the ordination plot for OTUs ( Fig. 3c and d).
b Relative abundance as proportion of the total number of rare ed reads.
c Percentage of study houses (n = 271) where the indicator species were detected, considering the three compartments. * Top-20 most abundant fungi, also detailed in the ordination plot for OTUs ( Fig. 3c and d).
b Relative abundance as proportion of the total number of rare ed reads.
c Percentage of study houses (n = 271) where the indicator species were detected, considering the three compartments.

Discussion
Determinants for the indoor dust mycobiome This is the rst study about indoor mycobiomes at a large geographical scale in Europe. We analyzed dust samples from 271 houses throughout Norway to investigate which factors are shaping their mycobiomes. From other studies [12,18], it is known that outdoor air is a major source for the indoor mycobiomes, otherwise, there is little knowledge on contributing intrinsic and extrinsic factors. We therefore inferred metadata related to outdoor conditions, building features and occupant characteristics to identify the drivers of fungal diversity and composition. The relatively low proportion (15%) of the overall community composition variation explained by the assessed variables, is a common trend in most fungal community studies, as fungal communities to a large degree are assembled through stochastic processes related to spore spread [32]. Although accounting for a small part of the explained variation, the PERMANOVA revealed that all variables were signi cantly correlated with the community composition. Remarkably, for the indoor samples, all variables showed slightly higher contribution than for the combined indoor and outdoor dataset. As expected, the contribution of the climatic variables, as well as those re ecting other outdoor characteristics (geography, topography and land use), was considerably higher for the outdoor samples.
The fungal community composition in house dust was clearly different between indoor and outdoor samples. After accounting for the key effect of the house compartment (7.66% of the variation), our results corroborated the rst hypothesis (H1) that regional-scale climate is the most important driver of the mycobiome (4.18%), while building and occupant factors have signi cant in uence, but to a much lesser extent (1.44% and 1.11%, respectively). These ndings are in agreement with previous mycobiome studies in the built environment [12,15,16,18,19]. Amend et al. [16] rst suggested that large-scale (extrinsic) factors are driving the fungal composition in buildings, rather than speci c building features. Likewise, Barberán et al. [18] reported that climatic variables (particularly mean annual temperature and precipitation) were the best predictors for indoor mycobiomes across North America, explaining 14% of the variation in indoor mycobiomes, in contrast to 5% explained by building features. They also identi ed continental-scale geographic patterns explained by climate, soil and vegetation variables [19]. In our study, the climatic variables temperature seasonality, mean temperature of warmest and driest quarters, showed better explanatory power than annual precipitation and annual mean temperature. Likewise, a recent meta-study, focused on soil mycobiome data across the world (36 studies covering 3370 soil samples), identi ed climate as the key driver of the global distribution of common fungi, as well as fungal diversity and community composition [33]. They reported the mean temperature of driest quarter as the most important variable explaining the biography of the most frequent fungi.
Although with limited impact, all building features signi cantly affected the indoor mycobiome. According to the PERMANOVA, the presence of pests was the most relevant building factor explaining 1.99% of variation for indoor samples. The volunteers reported in particular three kind of pests: mice, rats and long-tailed silver sh. The prevalence of the long-tailed silver sh (Ctenolepisma longicaudata) has increased notably in Europe during the last years and is considered as a major nuisance pest in modern buildings of Norway [34]. Madden et al. [35] reported that arthropod and microbial (fungi and bacteria) diversities follow parallel trends in settled-dust samples. Other building factors studied (type, material, ventilation, construction year and moisture-related problems) also explained smaller fractions of the indoor mycobiome variation (R 2 values between 1.24% − 0.44%). Previous studies have identi ed similar building determinants: For example, Dannemiller et al. [22] reported that the use of air conditioning systems in uenced the fungal community composition, and both air conditioning and water leaks increased the species richness. While, Kettleson et al. [21] found age and moldiness of the studied houses to be positively correlated to the fungal richness.
The most relevant occupant-related variable was the presence of allergies (including pollen, food, and skin reactions), explaining 1.61% of the indoor fungal community composition, while the presence of asthma cases accounted for only 0.26% of the variation. We found in indoor samples a striking abundance of taxa with allergenic effects on human. Despite the well-recognized association between dampness/mold problems and negative health effects as allergies and asthma development, the causal agents and mechanisms remain unclear. A previous study associated the asthma severity in children with the fungal community composition in house dust [36]. Further, in our study, the number of occupants and the presence of pets were also signi cant explanatory variables, with R 2 values 1.46% and 1.03%, respectively. Dannemiller et al. [22] also reported the in uence of occupancy (people, children and pets) on the fungal community composition, with an increased richness associated with the presence of pets.

Fungal diversity in Norwegian houses
Fungal richness, evenness and alpha diversity were consistently higher in indoor than outdoor samples. This trend has been reported, with approximately 50% increase of fungal richness/diversity indoors compared to outdoors, in previous studies [18,23]. As suggested by Barberán et al. [18], this tendency may be partially due to two inter-related phenomena: (i) the dominance of a few taxa in the outdoor communities, and (ii) the higher richness/diversity indoors, where there is a mixture of outdoor and indoor fungi. Both phenomena were likely relevant in our study. Dominant outdoor taxa from the genera Cladosporium, Thekopsora and Verrucocladosporium are in the top-20 OTUs (1st, 2nd and 8th most abundant OTUs) and occurred in more than 80% of the houses. These taxa were also present indoor but with relatively lower proportions.
In contrast, studies of speci c building units reported the opposite trend, with higher fungal diversity and richness outdoor [12,37,38]. Both fungal and bacterial richness, as well as the fungal biomass were higher in outdoor dust samples from a university housing facility in California [12,37]. Similar trend was also reported for the fungal diversity and biomass in settled dust from water-damaged units of a housing complex in San Francisco, with the lowest diversity inside units with visible molds [38]. However, that nding was associated with the in uence of a few dominant taxa, which were likely growing and spreading from mold colonies indoors. In this regard, Adams et al. [39] demonstrated that local sources of abundantly sporulating fungi might distort the perception of species richness and community composition assessed by HTS approaches, a few (or unique) abundant species mask the real community. In our study, based on low DNA yields extracted from low-input dust material, this phenomenon might partly explain our results related to the high abundance of some sporulating taxa outdoors (e.g. rust fungi a liated to Thekopsora) and indoors (e.g. mold genera as Penicillium and Aspergillus), as well as the considerably low evenness found in some outdoor samples. Unfortunately, our dataset does not allow to verify if this distortion occurred and to which extent.
Regardless the house compartment, the fungal richness (mean = 142.8 OTUs per sample) was about ten folds lower than those reported by Barberán et al. [18]. This is most likely due to the methodological differences between both studies, such as sampling materials, DNA preparation and polymerase-chain-reaction (PCR) protocols, barcodes (ITS1 vs. ITS2), HTS set-up and bioinformatics pipelines. Another relevant point is that our sampling campaign was carried out in spring (mostly in May), while the Barberán et al. [18] included samples collected in different seasons for more than one year. In contrast, Adams et al. [12] reported a more similar range of fungal richness to our study (78.5 and 179.7 as mean number of OTUs per indoor and outdoor samples, respectively) for dust settled on sterile Petri dishes during 1-month period. In addition, several studies have reported a global trend for fungal diversity and richness that increase with the latitude [16,33]. Our study also supports this trend as slightly higher alpha diversity were obtained for houses in northern Norway.
In agreement with previous studies in the built environment, which mainly described air-and dust-borne communities, the mycobiomes in our study houses were clearly dominated by ascomycetes ( 70%) with Capnodiales and Eurotiales as major orders in abundance, corroborating our hypothesis H2. These orders are wellknown for their stress tolerance; Capnodiales (with Cladosporioum as major representative genus in our dataset) is particularly rich in extremotolerant species, including saprobes, plant pathogens, endophytes, epiphytes and rockinhabiting fungi [31,40], while Eurotiales contains many xerophilic fungi (especially Aspergillus and Penicillium species) that are able to grow on substrates with low water activity (a w ≤ 0.85) like household dust [3,41].
Interestingly, we observed a distinct difference in the overall distribution of Ascomycota and Basidiomycota; ascomycetes were to a higher extent connected to areas with high annual precipitation and longer growing season, while basidiomycetes were more prevalent in continental areas with high degree of seasonality and high snow cover during winter. More than re ecting the actual biogeography of the two phyla, we speculate that this pattern may partly be due to temporal differences in plant phenology across the study area. During the sampling campaign in May, the plant phenology has likely progressed more in areas with a longer growing season, meaning that a larger proportion of plant-associated ascomycetes (including e.g. pathogens, endophytes and saprotrophs) have become dominant in these areas. Further, several of the most dominant basidiomycetes, including Fomitopsis sp. and Strobilurus sp. are known to be prevalent in coniferous forests that are more abundant in continental climates.

Overlapping between indoor and outdoor mycobiomes
According to previous ndings [12,18], we initially expected that a major part of indoor fungi is originated from outdoor sources (H3). Barberán et al. [18] reported that 65% of OTUs found indoor were also present outdoor. In our study this overlap was 58% and to a considerable extent, driven by low-abundance fungi (39% overlap after excluding OTUs with < 10 reads per sample). However, on a house-by-house basis, only 15% of OTUs were present in both outdoor and indoor environments, and 13% of the OTUs in both indoor compartments (living room and bathroom). The lower overlap between compartments in single houses may be due to the limited representativeness of the collected samples (one per house compartment) or/and the in uence of indoor fungal sources nearby the sampled surfaces. Considering these results, the hypothesis H3 has been partly refuted, as we cannot conclude that the major fraction of indoor fungi was from outdoor. In agreement with Yamamoto et al. [23], the indoor emissions may also act as primary sources for the indoor mycobiome. They reported that 70% of indoor fungal aerosol particles (80% for allergenic taxa), collected from seven classrooms of four different countries, were associated with indoor emissions. Diverse indoor fungal sources, including spoiled materials and food, waste, potted plants, drains and skin debris, have been recognized in previous studies [5,13,17]. Presumably, the indoor mycobiome is assembled by a combination of outdoor and indoor sources and their exact contributions are hard to tease apart.

The indoor core mycobiome
The indoor core mycobiome from Norwegian houses, i.e. those fungi signi cantly associated with their indoor environments, is similar to what has been reported in other countries. We detected two main groups of indoor fungi: (i) the well-known household xerophilic molds a liated to Eurotiales (17% of indoor indicator OTUs; mostly to the genera Penicillium and Aspergillus) and Wallemia (3%), and (ii) yeasts belonging to the orders Saccharomycetales (6%; genera Saccharomyces, Debaryomyces and Candida) and Sporidiobolales (2%; genera Rhodotorula and Sporobolomyces), as well as the genus Malassezia (4%).
Penicillium and Aspergillus species are ubiquitous fungi found in dust and air samples, both indoor and outdoor, during all seasons [4,5,19,42,43]. They are especially abundant indoors, as part of household dust or colonizing building materials and foodstuffs, which become relevant sources for further conidial dispersion [3,30]. Wallemia is an extreme xerophilic basidiomycete, commonly found in dust due to its ability to grow at low water potential, a w < 0.75 [3,44]. The yeast genera Malassezia, Debaryomyces, Candida and Rhodotorula are commensal fungi associated with human skin, showing prevalence in indoor environments [4,13,29,43,45,46]. The fourth mostabundant species (OTU3, 5.4% of total reads, present in 96% of houses), with the highest indoor IndVal (91.2%), was identi ed as Saccharomyces sp., a gastronomically relevant genus that includes S. cerevisiae (baker's and brewer's yeast) and has previously been reported in indoor environments [18,43]. In addition, some common edible mushrooms a liated to Agaricales -Agaricus bisporus and Pleurotus ostreatus -were identi ed as indoor indicator species in 39% and 5% of study houses, respectively. The majority of these fungi found indoor have been described as potential allergenic taxa [27,28]. Lastly, there was a signi cantly higher occurrence (mean = 21% of study houses) of indoor indicator species compared to outdoor indicators (9%), supporting that there is a consistent indoor core mycobiome.

Outdoor mycobiomes
Mycobiomes detected at the level of the main entrance outside the buildings, showed striking differences compared to the indoor mycobiomes. Cladosporium and Thekopsora were the most-abundant and widespread genera in the outdoor samples (18% and 16% mean relative abundance per samples, respectively). The genus Cladosporium includes many common airborne molds that colonize living and dead plant materials in nature [40] and is commonly found in airborne and dust-associated fungi inside and outside buildings [12,19,42]. The second mostabundant OTU correspond to Thekopsora aerolata, fungus causing important rust damages in cones of Picea spp., especially on Picea abies, and leaves of Prunus spp.
[47]. P. abies is widely distributed in Norway (south, east and mid), where the majority of houses in this study are located.
Many of the taxa fruiting in nature and prevalent in the outdoor samples were also frequent indoors (even showing higher relative abundances), including members of Agaricales (genera Strobilurus -commonly found on pine cones, Lycoperdon -puffballs, and Cylindrobasidium -corticioid fungi) and Polyporales (Fomitopsis -bracket wood-decay fungi). Adams et al. [12] also reported a high diversity of typically outdoor fungi -mushrooms, wooddecay polypores, puffballs and lichenized fungi -in both outdoor and indoor environments. Although these characteristic outdoor fungi have higher absolute abundances outdoor, their proportions may be higher in indoor samples, leading to a bias picture of their distributions. Based on former culture-based studies, it is acknowledged that the concentration of fungi in outdoor air are generally higher than in indoor air [42]. Adams et al. [12] also reported much greater (20-100 fold) fungal biomass in outdoor dust samples than indoor samples, resulting in a lower sequencing depth and coverage of the outdoor assemblages. We cannot rule out this impact in our study.
Indicator species analysis revealed that outdoor mycobiomes were distinctly enriched in so call rock-inhabiting fungi, including lichen-forming fungi of the order Lecanorales (16% of outdoor indicator OTUs), as well as fungi a liated to Chaetothyriales (16%) and Capnodiales (13%). They are well-known for their multi-stress tolerance and prevalence in diverse outdoor environments such as rocks and buildings, where they are exposed to stresses like solar radiation, desiccation and rehydration, temperature uctuations, osmotic stress, pollutants and lack of nutrients [31,48]. It is not surprising that these fungal groups were especially abundant in the outdoor samples, as mostly collected from the doorframes exposed to the external conditions and prone to colonization by subaerial bio lms.

Conclusions
Our main ndings are in line with previous indoor mycobiome studies, identifying climatic variables as the key determinants of indoor mycobiome. Building features and occupant characteristics had signi cant but smaller in uence. The indoor dust mycobiome represents a mixture of fungi from outdoor and indoor sources, which could also be the reason why a higher fungal richness was observed indoor. The indoor core mycobiome is characterized by two ecological groups with allergenic potential, xerophilic molds and skin-associated yeasts. In contrast, rockinhabiting fungi, well-known for their multi-stress tolerance and ability to form bio lms on buildings, were the main outdoor indicator fungi.
Despite methodological limitations related to citizen science sampling (e.g. non-uniform way of collection, small amount of dust collected with subsequently low DNA yields, and low number of samples per house), this approach was a successful strategy in characterizing the indoor mycobiome of a large set of houses throughout Norway during a short period of time. Future large-scale studies on indoor mycobiomes should preferably target other regions, beyond the US and Europe, to reveal whether similar trends are present under different climates.

Citizen science dust sampling campaign
To increase the number of study houses and cover a broad geographical area, citizen scientists were recruited through scienti c networks and diverse actions in social/public media: Facebook website, outreach articles in the Titan newspaper [49] from the University of Oslo (UiO) and the Agarica magazine from the Norwegian Mycological Society, as well as a radio interview at the Norwegian public broadcasting. A total of 359 volunteers signed up in this study and provided relevant information about their houses by lling out an online questionnaire (details in the section Environmental data). Sampling kits (Additional le 1: Fig. S1), including instructions, return envelope, three sterile FLOQSwabs in tubes (Code 552C, Copan Italia spa, Brescia, Italy) and two adhesive tapes (Mycotape2, Mycoteam AS, Oslo, Norway), were sent to volunteers by post.
For DNA metabarcoding analyses, three dry dust samples were swabbed from different compartments in each house: outside (entrance door), living room (main room) and bathroom. The samples were preferentially collected from the upper surface of doorframes, but in cases where this was not possible, similar areas on shelves or windowsills were sampled. As stated in previous studies [18,19], these selected areas, with little contact from the house occupants, act as passive collectors of dust that was deposited during an unknown amount of time. In addition, one adhesive tape was collected from shelves or windowsills in the living room to calculate the percentage of dust coverage, which was later included as an environmental variable in the study. Samples were sent back to UiO by post, where they were registered and the swabs were stored at -80 °C until DNA extractions. Whereas the adhesive tapes were immediately scanned using a scanner Epson Perfection V850 Pro (Seiko Epson Corporation, Nagano, Japan), and the percentage of dust coverage was calculated on a surface area of 45 × 18 mm by image analysis using the Olympus Stream v 1.9 software (threshold at the maximum value 61100).
This large-scale sampling campaign was mostly conducted in May 2018 (from 27th April to 5th June). Overall, we received 869 swabs from 290 houses. However, 57 samples failed during the DNA laboratory works (extractions, PCR and library preparation). Thus, the HTS was performed on 812 dust samples from 271 houses, including two houses from Svalbard.

Environmental data
Meta data about the study houses and their occupants were provided by the volunteers in the online questionnaire at UiO website. In addition to the location of houses, with the complete addresses and their corresponding geographic coordinates (latitude and longitude), the following variables (with categories for factor variables) were extracted from the questionnaire: building type (detached house/semi-detached house/block), area (urban/rural), construction year, building material (wood/brick and concrete), ventilation type (natural/mechanical/balanced), number of people, number of children, number of females, pets (no/dog/cat), allergies (no/pollen/food/skin), asthma (yes/no), moisture problem (yes/no), water damage (yes/no), odour problem (yes/no) and pests (no/mousses/rats/grey silver sh). Data about the location of dust samples in the house were included as two factor variables: house compartment (outside/living room/bathroom) and indoor vs. outdoor (indoor/outdoor).
Based on the geographic coordinates of study houses, data for six relevant WorldClim 2 bioclimatic variables (annual mean temperature BIO1, temperature seasonality BIO4, mean temperature of driest quarter BIO9, mean temperature of warmest quarter BIO10, mean temperature of coldest quarter BIO11, and annual precipitation BIO12), at 30 seconds resolution (~ 1 km 2 ), were extracted using the R package dismo following the authors' instructions [50]. Moreover, data for 116 environmental variables, related to geology, topography, climate and hydrology, were also explored. They were the explanatory variables analyzed in a recent study modelling the vegetation types in Norway [51], and were kindly provided by their authors. The contribution of the numerical variables, 46 from this dataset plus the 6 previously extracted from WorldClim, were evaluated by principal component analysis (PCA; Additional le 1: Fig. S7). Based on PCA results, 10 numerical variables were nally selected for the statistical analyses: the six detailed WorldClim bioclimatic variables, growing season length (The Norwegian Metereological Institute, MET), snow covered area in February (sca-2, MET), snow water equivalent in April (swe-4, MET), and potential incoming solar radiation (Geodata AS). Two additional factor variables: land cover AR50 (developed area/agricultural area/forest/barren land/bog and fen/fresh water; Norwegian Institute of Bioeconomy Research, NIBIO) and bedrock nutrient (poor/average/rich; Norwegian Geological Survey, NGU), were included in the nal selection (Fig. 1a).
Fungal DNA metabarcoding: DNA extraction, ampli cation and sequencing DNA was extracted from the swabs using chloroform and the EZNA Soil DNA Kit (Omega Bio-tek, Norcross, GA, USA). Swab tips were transferred to the kit Disruptor tubes that contain glass pearls and 800 µl SLX-Mlus buffer.
After a rst bead-beating cycle (1 min at 4.5 m s -1 ) using the FastPrep-24 homogenizer (MP Biomedicals. Irvine, Ca, USA), the samples were frozen at -20ºC for at least 30 min. Afterwards, samples were incubated at 70ºC for 15 min, and again shaken at FastPrep-24 (2 cycles of 30 s at 4.5 m s -1 ). These successive thermal-shocking and beadbeating steps were carried out to get a proper homogenization of dust samples and lysis of fungal conidia and spores. After adding 600 µl chloroform, samples were vortexed for 30 s and centrifuged at 13,000 rpm for 5 min.
DNA from the aqueous top phase was further puri ed using the HiBind DNA Mini Column following the kit's instructions. Final DNA extracts were eluted in 30 µl EB buffer and quanti ed using the uorometric Qubit dsDNA HS Assay Kit (Invitrogen, Thermo Fisher Scienti c, Waltham, MA, USA). Low DNA yields, ranged from 0.05 to 1 ng µl -1 , were recovered from the swabs, which were expected considering the small amount of dust collected with dry swabs. Nine blank controls (unused sterile swabs) from different extractions batches were included through the complete DNA metabarcoding protocol.
The internal transcribed spacer 2 (ITS2) region of the nuclear rDNA were ampli ed using the primers gITS7 5′-GTGARTCATCGARTCTTTG-3′ [52] and ITS4 5′-TCCTCCGCTTATTGATATGC-3′ [53]. The selected marker and primers have shown a good species resolution in previous mycobiome studies of diverse environments. Both forward and reverse primers were designed with 96 unique tags (barcodes) of 7-9 bp at the 5′-end, which differed in at least three positions from each other. To avoid tag switching errors [54], samples were combined in pools of 96 samples, each with a unique tag combination (Additional le 1: Table S3). Nine pools (96 samples each) were analyzed in this study, each of them included an extraction blank, a PCR negative and a mock community that was used as positive control. Positive controls contained 1 ng of an equimolar mixture of DNA from three fungal species that are not expected in the Norwegian built environment: Mycena belliarum, Pycnoporellus fulgens and Inonotus dryadeus. They were included to evaluate the e ciency of the DNA metabarcoding work ow, and more speci cally, to assess potential tag switching errors. In total, 17 dust samples were duplicates, as technical replicates across different PCR libraries.

Bioinformatics pipeline
After an initial quality checking of sequencing results, using FastQC [56], samples were demultiplexed independently (R1 and R2) with CUTADAPT v 1.8 [57] allowing zero miss matches in tags and primers; these were simultaneously removed along with sequences shorter than 100 bp. The demultiplexed R1 and R2 reads were kept separate for the next steps using DADA2 v 1.12 [58] . Additional clustering of sequences in OTUs was done using VSEARCH v 2.11.1 [59] at 98% similarity. This clustering level is similar to the 98.5% level used to de ne the species hypotheses (SHs) in the UNITE database [60]. OTUs containing only one read (singletons) were removed after clustering. To correct for potential over-splitting of OTUs due to remaining sequencing errors, the OTU table was curated using LULU [61] with default settings. An initial matchlist (sequence similarity) was created with blastn [62] [options: -qcov_hsp_perc 80 -perc_identity 84], for the subsequent LULU run [options: minimum_match = 84, minimum_relative_cooccurence = 0.95]. Taxonomic assignment of the OTUs was carried out using VSEARCH against the eukaryotic ITS dataset from UNITE v 8.0 [63].
In the resulting OTU table, we initially kept and identi ed all dust biodiversity captured by this DNA metabarcoding approach, including mostly fungi but also members of the clade Viridiplantae (green plants). Previous studies have reported that gITS7/ITS4 primers can also amplify plant DNA [52]. Two lters were subsequently applied on the table to select the OTUs that contained at least 10 reads, and showed at least 70% of identity in the taxonomic assignment. Finally, we selected the OTUs assigned to the kingdom Fungi on the quality ltered table. To re ne the taxonomic annotation of the top-100 most abundant fungi, a double-checking was done on those OTUs that initially failed at the species level. This was performed using BLAST + v 2.8 against both UNITE and NCBI [64] databases.
The key steps of this bioinformatics pipeline, as well as the resulting of numbers of reads and OTUs throughout the pipeline, are summarized in the Table S4 (Additional le 1).

Assessment of control and replicates samples
Prior to ltering the fungal OTUs, the quality of controls and replicates were assessed on the matrix that contained 8,033 OTUs, 88.5% attributed to Fungi, 11.2% to Viridiplantae (green plants mostly belonging to the phyla Streptophyta and Anthophyta), and the remaining 0.2% (19 OTUs) corresponded to other kingdoms. The number, identity and abundance of OTUs in the controls (extraction blanks, PCR negatives and positives) were checked and corrected considering their frequency in the study samples. All positive controls (mock community of three fungal species), included in the nine sequencing libraries, showed an identical pattern composed of the same four OTUs.
The three major OTUs corresponded to the mock-community members, identi ed as Mycena belliarium, Pycnoporellus fulgens and Inonotus hispidus, which represented ~ 99.96% of reads present in positive controls. The additional minor OTU (~ 0.04% of reads) detected in the positives corresponds to Saccharomyces sp. (OTU3), one of the most abundant and widely distributed OTU in the whole dataset. Remarkably, reads from mock species were exclusively detected in the positive controls, with the exception of a few reads (< 23) present in two dust samples, suggesting that the tag switching rate was insigni cant in this study.
Regarding the negative controls, six extraction blanks (unused sterile swabs) and three PCR negatives contained a relatively low number of reads, representing an average of 4.1 ± 2.6 OTUs per negative control. After checking the abundance and frequency of these OTUs in the study samples, two of them (< 10 reads in two samples) were deleted. The remaining 22 OTUs were kept because they were widely distributed in the dataset and correspond to ubiquitous fungi in the built environment.
The similarity of the community pro les for 17 technical replicates (duplicates in different PCR pools and sequencing libraries) was con rmed by NMDS (Additional le 1: Fig. S8), and the replicate with lower number of reads were discarded. Hence, con rming the reproducibility of the DNA metabarcoding work ow.

Statistical analyses
Statistical analyses were conducted in R v 3.5.2 [65] through RStudio v 1.2.1335. Tidyverse v 1.2.1 [66] and the vegan v 2.5-6 [67] R packages were used for data manipulation and plotting, and ecological analyses, respectively. The most relevant R functions used are detailed below, remarking when they belong to R packages different from vegan. Initially, the OTU table was rare ed (× 10 times resampling with the median value taken per OTU) to 2,000 reads per sample using the function rrarefy, and further adapted for the three datasets: all samples (full dataset), indoor samples, and outdoor samples.
Abundance of fungi was estimated using two different ways: (i) relative abundance of OTUs as percentages of rare ed reads on the total count in the dataset, and (ii) relative abundance of a certain taxa (at different levels, e.g. phylum, order and genus) as percentage of rare ed number of reads per sample. Prevalence of OTUs was also calculated as percentages of samples or houses in which each OTU was detected, based of rare ed tables.
Alpha-diversity of samples was assessed calculating species richness (number of observed OTUs) and evenness (equitability between OTUs), as well as Shannon, Simpson and Chao1 indices, on the rare ed OTU table. Signi cant differences in the variance of these parameters were evaluated with the analysis of variance (ANOVA) test. Betadiversity was assessed with NMDS ordination of both dust samples and OTUs using metaMDS, Bray-Curtis dissimilarity index and 200 random starts in search of stable solution. NMDS analyses were done on the Hellingertransformed rare ed OTU tables, after testing three transformations: Hellinger and log using the function deconstand, and Cumulative Sum Scaling (CSS) using cumNorm of the metagenomeSeq package [68]. Continuous environmental variables and alpha-diversity indices were regressed against NMDS ordination and added as vectors on the ordination plots using the function gg_env t of the package ggordiplot v 0.3.0 [69] to visualize their association with the dust mycobiomes. In addition, beta-diversity was also assessed using the function betadisper to test the homogeneity of variance in different groups of samples.
To evaluate the correlation between each environmental variable and the observed variance in fungal community composition, permutational multivariate analysis of variance (PERMANOVA; 999 permutations) with the function adonis2 was used. The effects of four groups of variables (building, occupants, climate and house compartment), were assessed by variation partitioning analysis (VPA) based on the Bray-Curtis dissimilarities using the function varpart and vegdist.
To evaluate the overlap between outdoor and indoor mycobiomes, we compared the OTUs detected in the three house compartments using two different estimates: percentages of OTUs on the total counts in the dataset, and percentages of OTUs per house. Indicator species analysis was performed to reveal the signi cant associations (p < 0.05) between OTUs and some relevant environmental variables related to the house compartments, building features (e.g. construction materials, water damages and moisture problems) and occupants (e.g. presence of allergies and pets). These analyses were performed using the multipatt function of the indicspecies package [70] in R. Pearson's correlation coe cients and their corresponding p values were calculated to explore the associations between OTUs and the continuous variables related to climate and number of occupants.
Finally, to unravel the most relevant variables predicting (i) the species richness per sample, and (ii) the percentage of shared OTUs between indoor and outdoor, we conducted GLM analyses using the glm function. A forward selection was performed using AIC in order to assess model improvement in comparison with the null model. , including raw sequences, mapping les, complete metadata le, the nal fungal rare ed OTU table, as well as the taxonomic assignment of their OTUs. The scripts used for the analyses will be made available upon e-mail request to the authors.

Competing interests
The authors declare that they have no competing interests.

Funding
This study has received funding from the European Union's Horizon 2020 research and innovation programme through a Marie Skłodowska-Curie Individual Fellowship to PMMS, under the grant agreement MycoIndoor No 741332.
Authors' contributions PMMS, ELFE, IS and HK conceived and designed the study. ELFE organized the citizen science sampling with contribution from IBE. PMMS performed laboratory work. PMMS and LNM analyzed data (bioinformatics and statistics) and prepared gures. SM provided technical advice on laboratory work and contributed to statistical analyses. PMMS wrote the rst draft of the manuscript. All authors contributed to the data interpretation, as well as edited and approved the nal manuscript. Overview of the citizen science dust sampling campaign in Norway. (a) Schematic overview of the metadata for each house: outdoor metadata that mainly include climatic variables (green), building features (violet) and occupant characteristics (blue). The sampling points (house compartments) are indicated with red dots. The building variable "Dust coverage" corresponds to the percentage of dust covering the study surface at the living room, as measured on the adhesive tape (Mycotape2). (b) Maps showing the location of the 269 houses (in mainland Norway) colored according to their temperature seasonality (left; standard deviation of mean monthly temperatures = BIO4/100) and the annual precipitation (right; BIO12).

Figure 2
Box plots visualizing diversity patterns in the three studied house compartments. A total of 269 houses were assessed, including dust samples from the outside (n = 266), living room (n = 270) and bathroom (n = 271). (a) Alpha diversity (richness) and Shannon index, (b) Beta diversity. All statistics were calculated from the rare ed matrix (6,632 OTUs).   Venn diagram summarizing the variation partitioning analysis (VPA). The four groups of variables are indicated in colors ("Building", "Occupants", "Climate" and "House compartment"; see Table 1 for selection). The percentage of variation explained by each group alone are in bold for the complete dataset. VPA values in square brackets were obtained for the partial datasets when analyzed separately: OUT -outdoor dataset, IN -indoor dataset. The unexplained variation (residual) was 85% and variables explaining < 0.01% are not shown in the Venn diagram.  Venn diagrams showing the distribution of dust mycobiomes across the three house compartments. The three diagrams show the proportions of OTUs across overall data (a -left), after removing low-abundance OTUs (< 10 reads per sample) (a -right), and when comparing at a house-by-house basis (b). Mean percentages of OTUs are shown together with standard deviations.