Diverse recruitment to a globally structured atmospheric microbiome


 Atmospheric transport is critical to dispersal of microorganisms between habitats and this underpins resilience in terrestrial and marine ecosystems globally 1,2. Conventional dogma that this is a neutral process involving ubiquitous distribution in air has been challenged by recent advances 3–5. However, the lack of standardized methods and analytical frameworks have impeded synthesis and global perspective. A key unresolved question is whether microorganisms assemble to form a taxonomically distinct, geographically variable and functionally adapted atmospheric microbiome. Here we characterized global-scale patterns of microbial taxonomic and functional diversity in air within and above the atmospheric boundary layer and in underlying soils. Bacterial and fungal assemblages in air were taxonomically structured and deviated significantly from purely stochastic assembly processes. Fungi dominated above tropical, temperate and continental biomes whilst bacteria did so above oceans and drylands. At high altitudes bacterial diversity declined but fungal diversity was greatest. Source-tracking indicated a complex recruitment process involving local soils plus globally distributed inputs from drylands and the phyllosphere. Assemblages displayed stress-response and metabolic traits relevant to survival in air, and taxonomic and functional diversity were correlated with macroclimate and atmospheric variables. Our findings highlight a structured global atmospheric microbiome that is central to understanding regional and global ecosystem connectivity.


Summary
Atmospheric transport is critical to dispersal of microorganisms between habitats and this 2 underpins resilience in terrestrial and marine ecosystems globally 1,2 . Conventional dogma 3 that this is a neutral process involving ubiquitous distribution in air has been challenged by 4 recent advances 3-5 . However, the lack of standardized methods and analytical frameworks 5 have impeded synthesis and global perspective. A key unresolved question is whether 6 microorganisms assemble to form a taxonomically distinct, geographically variable and 7 functionally adapted atmospheric microbiome. Here we characterised global-scale patterns of 8 microbial taxonomic and functional diversity in air within and above the atmospheric 9 boundary layer and in underlying soils. Bacterial and fungal assemblages in air were 10 taxonomically structured and deviated substantially from purely stochastic assembly. Fungi 11 dominated above tropical, temperate and continental biomes whilst bacteria did so above 12 oceans and drylands. At high altitudes bacterial diversity declined but fungal diversity was 13 greatest. Source-tracking indicated a complex recruitment process involving local soils plus 14 globally distributed inputs from drylands and the phyllosphere. Assemblages displayed stress- 15 response and metabolic traits relevant to survival in air, and taxonomic and functional 16 diversity were correlated with macroclimate and atmospheric variables. Our findings 17 highlight a structured global atmospheric microbiome that is central to understanding regional 18 and global ecosystem connectivity. 19 20 Main Text 21 Microorganisms occupy central roles in terrestrial and marine ecosystems globally 2 . 22 Movement of viable cells and propagules between habitats occurs largely through the 23 troposphere, which is the atmospheric layer closest to Earth 6 . This is critical to recruitment 24 and turnover that drive ecological resilience of these systems 1,6,7 , as well as influencing 25 dispersal of pathogens and invasive taxa 8 . There is also a growing awareness that 1 microorganisms suspended in the atmosphere are potentially capable of in situ metabolic and 2 biophysical activity that can influence climatic processes 9 . However, despite the central 3 importance of the atmosphere to these ecological outcomes, assessments of microbial 4 diversity in air at broad geographic scales remain limited 10,11 . As a result, there is little 5 understanding of how variable the overall microbial composition of the atmosphere may be 6 on a global scale, the extent to which it may be decoupled from underlying local surface 7 communities, or the importance of environmental or biotic factors in shaping diversity. The 8 unique role of the atmosphere as a transport medium for microorganisms has also obscured 9 the question of whether it supports a functionally adapted microbiome with the potential for 10 metabolic transformations and cell proliferation 6 . 11 Previous research suggests that the conventional assumption of random and ubiquitous 12 12 microbial transport in the atmosphere may no longer be valid 13,14 . Diversity estimates for 13 air within the boundary layer at near-ground level at various locales have indicated varied 14 bacterial and fungal communities that were correlated with local abiotic variables such as 15 temperature and humidity 15 or land use 11, 16 . Several studies have related variation in 16 diversity to different history of sampled air masses and this suggests combined influence of 17 the different sources and conditions to which microorganisms are exposed during transit 17-20 . 18 Estimates made indirectly from ground-deposited desert dust 21 or precipitation 5 have yielded 19 valuable insight on long-range dispersal across inter-continental scales although they reflect 20 deposition and differ somewhat to direct estimates from air 19 . Estimating diversity in the free 21 troposphere at higher altitudes above the boundary layer is challenging and scarce data 22 indicates a more restricted microbial occurrence 22 . There is currently no consensus for 23 diversity or abundance estimates due to the different experimental approaches, lack of 24 ecologically relevant scaling and taxonomic resolution, and the confounding effect of 25 contamination during diversity estimation for an ultra-low biomass system such as air 23,24 . 1 Adaptive traits have generally been inferred from taxonomy, although laboratory estimates of 2 metabolic activity by atmospheric bacterial isolates 25 , and recovery of RNA from air and 3 cloud water 26,27 , indicate that atmospheric microorganisms are potentially active in situ. 4 Here we report taxonomic and functional diversity in a large globally sourced original 5 dataset (n = 596) for air within the atmospheric boundary layer that delineates the majority of 6 physical interactions with the Earth's surface 28 (near-ground air), as well as aircraft sampling 7 of free tropospheric air at higher altitudes above the boundary layer (high-altitude air) (Fig.   8 1a,c; Supplementary Information, Field sampling). We combined this with concurrent 9 sampling of underlying surface soils and sediments to allow direct air-surface connectivity 10 comparisons. The approach employed a combination of high-throughput techniques to 11 estimate and triangulate taxonomic diversity: shotgun metagenomics for inter-domain 12 phylum-level comparisons and targeted amplicon sequencing for finer scale taxonomic 13 resolution of abundant groups. This was combined with an unprecedented effort to mitigate 14 against the occurrence of putative contaminant taxa that have plagued low-biomass 15 microbiological studies 23,24 (Supplementary Information, Decontamination of environmental 16 sequence data) . A statistical classification approach was used to identify potential 17 recruitment sources of airborne microorganisms. This was complemented by a targeted 18 functional analysis of our metagenomes (n = 120) focused on metabolic strategies and stress 19 responses relevant to environmental conditions in the atmosphere. We combined these biotic 20 data with a novel geospatial modelling approach to enable correlation with environmental 21 variables to which microorganisms were exposed during transit in the atmosphere, as well as 22 local climatic variables. We report compelling evidence for a taxonomically distinct, 23 altitudinally and geographically variable, and functionally relevant atmospheric microbiome 24 that is influenced by a complex suite of biotic and abiotic drivers. We demonstrate that 25 diverse local and global recruitment sources are important and that dryland soils are a 1 particularly strong influence on global diversity patterns in air. 2 An overview of inter-domain diversity from our metagenomic libraries indicated that 3 bacteria and fungi were the most abundant microorganisms in air with very low and patchy 4 occurrence of archaea and protists. We therefore focused further abundance and diversity 5 estimation effort on these two groups. On a global scale consistent patterns were observed 6 where total biomass and estimated abundance for bacteria and fungi were substantially lower 7 in near-ground air within the atmospheric boundary layer than in underlying soil (Fig. 1b, Fig. 1). High-altitude air above the boundary layer indicated a further steep 9 decline from values observed for near-ground level air. Fungi were more abundant than 10 bacteria in air at all locations except those above oceans and drylands (Supplementary 11 Information, Inventory of taxa). We view the abundance of microorganisms in air within the 12 atmospheric boundary layer as analogous to that which occurs in microbially active zones of 13 other biomes, such as the photic zone of the oceans or the topsoil of terrestrial habitats. Our 14 data allowed a globally triangulated estimate of 1.3x10 22 bacterial cells and 6.1x10 23 fungal 15 cells in this region of the global atmosphere. 16 Conserved ecological patterns were evident on a global scale for bacterial and fungal 17 diversity. Our diversity estimations using two commonly employed approaches, amplicon 18 sequencing and shotgun metagenomics, were broadly congruent (Extended Data Fig. 2) and 19 so we focused our fine scale phylogenetic interrogation on amplicon sequence data because 20 this approach allowed better ecological representation of the targeted assemblages. Bacteria 21 were least diverse in high-altitude air, followed by near-ground air, and soils were relatively 22 taxa-rich although variable as expected for the extensive global environmental gradient 23 encompassed by our study (Fig. 1e; Extended Data Fig. 1) 29-31 . Conversely, the fungi were 24 most diverse in high-altitude air, followed by near-ground air and soils were least diverse 25 ( Fig. 1i; Extended Data Fig. 1). Clear biogeographic patterns occurred where bacterial 1 diversity was reduced over low productivity habitats and fungal diversity was elevated above 2 locations with a well-developed phyllosphere, thus broadly reflecting patterns for soil 3 communities (Extended Data Fig. 1) 29,32 . We observed distinct patterns common to all 4 bacterial and fungal assemblages in air globally and these were robustly supported after 5 extensive effort to mitigate contaminant signal from the low biomass air and soil samples 6 (Supplementary Information, Decontamination of environmental sequence data): A robust 7 separation of assemblages by habitat type (high-altitude air, near-ground air, soil) and clear 8 biogeographic separation by location was observed (Fig. 1f,j; Extended Data Fig. 3). This 9 delineation was supported despite co-occurrence analysis indicating assemblages in air were 10 more variable than those in soil (Extended Data Fig. 4a). A pronounced distance decay 11 relationship for diversity was observed for near-ground and high-altitude air assemblages that 12 was comparable to those for soil (Extended Data Fig. 4b). However, there was no evidence 13 for a latitudinal gradient in richness and this mirrored observations for global soil diversity 29 , 14 and also reflected the inclusion of deserts, mountains, high latitude and ocean locations in our 15 study. We then applied null models to estimate nestedness, which is a measure of the extent to 16 which observed patterns of co-occurrence are non-random (Extended Data Fig. 5). Bacterial 17 and fungal assemblages in both near-ground and high-altitude air were significantly less 18 nested than null models and therefore taxonomically structured and non-randomly assembled, 19 although slightly less so than for soils 31 (Fig. 1g,k). The non-random patterns were 20 fundamentally an outcome of taxa specificity to habitat and location, and we interpreted these 21 as indicative of strong filtering for taxa. At higher altitudes above the atmospheric boundary 22 layer where abiotic stressors are more pronounced, communities were even more structured 23 and this reflected more extreme environmental filtering. The pattern persisted between 24 hemispheres sampled at peak and low growing season and across major climatic boundaries 25 and land use types. Our data indicates that environmental filtering in both near-ground and 1 high-altitude air results in structured and biogeographically predictable patterns for bacteria 2 and fungi. 3 Taxonomic composition of assemblages was congruent with the ecological patterns 4 observed (Fig. 2a,d; Extended Data Fig. 6). At broad taxonomic ranks (phylum-class) a 5 remarkably consistent diversity occurred globally regardless of underlying biome or growing 6 season. Our amplicon sequence variant (ASV) approach to diversity analysis revealed that at 7 finer taxonomic scale (genus-ASV) and after extensive decontamination effort there were 8 13% of bacterial and 10% of fungal genera co-occurring among ≥50% of globally distributed 9 air samples (Supplementary Information, Inventory of taxa). The only genus with ubiquitous 10 representation in all air samples was Sphingomonas, a diverse group linked with emissions 11 from the phyllosphere 16,33 . Despite conserved patterns of diversity at lower taxonomic ranks 12 down to genus level, we found no evidence for a "core" group of specific ASV-defined taxa 13 that represent an atmospheric microbiome. Nonetheless there was compelling evidence from 14 taxonomic data for environmental filtering of assemblages in air. Bacteria enriched in near- 15 ground air compared to soil were largely accounted for by classes supporting taxa with known 16 tolerance to environmental stress (Actinobacteria, Alphaproteobacteria, Firmicutes and 17 Gammaproteobacteria), although it cannot be ruled out that this also indicates taxa that 18 possess adaptive traits that favour aerosolization 34 . At higher altitudes where environmental 19 stress is exacerbated spore-forming Actinobacteria and Firmicutes were more abundant 20 (Extended Data Fig. 6), suggesting selection towards survival as passive resting structures. 21 Elevated abundance of gammaproteobacterial taxa at a single location (South Africa) was 22 consistent with emissions of this group due to land-use as a farm 35 . For the fungi, elevated 23 diversity in air relative to soils was largely due to macrofungi (Agaricomycetes) and prolific 24 spore-formers (Dothdiomycetes). In the absence of observed mycelia in air samples we 25 concluded that spores accounted for much of the fungal signature in air (Extended Data Fig.   1 6). The Agaricomycetes were notably more abundant in tropical air than all other locations 2 globally and this likely reflected global patterns for terrestrial fungi 36 . In near-ground air the 3 abundance of common fungal agricultural pathogens (Ustilaginomycetes) was elevated in 4 temperate Northern Hemisphere locations sampled during peak growing season, as opposed 5 to reduced abundance in Southern Hemisphere samples collected at the end of the growing 6 season. This we interpreted as a signature of seasonality in land use on a global scale. 7 Previous studies at individual near-ground locales have concluded that inter-seasonal 8 variation may variously be absent 37 , weak 17 , pronounced for some taxa 18 or stochastic 19 . 9 Elevated fungal diversity in ultra-low biomass high-altitude air was indicative of persistent 10 fungal propagules that are tolerant to extreme UV and thermal stress. Residence time for cells 11 in air may be extended at high altitudes and so this necessitates effective tolerance to these 12 stressors during potentially long-distance dispersal 38 . Overall, our combined ecological and 13 taxonomic data provided strong evidence that contrary to long-held dogma in microbial 14 ecology that microbial transport in air is ubiquitous and neutral to dispersal outcomes 4,15,38 , 15 instead atmospheric microbiomes exhibit a pronounced biogeography. 16 In order to further interrogate possible explanations for the observed patterns, we 17 conducted source tracking analysis to assess the likely origin of bacteria and fungi 18 encountered in the air. First, a co-occurrence analysis revealed that near-ground air displayed 19 greatest taxonomic connectivity with local soil at any given location and less connectivity 20 with soil from different locations (Fig. 2b,e). Assemblages in high-altitude air displayed 21 markedly fewer shared taxa with underlying near-ground air or soil and were essentially 22 decoupled from local underlying surface assemblages (Fig. 2b,e). Aerosolization of 23 microorganisms not only occurs from soil but also from different terrestrial and aquatic 24 surfaces, e.g. ocean surface waters 3,20 , the phyllosphere 33,39 and stochastic desert dust events 25 40,41 . We therefore employed a statistical classification algorithm to estimate recruitment to air 1 microbiomes from the surface habitats of different climatic regions (Fig. 2c,f, Extended Data 2 Fig. 7). A large volume of taxa had unexplained sources and this is likely due in part to 3 stochastic aerosolisation events for microorganisms and also a consequence of the fact that it 4 is impossible to exhaustively sample potential source microbiomes. For most locations local 5 soil was the major explained source of bacteria and fungi in air, and bacteria were sourced in 6 a more cosmopolitan manner than fungi (Extended Data Fig. 7). Many sampled air masses 7 had significant transit over oceans and yet marine sources were a relatively minor contributor 8 to observed diversity in air above terrestrial locations. This reflects that fewer 9 microorganisms occur above the oceans than over land 3 , and also the limited oceanic sources 10 for comparison. Clear patterns for terrestrial sources were apparent. Dryland soils (dry 11 deserts, polar/alpine and dry continental locations) were pronounced sources for bacteria 12 globally and this may reflect the more readily aerosolised non-cohesive soils typical of these 13 biomes 41 . This expands the influence of deserts to a global scale beyond the well-defined 14 intercontinental desert dust transit routes for microbial dispersal 40 . For the fungi, polar and 15 alpine soils were major sources and this is congruent with the notion that permanently cold 16 surface substrates in these environments have been proposed to act as long-term reservoirs for 17 inactive fungal propagules 4 . The phyllosphere was a pervasive yet smaller contributor to 18 bacterial diversity globally, and minor contribution to fungal sources likely reflects the lack of 19 comparative data. This may emerge as a more significant source as the inventory of 20 phyllosphere microbiomes increases. For high-altitude air, major sources were dry deserts and 21 polar/alpine sources and this likely reflects in part the adaptive advantages that taxa from 22 these habitats have in air, e.g. UV repair and desiccation tolerance 41 . The ability to become 23 aerosolized may vary between taxa in marine 34 and terrestrial 42 systems and so deterministic 24 biotic drivers may also be relevant to recruitment from sources, as well as selective deposition 25 during transit 43 . Overall, the source tracking demonstrated that atmospheric diversity is 1 driven by a complex recruitment process involving local soils plus globally distributed inputs 2 from drylands and the phyllosphere. 3 To generate further insight into possible biotic drivers of the observed patterns in 4 diversity we conducted a functional metagenomic analysis of selected metabolic and stress-5 response genes relevant to the atmospheric habitat (Fig. 3a,b; Supplementary Information, 6 Metagenomic functional analysis). We targeted bacteria because they likely comprise any 7 active fraction of the atmospheric microbiome 26 . Differentially more abundant genes in air 8 versus soil were inversely correlated with biomass and taxonomic richness, they affiliated 9 with taxa observed as enriched in air, and all values were averaged by location. We therefore 10 interpret the elevated abundance of genes in air as reflective of assemblage composition 11 rather than an artefact of sampling effort. Distribution of marker genes in air broadly reflected 12 that for underlying soil at any given location and this supported our identification of soil as a 13 major source for atmospheric bacteria. Traits were widely distributed globally and those for 14 stress tolerance were notably more abundant in bacterial assemblages in air above dry and 15 polar/alpine regions (Fig. 3b), thus further supporting our hypothesis that microorganisms 16 from these surface environments are adapted to survival in air. Both near-ground and high-17 altitude air communities possessed marker genes for cold shock, oxidative stress, sporulation, 18 starvation, and UV-repair and these were elevated in several air assemblages compared with 19 underlying soil (Fig. 3a). High abundance for stress-response genes occurred in air above 20 oceans, providing further evidence that the low biomass and richness above marine surfaces 21 reflected strongly filtered microbial diversity. This may arise during long-distance transport 22 from largely terrestrial sources, as well as during recruitment of bacteria from the sea surface 23 micro-layer 44 . Metabolic marker genes for respiration were widespread, and notably for the 24 ccoN proteobacterial cytochrome oxidase that correlated with elevated proteobacteria in air 25 versus soil. Markers for the metabolism and fixation of a variety of gaseous atmospheric 1 substrates including carbon dioxide, hydrogen, methane, nitrogen and isoprene, as well as 2 phototrophy were also prevalent in air (Fig. 3a). Elevated occurrence of the coxL gene 3 associated with carbon monoxide metabolism was indicative of the potential for interaction 4 with anthropogenic emissions 45 . This limited functional interrogation provided a much-5 needed glimpse into the potential for an active and stress-adapted atmospheric microbiome. 6 Our data indicates that there is capacity for greater metabolic plasticity than the existing 7 inventory from meta-omics 27,42 and transformation of substrates by atmospheric isolates 8 under laboratory conditions currently suggests 46,47 . 9 We examined possible interactions between the taxonomic and functional diversity of 10 assemblages and abiotic variables relevant to survival in air and soil (Fig. 4). These included 11 both location-specific macroclimate and environmental variables encountered by 12 microorganisms during transit acquired using a novel geospatial analysis (Extended Data Fig.   13 8). Significant correlations were revealed between both local macroclimate and transit abiotic 14 variables and community metrics of taxonomic and functional diversity in air (Fig. 4a). 15 Relatively strong negative correlations for bacterial and fungal diversity, oxidative stress 16 genes, and UV-repair genes in air with solar radiation and altitude (covariables of UV 17 exposure) provided further evidence for UV exposure as a strong selective force on global 18 bacterial and fungal diversity. Functional genes were most strongly correlated with mean 19 annual precipitation, and this likely reflects niche differentiation of source communities in 20 underlying soil at different climatic locations since we have shown they are coupled to 21 diversity in local air. Transit variables were also influential on functional diversity and this 22 was consistent with our other lines of evidence for environmental filtering. The strong 23 correlation between occurrence of phototrophy genes and all abiotic variables suggested 24 photoautotrophic bacteria may be subject to greater selective pressure than other groups. For 25 soil communities the correlations with macroclimate variables were relatively congruent with 1 those observed for other global studies of soil microbial diversity ( Fig. 4b) 30 , and this proliferation of an active microbiome under certain conditions. Given the physicochemical 1 and dynamic complexity of the atmosphere and the broad range of correlations we observed 2 between taxonomic and functional diversity and abiotic factors, a potentially chaotic system 3 of interplay may emerge that influences atmospheric microbial ecology in a manner similar to 4 that envisaged for highly dispersed marine larvae 49 . Taken together we anticipate these 5 findings will be valuable in future hypothesis-driven research to identify interactions 6 mediated by the atmospheric microbiome between different surface habitats across multiple 7 ecological scales, and in particular to testing models of recruitment, turnover, functionality 8 and resilience. Given that the atmosphere is also a sink for a large fraction of anthropogenic 9 emissions 45 , it is timely that an accurate global inventory of microbial diversity is provided in 10 order to present a baseline for measuring future responses to change. Finally, the study 11 complements efforts to inventory global soil 29-31 and oceanic microbiomes 50 and expands 12 the scope of the pan-global microbiome. The authors declare no competing interests. Reprints and permissions information is available at www.nature.com/reprints. 6 Correspondence and requests for materials should be addressed to S.B.P., e-mail: 7 stephen.pointing@yale-nus.edu.sg     the two previously interrogated locations were re-sequenced for this study 2,3 . Bulk phase 12 boundary layer air was sampled at 1.5m above the surface (near-ground air, n = 501) using 13 tripod-mounted air samplers and also above the boundary layer for surface interactions at 14 2,000 m above local surface level using aircraft mounted-air samplers (high-altitude air, n = 15 11) 4 . Concurrent sampling of underlying soil was conducted within a 25 m radius of air 16 sampling devices (soil, n = 84). Ship-board sampling was conducted at 25m above the ocean 17 surface to avoid sea-spray contamination. Logistical challenges limited high-altitude air 18 sampling to six locations although these were nonetheless able to capture a broad geographic 19 and climatic range for both hemispheres. 20 The use of high-throughput DNA sequencing for samples from low biomass habitats 21 such as air raises the issue of confounding signal due to contaminants that are otherwise 22 indistinguishable in higher biomass samples. We employed an experimental design for 23 sample recovery and quality filtering of sequence data that embraced recommended best 24 practice for minimising contaminant signal 5 (Supplementary Information, Decontamination 25 of environmental sequence data). Recovery of bulk phase air was achieved using three 1 Coriolis  high-volume impingement devices (Bertin Instruments, France) operated 2 concurrently. This device has been shown to perform well against other samplers 6 . All 3 equipment was transported between locations in sterile containers and bags. Each device was 4 dis-assembled and contact surfaces soaked for one hour with 1.5% v/v sodium hypochlorite 5 (NaClO) followed by three washes of Milli-Q H20 prior to and after each sampling in order 6 minimise contamination from cells or nucleic acids. All apparatus and work surfaces used 7 during sampling and sample processing were also cleaned in this way prior to use. All 8 operators wore surface sterilised nitrile gloves during field collections. Randomised 9 collection cones were assembled into the devices without activating the air pump, and these 10 were used as the negative sampling controls at each location. Additional control samples for 11 potential human contamination were provided via swabs from the inside of anonymised used 12 nitrile gloves (human operator controls). 13 Samplers were located 3m apart from each other at each sampling location and all 14 inlets were aligned facing prevalent local wind direction. Bulk air was recovered at 300 15 L/min -1 and particulates recovered after cyclonic deceleration into a sterile phosphate- 16 buffered saline (PBS) impingement medium in each collection cone. Samplers were only 17 approached from downwind during operation. Each device was used to collect discreet 18 m 3 18 air samples as this volume has been shown to result in recoverable environmental DNA 3 . 19 Samples were recovered hourly between 10:00 -16:00 hrs daily, and then processed The remaining sample fraction was archived. It was recognised that soil is not the primary 8 reservoir for terrestrial fungal diversity but in the absence of a practical means to globally 9 sample the diversity of other fungal substrates we accepted this limitation to the study. 10 DNA extractions from samples were performed in randomised sample batches each 11 with discreet laboratory controls to assess potential laboratory or reagent contamination. 12 Environmental DNA was recovered from filtered air and soil samples using a CTAB-based 13 manual extraction protocol optimised for low biomass samples 3 . DNA yield was quantified 14 using the Qubit 2.0 Fluorometer (Invitrogen, USA) and samples were then stored at -20 °C 15 until processed. developed to estimate gene copy number using pooled samples. These were amplified using 6 TaqMan Fast Advanced Master Mix as described above but without fluorescent markers 7 (Applied biosystems, USA) and quantified using a Bioanalyzer (Agilent Technologies, USA). 8 Serial dilutions of the template were used to generate standard curves. Although estimates of 9 cell abundance using qPCR are flawed due to uncertainties over gene copy number among 10 diverse phyla 12 , and the issue of multicellular fungi and taxonomic unit assignation 17 , these 11 were not sources of systematic bias in our study since they applied to all samples equally. atmosphere (e.g. 16,27 ) and the air is recognised as a potential source of putative contaminants 10 in studies of other habitats. Overall, the multi-step decontamination process identified 1,079 11 bacterial ASV and 229 fungal AV as suspected contaminants. 12 13 Shotgun metagenomics 14 Independent replicates were pooled by sampling day and device to yield 120 pooled samples 15 and 3 pooled controls for metagenomics sequencing. Libraries were prepared using a low-input 16 preparation protocol where required 28 and using the Nextera XT library kit and sequenced (2 17 × 150 bp paired-end) on an Illumina NextSeq 500 (Illumina, USA). Kneaddata (v0.7.4, default 18 settings, https://github.com/biobakery/kneaddata) was used to remove low-quality reads and 19 human DNA using the human genome hG37 as reference from raw fastq files. 20 Similar to the steps adopted for amplicon sequencing, filtered metagenomics reads were 21 further processed in a multi-step fashion to systematically identify and remove potential 22 contaminating nearest taxonomic units (NTU) (Supplementary Information, decontamination 23 of environmental sequence data). Filtered reads from the controls were co-assembled into 24 contigs using the "assembly" module of MetaWRAP (v1.2.1) 29 . Reads in the samples that 1 mapped to the contigs constructed (≥ 1,000 bp) in the controls were removed using Kneaddata. 2 Next, taxonomic classification for NTU was performed using Kraken (v2.0.9-beta) 30 based on 3 the PlusPFP database (Dec 2 nd , 2020 update) and species-level NTU classification was 4 optimized using Bracken (v2.6.0) 31 . Fungal species were identified using FindFungi (v0.23.3) 5 31 . Species-level information from Kraken2 and FindFungi were processed to identify potential 6 contaminating taxa using the same decontam settings applied to our ASV data 26 . The decontam 7 algorithm identified 23 fungal and 280 bacteria suspected contaminant NTU. Last, a genus 8 level subtractive filtering for NTU commonly associated with contamination in other low 9 biomass systems (i.e. as listed above for amplicon-based contaminant removal) was performed.

10
The genus-level subtractive filtering identified an additional 7 fungal and 244 bacterial 11 suspected contaminant NTU. All bacterial contaminants were subsequently removed using the 12 "extract_kraken_reads.py" command (option --exclude and --include-children) from 13 KrakenTools (v2.0.8-beta, https://github.com/jenniferlu717/KrakenTools). Reads cleared of 14 bacterial contaminants were subjected to another round of contaminants removal using 15 Kneaddata to discard reads that mapped to representative genomes of the fungal contaminants. 16 Genus-level subtractive filtering was not applied to archaea or protists, and viruses were poorly Statistical treatments and ecological modelling 8 Statistical analysis: General processing of the community data including the calculation of 9 relative abundance and estimates of alpha diversity were conducted using the R package 10 phyloseq 33 and visualised using ggplot2 34 . Calculation of geographic distances were 11 performed using R package geosphere 35 function distGeo with WGS84 ellipsoid. Source 12 tracking was conducted using FEAST 36 with data from other studies (processed using dada2 13 following the same parameters as this study) as additional sources/sinks 37-43 and NCBI 14 BioProject PRJEB42801. For correlation analysis between abiotic and biotic variables the 15 Pearson correlation coefficient for multiple pairwise combinations were calculated using the 16 R package Corrplot 44 . In order to visualise patterns of community dissimilarity, two methods 17 were used. Hellinger distances were ordinated with t-distributed stochastic neighbour 18 embedding (tSNE) using R package Rtsne 45 . We also calculated Jaccard sample pair-wise 19 distances based on the 10,000 most abundant and frequent ASVs using the R package vegan 20 46 . A preliminary analysis based on all reads and ASVs showed qualitatively similar patterns 21 but higher noise (i.e. the amount of variance accounted for by the major ordination axes was 22 relatively low due to a very high number of ASVs found only at one or two locations). We 23 decomposed the Jaccard matrix with Principal Coordinate Analysis (PCoA) which provided a 24 quantification of the variance accounted by each ordination axis 47 . 25 Network null models: A statistical mechanics approach was employed for network 1 construction 48 , and defined our networks as bipartite matrices with two layers: location and 2 taxa. Analyses were performed at multiple taxonomic ranks: Phylum, Class, Order, Family, 3 Genus, and ASV. In order to fully test our hypothesis, we employed degree sequence 4 constraints to enforce that for each taxon, the total number of locations in which the taxon 5 was found was a constraint, and for each location the total number of taxa found in that 6 location, disregarding location or taxa identity, was also a constraint. We used maximum-7 likelihood models 49,50 , to estimate the probability distribution that maximised entropy for the 8 canonical ensembles. We sampled the resulting probability distribution to obtain 999 null 9 matrices representing an unbiased sample of the canonical ensemble of our location by taxa 10 matrices using the MatLab routine Max&Sam 51 , and imported the null matrices in R for 11 downstream analyses. We used the R packages bipartite 52 and vegan 46 to calculate the 12 nestedness metric of NODF and Jaccard dissimilarity on the observed and null matrices. We 13 then used the standard definition of effect size 53 to quantify the difference between observed 14 metrics and the null distribution of the metrics. Since the distribution of the 999 null metrics 15 were approximately normal, an effect size larger than 2 standard errors corresponded to 16 taxonomic composition that diverged more than expected under purely random assembly 17 with an approximate P-value < 0.05. We calculated Z-scores for nestedness to indicate the 18 number of standard deviations a given data point lay from the mean using the commonly 19 employed NODF metric metric 54 , and also using Jaccard dissimilarity as this estimates 20 resemblance on average better than raw forms of the indices such as NODF, particularly in 21 the face of confounding effects of spatial scale and conspecific aggregation 55 .

10
*This location was affected by a Sahara Desert atmospheric dust intrusion during sampling.

4
+ denotes locations where metagenomes also indicated F>B. It should be noted that estimation of relative 5 abundance between domains using metagenome reads has many uncertainties when using existing 6 bioinformatics approaches.

8
Metagenomic functional analysis 1 2 Table S6. Functional genes targeted in the metagenomic inquiry of air and soil. A suite of respiratory genes 3 was used as a general marker of potential for metabolically active taxa, and targeted metabolic and stress 4 response genes were selected based upon substrates and stressors encountered in the atmospheric habitat. No 5 hits were recorded for ina genes and this likely reflects low homology between taxa.  Figure 1 The global atmospheric microbiome is taxonomically structured. a, Back trajectories for near-ground air (blue lines) and high-altitude air (red lines), plus soil sampling locations (green boxes) (total independent samples n = 596). b, Global distribution for microbial biomass in air and soil. c, Mean transit altitudes for sampled air. d,h, Global bacterial (d) and fungal (h) abundance distribution. e,i, Global taxonomic richness of bacterial (e) and fungal (i) ASV. f,j, Community dissimilarity (Jaccard Index) for bacterial (f) and fungal (j) air and soil assemblages by location. g,k, Modelled nestedness estimates for bacterial (g) and fungal (k) assemblages across phylogenetic ranks. Networks constructed for each habitat and  and fungal (f) diversity in air, averaged for each source biome in order to mitigate sample size effects.
Locations are numbered as shown in Fig. 1.

Figure 3
The atmospheric microbiome displays functional traits relevant to survival and metabolism in air. a, Functional metagenomics pro ling of targeted stress-response and metabolic genes by habitat type (n = 120). HA air, high-altitude air; NG air, near-ground air. b, Summary for distribution of stress-response and metabolic genes by climatic region, with all locations globally pooled by climate (n = 120). Oxid. stress, oxidative stress; Trace gas met., trace gas metabolism.

Figure 4
Taxonomic and functional diversity is correlated with multiple macroclimate and atmospheric transit variables. a, Correlation of location-speci c macroclimate factors and atmospheric variables encountered during transit for air assemblages. b, Correlation of location-speci c macroclimate factors with local soil assemblages. Blue circles denote positive correlations and red circles denote negative correlations. Circle colour intensity and size denote magnitude of correlation. MAT, mean annual temperature; MAP, mean annual precipitation; RH, relative humidity; UV, ultraviolet radiation. Abundance, qPCR estimated gene copy number; Richness, Chao1 estimation from rRNA gene diversity.