Host genetics correlated with the nasal microbiota diversity and composition
We set out by examining the correlation between host genetic variations and the overall diversity of the nasal microbiome in 1,401 individuals (63% females; a mean age of 30 years old) with high-depth metagenome data as well as integrated host genome data (a mean depth of 30× for host genome and an average sequencing data of 77.31 ± 23.00 Gb for nasal samples; Supplementary Table 1; Supplementary Fig. 1; Methods). The host genetic principal components (PCs) were examined, and the results revealed strong associations between the top three PCs and microbial α-diversity (p < 0.05 for both Shannon and Simpson indices based on species-level abundances, multivariable linear model; Fig. 1). Specifically, PC1 (r=-0.07, p = 8.7×10− 3), PC2 (r=-0.14, p = 1.39×10− 7), and PC3 (r=-0.08, p = 1.8×10− 3) were markedly correlated with the Shannon index. This is consistent with previous analyses of low-depth human genome sequences from 93 individuals in the Human Microbiome Project (HMP), which reported PC1 to associate with α-diversity in the anterior nares12. We verify that the top two host genetic PCs in our cohort are strongly associated with self-reported ancestry, namely the northern or southern region of origin in China (p < 2.2 × 10− 16 for PC1 and p = 1.78 × 10− 11 for PC2, Wilcoxon rank-sum test; Fig. 1b-c). We further confirmed that the individuals whose ancestors lived in northern China exhibited a higher nasal microbial α-diversity than those who lived in the south (mean Shannon index of 1.35 vs 1.28; Wilcoxon Rank-Sum test; p = 9.22×10− 3; Supplementary Fig. 2), despite the individuals themselves currently living in the same city. Also, the host genetic PC1 exhibited a correlation with the abundances of 14 nasal genera, including Micrococcus, Anaerococcus, Elizabethkingia, Campylobacter, Finegoldia, Yokenella, Serratia, Streptococcus, Gemella, Staphylococcus, Actinomyces, Malassezia, Veillonella, and Prevotella (Supplementary Table 2; Spearman test; False Discovery Rate (FDR) p < 0.05). Nine of these 14 PC1-associated genera also displayed differential abundances between the individuals deriving from China's northern and southern regions (Supplementary Table 2;p < 0.05).
Furthermore, the top three host genetic PCs showed correlations with four of the top ten microbiome β-diversity principal coordinates (PCos; computed based on species-level Bray-Curtis dissimilarity; p < 0.05 for pairwise associations, Spearman correlation; Fig. 1). Additionally, the top eight host genetic PCs (PC1-PC8) showed at least one correlation with any of the microbiome PCos among PCo2 to PCo8. These results suggested host genetics significantly influence nasal bacterial diversity and composition.
We also investigated associations between host genetic variants and nasal microbiome α-diversity and β-diversity (namely the top ten PCos). This analysis found two loci with marginal genome-wide significance (p < 5×10− 8, Fig. 1e, Supplementary Table 3). The SNP rs77221359, located in the intronic region of CACNB2, was significantly associated with the Shannon index representing the α-diversity of nasal samples (p = 2.6×10− 8; Supplementary Fig. 3). Notably, CACNB2, which encodes a subunit of a voltage-dependent calcium channel protein that is critical for mediating intracellular Ca2 + influx, has been established as a risk gene for psychiatric disorders such as schizophrenia and bipolar disorder23,24. Additionally, the SNP rs77221359 has been linked to multiple phenotypes or diseases, including chronic heart failure (p = 0.007), asthma (p = 0.012), and chronic sinusitis (p = 0.015), according to a search of the biobank Japan (BBJ) database25. The second identified significant association was for SNP rs79409173, located in the intronic region of WNK1, with PCo6 (p = 2.4×10− 8; Supplementary Table 3). These are interesting associations, given the involvement of nasal microbiome in asthma and chronic sinusitis.
Host genetics strongly associated with nasal bacterial or fungal taxa and functions
With the first largest cohort of whole genome and whole metagenome data, we next conducted M-GWAS on the nasal microbiome. The microbial taxonomy and function profiles were determined by aligning metagenomic reads to marker genes according to metaphlan3 and humann3, respectively (Supplementary Fig. 3a-b). After filtering highly correlated features, we obtained 293 independent nasal microbial features (Supplementary Table 4; 86 taxa and 207 functions; r2 < 0.995 using a greedy algorithm, Methods). The M-GWAS analyses were performed on 7 million human genetic variants (minor allele frequency (MAF) ≥ 1%) by adjusting for age, gender, body mass index (BMI), sequencing read counts, and the top ten host PCs. Our GWAS analyses did not demonstrate any evidence of an excess false positive rate (genomic inflation factors λGC ranged from 0.979 to 1.054, with a median of 1.012; Supplementary Fig. 4).
In total, we identified 180 independent associations involving 63 independent loci (distance < 1Mb and r2 < 0.2) and 46 independent taxa/functions reaching genome-wide significance (p < 5×10− 8; Fig. 2 and Supplementary Table 5). Out of the 63 loci we identified, 17 were associated with 14 microbial taxa and 47 were associated with 32 bacterial functions, with SNP rs138099463 near RND1 linking to relative abundances of both class Actinobacteria and PWY-5188: tetrapyrrole biosynthesis I from glutamate. Specifically, seven of the 63 loci were associated with at least two independent taxa or functions (Fig. 2 and Supplementary Tables 5). All genome-wide significant associations were listed in Supplementary Table 5. The 63 genome-wide loci explained a higher average fraction of variance (R2) for microbial functions (mean R2 of 8.7%; ranging from 2.49% for PWY-5138: unsaturated, even numbered fatty acid β-oxidation to 19.01% for ANAGLYCOLYSIS-PWY: glycolysis III (from glucose); Supplementary Table 6) compared to microbial taxa (mean R2 of 5.6%; ranging from 2.35% for species Malassezia globosa to 12.48% for class Actinobacteria).
With a more conservative Bonferroni-corrected study-wide significant p-value of 1.71 × 10− 10 (= 5 × 10− 8/293 for 293 M-GWAS tests), we identified 2 genomic loci associated with 3 nasal microbial taxa (Fig. 2).
The strongest association was observed for SNP rs73268759 located in the intronic region of the CAMK2A gene, with minor allele C (MAF = 0.021) positively associated with the presence/absence phenotype of genus Actinomyces (Fig. 3a-b; p = 7.75×10− 13) and family Actinomycetaceae (p = 2.06×10− 12; spearman r = 0.970 with Actinomyces). When further testing for the association of rs73268759 with the relative abundance of Actinomyces presented in 162 individuals, this association was more significant (Fig. 3c; p = 9.69×10− 15). The association was also replicated in the gut, with rs73268759 associated with the relative abundances of the gut-derived Actinomyces (Fig. 3d; p = 0.014). CAMK2A encodes a protein belonging to the Calcium/calmodulin-dependent protein kinase II (CAMKII), and its oxidation promotes asthma through the activation of mast cells26. The CAMK2A-associated two taxa, Actinomycetaceae and its main genus Actinomyces, are abundant commensals of the human oropharynx and have been increasingly associated with infections at many body sites27. We found these two taxa were also positively correlated with an increased risk of oral diseases (e.g., oral ulcers and caries), upper respiratory tract infection and urinary system infection (Fig. 3f; multivariable linear regression p < 0.05), when checking the correlation between the two taxa and phenotypic traits in this cohort. Collectively, these results supported a link between the genetic variation in the CAMK2A gene, abundances of the two taxa, and the inflammatory response to the upper respiratory tract.
The second strongest association was seen on rs35211877, which is a deletion of allele T (MAF = 0.057) located 163kb downstream of the POM121L12 gene. This deletion was negatively associated with Gemella asaccharolytica (Fig. 2; p = 1.13×10− 10). Furthermore, this deletion was also found to be associated with increased risks of asthma (p = 0.005), epilepsy (p = 0.005), and gastroesophageal reflux disease (p = 0.012) when searching the BBJ database. We also found that Gemella asaccharolytica had a positive correlation with stress sources (p = 1.07×10− 4), frequently allergic sub-health status (p = 0.003), and gastritis (p = 0.023), but a negative correlation with vitamin D and D2 levels in this cohort (Supplementary Fig. 5).
Cutibacterium was the most abundant genus in the nose (Supplementary Fig. 6), and it was linked to the SNP rs186899741 located in the intergenic region of OCSTAMP and SLC13A3 (p = 2.62 ×10− 8). This SNP was associated with 25-hydroxyvitamin-D3 (p = 0.001) in this cohort and type 2 diabetes (p = 0.006) in the BBJ. The secondary abundant genus Corynebacterium was linked to the SNP rs117538984 located in the intronic region of BARD1 (p = 1.44× 10− 9), which was associated with colorectal cancer (p = 0.002) and pulmonary tuberculosis (p = 0.01) in the BBJ. Interestingly, BARD1 is also known to interact with BRCA128, and has been reported to be elevated in the urine of breast cancer patients compared to controls29. In addition, we found several associations involving bacteria commonly detected in other body sites, for example, rs139288082 in UST with Streptococcus oralis (an oral commensal, belongs to the mitis group of streptococci and occasionally causes opportunistic infections such as bacteremia and endocarditis30), rs2716569 in the intergenic region of LINC01568 - LOC101928035 with Malassezia globosa (a traditionally known fungus for the skin microbiome, implicated in conditions such as inflammatory bowel diseases31, lung infections, and breast cancer32).
In addition to the associations with nasal microbial taxa, 46 host genetic loci were linked to nasal functions. For example, ANAGLYCOLYSIS-PWY: glycolysis III (from glucose) was linked to three gene loci, including PUM3, NELL1, and LINC02580. Both PUM333 and NELL134 exhibited the ability to regulate cell proliferation. COLANSYN-PWY: colonic acid building blocks biosynthesis was linked to multiple SNPs in the MEIS1 gene (p = 5.70×10− 10; associated with monocyte count35). PWY-5686: UMP biosynthesis was linked to LMF1 (p = 7.36×10− 10), which harbors variants associated with extreme respiratory outcomes following preterm birth36. These associations and their underlying interaction mechanisms called for further verification and investigation in future studies.
PheWAS and gene expression analysis for 63 microbiome-associated loci
To better understand the potential biological mechanism of the 63 nasal microbiome-associated variants (MAVs), we first explore their associations with diseases and traits. The phenome-wide association study (PheWAS) conducted on this cohort and the BBJ revealed that these 63 MAVs were enriched in 24 host-related traits and diseases: five diseases (asthma, type 2 diabetes(T2D), colorectal cancer, atopic dermatitis, abortion), glucose and low-density lipoprotein (LDL) from the BBJ cohort; six metabolic-related traits including chromium, phenylalanine, triglyceride, vitamin A, lymphocytes count and diastolic pressure; as well as health conditions (sleep quality, sub-health status) and lifestyles in this cohort (fisher exact test p < 0.05; Fig. 4).
As the 63 MAVs are mostly located in the intronic or intergenic region, we annotated their associations with gene expression in a recently published nasal airway epithelium transcriptome data37 and across the 49 tissues in the Genotype-Tissue Expression (GTEx) database38. 26 of the 63 MAVs were mapped to 33 genes (intronic or < 5KB upstream/downstream). We investigated the expressions of the 33 top genes across 50 tissues and found that over half (> 16) of the genes significantly demonstrated expression in seven of the 50 tissues (Fig. 5a). As expected, the nasal airway epithelium exhibited the strongest enrichment, with 23 of the 33 genes showing significantly expressed (p < 10− 4; Supplementary Table 7). The tissues such as thyroid, muscle skeletal, lung, testis, nerve tibial, and esophagus muscularis, which are also enriched tissues shared by the gut MAV eQTL target genes39, also showed enrichment. BARD1 (associated with the abundance of genus Corynebacterium) and LMF1 (associated with the abundance of PWY-5686: UMP biosynthesis) were the top two most expressed genes because of cumulative representation across 50 tissues (Supplementary Fig. 7). The strongest M-GWAS signal gene CAMK2A is enriched in 17 tissues. The hierarchical clustering of MAV top genes showed a unique gene expression pattern for the nasal airway epithelium (Fig. 5b), suggesting the nasal microbiome may have a genetic attribute distinct from the other body sites.
We further expanded the genetic associations with suggestive p < 10− 6 (Supplementary Table 8) and performed gene expression analysis in the tissues, followed by gene functional mapping and disease enrichment analysis with the FUMA40 platform (Methods). Those nasal MAVs of suggestive significance were mapped to 413 genes (< 5Kb for associated genetic loci). These genes exhibited enrichment for differentially expressed in multiple tissues including the most significant enrichment in the nasal airway epithelium and thyroid (Supplementary Fig. 8), in agreement with the findings using the genome-wide significant MAVs.
Gene functional mapping with FUMA found that the gene sets of suggestive MAVs were mainly enriched in calcium signaling and regulating hippo signaling pathways, by using the KEGG, Wikipathways, and GO databases (Supplementary Fig. 9). The disease-enriched analysis in the GWAS catalog using FUMA showed that the MAVs were enriched in 75 traits (pbon<0.01; Supplementary Fig. 10), with the strongest link to cardiometabolic related traits such as obesity-related traits, systolic blood pressure, diastolic blood pressure, body mass index, platelet count. It was also found that MAVs were involved in neuropsychiatric symptoms such as cognitive decline rate in late mild cognitive impairment, general cognitive ability, schizophrenia, dementia and core Alzheimer’s disease neuropathologic changes (Supplementary Fig. 10). Correlations between MAVs with asthma, T2D, and colorectal cancer were confirmed through both the PheWAS analysis of BBJ and the bigger GWAS dataset using FUMA. Links between MAVs and diastolic pressure and sleep phenotypes were also replicated in our cohort and that of in FUMA. Interestingly, MAVs showed correlations with smoking phenotypes (smoking status and smoking initiation), which have been reported to be key determinants of the airway microbiome41,42. This result suggests individual genes may influence the nasal microbes by making an individual genetically more likely to smoke or not, which needs to further validate in other cohorts due to the rarity of smokers in this current cohort.
Host genetics and other factors contributed comparably to the nasal microbiome
The above-identified 63 genome-wide significant loci could infer 9.51% of the variance in the nasal microbiome β-diversity. Applying association analysis for host genetic variants and microbiome β-diversity, we identified 21 loci with suggestive significance (p < 10− 5; Supplementary Table 9). Among the top index variants, eight were associated with Staphylococcus epidermidis (n = 4) or Corynebacterium accolens (n = 4), which were among the most abundant commensal in the nose (Supplementary Table 4) and might possess antimicrobial activity against pathogens43,44. The 21 top index variants that were most closely associated with β-diversity, while not reaching genome-wide significance, could explain 10.59% of the variance in the community structure, with each of them contributing from 0.34–0.70% of the variation (Supplementary Fig. 11). This fraction was lower than the proportion of host genetics influencing the gut microbiome (~ 20%) but close to that of the oral microbiome (10.14%~14.14%), as previously inferred in the same cohort18,19. This is expected as the nasal cavity is more open compared to the half-closed oral and fully closed gut environments.
After establishing the contribution of host genetics to nasal microbiome composition and functions, we investigated the extent to which environmental factors influence the nasal microbiome compared to genetics. Among the 340 host traits (age, gender, BMI, dietary, lifestyle, health status questions, and blood measurements), 45 were significantly associated with β-diversity (BH-adjusted FDR < 0.05), via PERMANOVA analysis (Supplementary Table 10). The five strongest associated factors observed were sex, serum testosterone and estradiol, serum iron concentration, and muscle mass, each explaining a variance of 0.65–1.47%. In total, the 45 significant host factors could infer 10.76% of the variance in the nasal microbiome β-diversity. We inferred that the contribution of host genetics to the nasal microbiome is comparable to the other host factors. Host genetics together with other host factors collectively explained 19.19% of the variance in the nasal microbiome community.
From observational correlation to Mendelian randomization for the nasal microbiome and host traits
Host factors greatly influenced the nasal microbiome composition, we further investigated the correlation of trait-microbiota pairs for the 293 unique nasal microbial features and 340 host traits using multivariate linear regression. After adjustment for gender, age, BMI, depth, and the top four PCs, we observed 520 significant associations (Benjamini–Hochberg (BH)-adjusted P < 0.05, Supplemental Table 11). The genus Elizabethkingia and its species E. bruuniana, E. miricola, as well as genus Serratia and its species S. grimesii, showed the most links to the host traits (n = 45, 38, 37, 31 and 35, respectively, Supplementary Fig. 12). In return, creatine, cystine, aspartic acid, chromium, magnesium, and cystathionine were among the metabolites associated with the largest number of microbial features (n = 42, 33, 22, 20, 20 and 17, respectively, Supplementary Fig. 12).
Since top associated genetics variants showed strong power as instrumental variables (Supplementary Fig. 13) and well explained for both the nasal microbiome and host traits (Supplementary Fig. 14, Methods), as well as the strong correlations between them, we next performed bidirectional one-sample Mendelian randomization analysis for the 520 observationally significant associations, aiming to reveal the potential causal relationships between the nasal microbial features and host traits. In total, we identified 157 suggestive causal effects with p < 0.05, of which 5 were significant after Bonferroni correction (p < 9.62×10− 5 = 0.05/520). Out of the 157 potential causal effects, 112 were from nasal microbial features to host traits, while the other 45 were from host traits to nasal microbial features (Fig. 6, Supplementary Table 12). Nine interplays of microbial features and host traits were bidirectional, including cystine positively correlated with 5 microbial features (Serratia, Serratia grimesii, Elizabethkingia bruuniana, PWY0-1319: CDP-diacylglycerol biosynthesis II, PWY-7117: C4 photosynthetic carbon assimilation cycle, PEPCK type) but negatively with 3 microbial taxa (phylum Firmicutes, class Bacilli, genus Staphylococcus), as well as Elizabethkingia miricola positively correlated with refined amine base succinate level.
Serratia and its species S. grimesii showed suggestive causal effects on a dozen of metabolites, particularly an increased cystine level (β = 0.42, p = 1.34×10− 5 for S. grimesii on cystine), which had a reciprocal effect (β = 0.31, p = 0.006 for cystine on S. grimesii; Fig. 6a). Cystine is formed from the jointing of two cysteine molecules, and cysteine metabolism genes have been reported to be widely present in the S. grimesii BXF145. S. grimesii was also causally linked to increased levels of homocysteine (β = 0.32, p = 0.002) and cystathionine (β = 0.26, p = 0.009), but reduced levels of aspartic acid (β=-0.26, p = 0.005) and glutamic acid (β=-0.24, p = 0.007). Further investigation of the genomics functional modules46 of S. grimesii confirmed it was widely involved in cysteine, cystathionine, aspartic acid (aspartate), and glutamic acid (glutamate) metabolic related pathways (Supplementary Fig. 15). In addition, three species, namely Serratia grimesii, Yokenella regensburgei, and Elizabethkingia bruuniana, presented consistent causal effects on multiple metabolites such as increased cystathionine but reduced glutamic acid concentrations. The genomes of these three species were consistently enriched in cysteine, cystathionine, aspartic acid, and glutamic acid related pathways (Supplementary Fig. 15). Staphylococcus, a bacterium universally present in the nasal passage and on the skin of humans, was causally related to decreased levels of cystine, magnesium, and refined amine base succinate. The opposite causal effects on metabolites were observed between Staphylococcus and Serratia grimesii, Yokenella regensburgei and Elizabethkingia bruuniana, suggesting that they represent distinct ecosystems even the potential microbial antagonism among them. This observation was also supported by a study that reported Serratia strains could inhibit the growth of Staphylococcus spp. by synthesizing the cyclodepsipeptide antibiotic serrawettin W147. The microbial pathway COLANSYN-PWY: colonic acid building blocks biosynthesis with genome-wide significant MAVs was positively associated with health conditions, including a sub-health performance of being vulnerable, feeling cold frequency in sleep, and urine blood test (positive).
Cystine showed causal effects on a dozen of nasal microbiota and pathways (Fig. 6b). Cystine is a major nutrient and multiple bacterial species could metabolize cysteine into H2S48,49. Cystine was remarkably associated with increased abundances of four bacteria, S. grimesii, Y. regensburgei, E. bruuniana and E. miricola in the observational analysis (BH-adjusted p = 6.13 × 10− 18 ~ 1.08 × 10− 26). Cystine also causally influenced these four bacteria in the MR analysis (p = 0.014 ~ 0.004). Moreover, the growth of these four bacteria could be well predicted under a given cysteine environment in simulations that used the gapseq model50 (Supplementary Fig. 16). Compared to the four bacteria, Streptococcus oralis that showed no correlation with the cystine level in the observational and MR analysis was not influenced by adding the cystine in the simulation (Supplementary Fig. 16). In addition to the cystine, serum aldosterone also increased the abundances of the Elizabethkingia and Serratia spp.. Glutamic acid negatively correlated with the level of PWY-5028: L-histidine degradation II. Cystathionine was positively associated with abundances of three microbial pathways (HISTSYN-PWY: L-histidine biosynthesis, PWY-5097: L-lysine biosynthesis VI, PWY-724: superpathway of L-lysine, L-threonine and L-methionine biosynthesis II). Refined amine base succinate was causally related to an increased abundance of Elizabethkingia miricola. These findings suggest that certain metabolites such as cystine, aldosterone, cystathionine, and glutamic acid play a crucial role in the host metabolism-nasal microbiota interplays.