Ail Mouse Population And Overall Study Design
We analyzed 242 lung tissue samples derived from the 15th generation of a previously established AIL population, as described by Belheouane et al [27]. In brief, the AIL consisted of MRL/MpJ, NZM2410/J, BXD2/TyJ, and CAST/EiJ mice (Jackson Lab, Maine, USA). To create a heterogenous intercross line, mice were intercrossed in equal strain and sex distributions [27, 42].
To map genomic regions associated with bacterial traits in the murine lung, we carried out a nested strategy to identify and map candidate resident taxa while minimizing the influence of potential contamination (see Methods). First, we screened for putative bacterial lung residents by analyzing 16S rRNA gene amplicon profiles at the transcript (RNA) level, thereby preferentially identifying live/active taxa. These data were further curated through the application of the “decontam” R package [34], which incorporates information from negative controls and absolute quantification of bacterial load to identify possible contaminants in metagenomic sequencing data. For this purpose, we applied ddPCR to obtain precise total bacterial load measurements. Next, a core measurable microbiota (CMM) was defined based on the processed sequences for further analysis. In a second step, we measured the bacterial loads of individual candidate taxa using ddPCR. Finally, linkage mapping was performed on the following panel of traits: (1) CMM based on conventional relative abundance estimates, derived from RNA template (herein: CMM-RA), (2) CMM based on relative abundance corrected by quantitative microbiome profiling (QMP; i.e., ddPCR estimates of bacterial load), derived from RNA template (herein: CMM-QMP), (3) ddPCR estimates of candidate taxa derived from DNA template (herein: ddPCR-DNA), and (4) ddPCR estimates of candidate taxa derived from RNA template (herein: ddPCR-RNA).
16s Rrna Gene Sequencing And Ddpcr To Define Bacterial Traits
To identify resident candidate taxa from the lung, we first performed 16S rRNA gene amplicon sequencing on both DNA and RNA reverse transcribed into complementary DNA (cDNA) as template (see Methods). After sequence processing, we determined the DNA-based data to be of insufficient quality/quantity, similar to other reports of lung samples [43]. However, rather than adopting a two-step PCR protocol as in Barfod et al. [43], which may be more prone to amplifying contaminants in low biomass samples [44], we instead narrowed the 16S sequencing data analysis to the transcript (RNA) level. This reflects metabolically active cells [45, 46], and may better reveal true resident lung bacteria interacting with the host, as demonstrated in our QTL mapping study exploring gene-microbe interactions in the skin of this same AIL [27]. To assess for potential contamination, we employed the “frequency” method from the “decontam” R package (v.1.4.0) using a threshold of 0.1 (see Methods). This analysis resulted in the removal of potential contaminant amplicon sequence variants (ASVs), including the common contaminants Halomonas and Shewanella. In total, we analyzed 8,414,939 sequences, and after normalizing sequencing coverage to 4,000 sequences per sample, a total of 20,772 ASVs remained in the data set. We first analyzed murine lung bacterial community composition at the phylum and genus levels based on 16S rRNA gene profiles (Fig. 1). The most abundant phyla include Firmicutes, Proteobacteria, Actinobacteria, and Bacteroidetes (Fig. 1A). The two most abundant genera classified to the genus-level include Lactobacillus, with a mean relative abundance of 13.20% and Pelomonas, with a mean relative abundance of 7.16% (Fig. 1B). These two taxa were thus additionally selected as bacterial candidate traits for targeted linkage mapping using ddPCR (see below).
Next, we defined a CMM [25, 26] (see Methods) consisting of 58 taxa ranging from the genus- to phylum-level, as well as 13 ASVs. The number of CMM traits represents a small fraction of the total number of lung microbiota (e.g., 1.74% of genera and 0.06% of ASVs), but their abundances represent 55.16% and 22.38% of the respective taxa at these levels.
Because 16S rRNA gene amplicon profiling is prone to biases and errors related to relative abundance compositional analysis [47], we performed QMP using ddPCR-based bacterial load estimates at the RNA level to account for these concerns. 16S rRNA sequence-derived relative abundances of the CMM traits were then subsequently transformed into corrected, absolute quantitative abundances [47] (see Methods), i.e., CMM-QMP traits. Thereafter, the quantitative profiles of the 58 taxa and 13 ASVs were included as an independent set of bacterial traits.
Finally, as indicated above, we selected two candidate bacterial traits, Lactobacillus and Pelomonas, for further independent analysis, as they were the two most abundant classifiable genera and are frequently identified as lung residents [43, 48, 49]. Droplet digital PCR is ideal for quantifying low biomass samples, as the process of fractionating a sample into thousands of individual droplets, in which independent PCR reactions occur, allows for the amplification of even very low levels of target strains [4, 50–52]. This was demonstrated by e.g., Gobert et al., who effectively measured low levels of Lactobacillus in fecal samples using ddPCR [50]. We thus generated load estimates for Lactobacillus and Pelomonas using genus-specific primers adapted for ddPCR (Table 3). This was performed at both the DNA and RNA level, as (i) ddPCR is sensitive enough to allow for absolute quantification and (ii) taxon-specific primers are expected to be less prone to contaminating taxa than universal PCR primers. Importantly, these specific estimates are significantly correlated to those based on 16S rRNA gene sequencing for both Lactobacillus (ddPCR-RNA vs. CMM-RA: Spearman’s r = 0.5848, p < 2.2 x 10− 16; ddPCR-RNA vs. CMM-QMP: r = 0.7131, p < 2.2 x 10− 16) and Pelomonas (ddPCR-RNA vs. CMM-RA: r = 0.2519, p = 7.411 x 10− 05; ddPCR-RNA vs. CMM-QMP: r = 0.2916, p = 3.796 x 10− 06), whereby the correlation is stronger for the CMM-QMP traits.
Summary Statistics And Evaluation Of Sources Of Variation For Lung Bacterial Traits
Prior to genetic mapping, it is important to determine whether (i) a given phenotype displays sufficient variation between individuals, and (ii) any residual variation remains after accounting for potential covariates such as cage, sex, or age. Summary statistics for each of the four categories of bacterial traits are provided in Additional file 1: Tables S1-3. This reveals considerable interindividual variation among traits. In particular, traits with large mean abundances, such as Lactobacillus and Pelomonas, display a large range, from 0-88.15% and 0-82.85% across the dataset for the CMM-RA measurements (Additional file 1: Tables S1-3).
Next, we evaluated the influence of host and environmental factors on bacterial trait variation in each of the four categories by constructing a mixed effects model that includes sex and age as fixed explanatory variables, cage as a random effect, and the bacterial trait values as responses (see Methods; Additional file 1: Table S4-6). These variables are associated with variation in trait values to varying degrees. For example, cage explains 0% of the total variance for Lactobacillus, but explains 28.83% of the total variance for Pelomonas among CMM-RA traits. However, importantly, the remaining residual variation in all four categories of traits accounts for the greatest proportion of total variance after accounting for sex, age, and cage effects. This suggests that other variables differing between individual mouse hosts, such as genotype, may contribute to variation in bacterial trait abundances.
Qtl Mapping Of Lung Microbiota Traits
To test for associations between resident lung bacteria and the host genome, we performed QTL mapping for the CMM-RA and CMM-QMP traits, as well as for the ddPCR-DNA and -RNA for Lactobacillus and Pelomonas. In total, this yielded seven significant (p ≤ 0.05) and six suggestive (p ≤ 0.1) host loci involving eight bacterial traits (CMM-RA: one significant, five suggestive; CMM-QMP: four significant, one suggestive; ddPCR-DNA/RNA: two significant), with narrow confidence intervals ranging from 0.08 to 3.59 Mb, with an average of 1.80 Mb (Table 1, Additional file 2). The number of protein-coding genes within these confidence intervals ranged from one to 36. For example, two significant associations were detected among the CMM-QMP traits at the genus-level, including Pelomonas and Streptococcus at chromosomes 11 and 16, for which the confidence intervals contained one and eight genes, respectively. For ddPCR-DNA and -RNA, Lactobacillus and Pelomonas were each significantly (p < 0.05) associated with a single genomic locus (Fig. 2, Table 1).
Table 1
QTL mapping statistics for CMM-RA, CMM-QMP, and ddPCR-DNA/RNA with significant associations.
Type
|
Taxon
|
Trait
|
Chr
|
Peak SNP
|
LOD score
|
Confidence interval
|
Size
|
Phenotypic variance assoc. with peak SNP
|
CMM-RA
|
Order
|
Enterobacteriales
|
1
|
UNC1088683
|
5.49
|
88.61–90.40
|
1.79
|
9.92
|
|
Family
|
Enterobacteriaceae
|
1
|
UNC1088683
|
5.49
|
88.61–90.40
|
1.79
|
9.92
|
CMM-QMP
|
Class
|
Deltaproteobacteria
|
1
|
UNC1143014
|
6.03
|
93.30–95.02
|
1.72
|
11.01
|
|
Genus
|
Pelomonas
|
11
|
UNC20541010
|
4.42
|
121.68–121.76
|
0.08
|
8.19
|
|
|
Streptococcus
|
16
|
JAX00427186
|
4.85
|
86.88–88.15
|
1.27
|
8.96
|
|
ASV
|
ASV8_Propionibacterium
|
14
|
UNC24933570
|
6.36
|
122.04–125.01
|
2.97
|
11.58
|
ddPCR-DNA
|
|
Lactobacillus
|
1
|
UNC1677482
|
6.38
|
131.79–133.44
|
1.65
|
11.43
|
ddPCR-RNA
|
|
Pelomonas
|
4
|
UNC6891976
|
5.74
|
21.98–25.57
|
3.59
|
10.35
|
Overall, the phenotyping performed for the two abundant core taxa Lactobacillus and Pelomonas was based on four different methods, which differ according to (i) primers (universal vs. taxon-specific), (ii) relative abundance vs. quantitative estimates, and (iii) DNA vs. RNA. To compare these methods, we generated four respective QTL profile plots for each of the three significant associations involving these two taxa (Fig. 3). For the Lactobacillus association on chromosome 1, only the ddPCR-DNA measurements revealed a significant association, which suggests that this region of the genome associates only to their cell number rather than their activity. In contrast, the Pelomonas QTL on chromosome 4 involves only the ddPCR-RNA estimates. Lastly, the Pelomonas QTL on chromosome 11 involves only the CMM-QMP estimates. A similar peak structure is apparent for the CMM-RA estimate, but does not reach the significant or suggestive threshold. Thus, with regard to the three categories mentioned above, the involvement of quantitative- over relative abundance information appears to be most important for the detection of gene-microbe interactions, while differences between primers and/or cell number vs. activity may lead to discrepancies between phenotyping methods.
Analysis Of Candidate Genomic Regions
The narrow confidence intervals afforded by the AIL population allowed us to identify several promising candidate genes. In Table 2, we report a list of candidate genes from significant QTLs and their functions, summarized from experimental evidence. Most genes within the candidate regions are related to immune response, inflammatory response, cell apoptosis, and/or DNA repair. A number of genes are notable due to their role in lung functioning and disease susceptibility. For the QTL on chromosome 1 associated with variation in Lactobacillus ddPCR-DNA, the Mk2 (mitogen-activated protein kinase-activated protein kinase 2) and the Il10 (Interleukin 10) genes are the two closest genes in proximity to peak SNP UNC1677482. In humans, MK2 is a downstream product of the p38MAPK pathway acting as a pro-inflammatory kinase, and induces various signals such as cytokines in response to lipopolysaccharide (LPS)- and virus-induced infections [53–56]. MK2 is involved in transcriptional and post-transcriptional regulation of cytokine expression and was previously shown to affect the stability of Il10 transcript [54]. IL-10, a well-known anti-inflammatory cytokine, is e.g., also involved in mitigating disease severity in Mycobacterium tuberculosis infection [57]. For the Pelomonas (ddPCR-RNA) QTL on chromosome 4, the peak SNP lies within an intergenic region. However, several genes are found within the confidence interval, including Pou3f2 and Mms22l. POU family transcription factors were previously shown to be highly expressed in small-cell lung carcinoma (SCLC) cell lines and to contribute to neuroendorcrine differentiation in non-small cell lung carcinoma (NSCLC) cell lines[58]. MMS22L was shown to accelerate cancer cell growth in lung cancer cell lines [59]. In contrast to the other candidate genes, Klhl32 is poorly characterized. However, recent work by de Vries et al. [60] identified KLHL32 as a protein-coding gene that strongly associates with DNA methylation levels of a specific CpG-site (a cytosine base located adjacent to a guanine base) in patients with chronic obstructive pulmonary disease (COPD).
Table 2
List of candidate genes within confidence intervals and their functions.
|
Trait
|
Gene
|
Functions (& references)
|
RA
|
Enterobacteriaceae / Enterobacteriales
|
Chrnd, Chrng
|
Nicotine dependence, smoking behaviors, lung cancer, COPD [61]; lung hypoplasia [62]
|
|
|
Ecel1
|
Restrictive lung disease (affected by respiratory muscles) [63]
|
|
|
Dis3l2
|
Lung function [64]
|
|
|
Kcnj13
|
Development & physiology of the respiratory system [65]; tracheal tubulogenesis [66]
|
QMP
|
Deltaprotebacteria
|
Per2
|
NSCLC [67, 68]; lung tumorigenesis [69]
|
|
|
Twist2
|
Lung cancer [70]; lung adenocarcinoma [71]; Pneumonia [72]
|
|
Pelomonas
|
Ptchd3
|
Asthma [73]
|
|
Streptococcus
|
Grik1
|
Lung metastasis (of colorectal cancer in vivo) [74]; lung cancer [75]
|
|
|
Bach1
|
Lung cancer [76–78]; cystic fibrosis [79]
|
|
|
Map3k7cl
|
NSCLC [80]; pulmonary cell development [81]
|
|
ASV8_Propionibacterium
|
Nalcn
|
Respiratory rhythm [82]; NSCLC [83]
|
|
|
Fgf14
|
Lung Adenocarcinomas [84, 85]; lung functioning & phenotype [86]
|
|
|
Ubac2
|
COPD [87]; cystic fibrosis [88]; asthma [89]
|
|
|
Zic2
|
Lung adenocarcinoma [90]; NSCLC [91]; SCLC [92, 93]
|
ddPCR-DNA
|
Lactobacillus
|
Mk2
|
Lung cancer [94]; inflammatory pulmonary diseases [95]
|
|
|
Il10
|
Tuberculosis [96]; asthma [97]; NSCLC [98, 99]
|
|
|
Il19
|
Antimicrobial defense in airway epithelial cells [100]
|
|
|
pIgR
|
COPD-like phenotype airway inflammation [101]
|
ddPCR-RNA
|
Pelomonas
|
Pou3f2
|
NSCLC [58]
|
|
|
Mms22l
|
Lung carcinogenesis [59]
|
|
|
Klhl32
|
COPD [60]
|
|
|
Fut9
|
Bronchopulmonary dysplasia [102]
|
Key: COPD: chronic obstructive pulmonary disorder, NSCLC: non-small cell lung cancer, SCLC: small cell lung cancer |
Evaluation of Lactobacillus in a Il10 knockout model
Given the promising association detected between Lactobacillus load (DNA) and a locus containing the well-known anti-inflammatory cytokine IL-10, we aimed to confirm this potential gene-microbe association in an Il10 knockout model using the same phenotyping method as for our mapping population. We accordingly performed ddPCR to quantify the Lactobacillus load at the DNA level using genus-specific primers in Il10+/+, Il10+/− and Il10−/− mice. Interestingly, we observe significant differences in Lactobacillus according to genotype, with Il10+/+ mice displaying higher abundances than Il10+/− and Il10−/− mice (Fig. 4).