16s rRNA Lung Microbiota Study in Mechanically Ventilated Patient: A Pilot Study

Characterization of the respiratory tract bacterial microbiota is in its infancy when compared to the gut microbiota knowledge. As key methodological steps can directly affect the accuracy of the results, it is crucial to determine a robust methodology in order to limit bias. Two different pairs of primers 515F-806R targeting the V4 hypervariable region of the 16SrRNA gene, “S-V4” for “Stringent V4” primer pair and “R-V4” for “Relaxed V4” primer pair, are respectively used in two ongoing international projects “the Human microbiome project” and “the Earth microbiome project”. We compared two methods of sample processing using these two different primer pairs for bacterial microbiota analyses of respiratory samples from critically-ill ventilated patients. For the later, Illumina 250 paired-end sequencing was done on a MiSeq platform after DNA extraction using mechanical lysis (bead-beating) and NucliSENS easyMAG. The concordance with conventional microbiology is the criterion of choice to determine the optimal method. from highly different Only a similar both the best agreement as compared to 44% for the primer pair. The main difference related Enterobacteriaceae, between analyses using Enterobacteriaceae detection poor for Among patients VAP, a decrease in alpha diversity in ETA was observed. mean pairwise Unifrac distance was higher inside this at the time of VAP diagnosis as compared to control


Abstract Background
Characterization of the respiratory tract bacterial microbiota is in its infancy when compared to the gut microbiota knowledge. As key methodological steps can directly affect the accuracy of the results, it is crucial to determine a robust methodology in order to limit bias.
Two different pairs of primers 515F-806R targeting the V4 hypervariable region of the 16SrRNA gene, "S-V4" for "Stringent V4" primer pair and "R-V4" for "Relaxed V4" primer pair, are respectively used in two ongoing international projects "the Human microbiome project" and "the Earth microbiome project".
We compared two methods of sample processing using these two different primer pairs for bacterial microbiota analyses of respiratory samples from critically-ill ventilated patients. For the later, Illumina 250 paired-end sequencing was done on a MiSeq platform after DNA extraction using mechanical lysis (bead-beating) and NucliSENS easyMAG. The concordance with conventional microbiology is the criterion of choice to determine the optimal method.

Results
Twenty samples from seven patients and four controls were sequenced. The two primer pair provided highly different results. Only 54% of the samples had a similar microbial composition with both pairs of primers. "S-V4" gave the best agreement with the conventional microbiology for ETA: 89% as compared to 44% for the "R-V4" primer pair. The main difference related to Enterobacteriaceae, which were concordant between conventional cultures and microbiota analyses using "S-V4". Enterobacteriaceae detection was poor for "R-V4". Among patients with VAP, a decrease in alpha diversity in ETA was observed. The mean of pairwise Unifrac distance was higher inside this group of patients at the time of VAP diagnosis as compared to control patients.

Conclusion
Accuracy of the bacterial lung microbiota composition was highly correlated to the pair of primers used for ampli cation of the 16s rRNA hypervariable sequence. Comparison of microbiota results obtained by sequencing and conventional microbiology allowed us to select the best option for further studies. This work validates our methodology based on 16SrRNA gene ampli cation with 515F-806R "S-V4" primer pair associated to Illumina® MiSeq 250 paired-end sequencing.

Background
Next-generation sequencing platforms have revolutionized our ability to investigate the microbiota composition and diversity of complex environments, including many human body sites. 16S rRNA gene sequencing is considered one of the two widest used methodologies for phylogenetic studies of bacterial communities. Since 2008, the National Institute of Health launched the Human Microbiome Project (HMP) to sequence and characterize the microbiome of healthy human subjects (http://commonfund.nih.gov/hmp) [1].
Despite remarkable progress in sequencing technologies that produce ever-larger data sets at ever-decreasing cost in metagenomics technology, no de nite consensus exists on the reference methods to adopt to study lung microbiota.
Several critical factors, including sampling, DNA extraction [2], primer sequences, ampli cation batches [3], and sequencing platform [4] can affect the accuracy of the results [5]. Many other challenges relate to bioinformatic analyses and add another level of complexity. Thus, bias can be present at each step of the sampling, sequencing and bioinformatic analyses.
The search for the optimal extraction method has aroused particular interest, and comparison of DNA yield, DNA purity, and most importantly representation of microbial diversity has been largely done [2] [6]. One of the most complex questions relates to the hypervariable (V) region(s) of the 16S rRNA gene to be ampli ed and sequenced, with an objective to identify the entire bacterial population and to accurately re ect the communities. Based on the results of many studies on fecal microbiota, researchers tend to favor the V4 hypervariable region. The International Human Microbiota Standards (IHMS) project aim to homogenize the methods in order to allow a greater comparability between studies [6]. However, the most informative hypervariable 16S rRNA region may differ from an environment to another, especially when high-throughput short-read sequencing technologies (for example, 454 and Illumina) are used [7]. The selection or design of the primer pair is also of critical importance [8].
Because of its diversity, the oral microbiome is a complex community to study, and the selection of which hypervariable region(s) of the 16S rRNA gene to amplify is crucial. Many attempts at identifying the most robust, unbiased and speci c V region have been published and researchers mostly used long regions such as V1-V2 [9] V1-V3 [10], V3-V5 [11], or V7-V9 [10]. However, all but one studies were performed on the pioneer but no longer used 454 pyrosequencing [12] and only few studies compared speci cs V regions after sequencing with Illumina platforms, which have now took the lead. [7] As compared to the gut microbiome, knowledge on the lung microbiome is much poorer, especially in the eld of intensive care. If lung and oral microbiota differ signi cantly during chronic pulmonary diseases [8], similarities are observed in healthy subjects [13]. Whether or not they differ in the context of acute pulmonary infection has not been addressed. To date, several unresolved methodological issues exist in the study of lung microbiota. Based on results from the two most recent methodological studies concerning mock DNA community or mock DNA extracted from the mock community cells [14] [5] it appeared that the V4 region could be the best region to amplify when high throughput sequencing is performed with Illumina Miseq plateform. If V4 region seems consensual, no methodological study addressed the question of the optimal pair of primers to choose for consistent and reliable results.
As a standardized approach for sequencing in situations where a comparison across multiple sequencing runs is required, we aimed to validate a robust methodology ahead of conducting longitudinal studies in ICU patients under mechanical ventilation at different time points.
In the present methodological pilot study, we wanted to compare two pairs of primers targeting the V4 region for 16SrRNA gene to assess oral and lung microbiome of infected and non-infected mechanically-ventilated patients.

Design of the study and ethical aspects
We designed a prospective observational study in the 12-bed ICU of Louis Mourier hospital (Assistance Publique -Hôpitaux de Paris, Colombes, France). All patients hospitalized in our ICU who required invasive mechanical ventilation were screened. Inclusion criteria were age over 18, intubation in our unit or within the six hours before admission, expected duration of mechanical ventilation above 72 hours with at least one of the following reasons for intubation: trauma with or without neurologic lesion, pulmonary sepsis, acute respiratory distress syndrome, and septic shock. Exclusion criteria were a known chronic lung disease (chronic obstructive pulmonary disease, emphysema, bronchiectasis, cystic brosis, lung transplant or a restrictive lung disease), an active lung tuberculosis, an invasive ventilation for more than 72 hours in the last six months, an immunosuppression (HIV with less than 200 CD4 lymphocytes/mm 3 , neutropenia below 500/mm 3 , treatment by immunosuppressive agents), and absence of consent to participate.
The investigator informed the patients or their surrogates about the study both orally and with a written document. In accordance with French law, written informed consent was not required. Patients or their surrogates were informed that they could decline to participate at any time, and their decision was recorded in patient les. The ethics committee of the French Intensive Care Society approved the study (reference CE . Before analyzing all samples, we aimed at determining the best methodology to apply for lung microbiota analysis. We thus designed this preliminary methodological pilot study. Samples of a subset of the rst included patients were selected. Control patients (C) were patients who were eventually ventilated no more than 3 days because of death or ventilation weaning. These patients were of no interest for the longitudinal study and were also selected for the methodological study. We also selected only three patients who developed a VAP (P) for this preliminary study, in order to keep the maximum of VAP patients included in the longitudinal study.

VAP de nition
We used the modi ed CPIS (clinical pulmonary infection score) when a VAP was clinically suspected [15]. A score above six with a bacterial growth of a pathogen above 10 4 cfu/mL in a bronchoalveolar lavage (BAL) de ned a VAP.

Samples
Two endotracheal aspirates (ETA) for studying lower respiratory tract (LRT) and one oropharyngeal swab (OS) for studying upper respiratory tract (URT) were sampled within the rst six hours after intubation and every 72 hours afterward. One ETA and the OS were immediately stored at -80 °C. The second ETA was sent to the microbiology lab for standard culture. When a VAP was suspected, a BAL was performed in addition to the other samples and one vial was stored at -80 °C. The other vials were studied in the clinical microbiology laboratory of Louis Mourier hospital in order to detect the largest set of bacterial species. For conventional microbiology, ETA were diluted to 10 − 3 or to 10 − 5 and plated on four different media: aerobic blood agar, anaerobic blood agar, BY-chocolate agar, Drigalski agar.
For control patients, only one couple of ETA and OS on day 0 (D-0) was selected. For VAP patients, two couples of URT and LRT samples, one on a day between 3 or 6 days before the onset of the pneumonia (D-before) and one on the day of VAP diagnosis (D-VAP) were selected.

Demographic and clinical data
Clinical data including demographic data, underlying conditions, admission diagnosis in ICU, community-acquired pneumonia (CAP) and VAP diagnosis, type and duration of antibiotic administration, length of mechanical ventilation, ICU length of stay (ICU LOS), ICU mortality were extracted from the electronic medical le.
Positive controls strains (PC) and mock community Five bacterial strains were selected to create a mock community. The strains were selected to represent a great diversity with ve different genera typically involved in nosocomial pneumonia ( Table 1).The different bacterial strains were grown independently in lysogeny broth (LB medium) with constant shaking at 220 rpm. Bacteria were grown in aerobic atmosphere in an INFORS HT Multitron during 18 h at 37 °C. They were then centrifuged and washed with saline solution. All cultures were individually diluted to obtain an optical density at 600 nm (OD 600 ) of 0.4 and then pooled. This pool constitutes a mock community that was used as a positive control (PC) in each batch of extraction. Speci c 16S rRNA gene PCR ampli cation Following bacterial DNA extraction, the V4 region of bacterial 16S rRNA gene was ampli ed using two Illumina MiSeq barcoded primer sets. Noteworthy, the V4 region has been chosen as it is discriminative enough to characterize microbial communities and permitted taxonomic assignment to genus or species-level in many prior studies [14].
Thermocycler conditions were as follows: heated lid 115 °C, 95 °C x 3 min, followed by 32 cycles of 95 °C x 30 s, 55 °C x 30 s, 72 °C x 30 s), followed by 72 °C x 5 min ad held at 4 °C. Thirty-two cycles was chosen as the best compromise between nonspeci c ampli cation and the greatest number of samples successfully ampli ed.
PCR products were puri ed using a 0.9 volume of AMpure magnetic bead-based puri cation system (Beckman Coulter, Inc, Atlanta, Georgia) and eluted in low Tris-EDTA buffer. DNA yield were quanti ed in puri ed samples using Qubit® uorometric quanti cation (Qubit dsDNA Hs Assay Invitrogen→). Successful speci c ampli cation and purity were veri ed using DNA 1000 high sensitivity reagents in 4200 Tape Station Agilent bioanalyseur (Agilent Technologies®).
16S amplicon preparation for next generation sequencing 16SDNA amplicons were normalized for the second PCR as recommended, except in case of small amounts of DNA yielded from the rst PCR. Second PCR reaction contained 5 µL of the rst PCR products, 1 µL Acutaq polymerase, 5 µL Acutaq 10x buffer, 2 µL dNTP, 5 µL of each Illumina sequencing primer (Nextera XT Index Kit v2 Set D, 96 indexes) and 27 µL DNase RNase free distilled water (UltraPure™ Invitrogen®). Thermocycler conditions on C1000™ Thermal Cycler (BIO-RAD®) were as follows: heated lid 115 °C, 95 °C x 3 min, followed by 12 cycles of 95 °C x 10 s, 55 °C x 30 s, 72 °C x 30 s), followed by 72 °C x 5 min and held at 10 °C. DNA yield were quanti ed in puri ed samples using Qubit® uorometric quanti cation (Qubit dsDNA Hs Assay Invitrogen→). Successful speci c ampli cation and purity were veri ed using DNA 1000 high sensitivity reagents in 4200 Tape Station Agilent bioanalyseur (Agilent Technologies®). Samples were prepared for sequencing using standard protocols as recommended by Illumina . First samples were normalized to 2 nM and pooled in an equimolar concentration, then denatured using NaOH 0,2N to obtain a denatured library at 20pM. Libraries at 5pM were obtained combining library at 20pM and HT1 (hybridation buffer) and were mixed with Illumina generated PhiX control libraries (5% of 12pM PhiX solution). Eighty-four samples were loaded at 5pM and sequenced in a single Illumina sequencing run, using the indexing strategy and 250-bp paired-end reads (V2 500 cycle kit) permitting high quality coverage of the 370 bp V4 amplicon. Median read lengths were 251 bp from each end; paired reads were ltered to require minimum overlap of 130 bp.

Bioinformatics analysis
The quality of all the sequences was systematically veri ed (additional le 1). Brie y, sequences were processed using Mothur pipeline version 1.39.5 [19]. Paired-end sequencing reads of 16S rRNA were rst merged and the resulting concatenates were then subsequently ltered and only those that met the following criteria were analyzed further : (a) sequence were at least 170 bp in length; (b) sequence were shorter than 265 bp; (c) had no ambiguous base; (d) had a maximum homopolymer length of 5. Chimeric sequences were ltered out using the chimera.vsearch function of mothur. Sequence alignment to the 16S database: silva.nr_v132 and then de novo operational taxonomic unit (OTU) clustering was performed on the basis of sequence similarity with a threshold of 3% using Mothur's opticlust algorithm.
Taxonomic assignment was also performed with mothur, up to the rank of family genus or species according a threshold identity of ≥ 80%. The resulting OTU table count was then ltered further by removing all OTU's represented by less than three couples of paired-end reads overall. Rarefaction was done by random subsampling with 5000 reads repeated 1000 times. All the script of the analysis is deposited on https://github.com/RespiratoryMicrobiomeMethods. All sequences are deposited in the European nucleotid archive under the study submission number XXXX (it will be determined soon).

Data analysis
Statistical analysis was performed in R v3.6.3 with the help of the tidyverse packages [20] [21] For each set of primers and for each samples, number of sequences initially obtained, number of sequences nally conserved for analysis and rarefaction curves were compared. Rarefaction curves revealed for each sample: (a) the log 10 of the number of read generated versus the number of OTUs identi ed (b) the log 10 of the number of read generated versus the Shannon's H' (Shannon diversity index). Patients samples were classi ed in three groups: control patients, VAP patients on day before VAP (D-before) and VAP patients on VAP day (D-VAP). For each patients LRT and URT are represented.
The relative abundance table ranked by order were represented by histograms, at the level of family for all VAP and control patients' samples, and at the level of genus for PC. In order to facilitate understanding and readability of the gures OTUs identi ed were presented at the family level, not at the genus level, and only families with relative abundances greater than 7% of the total are represented individually, those under 7% being grouped and labeled as 'other'. This threshold was arbitrarily chosen to keep the gures readable . The full table of all recovered taxa is available as supplementary table   (Additional le 2).
For all samples, results obtained with the two primer sets were compared to conventional microbiology. The concordance with conventional microbiology was the main criteria for primer pair validation and comparison and was retained if the pathogen(s) identi ed in culture was (were) identi ed by metagenomics. The concordance between the two primer pairs was also assessed and retained if the two most abundant OTUs were similar.
Microbial Alpha diversity was assessed and expressed as the Shannon diversity index (Shannon's H') for normalized numbers of sequences for each sample. The Shannon diversity index gives a measure of the species diversity in a given community while taking into account both the abundance and the evenness of species present in a community [22]. Beta diversity was given as weighted (and unweighted) Unifrac distance, and Bray-Curtis distance as these two are the most used indices and represent most features of beta diversity after being combined [23] [24].
All qualitative values are expressed as percentages and all quantitative values are expressed as median and interquartile range.

Subject characteristics, samples collection and DNA extraction.
Eight patients were initially included in this methodological pilot study. They represented a diverse set of underlying diseases and acute indications for mechanical ventilation. Five controls patients (C1 to C5) nally ventilated three days and three patients who developed a VAP were included (P1 to P3). One patient (C1) was nally excluded because of technical failure and the impossibility to perform speci c 16S rRNA gene ampli cation. The patient's characteristics are presented in the Table 2. All VAP patients developed a late-onset VAP, after the sixth day of invasive mechanical ventilation (additional le 3). For each control patient, a couple of samples, one from URT and one from LRT within the rst six hours of initiation of the mechanical ventilation, were studied. For each VAP patients two couples of samples, URT and LRT, on a day before VAP diagnosis (D-before), and on day of VAP diagnosis (D-VAP) were sequenced (Fig. 1).
Generally, ETA contained high amount of human host DNA, explaining why, despite high Qbit quanti cation results (between 10 ng/µL and 50 ng/µL), only a small amount of bacterial DNA could be ampli ed. In contrast, OS contained little amount of global extracted DNA (between 0,1 ng/µL and 10 ng/µL) which were su cient for speci c V4 ampli cation.
One positive control (PC) named PC1 and three negative control (NC) named NC1 to NC3 were also processed and sequenced. Overall, 24 samples were ampli ed and sequenced twice using both primer sets 515F-806R "S-V4" primer pair and "R-V4" primer pair.
2. Determining the optimal V4 primer pair between "S-V4" primer pair and the "R-V4" primer pair Assessment of sequencing quality For the "S-V4" primer pair and the "R-V4" primer pair the median number of bacterial reads used for taxonomic classi cation total were 121721 (29166 ; 150342) and 98983 (34324 ; 129888), respectively. Table 3 details the total number of sequences obtained, nally conserved for analysis and length of amplicon obtained per sample for each primer pair. The ratio of the median number of reads analyzed to the median number of total reads was higher for the "S-V4" primer pair 0.74 (0.38;0.85) than for the "R-V4" primer pair 0.64 (0.23;0.76). The median length of the amplicon obtained with "S-V4" primer pair is closer to the expected length of 440 bp than with "R-V4" primer pair with 435 vs 428 bp respectively. Sequence quality was similar with a mean Phred score of 30 per sequence and respectively 152/168 and 109/168 sequences passing the lter without warning considering median Phred score per sequence and mean Phred score across each base position. Details are presented in additional le 1. For "S-V4" primer pair and "R-V4" primer pair qualitative sequencing parameters are represented by number of total reads observed (column 1), number of bacterial reads conserved for analysis (column 2), ratio between these two parameters (column 3) and median length of the amplicon obtained (column 4). The length of amplicons expected theoritecally is 440pb. For control patient (C) day before diagnosis of VAP "D-before" is always D-0 which means day of intubation. Alpha diversity estimates are represented by number of OTU identi ed and Shannon's H'. For "S-V4" primer pair and "R-V4" primer pair qualitative sequencing parameters are represented by number of total reads observed (column 1), number of bacterial reads conserved for analysis (column 2), ratio between these two parameters (column 3) and median length of the amplicon obtained (column 4). The length of amplicons expected theoritecally is 440pb. For control patient (C) day before diagnosis of VAP "D-before" is always D-0 which means day of intubation. Alpha diversity estimates are represented by number of OTU identi ed and Shannon's H'.
Rarefaction curve for number of observed OTUs and Shannon index showed similar pro le for the "S-V4" primer pair ( Fig. 2A-B) and the "R-V4" primer pair (Fig. 2 C-D)  With the rarefaction threshold of 5.10 3 , number of OTU and Shannon index identi ed in each sample are compared for the two primer pairs and are noticeably different (additional le 4). For "S-V4" primer pair no OTUs were identi ed in the NC.

Concordance between 16S sequencing and conventional microbiology
The main criteria was the concordance between metagenomics and conventional microbiology for PC and for lower respiratory tract samples (ETA).
For PC, relative abundance of OTUs greater than 1% of total number of sequences were represented at the taxonomic level of genus on the Fig. 3. With the "R-V4" primer pair, Staphylococcus was not identi ed and relative abundance of Pseudomonas reached 98%. With the "S-V4" primer pair all ve bacteria were identi ed but their relative abundance were not strictly equal as Staphylococcus and Pseudomonas were less represented. For PC, "S-V4" primer pair provides the best agreement with conventional microbiology.
For the patient's samples, concordance between 16S sequencing and conventional bacterial microbiology was retained if the known bacterial species identi ed using conventional microbiology were also identi ed by metagenomics.
Of the ten lower respiratory tract samples, one was not cultured by conventional bacteriology and only one was sterile. The agreement between 16S sequencing and conventional microbiology was excellent with 89% (8/9) concordant samples for "S-V4" primer pair versus 44% (4/9) for "R-V4" primer pair. Major differences between the two primer pairs were due to the lack of identi cation of Enterobacteriaceae with the "R-V4" primer pair. For patient P1 on D-9 and for patient P2 on D-12 Serratia Marcescens and Escherichia coli were not identi ed with "R-V4" primer pair (Table 4). For P3, Escherichia coli was also not identi ed with both pair of primers. Staphylococci were also underrepresented with the "R-V4" primer pair.  NA: no culture was obtained for the sample or when there was a single species; ESBL: extended spectrum betalactamase; MV: mechanical ventilation Comparison for taxonomic composition and OTU relative abundance for all patients and respiratory tract site samples Sequences were classi ed to the lowest taxonomic designation possible. The relative abundance of bacteria classi ed at family taxonomic level, is represented for control patients in Fig. 4 and for VAP patients in Fig. 5 for both pairs of primers.
Agreement between the two primer pairs for all patients samples was retained if the two most abundant OTUs were similar.
For both primer pairs and considering all patients samples, three major phyla were found: 24% Proteobacteria, 43% Firmicutes and 24% Bacteroidetes. Metagenomics revealed nonpathogenic bacteria belonging to the genus of Veillonella, Bacteroides, Prevotella and Neisseria globally identi ed as "normal respiratory ora" or "mixed ora" or "oropharyngeal ora" in conventional microbiology (Table 4).
However, the composition of the microbiota was similar in only about half of the cases between the two primer pairs (additional le 5). For control patients, results were identical between the two primer pairs for C2, C4 and C5 but different for C3, especially for the lower respiratory tract due to the identi cation of Staphylococcaceae for C3 with "S-V4" primer pair.
Generally, for VAP patient on D-before results with both primer pairs were similar except for P1 because an additional OTU was identi ed at family level as Enterobacteriaceae with "S-V4 primer pair". For VAP patient on D-VAP, major differences appeared for P1 for the same reason. For P2, the differences were minimal and mainly due to the difference in relative abundance of each OTU, although the latter were identical in both cases.
3. Assessment of 16S rRNA gene pro ling for all patient samples with "S-V4" primer pair Assessing taxonomic composition and OTU relative abundance across respiratory tract site and time As "S-V4" primer pair provides the best agreement with conventional culture for PC and also with conventional microbiology for patients samples, URT and LRT bacterial composition was compared only with this primer pair. A crucial point was that for each patients URT and LRT pro les were different.
For VAP patients, more OTUs were identi ed in the URT as compared to the LRT on D-before and on D-VAP. On D-VAP, a single OTU was largely dominant in the LRT: Enterobacteriaceae for P1, Neisseriaceae for P2, and Streptococcaceae for P3. These dominant OTUs represented the pathogen responsible for VAP for P1 and P3 and were also present in the URT. For P2 patient Enterobacteriaceae that represented the pathogen responsible for the VAP was identi ed but represented less than 7% of all OTUs. It was thus represented in the 'other' category on Fig. 4A. For P1 patient Enterobacteriaceae became predominant in URT and LRT between D-0 and D-VAP whereas for P3 patient Streptococcaceae was not present in both samples on D-before and appeared on D-VAP (Fig. 5A-B).
Overall, the pathogen identi ed by conventional microbiology was not always found to be the most abundant in the microbiome by metagenomics analysis.
Assessing bacterial alpha diversity across respiratory tract site and time Alpha diversity was higher in URT as compared to LRT with an average Shannon of 2,5 [1,7; 2,7] and 1,8 [0,7; 2,8], respectively. For control patients at D-0, alpha diversity was higher in URT than in LRT except for patient C4. Indeed, all patients but C4 were admitted for community-acquired pneumonia (Fig. 6-A). Changes in diversity over time for VAP patients are represented in Fig. 6 (Fig. 7).
Considering the comparison between these three groups of samples weighted Unifrac distance was different only for LRT.
Weighted Unifrac distance was higher for VAP patients on D-VAP. For the later, there were a raise in weighted Unifrac distance between D-before and D-VAP only in LRT. This means that in LRT a speci c OTU for each patient become predominant between D-before and D-VAP. Results were similar using Bray-Curtis distance (additional le 6).

Discussion
While DNA sequencing continues to decrease in cost and more and more sequencing platforms become available, the number of studies investigating the microbiota of diverse environments have considerably increased. However, the difference in protocols and analysis between studies limits largely the comparison between publications. This heterogeneity in the methods is the main obstacle for generalizing conclusions and for eventually providing useful data for clinical practice. In addition, only few methodological studies were conducted on mock communities [5] [14] [2]. Methodological studies are thereby essential to determine the optimal method for each type of analysis, in order to de ne a consensual methodology.
Numerous factors in uence results of 16S rRNA gene amplicon sequencing. Previous studies have independently examined one speci c factor such as extraction procedure [25][26], the ampli ed region of the 16S rRNA gene or the choice of the sequencing platform [27]. Considerable differences occurred based on the sequencing platform: 454-pyrosequencing platforms, Illumina platforms or Ion Torrent platform [28]. Recently, Fouhy et al highlighted that differences occurred between data sets from two studies on a same mock DNA templates, likely due to a combination of factors including DNA extraction procedure, primer choice and sequencing plateform [5].
In contrast to the huge number of microbiome studies in the environment [29] or the digestive tract [6] [30], the pulmonary microbiota especially in the eld of respiratory infections remain largely unexplored. Due to its low microbial density, some speci cities exist concerning the methods and analysis for this particular microbiota. Indeed, bacterial density seemed to interfere with microbiota analyses at < than 10 6 bacteria per ml or DNA < 1 pg/ml [31].
Our preliminary work has revealed issues from targeting the V3-V4 hypervariable region using a 300-bp paired-end sequencing (V3 600 cycle kit) performed on Illumina Miseq plateform. Since V4 region provides one of the best accuracy in several studies [18] and in the most recent study of Fouhy et al [5], we decided to focus on this hypervariable region. The following step was to determine the primer pair to use. We thus compare two primer pairs based on the Human microbiome project and the Earth microbiome project (515F-806R primer pair).
This work is the rst methodological study concerning pulmonary microbiota performed on both the URT and the LRT.
Trying to identify the best methodology, we believed it was crucial to use different comparators or positive controls: an internal bacterial positive control (mock population) and conventional microbiology. Because many studies did not do so, it seems hard to interpret the results of many studies revealing bacterial genus or species which are usually not identi ed in respiratory samples [32] [33]. It is thereby impossible to determine which methodology or primer set give the most reliable results. In our work, the results obtained using the "R-V4 primer pair" and the "S-V4 primer pair" were strikingly different, the later providing overall much reliable data. First, the median length of the amplicon obtained with "S-V4 primer pair" (435 bp) was closer to the expected length of 440 bp. Second, the "R-V4 primer pair" was not able to reveal Staphylococcus genus with an important disequilibrium in favor of the Pseudomonas genus (almost 98% of the OTU identi ed while the three Enterobacteriaceae represented almost 2%). In contrast, the "S-V4 primer pair" allowed identi cation of all ve bacterial species present in the mock community. These results con rmed that Enterobacteriaceae are somehow di cult to identify [14] This is a crucial point for lung microbiome study because they represent one of the main etiology of VAP.
When comparing sequencing results with conventional microbiology, the "S-V4 primers pair" also provided a satisfying agreement between the two methods (89%). Obviously, the sequencing allowed identi cation of many more bacteria than the conventional microbiology, such as Prevotella, Neisseria or Fusobacteria, which are less cultivable.
Considering negative control, results were also discordant. For "S-V4 primer pair" no OTUs were identi ed as expected whereas few OTUs corresponding to non-clinical bacteria were identi ed with the "R-V4 primer pair". These OTUs could be real contaminants but more probably arti cial identi cation due to errors of ampli cation as accuracy of the "S-V4 primer pair" is better for PC.
For each patients URT and LRT pro les were not similar whatever the primer pair. We believe this is of high importance as many teams working on pulmonary microbiota based their results only on the URT microbiota and did not explore, for practical reasons, the LRT microbiota. [34]. Our study highlighted that this concept seems false for ventilated patients as for patients with chronic pulmonary diseases.
Thereafter we focused our analyses using the "S-V4 primer pair". Shannon diversity index I observed in our patients were concordant with previous studies (between 1.5 and 3 depending on the presence of a lung infection or not) [33][32] [35]. We observed a higher alpha diversity in URT microbiota than in LRT microbiota for control patient. Interestingly, as previous studies, we found lower alpha diversity in LRT microbiota on D-VAP than in D-before, meaning that alpha diversity tended to decrease in LRT samples over time in case of VAP development [35]. Of course, our data on only three patients are insu cient to describe a typical evolution but, using our methodology, we found results similar to previous larger studies [35] [36].
Finally, considering Weighted Unifrac distance, results were similar with both for VAP patient, with increased beta diversity from D-before to D-VAP. It can be explained by a more profound dysbiosis on D-VAP with the emergence of VAP bacterial pathogens with an advantage of growth, different for each patient. Zakharkina et al which found similar results, hypothesized that the bacteria responsible for VAP (Staphylococcus, Acinetobacter and Pseudomonas) would exclude other bacterial communities. [35] This methodological study has limits. First, the small number of samples limits the conclusion we may draw from this study. As this study has for main objective to validate a global methodology for larger cohort study we decided to limit samples number included. We have insu cient number of patients and samples to apply statistical test and show signi cant differences between URT and LRT or between the beginning of mechanical ventilation and the day of VAP development. This prevented us from drawing conclusions based on the analysis of alpha and beta diversities. Second, for several OTUs, there is no identi cation until the species level. Obviously, 16SrRNA gene sequencing is not the most suitable technique for identi cation at the species level and our results are concordant to previous results. This limit depends of the database used and as SILVA database is updated we can hope to identi ed more OTUs at the species level in the close future. Third, we focus only on PCR primer choice and our study exclude the potential effects of others aspects of sample processing. As there is no previous methodological study concerning lung microbiota it would have been interesting to evaluate all the potential bias to found the optimal methodology. However, as our global methodology leads us to process all type of samples (BAL, ETA and OS) with concordant results with conventional microbiology, we conclude that it was adapted to conduct larger cohort study. In addition, preliminary data con rmed that we are able to study the virome and the lung mycobiota using the same extracted samples (unpublished data).

Conclusion
This is the rst methodological study conducted on the LRT from ventilated patients in order to validate a suitable method for lung microbiota analysis. This study highlights the importance to evaluate the protocol and to confront metagenomics data with different positive controls such as a mock population and conventional microbiology.
Our methodology based on NucliSENS easyMAG total nucleic acid automated extraction, V4 16S rRNA gene ampli cation with the 515F-806R "S-V4 primer pair" and sequencing on Illumina® MiSeq plateform provided satisfying sequencing depth and identi cation of pathogens responsible for VAP at family and genus level. All samples provide from patients initially included in a cohort study untitled " Project Lung microbiota of ventilated ICU patients : towards a new understanding of nosocomial pneumonia". The ethics committee of the French intensive care society approved the study (reference CE SRLF 15-41). Informed consent was obtained from patients' next-of-kin and con rmed by the patients whenever possible.

Consent for publication
All patient (or next-of-kin by default) included were orally informed and consent obtained for participation and data publication.
Availability of data and material (data transparency) All sequences are deposited in the European nucleotid archive under the study accession numbers XXXX (will be done very soon). Analysis script has been uploaded on github.com. and is available at https://github.com/RespiratoryMicrobiomeMethods.
All supplementary experimental data are available in additional les linked to this article.