A Metagenomics Method for the Quantitative Detection of Pathogens Causing Ventilator- Associated Pneumonia

Sébastien HAUSER (  sebastien.hauser@biomerieux.com ) bioMerieux https://orcid.org/0000-0001-5076-2730 Vladimir Lazarevic Genomic Research Laboratory Maud Tournoud bioMérieux: bioMerieux SA Etienne Ruppé AP-HP: Assistance Publique Hopitaux de Paris Emmanuelle Santiago Allexant bioMérieux: bioMerieux SA Ghislaine Guigon bioMérieux: bioMerieux SA Stephane Schicklin bioMerieux SA Veronique LANET bioMérieux: bioMerieux SA Myriam Girard Genomic Research Laboratory Caroline Mirande-Meunier bioMérieux: bioMerieux SA Gaspard Gervasi bioMérieux Jacques Schrenzel Genomic Research Laboratory


Background
Hospital-acquired pneumonia (HAP) and ventilator-associated pneumonia (VAP) are common nosocomial infections causing high morbidity rates [1]. Optimal care for HAP/VAP requires prompt detection and identi cation of the causative pathogen.
Today, the "gold standard" method for the microbiological diagnosis of HAP/VAP is culture-based [2] and requires at least 24 h to identify the causative pathogen [3]. Molecular methods that are independent of cultivability may improve speed and analytical sensitivity [4] over conventional tests [5][6][7]. Clinical applications of metagenomic next-generation sequencing (mNGS) are currently drawing attention of infectious diseases specialists [8,9] and guidelines for NGS-based diagnostics have emerged [10][11][12].
However, the technical requirements for NGS diagnostics remain to be de ned. To control for the integrity of the reagents, equipment functionality, and the potential presence of inhibitors, the US Food and Drug Administration (FDA) [12] recommends the use of an internal control (IC), typically a "foreign sequence" co-extracted and co-analyzed with the sequences of the sample. Spiked nucleic acids [13] or exogenous bacteria [14] can be used as IC. Moreover, metagenomics analyses require control of the quantity and quality of the DNA extracted from the sample as well as of the concentration, purity and size distribution of the sequencing library [15]. These control steps waste sample volume and are time-consuming, expensive and tedious to make.
We propose here a sample processing control (SPC) that is spiked into the clinical sample and follows the entire analytical process. A positive SPC detection indicates that all steps of the sample processing were successful and that sequencing data can be safely interpreted. This concept is widely used for integrated PCR cartridge systems such as GeneXpert® (Cepheid) that contains Bacillus globigii spores or FilmArray® (bioMérieux) that includes Schizosaccharomyces pombe cells. However, these SPC cannot be directly transposed from PCR to mNGS assays. As mNGS relies on random sequencing of DNA fragments, the detection of SPC competes with the detection of pathogens, commensal microbiota ( ora) and the DNA of the patient. Therefore, the e cacy of the detection of SPC and pathogens depends also on the biological composition of each sample (relative abundance of microorganisms and the ratio of microbial to human cells) [16][17][18]. In addition, clinical samples may contain large quantities of dead bacteria and extracellular bacterial DNA [19]. The depletion of DNA from human cells and extracellular (human and bacterial) DNA is possible using selective lysis of host cells followed by endonuclease digestion. The SPC detection can then establish whether the mNGS is able to detect pathogens at a de ned minimal concentration.
Clinically-de ned thresholds of pathogen concentrations are used to distinguish infection from asymptomatic colonization [20]. For HAP/VAP diagnosis, culture-based thresholds are currently de ned at 1.0E+4 colony-forming units (CFU) per mL for bronchoalveolar lavage (BAL) samples and 1.0E+3 CFU/mL for mini-BAL samples [21][22][23]. Therefore, mNGS should also provide absolute quanti cation of the detected pathogens. Various experimental approaches have been proposed including cell counting by ow cytometry [24], normalization of bacterial relative abundance based on de ned cell numbers that are spiked into the samples before nucleic acid isolation [25] or use of spiked nucleic acids [26]. However, these designs cannot provide an absolute quanti cation of pathogens.
To control all processing steps and to quantify the abundance of the detected pathogen(s), we spiked samples with rehydrated BioBall® (bioMérieux) as a SPC. Diagnostic capabilities of the mNGS work ow were assessed in BAL samples using a panel of bacterial species that are commonly causing HAP/VAP infection.

Sample preparation and DNA extraction
Six hundred microliters of (mini-)BAL uid supplemented with SPC were mixed with 6 mL of Tris-HCl 50 mM (pH 8) containing PEG 4 % and saponin 0.08 % and incubated at room temperature for 10 min for the selective lysis of human cells. After saponin treatment, the non-lysed cells were pelleted (12,000 g during 10 min) and treated with DNase I (5,000 U, 37°C, for 15 min). The DNase was inactivated by heating (80°C for 10 min) and addition of EDTA (10 mM nal concentration). The sample was then added to a tube containing a mix of 1 mm glass beads (600 mg per 1.5 mL tube) and 0.1 mm zirconia/silica beads (150 mg per 1.5 mL tube) and bacteria were disrupted by shaking for 20 minutes on a vortex with horizontal tube holder. Nucleic acids were extracted from the lysate on an easyMAG® platform (bioMérieux) using the generic protocol (V2.0.1). Elution was carried out in a volume of 25 µL and the extracts were stored at -20°C. Human and bacterial DNA load were determined as described previously [18].

Library preparation and MiSeq® sequencing
Libraries were prepared for 2 x 250 paired-end sequencing with a modi ed protocol of the Nextera® XT DNA Library Preparation Kit (Illumina®).
To allow e cient preparation of DNA libraries for all extracts, the protocol was adapted to run with only a few picograms of DNA [27,28]. The following modi cations were made to the standard protocol: i) when the concentrations of extracted DNA were above 0.2 ng/µL, 1 ng was used for the preparation of the library; otherwise, a maximum volume of 5 µL of DNA extract was used, regardless of its concentration, ii) after tagmentation, 16 ampli cation cycles were applied to the DNA library in order to obtain enough material for sequencing, iii) the library was puri ed using 25 µL of AMPure XP beads (Beckman Coulter). Indexed libraries prepared from 2 samples were pooled at equal quantities before sequencing on the MiSeq® platform (see Figure 1) with the MiSeq Reagent kit V3 (Illumina) following manufacturer's instructions.

Bioinformatics pipeline
For identi cation of pathogens, sequence reads were analyzed using the metagenomics pipeline described by Tournoud et al. [29,30]. The detection of antibiotic resistance genes (ARG) was not assessed in this study.
Brie y, the bioinformatics pipeline consisted of the following steps: quality control of the reads, trimming and ltering of poor quality reads and taxonomic read binning with Kraken [31] using an internal reference database including more than 10,000 genomes from the 20 SOIs, B. subtilis, bacteria found in the lung and oral cavity and the human genome (Homo sapiens genome assembly GRCh37). Bacterial genome sequences were from both public (e.g. PATRIC [32], RefSeq [33], FDA ARGOS [34]) and private (strains sequenced from BAL samples) databases. The median number of genomes per species was 42 with interquartile-range equal to [19; 297] and the total number of genomes belonging to the ora was 7593. The performance of taxonomic classi cation of simulated sequence reads of SOI, B. subtilis and Human from the internal reference database is reported in Additional gure 1.
Pathogen detection and quanti cation relied on the number of reads attributed to each SOI. To avoid spurious pathogen detection due to erroneous taxonomic classi cation, reads associated to a SOI with an average genome coverage depth > 1x were assembled. Assembly was performed using the idba_ud500 assembler (IDBA-UD 1.1.1) [35], with the following parameters: --mink 40 --maxk 250 --min_pairs 2. To con rm species identi cation, the assemblies were aligned with BLAST [36] against a pathogen marker database built by selecting clade-speci c MetaPhlAn2 [37] and 16S markers (for H. alvei, P. vulgaris, and M. morganii, for which no MetaPhlAn2 markers were available) to allow unambiguous taxonomic assignment to the 20 SOIs. The BLAST algorithm was run with 75 % coverage and 97 % identity. When at least one marker of the tested SOI is detected, taxonomic assignment of the sequences to the SOI was con rmed. When a marker from another species was detected and not the one of the tested SOI, the tested SOI was invalidated to avoid false positive detection induced by miss-classi cation of the reads. When no pathogen marker was detected, mainly because of low-coverage assemblies, no con rmation or invalidation of taxonomic classi cation to SOI were reported.

Sequence data analysis and interpretation
The complete mNGS work ow is presented in Figure 1. For each sample, all SOIs were analyzed separately as independent detection assays. Detailed procedure and calculations are provided in the Additional le. Brie y, data interpretation consisted of 4 steps.
Step 1: SOI and SPC detection We de ned detection thresholds (DTs) to differentiate background from 'true' taxonomic classi cation of sequence reads as SPC or SOI within sequenced sample (see Additional le). These DTs represent the minimal quantity of classi ed sequence reads normalized per million of bacterial reads required to report SPC or SOI as detected.
Step 2: Calculation of absolute concentration or minimal detectable concentration of SOI As SPC is added at a de ned concentration (C SPC = 1.7E+4 CFU/mL) within each sample, it can be used as a calibrator.
-When SOI and SPC are detected (Figure 1), the absolute concentration of SOI (C SOI ) can be calculated as: -When a SOI is not detected (Figure 1), its minimal detectable concentration (Cmin SOI ) can be calculated if SPC has been detected: RN SOI and RN SPC correspond to the normalized quantity of sequence reads associated to SOI and SPC, respectively. L SOI correspond to the genome size of SOI.
Step 3: Comparison to the metagenomics threshold A metagenomics threshold (MT) was de ned (see Additional le) as the concentration in genome equivalent (GEq) of SOI above which infection can be suspected. When a SOI is not detected (Figure 1), the comparison of MT to the calculated Cmin SOI makes it possible to differentiate the absence of SOI in the sample from the inability to detect this SOI at a concentration de ned by the MT.
Step 4: Reporting In each sample, the result of detection of each SOI is reported separately. The rules of interpretation (see Figure 1 and Additional gure 2) de ne 4 possible reports: -Suspected infection, when a SOI is detected and quanti ed at or above MT or when SOI is detected but not the SPC -Suspected colonization, when a SOI is detected and quanti ed below MT -Absence of detection, when the SOI is not detected and the calculated Cmin SOI is < MT -Not interpretable, when both SOI and SPC are not detected or when calculated Cmin SOI is > MT.
PCR quanti cation of Staphylococcus aureus A standard curve was produced from genomic DNA of S. aureus strain MW2 at concentrations ranging between 1 pg and 1 ng and corresponding to 3.1E+2 to 3.1E+5 GEq.

Results
Absolute quanti cation of S. aureus We compared the quantity of S. aureus obtained by our mNGS process to the results of qPCR and culture quanti cations (Table 2) on 11 samples from the "training set". These samples were either culturepositive for S. aureus (n=10) or culture-negative but with S. aureus detection by mNGS (sample T07).  (Figure 2A). We observed a better correlation (R 2 = 0.9904) when we compared quanti cation results of mNGS and qPCR ( Figure 2B). Moreover, for sample T07 that was culture-negative, both qPCR and mNGS detected and quanti ed S. aureus above 1.0E+4 GEq/mL.

Validation of quantitative detection of SOI
We assessed the complete mNGS work ow ( Figure 1) with a "validation set" of 40 samples by comparing quantitative detection of SOI by culture and mNGS (Table 3). Table 3 : Detection of SOI(s) above clinical threshold by microbial culture or above MT by mNGS in the "validation set". Log10 of quanti ed pathogen concentrations in CFU/mL (for microbial culture) and GEq/mL (for mNGS) are presented in the corresponding columns. (>MT) means that SOI was detected but not the SPC, suggesting that detected SOI were likely present at a concentration above MT (Figure 1). (+) in 16S/MetaPhlAn2 markerscolumn means that taxonomic classi cation of sequence reads to a bacterial species is con rmed by 16S/MetaPhlAn2 markers search.

Control of the sample processing
The SPC detection failed (RN SPC < DT SPC ) in 4 samples (10 %) in which at least one SOI was detected as probable infecting agent (samples 3, 7, 23 and 35 in Table 3). In these samples, non-detected SOIs are reported as "not interpretable" (Figure 1), as the minimal detectable concentration may be above the MT. Quantitative detection of the SOI panel mNGS results were compared to the current "gold standard" for HAP/VAP diagnosis, i.e. the microbial culture (Table 4). Ten "true positive" detections of probable infection by SOIs and no "false negative" results ( Table 3 and Table 4) revealed a test sensitivity of 100 %. With 19 "false positives" (Table 3 and  Table 4), the test speci city was 96.8 % but the "false discovery rate" reached 65.5 %. Importantly, we con rmed proper taxonomic classi cation of 37 % (7/19) of the "false positive" detections by nding speci c 16S/MetaPhlAn2 markers. The 12 other "false positive" detections did not yield enough reads to check their taxonomic classi cation by 16S/MetaPhlAn2 marker search. Sample 27, which was culture positive for E. cloacae complex above 1.0E+5 CFU/mL, led to 7 "false positive" detections (Table 3) corresponding mostly to Enterobacterales. The presence of K. pneumoniae, detected at a concentration above 1.0E+6 CFU/mL, was con rmed by the 16S/MetaPhlAn2 markers search. But the other "false positive" SOI detections were quanti ed at 10 to 100-fold lower concentrations relative to K. pneumoniae and likely had too few reads for 16S/MetaPhlAn2 con rmation.

Detection of co-infections
Presence of a single infecting agent at a high concentration may preclude the detection of other SOIs and the SPC and may limit the detection of co-infections by mNGS. However, we were able to detect two (samples 6 and 39) or three (sample 29) co-infecting pathogens with absolute concentrations differing by up to 3 orders of magnitude ( Table 3). None of these mixed infections were reported by microbial cultures.

Discussion
Here we propose to spike BioBall® to control the processing of (mini-)BAL samples and to provide absolute quanti cation of detected SOI in the mNGS work ow. We selected B. subtilis as SPC because of its rare natural presence in BAL samples and the ability of mNGS to distinguish its sequence reads from those of SOIs, commensal ora and human genome (see Additional gure 1). The quantitative metagenomics assay and metrics (Figure 1) were developed using a "one system" approach. This means that all individual steps, from sample preparation to results reporting, are controlled by the SPC, thus eliminating the need for the fastidious steps required to control and quantify extracted DNA and sequencing libraries.
In a mNGS run, the detection limit of a SOI can be impacted by intrinsic factors including genome length and e ciency of its DNA extraction as well as extrinsic factors such as the accuracy of taxonomic classi cation and sample composition (host cell load, relative abundance of microorganisms, peculiarities of genome sequences for certain microorganisms) [19]. To avoid reporting false negative results, the detection of a SOI should only be reported as negative when Cmin ≤ MT (Figure 1). This was especially useful for the analysis of culture-negative samples in which, after patient's DNA removal, remaining DNA quantities were signi cantly below those recommended for library preparation and sequencing. While modi cations of the library preparation protocol allowed sequencing of these samples, It is interesting to note that despite the competition effect in detecting SOIs, our mNGS assay allowed to detect co-infections by two or three SOIs with concentrations ranging over three orders of magnitude. These co-infections were not detected by the routine cultures. Our results are consistent with previous observations that mNGS assays could be more effective in characterizing polymicrobial infections [39].
Qualitative detection of microorganisms by mNGS can re ect resident microbiota, transient colonization, sample contamination, and/or infection. To differentiate asymptomatic presence of bacteria from probable infection, the absolute quantity of pathogens has to be determined and compared to de ned clinical decision thresholds [5,40]. For that purpose, we used the counts of reads assigned to SPC as calibrator for quanti cation of SOIs ( Figure 2). Using S. aureus as an example, the results of absolute quanti cation by mNGS ( Figure 2B) were comparable to those of qPCR [41].
Clinical microbiology laboratories have de ned clinical decision thresholds for the HAP/VAP causative pathogen(s) in CFU/mL (mini-BAL: 1.0E+3 CFU/mL and BAL: 1.0E+4 CFU/mL) [20]. We did not nd obvious correlation between the number of genomes quanti ed by mNGS or qPCR and CFU counts from culture plates (Figure 2A). This may have different causes such as a lack of precision of the culture report that provides concentration of CFU at the nearest log level. A second cause could be the presence of viable but non-culturable (VBNC) cells in the sample from which genomic DNA remained detectable [42,43]. Therefore, we de ned the metagenomics threshold (MT) at 5.3E+3 GEq/mL to differentiate, similarly to the clinical decision thresholds, asymptomatic presence of bacteria from infection [5,40,[44][45][46].
Assessment of our mNGS process and de ned metrics on the HAP/VAP panel showed good diagnostic capabilities (speci city: 96.8 %; sensitivity: 100 %), albeit with a "false discovery rate" of 65.5 %. To avoid "false positive" detections that may result from a lack of accuracy in the taxonomic classi cation, additional control methods should be considered. In this study, we con rmed correct sequence classi cation to the SOI for at least 26 % of "false positive" detections using 16S/MetaPhlAn2 markers detection by BLAST. For the other "false positive" results, the quantity of reads was insu cient for sequence assembly required for BLAST-based 16S/MetaPhlAn2 markers search. Therefore, other controls should be considered such as the removal of reads that are stacked on a single location and share identity with human genome or with commensal ora, as suggested by Uprety et al. [47]. Nevertheless, in the presented mNGS work ow, no "false positive" tests seemed to result from (k-mer based) miss-  [40] that speci c MT should be established for each bacterium depending on its pathogenicity. Presence of VBNC and antibiotic persisters could also be taken into account for the setting of speci c MT. However, the evaluation of speci c thresholds for each SOI would require large numbers of samples and clinical data that were not available to us at the time of this study.

Conclusions
We present a new clinical metagenomics work ow for the detection of causative pathogen(s) of HAP/VAP (Figure 1). It includes the use of a SPC to control that all steps, from sample preparation to data reporting, are performed properly. Detection of the SPC spiked at concentrations slightly above the MT allows to determine whether pathogens are present at or above the MT. SPC was especially useful for validating the analysis of culture-negative samples that yielded only few reads. However, we could not validate 21.1 % of individual negative SOI detections, mainly because of the competition effect by reads generated when other SOI(s) were detected at high concentrations in the same sample. This should have a limited impact on the diagnosis as it would not affect the detection and identi cation of the major pathogen causing the infection.
We have also demonstrated that SPC can be used as a calibrator in mNGS, allowing absolute quanti cation of S. aureus GEq as e ciently as quantitative PCR ( Figure 2B). This allowed us to de ne a metagenomics threshold to differentiate colonization from suspected infection. Our quantitative mNGS process showed good diagnostic capabilities (speci city: 96.8 %; sensitivity: 100 %). The "false discovery rate" of 65.5 % could be due, at least in part, to a unique MT de ned for all bacterial species instead of individual MT [40]. However, some "false positive" mNGS results might also re ect the presence of VBNC and antibiotic persister cells, and their potential for infection recurrences [43]. Additional studies, including clinical assessments will be necessary to evaluate the diagnostic values of such "false positives" or to set speci c MT to each species.
Before implementing mNGS in routine clinical diagnosis, it is imperative to solve several limitations. One of them is a relatively long sample processing time that includes steps to control and quantify the extracted DNA and sequencing libraries. Our SPC allowed to skip these steps, reducing the hands-on time The DNA of the sample were extracted promptly after reception in the Genomic Research Laboratory of the Geneva University Hospitals. Remaining material were fully discarded.

Consent for publication
Not applicable Availability of data and material