COVID-19 severity and complications associated with low diversity, dysbiosis and predictive metagenome features of the oropharyngeal microbiome.


 Coronavirus 2 (SARS-CoV-2) infection and the resulting COVID-19 illness vary from asymptomatic disease, mild upper respiratory tract infection, pneumonia1, to a life-threatening multi-organ failure with case fatality rates ranging from 0.27–13.4%2,3. Despite increasing knowledge of the clinical and immunological features underlying COVID-191,4−6, biological variables explaining the course of infection and its severity remain elusive. At the entry site of SARS-CoV2, the oropharyngeal microbiome represents a hub integrating viral and immune signals at the start of the infection7–10. To evaluate the role of the oropharyngeal microbiome in COVID-19, we performed a multi-center, cross-sectional clinical study analyzing the oropharyngeal microbial metagenomes in healthy adults, patients with non-SARS-CoV-2 infections, or with mild, moderate and severe COVID-19 encompassing a total of 345 participants. Significantly reduced microbiome diversity and high dysbiosis were observed in hospitalized patients with severe COVID-19, which was further associated with a loss of microbial genes and metabolic pathways. In this cohort, diversity measures were also associated with need for intensive care treatments as major clinical parameters in COVID-19. We further applied random forest machine learning to unravel microbial features for segregating clinical outcomes in hospitalized cases, and observed oropharyngeal microbiome abundances of Haemophilus or Streptococcus species as most important features. These findings provide insights into the role of the oropharyngeal microbiome in SARS-CoV-2 infection, and may suggest new biomarkers for COVID-19 severity.


Main Text
The microbiome constitutes a signaling hub regulating host immunity, mucosal homeostasis and defense against pathogens 7 . The oropharyngeal and lung microbiome has been previously studied during respiratory syncytial virus (RSV) and in uenza respiratory tract infection [8][9][10] . At these body sites, the microbiome was found to modulate viral infections through adaptions of mucosal immunity or inhibition of mucosal virus adherence 11 , which eventually affect the clinical course of these infections 12 . Viruses can also facilitate pathogen colonization, eventually leading to bacterial superinfections and acute respiratory distress syndrome 13,14 . Therefore, we hypothesized that the presence of a speci c microbial ecology may restrict or facilitate SARS-CoV-2 infection and determine the severity and clinical course of COVID-19.
To study the role of the human oropharyngeal microbiome in SARS-CoV-2 infection and illness, we collected oropharyngeal biospecimens from a multi-center patient and healthy volunteer cohort in Germany (n=345). Between 30 March and 20 May 2020, we recruited healthy adult individuals, patients with mild, moderate or severe COVID-19, and SARS-CoV-2 negative patients with mild upper respiratory infections or with critical pneumonia. Demographic and clinical characteristics are depicted in Table 1. Mild COVID-19 was assigned to patients presenting with symptoms of upper respiratory tract infections that were not hospitalized (Extended Data Fig. 1). Patients with moderate or severe disease were hospitalized with lower respiratory tract symptoms. In our cohort, 66 cases with severe COVID-19 required intensive care unit (ICU) therapy, and 41 out of them deteriorated over the course of hospitalization and required invasive mechanical ventilation or extracorporeal membrane oxygenation (ECMO). Fifteen hospitalized patients (17.2%) died by the conclusion of our study. Extended Data Fig. 2 presents medical treatments and pre-existing conditions of these patients.
To characterize the oropharyngeal microbiome pro les in our cohort, we used shotgun metagenome sequencing on oropharyngeal samples from all 345 study subjects that were collected either during SARS-CoV-2 screening of suspected cases or during/after admission of patients with con rmed or suspected COVID-19 to one of the seven medical centers participating in this study. Moderate and severe COVID-19 was associated with a signi cant decrease or loss of the oropharyngeal bacterial diversity (Fig.   1a). Differences in microbiome compositions between each group were most apparent in a principlecoordinate analysis (PCoA) with signi cant separations according to COVID-19 severity (PERMANOVA: F=23.15, p =0.002; Fig.1b, Extended Data Fig. 3a for t-SNE). Color-coding the PCoA projection according to diversity separated low vs. high diversity states (Extended Data Fig. 3b). We further classi ed samples with taxonomic compositions highly unlike those of healthy control samples as "dysbiotic" 15 as measured by Bray-Curtis beta-diversity distances 16 (Fig. 1c) and observed that dysbiosis contributed signi cantly to the variation of the oropharyngeal microbiome (PERMANOVA F=42.8; p=0.002). Patients with moderate and severe COVID-19 showed a signi cantly higher dysbiosis index of the oropharyngeal microbiome compared to healthy controls, URT patients and mild COVID-19 patients, respectively (Fig.   1d). Antibiotic exposure and age had moderate effects on compositional diversity in the PCoA (Extended Data Fig. 3c, d). Patients admitted to the ICU because of severe respiratory distress other than COVID-19 also showed signi cantly lower diversity and higher dysbiosis indices of their oropharyngeal microbiome (Extended Data Fig. 4, 5 including clinical characteristics).
A highly diverse ecology of metagenomic species was observed in healthy controls, patients with non-SARS-CoV-2 URT or mild COVID-19 (Fig. 2a). To better understand the microbiome patterns across the different infection and disease states, we quanti ed the taxonomic groups that contributed to mono-domination events, which are de ned by a relative-abundance threshold of any single species of at least 30% 16 . As presented in Extended Data Fig. 6, Staphylococcus aureus, Rothia mucilaginosa and Enterococcus faecium were highly enriched species in patients with moderate and severe COVID-19, whereas Prevotella spp. were found to mono-dominate the oropharyngeal microbiome of healthy subjects and patients with mild infections. Hypothesizing the oropharyngeal microbiome is associated with disease severity, we investigated the taxonomic pro les of patients with mild vs. severe or moderate vs. severe COVID-19 and found Prevotella, Streptococcus and Haemophilus spp. to signi cantly discriminate these groups (Figure 2b; Supplemental Table with a complete list of taxa). Variance in the species pro les of hospitalized patients was only partly explained by antibiotic administration, but not by age and gender (Extended Data Fig. 7). Next, we analyzed genes and pathways of the oropharyngeal microbiome in our cohort, and found several microbial genes and pathways to be signi cantly enriched in moderate vs. severe disease, high vs. low alpha diversity, or low vs. high dysbiosis (Fig. 2c). Similar to the differences in taxonomy, a loss of genes and pathways, notably transposases, amino acid and DNA biosynthesis or sugar degradation pathways, characterized severe COVID-19, low diversity and high dysbiosis, as depicted in Fig. 2d.
In addition to characterizing the oropharyngeal microbiome across different COVID-19 severity states, we also aimed to study associations between individual microbiome features of the oropharyngeal microbiome and selected clinical parameters or outcomes. As diversity measure are increasingly used as microbial biomarkers for disease outcomes 16 , we assessed the associations of Shannon diversity and dysbiosis scores with treatment categories and outcomes in hospitalized patients using logistic regression analyses. ICU admission and need for mechanical ventilation or for ECMO therapy were signi cantly associated with microbial diversity or dysbiosis ( Table 2). The association of dysbiosis and mechanical ventilation or ECMO therapy remained signi cant after multivariable adjustment for age, gender, center, antibiotic and steroid treatment ( Table 2).
Next, we aimed to determine whether taxa, microbial genes and metabolic pathways as different features of the oropharyngeal microbiome are signi cantly associated with ICU admission, need for mechanical ventilation and mortality as major outcomes of hospitalized COVID-19 patients. We aggregated the microbiome and clinical data of patients with moderate and severe SARS-CoV-2 infection, and established a random forest (RF)-based machine learning model (Extended Data Fig. 8). To achieve an independent training and validation of the algorithm, the data were segregated into a training and validation cohorts based on geographic distribution of hospitals (hospitals in the Heidelberg area vs. those outside this region, respectively [see Extended Data Fig. 9 for clinical characteristics after strati cation]). Importantly, RF models trained for each of the clinical outcomes in the training set provided signi cant power to predict these outcomes also in the validation cohort of patients whose data were not included in the training of the RF algorithm (Fig. 3a, Extended Data Fig. 10a, b). Microbiome features were observed to discriminate the need for mechanical ventilation (mean error rate, 0.21; recall, 0.93; area under the receiver operating curve (AUROC), 0.84; Figure 3a), and ICU admission, although to a lesser extent. Incorporation of ICU admission, mechanical ventilation, Shannon index and dysbiosis score as inputs enabled further improvement of the segregation of mortality (mean error rate, 0.27, 0.27; recall, 0.86, 0.71; AUROC, 0.66, 0.67 for ICU admission and mortality, respectively; Fig. 3a).
Given the power of the obtained RF models to discriminate clinical outcomes based purely on the oropharyngeal microbiome, we analyzed the importance of species, genes and pathways in these models and observed partial overlaps of individual microbial features ( Fig. 3b; Gini importance > 0.05). As such, Haemophilus and Streptococcus spp. were among the most important taxa that segregated patients admitted to an ICU and received mechanical ventilation. An acetolactate synthase and pathways involved in DNA and amino acid biosynthesis were among the most important functional features to segregate these two clinical features (Fig. 3c, d; Gini importance > 0.1). Notably, upon strati cation into the training and validation cohort, the differences in alpha diversity and dysbiosis between moderate and severe COVID-19 persisted (Extended Data Fig. 10c, d, e). Furthermore, the signi cant associations of dysbiosis scores with ICU admission and mechanical ventilation were replicated in this setting (Extended Data Fig.   11).
Previously, COVID-19 severity has been linked to several immunological changes such as lymphocytopenia 17 , disinhibited releases of proin ammatory cytokines 6 , hyperactivation of monocytes and macrophages 18 , dampened type-I interferon (IFN) signaling 19 , or a hypercoagulable state 20 . In a recent case series of eight patients, the lung microbiome in COVID-19 (non-accessible to the majority of treated patients) was reported to be reduced in diversity and enriched with facultative pathogens 21 . Here, we also observed that the readily accessible oropharyngeal microbiome of hospitalized patients with severe COVID-19 was mono-dominated by facultative pathogens with a loss of microbial richness and high dysbiosis. The loss of (bene cial) species, e.g., bi dobacteria or Prevotella spp., in severe cases mirrored the loss of microbial genes and metabolic pathways in these patients. In this context, it is worth noting that the diversity within the oral microbiome has been previously found to be signi cantly associated with CD4/CD8 T cell ratios and with CD8 T cell activation in HIV-infected patients 31 . Whether such a loss of diversity and, in particular microbial genes, also affects systemic immunity against SARS-CoV2 and thereby facilitates the development of severe COVID-19 still needs to be determined. In general, a mild SARS-CoV-2 infection did not affect microbiome diversity or composition when patients presented during initial coronavirus testing, con rming recent results from an Italian cohort 22 . As we were not able to follow-up with this mildly affected, community-followed patient subset, potential microbiome contribution to their clinical course merits future studies.
A few machine learning prediction-based models have been attempted in COVID-19 infection, relying on highly variable serum biomarkers, cytokines, blood immune cell counts (representative studies and results are depicted in Extended Data Fig. 12). These analyses revealed AUC/AUROC scores ranging from 0.845 to 0.95 to predict COVID-19 severity or mortality, which are comparable to our AUROC scores, e.g., for mechanical ventilation as a major characteristic of severe COVID-19. Our RF models are purely based on microbiome signatures, but nevertheless these features were suitable to disriminate clinically critical endpoints, severity of the diseases and therapeutic features such as ICU admission, mechanical ventilation and mortality. Due to its cross-sectional design with oropharyngeal microbiome collection close to hospital admission, the predictive power of our observations may be limited to hospitalized COVID-19 patients and also requires independent con rmation.
With these limitations notwithstanding, utilization of the readily accessible oral microbiome signature at admission, as noted in our study, may enable data-driven patient strati cation, leading to improved follow-up and tailored management from earlier disease stages, optimization of allocation of critical care supplies and staff, and enhanced ability to safely relocate at-risk patients to centers with maximum care level prior to clinical deterioration.

Study cohort description
Recruitment. Seven German medical centers participated in the recruitment of inpatient subjects (both sexes, 37-85 years of age) with laboratory con rmed COVID-19 (n=93): University Clinic Heidelberg (Thoraxklinik and Department of Gastroenterology and Infectious Diseases), University Hospital Mannheim, University Clinic Regensburg, University Clinic Frankfurt, Klinikum rechts der Isar of Technical University Munich, and University Heart Center Freiburg. As a control for patients with severe COVID-19 that were admitted to an intensive care unit, we included oropharyngeal biospecimen from 30 SARS-CoV-2 negative patients (both sexes, 29-94 years of age) that were treated at the ICU at the University Clinic Heidelberg because of pulmonary infections and respiratory distress (see Extended Data . Symptomatic patients who were tested positive for SARS-CoV-2 infection without evidence of pneumonia or hypoxia were categorized as mild COVID-19. These were largely patients that were screened for the infection in the outpatient test center. Patients admitted to the hospital with signs of pneumonia but no signs of severe pneumonia (including SpO 2 ≥ 93% on room air) and tested positive for SARS-CoV-2 were considered to have a moderate disease. Severe COVID-19 was assigned to cases with severe pneumonia (respiratory rate > 30 breaths/min; or SpO 2 < 93% on room air), acute respiratory distress syndrome, sepsis or septic shock. For six patients in the severe COVID-19 group that received mechanical ventilation, we do not have gender, age or clinical outcome information. Clinical outcomes were monitored until June 8 th , 2020.
Observational study design consideration. This manuscript describes an observational cohort study and was prepared, where applicable, according to the reporting recommendations for observational studies (STROBE Statement) 23 . The oropharyngeal specimens were collected prospectively between 30 March 2020 and 20 May 2020 at the different study centers either with the goal of microbiome sequencing and CoV-2 detection, or with the goal of assembling biospecimen banks that would facilitate many different analyses.

Specimen collection and processing
Collection. Oropharyngeal (throat) swabs for microbiome analysis were collected upon testing in an outpatient COVID-19 test center using a DNA-RNA collection kit (Zymo Research, CA, USA). The swabs were collected in addition to diagnostic swabs for laboratory con rmation of SARS-CoV-2 infection. For patients admitted to the hospital oropharyngeal specimen for microbiome analyses were collected primarily at admission day again with the Zymo DNA-RNA collection kit; 15 samples from the University Clinic Regensburg were collected by pharyngeal rinses with sterile water. The rinse solutions were centrifuged, and the pellet was resuspended in an RNA-DNA stabilizing buffer. The oropharyngeal specimens from healthy volunteers were also sampled using the Zymo Research collection kit, and these biospecimens were used for both virus detection and microbiome sequencing. SARS-CoV-2 RT-PCR. Laboratory con rmation of SARS-CoV-2 for hospitalized patients was performed in certi ed diagnostic departments of the participating university hospitals. RT-PCR assays were performed in accordance with the protocol established by the WHO / Dr. Drosten (Charité, Universitätsmedizin Berlin, Germany) 24 applying an assay kit from Tib-Molbiol (Berlin, Germany) with the E and RdRP genes as RT-PCR targets. The same method was applied to assess the SARS-CoV-2 status of outpatients and healthy volunteers after RNA isolation using the Viral RNA Mini kit (Qiagen, Hilden, Germany).

Metagenome sequencing
For microbiome analyses, DNA was extracted from 200 μL of one aliquot of collected swab material using the QIAamp Fast DNA tissue kit (Qiagen) as per manufacturer protocol. DNA concentrations were measured using a Qubit 4.0 uorometer with a Qubit™ 1X dsDNA HS Assay-Kit (Life Technologies, Grand Island, NY, USA, Cat. No Q33230). Metagenomes were generated from the resulting DNA, and wholegenome libraries were prepared as follows. DNA was normalized to a concentration of 0,2 ng/µl. Illumina sequencing libraries were prepared from 1 ng DNA using the Nextera XT DNA Library Preparation kit (Illumina, San Diego, CA, USA, FC-131-1096) according to the manufacturer's recommended protocol, with reaction volumes scaled accordingly. Concentrations for each pooled library were determined using Qubit™ 1X dsDNA HS Assay-Kit (Life Technologies, Grand Island, NY). Prior to sequencing, libraries were pooled by collecting equal volumes of normalized libraries (2 µl) from batches of 96 samples. The resulting multiplexes were sequenced on Illumina NovaSeq 6000 to obtain 100 bp paired-ends reads with Unique Dual Barcodes to yield ~30-80 million paired end reads per sample. Post-sequencing demultiplexing and generation of FASTQ les was generated using the bcl2fastq Conversion Software (Illumina, San Diego, CA, USA).

Analyses of microbial sequencing data
Metagenomic data processing. Sequencing reads from each sample in a pool were de-multiplexed based on their associated IDT barcode sequence. Next, in order to remove host contaminant reads, sequencing reads mapping to the human genome were rst ltered out using KneadData version 0.7.2. Taxonomic pro les of shotgun metagenomes were generated using Kraken 2 version 2.0.8-beta 25 26 . Here, reads are mapped against a UniRef-based protein sequence catalogue to extract the abundance pro les of genes and gene families (UniRef90). Rarefaction of the metagenome data was done at described below.
Microbiota diversity indices. Alpha-diversity is a mathematical value that summarizes a microbial community according to the count of unique species within a sample and how evenly their frequencies are distributed in the community. Here we calculated alpha-diversity using the Shannon index at the species level that weights also relatively rare abundant species 27 . To analyze differences in the microbiome composition (i.e., beta-diversity) between samples, we performed a principle coordinates analysis (PCoA) based on species-level Bray-Curtis dissimilarity. These differences were also visualized using the t-distributed Stochastic Neighbor Embedding (t-SNE) algorithm where samples with similar compositions are located relatively close to each other, and clusters of distinct microbiome compositions can be separated 16 .
Dysbiosis index. The extent of microbiome dysbiosis in each sample was quanti ed based on Bray-Curtis distances, following a previously described strategy 15 . First, the set of all samples was subjected to principal coordinate analysis (PCoA) using Bray-Curtis distances. Next, the centroid of the healthy control samples subset was derived as the median of their values along all PCoA axes. Then, the dysbiosis score of each sample was calculated as the Euclidian distance between its position in the PCoA space and the healthy control centroid.
Quanti cation and statistical analysis Association testing. We assessed differences in relative abundances for species between two different subgroups by Wilcoxon tests and applied multiple hypothesis testing correction by the Benjamini-Hochberg False discovery rate (FDR) method as implemented in SIAMCAT R package 28 . We also performed a confounder analysis herein computing associations using Fisher's exact tests or Wilcoxon tests (for discrete or continuous variables) between different meta-variables and species abundances and illustrated the variance in relative abundance as explained by different meta-variables. For association analyses, taxonomic input data were ltered using abundance ltering with 10 -3 as cutoff for minimal abundance and after log-transformation. To associate microbiological taxa with age, gender or antibiotic treatment as clinical cofactors we used a multivariate statistical linear regression analysis by the R package MaAsLin2 with default parameters (abundance ltering with 10 -3 , min. prevalence 0.1, arcsine-sqrt transformation of relative abundances to normalize microbiome pro les) 29 . Statistical signi cance was determined by FDR multiple hypothesis testing correction p< 0.05.
Logistic regression. Univariate and multivariate logistic regression analyses were used to predict severity of COVID-19 and clinical outcomes strati ed by Shannon alpha diversity or the dysbiosis index of the oropharyngeal microbiome of the COVID-19 inpatient cohort (i.e., patients with moderate and severe . Shannon, dysbiosis index and age were used as continuous variables. were trained to predict one clinical outcome -either ICU admission, mechanical ventilation or mortality based on one type of microbiome features -i.e. either taxa, gene, or pathways. Hence 9 different combinations of feature type X outcome type were addressed. For each such combination, 100 forests were trained independently, each containing 10000 trees. For each forest, the data was divided randomly and equally to a training set and testing data sets, such that each set contains an equal number of the two possible states of the given outcome. For reproducibly, the randomization seed was set sequentially to a seed value from 1 to 100 for the 100 forests. In the second layer, RF models were trained to predict the ICU and ventilation clinical outcomes using outcome-speci c sets of input features, combining together the taxa, gene and pathway features that had Gini importance > 0.01 for the prediction of the given clinical outcome in the rst layer, as well as Shannon index and dysbiosis index as additional features. In the third layer, RF models were trained to predict the ICU and ventilation clinical outcomes, using the features that had Gini importance > 0.01 for the prediction of the corresponding clinical outcome in the second layer. For the ventilation outcome, the RF model that was trained at the third layer was taken as the nal predictive model. For the ICU and mortality outcomes, an additional, forth, layer was applied, including as input features for each sample the percentages of trees voting for ventilationpositive and ICU-positive states in the corresponding random forests of the third layer, as well as the Shannon and dysbiosis indices. The two RF models that were trained in this forth layer were taken as the nal predictive models for the ICU and mortality outcomes, correspondingly. The performance of the random forests models were assessed by (i) confusion matrices, enumerating the total number of true-    Table 4). c, Volcano-plots presenting fold changes in gene and pathway fractional abundances of the oropharyngeal microbiome of moderate and severe COVID-19 patients strati ed according to severity, high vs. low alpha-diversity (Shannon) and low vs. high dysbiosis. d, Heatmaps of fold changes of individual microbial genes (left) and pathways (right) found signi cantly under-represented in all the 3 compared strati cations in the high severity, low Shannon diversity and high dysbiosis samples. Colorcodes indicate the log2 of the ratio (low / high severity, low / high Shannon and high / low dysbiosis) between the means of relative abundances of the two compared groups, con ned by assigning -10 to cases having lower values.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. ExtendedData.docx TableSOM.xlsx