Overview of the samples and sequencing data
Metatranscriptome sequencing was conducted on 588 oropharyngeal swab specimens (OPs) collected from 192 severe COVID-19 patients. Consecutive OPs were obtained on day 1, 5, 10, 14, 21, and 28 after admission when the patient's condition allowed (Fig. 1A). Most patients (147) recovered within 28 days after admission, while 38 patients died within 28 days. The OPs were collected at least twice from 177 patients (Fig. S1A). For comparison, OPs were collected from 95 healthy individuals without any pulmonary diseases in the community from Wuhan in April, 2020. Deionized water was used as the negative control (NC), which was processed following the same protocol as for clinical samples.
The median age of the patients was 58 (IQR 49, 68), and 78 (40.6%) of them were females, while the median age of the HC was 47 (IQR 33, 61) and 57 (60%) of them were females. The median days from symptom onset to the recruitment of the patients were 13 (IQR 11, 17) days. A large fraction of patients had at least one comorbidity or underlying health condition, which included hypertension (64), diabetes(22), heart disease(14), etc. All but four patients took antibiotics. We identified multiple risk factors, including age, severity on admission (severity-A), and corticosteroid use, significantly correlated with the fatality (Fig. S1B).
OPs were sequenced to a median number of reads of 33.0 million (IQR 9.3 million, 63.9 million). After filtering out the low-quality reads, host reads, and ribosome RNA reads, the median number of reads mapped to the microbial genome was 1.0 million (IQR 0.2 million, 2.7 million). We required a minimum of 1,000 microbial reads for the following analysis, which ensured that at least 92% of the microbes with relative abundance (hereinafter referred to as "abundance") greater than 0.1% could be detected (Fig. S1C); 677 samples were qualified.
Microbiota composition in the upper respiratory tract of COVID-19 patients
Bacteria were the most abundant microbes in the OPs of the COVID-19 patients, which accounted for 97.02% of the total microbial reads, followed by fungi (2.76%), viruses (0.22%), and archaea (0.006%). The abundance of fungi in COVID-19 patients was significantly higher than that in HC (median abundance: 2.76% vs. 1.08%, p<0.001), which was mainly contributed by the elevated level of Candida in COVID-19 patients (1.87% vs. 0.0003%, p<0.001). The microbial community was mainly composed of Streptococcus, Veillonella, and other known commensal microbes in both COVID-19 and HC (Fig. 1B). The overall microbial composition was different between COVID-19 patients and HC (R2=0.05, p<0.001 on admission; R2=0.04, p<0.001 before discharge/death; PERMANOVA test, Fig. 1C). Rothia and Actinomyces were significantly enriched in COVID-19 patients, whereas Streptococcus, Capnocytophaga, and other seven genera were more abundant in HC (Fig. S1D). Notably, the microbiota in deceased patients significantly differed from that in recovered patients, which was more remarkable before discharge/death (R2=0.04, p<0.001 on admission; R2=0.10, p<0.001 before discharge/death; PERMANOVA test). Additionally, we found no systematic difference between Lopinavir-Ritonavir recipients and standard care controls (R2=0.0014, p=0.56, PERMANOVA test).
The hierarchical clustering algorithm identified 11 clusters (CSs) (Fig. 1D). COVID-19 samples were assigned to all CSs, while HC samples was assigned to CS1, CS3, and CS9. Most samples from patients (403/571) and HC (75/88) were assigned to CS1, which was dominated by Streptococcus. CS7 and CS8 were enriched for potential pathogens Enterococcus and Candida, and all samples in these two CSs were collected from patients, thereby indicating a possible co-infection or secondary infection. CS2, 4-6, 9-10 were dominated by Prevotella, Lautropia, Schaalia, Rothia, Capnocytophaga, and Veillonella, respectively, all of which were known commensal microbes in the oral cavity or URT. In particular, CS11 consisted of outliers that cannot be assigned to any other CSs, which were mostly dominated by opportunistic pathogens (Fig. S1E). All clusters but CS3 showed a great dissimilarity to NCs (Fig. S1F). We found that CS3 was more likely to be observed in samples from the same individual (p<0.05, Fig. S1G), reflecting a relatively barren state of microbiota in these samples.
OPs from the same individuals showed a higher similarity than that from different individuals (Fig. 1E), indicating the existence of an individual-specific microbiota. Meanwhile, samples from deceased patients showed larger variance among samples, suggesting that the URT microbiota's alteration was more evident in severe cases. Besides, the similarity between the microbiota of samples from the same individual was negatively correlated with the sampling interval in recovered patients, which may imply a gradual restoration of the URT microbiota (Fig. 1F).
Association between the upper respiratory tract microbiota and the fatality
To better examine the association between microbiota and clinical features and outcomes, we defined two critical time points, T1 and T2, which referred to the first time point within five days after admission and the last time point within five days before discharge or death, respectively. T1 samples were available for 181 patients (65% samples were collected on the first day of admission, 88% samples were collected within three days after admission), T2 samples were available for 157 patients (80% samples were collected within three days before discharge or death), and 144 patients had both T1 and T2 samples. Among seven demographic and clinical features, the fatality was the only feature significantly associated with the microbiota at both T1 and T2 (Table 1).
All three differential analysis (GAMLSS regression, linear model regression, and LEfSe) identified Streptococcus as the most enriched microbe in the recovered group at T1(Fig. 2A, Fig. S2A,C), which was also true for T2 but was ranked behind two low-abundance bacteria in GAMLSS results (Lautropia and Atopobium, Fig. 2B, Fig. S2B,D). To further identify microbes associated with disease progression, Cox regression was conducted on fatality-associated genera selected by GAMLSS at T1. Finally, seven variables were selected as the best predictors, including age, viral copy number, severity-A, and the abundance of four genera (Streptococcus, Porphyromonas, Atopobium, Serratia) (Fig. S2E). Kaplan–Meier survival analysis confirmed that the abundance of these four bacteria was significantly associated with the fatality risk (Fig. 2C).
Moreover, we built a classifier based on L1 regularized logistic regression model to distinguish the recovered and deceased patients using microbiota data at T1. A good performance was achieved with four variables (Fig. 2D). Streptococcus was the only microbe with a non-zero coefficient (regression coefficient =-1.74) besides three host factors, including age, viral copy number, and severity-A. The classifier trained merely on microbiota data also detected Streptococcus as the best predictor (regression coefficient =-1.51), further supporting the abundance of Streptococcus as a valid predictor for clinical outcomes.
Enterococcus and Candida were recognized as the most enriched genera in deceased patients by all three methods at T2, but not at T1 (Fig. 2B, Fig. S2B,D). Analysis at the species level confirmed Candida albicans and Enterococcus faecium as the two most enriched species in deceased patients besides SARS-CoV-2, suggesting that secondary infections of the two pathogenic microbes could potentially increase the case fatality rate (CFR) of COVID-19 patients.
Association between the abundance of Streptococcus parasanguinis and the fatality risk
We then took a closer look at the dynamic change of the Streptococcus in COVID-19 patients. The abundance of Streptococcus was higher in the recovered group at all time points except day 21 (P<0.01, Fig. 3A) and significantly increased during the hospitalization (Fig. S3A). In contrast, the abundance of Streptococcus was marginally decreased over time in the deceased group (Fig. S3A). Also, CFR declined with the increased abundance of Streptococcus, and this trend was more pronounced at T2 (Fig. 3B).
Specifically, four Streptococcus species at T1 and seven Streptococcus species at T2 were enriched in recovered patients (Fig. 3C, Fig. S3B, Supplementary Text). In samples with the highest abundance of Streptococcus (CS1), S. parasanguinis was the only Streptococcus significantly enriched in the recovered patients at T1 (Fig. S3C). Moreover, Kaplan–Meier survival analysis also identified the abundance of S. parasanguinis as the primary factor associated with CFR (Fig. 3D). In summary, our results indicated that a higher level of S. parasanguinis at T1 predicted better prognosis in COVID-19 patients.
We then examined whether any confounding factors could explain the association between the fatality and the abundance of S. parasanguinis at T1. Neither of the risk factors for mortality observed in our cohort or other studies (age, gender, use of corticosteroid, the severity on admission, and SARS-CoV-2 viral copy number, use of high-grade antibiotics (listed in Table S1), comorbidities) was significantly associated with the abundance of S. parasanguinis (p>0.05). The fatality was the only factor associated with the abundance of S. parasanguinis in multivariable linear regression analysis. Moreover, multivariate logistic regression analysis confirmed that the abundance of S. parasanguinis was independently correlated with the fatality in our data (Table 2).
Intriguingly, we noted that the degree of correlation between the abundance of S. parasanguinis and the fatality was greater in patients with less severe symptoms on admission (with a severity score of 3 and 4)(Fig. 3E). Specifically, 95.2% (60/63) with a high abundance of S. parasanguinis (>10%) and 93% (93/100) patients with an intermediate abundance of S. parasanguinis (>1%) on admission in the less severe group got recovered, resulting in a CFR of 7%. On the contrary, patients with negligible S. parasanguinis (<0.1%) suffered a higher CFR of 30.8% (8/26) (odds ratio = 6, 95% Confidence Intervals: 2 to 18, p=0.0002). Meanwhile, patients with a severity score of 5 on admission showed the highest CFR (72.7%), which was not correlated with the abundance of S. parasanguinis (Fig. 3E), indicating that the predictive power of S. parasanguinis was stronger in the less severe patients (AUC=0.703, Fig. S3D).
The authenticity of S. parasanguinis was first validated by qPCR using specific primers targeting the groEL gene24. A strong positive correlation was found between the abundance inferred from the sequencing data and the copy number estimated from qPCR (Fig. 3F). Moreover, the copy number of S. parasanguinis also showed a good performance in predicting the survival of the patients (Fig. S3E). Secondly, reads from ten samples with the highest abundance of S. parasanguinis were mapped to two reference genomes of S. parasanguinis (NC_017905.1, NC_015678.1). All variants with minor allele frequency greater than 5% were identified, and the variants density was 0.036% and 0.04%, suggesting that the Streptococcus in those samples was close relative of known S. parasanguinis strains.
Co-detection of other pathogens in the upper respiratory tract of COVID-19 patients
We then screened all samples for 52 common respiratory pathogens (Table S2). Overall, 20.3% of patients had at least one such pathogen, and the proportion of samples with pathogens was higher in deceased patients than that in recovered patients (48.7% vs. 13.1%, p<0.001). Also, the number of pathogens identified in each sample was significantly higher in deceased patients than in recovered patients (Fig. 4A). Specifically, Candida albicans and Enterococcus faecium were more prevalent in deceased patients than in recovered patients (Fig. 4B). More importantly, these two species' abundances on admission were relatively low and similar between recovered and deceased patients, but remarkably increased during hospitalization in deceased patients (Fig. 4C,D), suggesting secondary infections.
E. faecium is a commensal bacteria in human GIT. Translocation of this bacteria had been reported in critically ill patients25. We compared the abundance of E. faecium between anal swabs, plasma, and OPs in 32 paired samples, including eight samples with the highest abundance of E. faecium in OPs, but found no excess E. faecium RNA in the plasma of these co-detected patients (p=0.17, Fig. 4E), suggesting that the translocation through the circulatory system was unlikely.
Virus co-infection was rare in this cohort. Only Rhinovirus A was detected in one recovered patient with more than 50 reads. Other possible viral co-infections included Influenza A virus and Human mastadenovirus C, which were revealed by less than 50 reads.
Dynamics of the upper respiratory tract microbiota in COVID-19 patients and its association with the fatality
Although no significant difference in alpha diversity was observed between the recovered and deceased patients (Fig. 5A), an increasing trend of alpha diversity over time was observed in some deceased patients (Fig. S4A). We found this phenomenon was associated with the codominance of both pathogenic microbes and commensal microbes in their microbiotas (Fig. S4B).
The dynamic change of microbiota was then visualized as transitions between different CSs (Fig. 5B). Multivariate logistic regression identified fatality as the only factor associated with CS transition (p<0.01). Therefore, the analysis was conducted for recovered and deceased patients, separately (Fig. 5B). Most CSs (73.5%) at T1 transited to CS1 at T2 in recovered patients, while only 33.3% CSs transited to CS1 in deceased patients (p<0.05). Notably, CS1 had the lowest CFR among all CSs (12.4% at T1, 4.7% at T2, p<0.001) and tended to be more resistant to co-/secondary infections compared to other CSs (6.6% vs. 19.0%, p<0.001). Meanwhile, 42.1% CSs transited to the three most fatal CSs (CS7, CS8, and CS11) in deceased patients, whereas only 1.6% CSs transited to the fatal CSs in recovered patients (p<0.001, Fig. 5B). Concordantly, expansions of Streptococcus (dominant genus for CS1) were more frequently observed in recovered patients than that in deceased patients (47.6% vs.15.0%, P<0.05, Fig. 5C). Meanwhile, the expansion of Enterococcus and Candida (dominant genera for CS7 and CS8) was more frequently observed in deceased patients (P<0.05, Fig. 5C). Thus, not only the microbiota compostion, but also the dynamics of the microbiota differed between recovered and deceased patients.
The transition dynamics was further modeled as a Markov chain. The self-transition proportion of CS1 was significantly higher than other CSs (p<0.001, Fig. 5D). Moreover, we found that patients with more samples assigned to CS1 tended to have a low CRF (Fig. S4C), suggesting that CS1 represented a health status. In addition, the transition from deleterious CSs (CS7, CS8, and CS11) to CS1 was less frequent compared to other CSs (Fig. 5D), suggesting severe dysbiotic microbiota cannot recover in a short time. Besides CS1, all other CSs had a low self-transition rate (< 0.29), suggesting that they were unstable and probably represented an intermediate state.
The magnitude of microbiota change in the deceased group was higher than that in the recovered group and tended to be increased over time (r=0.31, p=0.07, Fig. 5E). A similar trend was observed when using the Bray-Curtis distance to HC (Fig. 5F). Strikingly, the distance to HC increased in the deceased patients over time but tended to decline in the recovered group (Fig. 5F). Specifically, Streptococcus and Schaalia were the two genera whose abundance was significantly approaching the level in HC during hospitalization (Fig. S4D,E). Of note, the distance from the recovered group to HC at T2 was still significantly greater than that within HC, indicating that the URT microbiota was not restored at discharge.
Microbial network in COVID-19 patients
To gain insights into the microbial interactions in COVID-19 patients, we constructed the interaction network based on microbes' co-occurrence. The network analysis revealed that the correlation between genera in deceased patients was sparser and more fragmented than that in recovered patients and HC at both T1 and T2 (Fig. S5), suggesting decreased biotic interactions in deceased patients, which was associated with a more fragile microbiota26. The components of a "core" healthy microbiota, including Prevotella, Veillonella, Campylobacter, Leptotrichia, Fusobacterium and Selenomonas27, were more closely linked in the network of HC and recovered patients than that of deceased patients, suggesting that the aggregation status of the network may reflect the homeostasis and resistance of the microbiota.