Dynamics of the upper respiratory tract microbiota and its association with fatality in COVID-19 patients

The pandemic of Coronavirus disease 2019 (COVID-19) is ongoing globally, which is a big challenge for public health. Alteration of human microbiota had been observed in COVID-19. However, it is unknown how the microbiota is associated with the fatality in COVID-19. We conducted metatranscriptome sequencing on 588 longitudinal oropharyngeal swab specimens collected from 192 COVID-19 patients recruited in the LOTUS clinical trial (Registration number: ChiCTR2000029308) (including 39 deceased patients), and 95 healthy controls from the same geographic area. The upper respiratory tract (URT) microbiota in COVID-19 patients differed from that in healthy controls, while deceased patients possessed a more distinct microbiota. Streptococcus was enriched in recovered patients, whereas potential pathogens, including Candida and Enterococcus, were more abundant in deceased patients. Moreover, the microbiota dominated by Streptococcus was more stable than that dominated by other species. In contrast, the URT microbiota in deceased patients showed a more signicant alteration and became more deviated from the norm after admission. The abundance of Streptococcus on admission, particularly that of S. parasanguinis, was identied as a strong predictor of fatality by Cox and L 1 regularized logistic regression analysis, thus could be used as a potential prognostic biomarker of COVID-19. The generalization of the results in other populations and underlying mechanisms needs further investigations.


Introduction
Coronavirus disease 2019 (COVID- 19) has infected more than 30 million people worldwide. Although most patients showed mild symptoms or were asymptomatic 1 , approximately 14% developed severe diseases, 5% were critically ill 2 , and the overall fatality rate is 3.2%. Older people and patients with underlying diseases are at increased risk for severe illness from COVID-19 and have a higher fatality rate.
Other risk factors include smoking history, pregnancy, male, and obesity 3 . Genetic variants on toll-like receptor 7, ABO blood group locus, and 3p21.31 gene cluster were also associated with the severe COVID-19 4,5 . Meanwhile, biomarkers to monitor disease progression and predict clinical outcomes have been developed, including antibody concentration 6 , serum proteins, metabolites 7 , as well as combinations of regular in ammatory and coagulation markers (e.g., procalcitonin, interleukin 6, and D-dimer) 8 .
Human microbiota plays a crucial role in individual health; alteration of the human microbiota has been observed in various chronic and acute diseases 9 . In particular, microorganisms residing in the gastrointestinal tract (GIT) and upper respiratory tract (URT) can alter the susceptibility to and outcomes of infectious diseases 10 . The underlying mechanisms include colonization resistance and induction of the immune responses in the host 11 . In recent studies, the diversity and composition of GIT microbiota showed signi cant differences between COVID-19 patients and healthy controls (HC) [12][13][14][15] . Bacterial and fungal opportunistic pathogens were enriched in the feces of COVID-19 patients, and a secondary infection was suspected, suggesting that dysbiosis of the gut microbiota is common in COVID-19 patients [12][13][14][15][16] . Notably, some differential microbes could potentially alter the expression of angiotensin-converting enzyme 2 (ACE2) 17 , which is the receptor used by SARS-CoV-2 to enter the host, thereby in uence the susceptibility and the severity of the infection. Besides, some commensal bacteria are able to modify the heparan sulfate on epithelial cells and prevent the virus from binding to the host cell 18 .
Respiratory microbiota contributes to the foremost barrier to viral infection 19 , how it alters in  patients is largely unknown [20][21][22] . Moreover, there is no study on the association between the microbiota and the fatality risk, which is one of the most crucial questions regarding microbiota's contribution to the health of COVID-19 patients.
The power and validity of the previous studies are limited by the small sample size and overrepresentation of patients with mild symptoms. To obtain a thorough understanding of the association between microbiota and COVID-19, especially those related to the fatality, a large cohort with deceased patients is needed. Lopinavir Trial for Suppression of SARS-CoV-2 in China (LOTUS) was a clinical trial that aims to evaluate the e cacy and safety of lopinavir-ritonavir treatment for SARS-CoV-2 infection 23 . LOTUS recruited 192 severe COVID-19 patients, with an overall mortality rate of 22.1%. Serial oropharyngeal swab samples were collected. By analyzing metatranscriptome data from 588 COVID-19 samples and 95 health controls (HC), we found that the URT microbiota was signi cantly different between recovered and deceased patients on admission and afterwards. The abundance of Streptococcus, particularly S. parasanguinis, was strongly associated with the fatality risk. Meanwhile, we found that the restoration of the URT microbiota lags behind the recovery of the disease.

Results
Overview of the samples and sequencing data Metatranscriptome sequencing was conducted on 588 oropharyngeal swab specimens (OPs) collected from 192 severe COVID-19 patients. Consecutive OPs were obtained on day 1, 5, 10, 14, 21, and 28 after admission when the patient's condition allowed (Fig. 1A). Most patients (147) recovered within 28 days after admission, while 38 patients died within 28 days. The OPs were collected at least twice from 177 patients (Fig. S1A). For comparison, OPs were collected from 95 healthy individuals without any pulmonary diseases in the community from Wuhan in April, 2020. Deionized water was used as the negative control (NC), which was processed following the same protocol as for clinical samples.
The median age of the patients was 58 (IQR 49,68), and 78 (40.6%) of them were females, while the median age of the HC was 47 (IQR 33, 61) and 57 (60%) of them were females. The median days from symptom onset to the recruitment of the patients were 13 (IQR 11, 17) days. A large fraction of patients had at least one comorbidity or underlying health condition, which included hypertension (64), diabetes (22), heart disease(14), etc. All but four patients took antibiotics. We identi ed multiple risk factors, including age, severity on admission (severity-A), and corticosteroid use, signi cantly correlated with the fatality (Fig. S1B).
OPs were sequenced to a median number of reads of 33.0 million (IQR 9.3 million, 63.9 million). After ltering out the low-quality reads, host reads, and ribosome RNA reads, the median number of reads mapped to the microbial genome was 1.0 million (IQR 0.2 million, 2.7 million). We required a minimum of 1,000 microbial reads for the following analysis, which ensured that at least 92% of the microbes with relative abundance (hereinafter referred to as "abundance") greater than 0.1% could be detected ( Fig. 1C). Rothia and Actinomyces were signi cantly enriched in COVID-19 patients, whereas Streptococcus, Capnocytophaga, and other seven genera were more abundant in HC (Fig. S1D). Notably, the microbiota in deceased patients signi cantly differed from that in recovered patients, which was more remarkable before discharge/death (R 2 =0.04, p<0.001 on admission; R 2 =0.10, p<0.001 before discharge/death; PERMANOVA test). Additionally, we found no systematic difference between Lopinavir-Ritonavir recipients and standard care controls (R 2 =0.0014, p=0.56, PERMANOVA test).
The hierarchical clustering algorithm identi ed 11 clusters (CSs) (Fig. 1D). COVID-19 samples were assigned to all CSs, while HC samples was assigned to CS1, CS3, and CS9. Most samples from patients (403/571) and HC (75/88) were assigned to CS1, which was dominated by Streptococcus. CS7 and CS8 were enriched for potential pathogens Enterococcus and Candida, and all samples in these two CSs were collected from patients, thereby indicating a possible co-infection or secondary infection. CS2, 4-6, 9-10 were dominated by Prevotella, Lautropia, Schaalia, Rothia, Capnocytophaga, and Veillonella, respectively, all of which were known commensal microbes in the oral cavity or URT. In particular, CS11 consisted of outliers that cannot be assigned to any other CSs, which were mostly dominated by opportunistic pathogens (Fig. S1E). All clusters but CS3 showed a great dissimilarity to NCs (Fig. S1F). We found that CS3 was more likely to be observed in samples from the same individual (p<0.05, Fig. S1G), re ecting a relatively barren state of microbiota in these samples.
OPs from the same individuals showed a higher similarity than that from different individuals (Fig. 1E), indicating the existence of an individual-speci c microbiota. Meanwhile, samples from deceased patients showed larger variance among samples, suggesting that the URT microbiota's alteration was more evident in severe cases. Besides, the similarity between the microbiota of samples from the same individual was negatively correlated with the sampling interval in recovered patients, which may imply a gradual restoration of the URT microbiota (Fig. 1F).

Association between the upper respiratory tract microbiota and the fatality
To better examine the association between microbiota and clinical features and outcomes, we de ned two critical time points, T1 and T2, which referred to the rst time point within ve days after admission and the last time point within ve days before discharge or death, respectively. T1 samples were available for 181 patients (65% samples were collected on the rst day of admission, 88% samples were collected within three days after admission), T2 samples were available for 157 patients (80% samples were collected within three days before discharge or death), and 144 patients had both T1 and T2 samples. Among seven demographic and clinical features, the fatality was the only feature signi cantly associated with the microbiota at both T1 and T2 (Table 1).
All three differential analysis (GAMLSS regression, linear model regression, and LEfSe) identi ed Streptococcus as the most enriched microbe in the recovered group at T1( Fig. 2A, Fig. S2A,C), which was also true for T2 but was ranked behind two low-abundance bacteria in GAMLSS results (Lautropia and Atopobium, Fig. 2B, Fig. S2B,D). To further identify microbes associated with disease progression, Cox regression was conducted on fatality-associated genera selected by GAMLSS at T1. Finally, seven variables were selected as the best predictors, including age, viral copy number, severity-A, and the abundance of four genera (Streptococcus, Porphyromonas, Atopobium, Serratia) (Fig. S2E). Kaplan-Meier survival analysis con rmed that the abundance of these four bacteria was signi cantly associated with the fatality risk ( Fig. 2C).
Moreover, we built a classi er based on L 1 regularized logistic regression model to distinguish the recovered and deceased patients using microbiota data at T1. A good performance was achieved with four variables (Fig. 2D). Streptococcus was the only microbe with a non-zero coe cient (regression coe cient =-1.74) besides three host factors, including age, viral copy number, and severity-A. The classi er trained merely on microbiota data also detected Streptococcus as the best predictor (regression coe cient =-1.51), further supporting the abundance of Streptococcus as a valid predictor for clinical outcomes.
Enterococcus and Candida were recognized as the most enriched genera in deceased patients by all three methods at T2, but not at T1 (Fig. 2B, Fig. S2B,D). Analysis at the species level con rmed Candida albicans and Enterococcus faecium as the two most enriched species in deceased patients besides SARS-CoV-2, suggesting that secondary infections of the two pathogenic microbes could potentially increase the case fatality rate (CFR) of COVID-19 patients.
Association between the abundance of Streptococcus parasanguinis and the fatality risk We then took a closer look at the dynamic change of the Streptococcus in COVID-19 patients. The abundance of Streptococcus was higher in the recovered group at all time points except day 21 (P<0.01, Fig. 3A) and signi cantly increased during the hospitalization (Fig. S3A). In contrast, the abundance of Streptococcus was marginally decreased over time in the deceased group (Fig. S3A). Also, CFR declined with the increased abundance of Streptococcus, and this trend was more pronounced at T2 (Fig. 3B).
Speci cally, four Streptococcus species at T1 and seven Streptococcus species at T2 were enriched in recovered patients (Fig. 3C, Fig. S3B, Supplementary Text). In samples with the highest abundance of Streptococcus (CS1), S. parasanguinis was the only Streptococcus signi cantly enriched in the recovered patients at T1 (Fig. S3C). Moreover, Kaplan-Meier survival analysis also identi ed the abundance of S. parasanguinis as the primary factor associated with CFR (Fig. 3D). In summary, our results indicated that a higher level of S. parasanguinis at T1 predicted better prognosis in COVID-19 patients.
We then examined whether any confounding factors could explain the association between the fatality and the abundance of S. parasanguinis at T1. Neither of the risk factors for mortality observed in our cohort or other studies (age, gender, use of corticosteroid, the severity on admission, and SARS-CoV-2 viral copy number, use of high-grade antibiotics (listed in Table S1), comorbidities) was signi cantly associated with the abundance of S. parasanguinis (p>0.05). The fatality was the only factor associated with the abundance of S. parasanguinis in multivariable linear regression analysis. Moreover, multivariate logistic regression analysis con rmed that the abundance of S. parasanguinis was independently correlated with the fatality in our data (Table 2).
Intriguingly, we noted that the degree of correlation between the abundance of S. parasanguinis and the fatality was greater in patients with less severe symptoms on admission (with a severity score of 3 and 4) (Fig. 3E). Speci cally, 95.2% (60/63) with a high abundance of S. parasanguinis (>10%) and 93% (93/100) patients with an intermediate abundance of S. parasanguinis (>1%) on admission in the less severe group got recovered, resulting in a CFR of 7%. On the contrary, patients with negligible S. parasanguinis (<0.1%) suffered a higher CFR of 30.8% (8/26) (odds ratio = 6, 95% Con dence Intervals: 2 to 18, p=0.0002). Meanwhile, patients with a severity score of 5 on admission showed the highest CFR (72.7%), which was not correlated with the abundance of S. parasanguinis (Fig. 3E), indicating that the predictive power of S. parasanguinis was stronger in the less severe patients (AUC=0.703, Fig. S3D).
The authenticity of S. parasanguinis was rst validated by qPCR using speci c primers targeting the groEL gene 24 . A strong positive correlation was found between the abundance inferred from the sequencing data and the copy number estimated from qPCR (Fig. 3F). Moreover, the copy number of S. parasanguinis also showed a good performance in predicting the survival of the patients (Fig. S3E). Secondly, reads from ten samples with the highest abundance of S. parasanguinis were mapped to two reference genomes of S. parasanguinis (NC_017905.1, NC_015678.1). All variants with minor allele frequency greater than 5% were identi ed, and the variants density was 0.036% and 0.04%, suggesting that the Streptococcus in those samples was close relative of known S. parasanguinis strains.
Co-detection of other pathogens in the upper respiratory tract of COVID-19 patients We then screened all samples for 52 common respiratory pathogens (Table S2). Overall, 20.3% of patients had at least one such pathogen, and the proportion of samples with pathogens was higher in deceased patients than that in recovered patients (48.7% vs. 13.1%, p<0.001). Also, the number of pathogens identi ed in each sample was signi cantly higher in deceased patients than in recovered patients (Fig. 4A). Speci cally, Candida albicans and Enterococcus faecium were more prevalent in deceased patients than in recovered patients (Fig. 4B). More importantly, these two species' abundances on admission were relatively low and similar between recovered and deceased patients, but remarkably increased during hospitalization in deceased patients (Fig. 4C,D), suggesting secondary infections.
E. faecium is a commensal bacteria in human GIT. Translocation of this bacteria had been reported in critically ill patients 25 . We compared the abundance of E. faecium between anal swabs, plasma, and OPs in 32 paired samples, including eight samples with the highest abundance of E. faecium in OPs, but found no excess E. faecium RNA in the plasma of these co-detected patients (p=0.17, Fig. 4E), suggesting that the translocation through the circulatory system was unlikely.
Virus co-infection was rare in this cohort. Only Rhinovirus A was detected in one recovered patient with more than 50 reads. Other possible viral co-infections included In uenza A virus and Human mastadenovirus C, which were revealed by less than 50 reads.

Dynamics of the upper respiratory tract microbiota in COVID-19 patients and its association with the fatality
Although no signi cant difference in alpha diversity was observed between the recovered and deceased patients (Fig. 5A), an increasing trend of alpha diversity over time was observed in some deceased patients (Fig. S4A). We found this phenomenon was associated with the codominance of both pathogenic microbes and commensal microbes in their microbiotas (Fig. S4B).
The dynamic change of microbiota was then visualized as transitions between different CSs (Fig. 5B). Multivariate logistic regression identi ed fatality as the only factor associated with CS transition (p<0.01). Therefore, the analysis was conducted for recovered and deceased patients, separately (Fig.  5B Fig. 5C). Meanwhile, the expansion of Enterococcus and Candida (dominant genera for CS7 and CS8) was more frequently observed in deceased patients (P<0.05, Fig. 5C). Thus, not only the microbiota compostion, but also the dynamics of the microbiota differed between recovered and deceased patients.
The transition dynamics was further modeled as a Markov chain. The self-transition proportion of CS1 was signi cantly higher than other CSs (p<0.001, Fig. 5D). Moreover, we found that patients with more samples assigned to CS1 tended to have a low CRF (Fig. S4C), suggesting that CS1 represented a health status. In addition, the transition from deleterious CSs (CS7, CS8, and CS11) to CS1 was less frequent compared to other CSs (Fig. 5D), suggesting severe dysbiotic microbiota cannot recover in a short time. Besides CS1, all other CSs had a low self-transition rate (< 0.29), suggesting that they were unstable and probably represented an intermediate state.
The magnitude of microbiota change in the deceased group was higher than that in the recovered group and tended to be increased over time (r=0.31, p=0.07, Fig. 5E). A similar trend was observed when using the Bray-Curtis distance to HC (Fig. 5F). Strikingly, the distance to HC increased in the deceased patients over time but tended to decline in the recovered group (Fig. 5F). Speci cally, Streptococcus and Schaalia were the two genera whose abundance was signi cantly approaching the level in HC during hospitalization (Fig. S4D,E). Of note, the distance from the recovered group to HC at T2 was still signi cantly greater than that within HC, indicating that the URT microbiota was not restored at discharge.

Microbial network in COVID-19 patients
To gain insights into the microbial interactions in COVID-19 patients, we constructed the interaction network based on microbes' co-occurrence. The network analysis revealed that the correlation between genera in deceased patients was sparser and more fragmented than that in recovered patients and HC at both T1 and T2 (Fig. S5), suggesting decreased biotic interactions in deceased patients, which was associated with a more fragile microbiota 26 . The components of a "core" healthy microbiota, including Prevotella, Veillonella, Campylobacter, Leptotrichia, Fusobacterium and Selenomonas 27 , were more closely linked in the network of HC and recovered patients than that of deceased patients, suggesting that the aggregation status of the network may re ect the homeostasis and resistance of the microbiota.

Discussion
In this study, we have demonstrated the features and dynamics of the URT microbiota and its association with the fatality in COVID-19 patients using the LOTUS cohort. The metatranscriptomic analysis was conducted, which could provide a real-time snapshot of the microbiota change, as RNA degrades faster than DNA 28 . We noted that the URT microbiota of COVID-19 was more dispersive and partially distinguishable from the HC (Fig. 1C), which is concordant with results in two recent studies 21,22 . However, a signi cant overlap was observed between the microbiota of two groups in our data, indicating that the in uence of COVID-19 on URT microbiota varied among different individuals, which may be associated with the host immune status and the background of the microbiota. Meanwhile, the URT microbiota of deceased patients showed more distinct composition compared to that of the HC and recovered patients, and such tendency is more signi cant at T2 than that at T1, suggesting that the extent of the microbiota alteration was correlated with the severity and progression of the disease, while the causality was elusive.
An incomplete restoration of URT microbiota was observed in recovered COVID-19 patients. The microbiota was gradually restoring during hospitalization in recovered patients but was still different from that in HC at discharge (Fig. 5F). Delayed recovery of the GIT and URT microbiota were observed after antibiotic perturbation and virus infection in multiple studies 19,29 , suggesting that microbiota recovery lags behind the disease's recovery. A Follow-up study is ongoing to investigate the recovery of URT microbiota and its clinical associations after discharge.
Potential pathogens other than SARS-CoV-2 were co-detected in 20.3% samples in our study, while the previous estimate varied between 6.1% to 94.2% 16,[30][31][32] . Candida albicans and Enterococcus faecium were the two most common co-detected pathogens, being observed in 11.5% and 6.25% of COVID-19 patients. The increasing abundance of these two pathogens during hospitalization, which may represent nosocomial infections, was signi cantly correlated with a higher fatality rate (Fig. 4C,D) (Fig. 3C, Fig. S3C), which may relate to its high sensitivity to antibiotics 33 . Notably, S. salivarius, specially Stain K12 and M18, are probiotics capable of fostering a balanced healthy oral microbiota 34,35 . Thus, depletion of this species may reduce the resistance and resilience of the microbiota and increase the susceptibility to co-/secondary infections.
For the rst time, we found that a higher abundance of Streptococcus parasanguinis on admission was strongly associated with better outcomes in COVID-19 patients. We suspected that several mechanisms might be involved. First, S. parasanguinis may contribute to a stable niche that resistant to pathogens. The bacteria play a vital role in initiating dental plaque formation, which is critical for maintaining a healthy microbiota in the oral cavity 36 . Moreover, S. parasanguinis could directly inhibit the growth of Pseudomonas aeruginosa by producing H 2 O 2 , and induce a varying level of cytokines 37,38 , which is essential for maintaining the immune homeostasis. Second, the decreasing of the abundance of S. parasanguinis may indicate a disrupted ecosystem in the respiratory tract. As an essential component of a healthy oral/respiratory tract microbiota 39,40 , the decreasing or depletion of S. parasanguinis may result from the introduction and overgrowth of competing microbes, altered regional conditions, enhanced immune pressure, or use of antibiotics, which have been observed in pneumonia and other pulmonary diseases 41,42 . We speculate that the disrupted ecosystem, rather than S. parasanguinis itself, is associated with a critical condition and poor prognosis. However, it was mysterious why only S. parasanguinis but not other commensal microbes were associated with the integrity of the URT microbiota. Third, the abundance of S. parasanguinis may be associated with confounding factors that correlate with clinical outcomes. However, we did not nd any risk or protective factors associated with the abundance of S. parasanguinis. Severely ill patients were typically administered with broad-spectrum or high-grade antibiotics, which may reduce the abundance of S. parasanguinis. However, although we found a strong correlation between the administration of the high-grade antibiotics (Table S1) and the fatality (59.0% in deceased patients vs. 6.1% in recovered patients, p<0.001), no correlation existed between the administration of high-grade antibiotics and the abundance change of S. parasanguinis (p=0.09).
There are some limitations in our study. First, the cohort only includes severe COVID-19 patients, while the mild and asymptomatic patients may display distinct features. Nevertheless, as most deaths occurred in severely ill patients, this cohort is appropriate for studying the association between the URT microbiota and fatality. Second, our study is a single-center study. It is unclear whether the ndings could be generalized to other populations, particularly considering that URT microbiota could potentially be in uenced by geography, ethnicity, host genetics, and medication 40,43 . Thus, more data, especially those from well-controlled clinical trials, is needed to validate ndings in this study.
In summary, our study has characterized the features and dynamics of the URT microbiota and its association with clinical features, especially the fatality in COVID-19 patients. The URT microbiota in deceased patients was signi cantly distinct from that in recovered patients on admission and afterwards. A high abundance of Streptococcus was identi ed as an indicator of a stable and resilient URT microbiota associated with fewer co-infection and a lower fatality rate. In addition, we recommend S. parasanguinis as a potential prognostic biomarker related to the fatality in COVID-19. Further studies are needed to unveil the underlying mechanisms responsible for the associations observed in this study.
Notably, although several papers have been published to recommend the use of probiotics in COVID-19 patients, which to some extent is supported by our data 44,45 , the e cacy of the intervention needs to be validated in rigorous clinical trials.

Study design and sample collection
Oropharyngeal swab samples were collected on day 1, 5, 10, 14, 21, and 28 after admission from the patient enrolled in Lopinavir Trial for Suppression of SARS-CoV-2 in China (LOTUS) [ChiCTR2000029308].
All patients were positive for SARS-CoV-2 RNA tested by RT-PCR assay (Shanghai ZJ Bio Tec or Sansure Biotech, China) before admission. Demographic and clinical information was shown in the original paper 23 . All patients were assigned a severity score on admission and at discharge or on day 28, ranging from 2 (not hospitalized, not requiring supplemental oxygen) to 7 (death) 23 .
The study was approved by the Institutional Review Board of Jin Yin-Tan Hospital (KY2020-02.01). Written informed consent was obtained from all patients or their legal representatives if they were too unwell to provide consent.

Library preparation and sequencing
Total nucleic acids were extracted from each 400 µl oropharyngeal swab sample using the NucliSens easyMAG apparatus (bioMerieux, Marcy l'Etoile, France) following the manufacturer's instructions, and nally eluted in 80 µl elution buffer and stored at -80℃ until use.
The RNA was converted into Illumina-compatible sequencing libraries with MINERVA protocol 46  analysis.
An extra step was applied to re ne the species results given by MEGAN. Reads from all samples were merged and mapped to the reference genome of all species supported by at least 10,000 reads by Bowtie2 (--sensitive)(version 2.2.3) 52 . Species with a genome coverage of less than 1% were regarded as false positives. Reads assigned at the genera level were reallocated to species level according to the relative proportion of each species in that genus.

Clustering analysis
Principal coordination analysis based on Bray-Curtis distance was applied to the relative abundance matrix at the genus level. The rst 56 coordinates, which explained 95% of the total variances, were used as the input for clustering. Then we applied two rounds of hierarchical agglomerative clustering with a complete-linkage strategy. The rst clustering removed outliers that might interfere with the estimation of the number of clusters. We determined the number of possible clusters using average silhouette width and performed post examination to remove the outliers within each putative cluster. Speci cally, the sample whose average Bray-Curtis distance to other samples in the same cluster greater than 0.7 was reclassi ed as an outlier. After removing outliers, the optimal number of clusters was re-estimated by 1000 rounds of bootstrap average silhouette width. Since the average silhouette width reached the maximum when k=10 and dropped dramatically at k=11, we determined 10 to be the optimum number of clusters (Fig. S1H).
Association between the microbe and the metadata Three statistical methods were applied to identify the microbe that was associated with speci c metadata, which included Generalized additive models for location, scale, and shape with zero-in ated beta distribution(GAMLSS) 53 , Linear model regression implemented in metamicrobiomeR 54 , Linear discriminant analysis effect size (LEfSe) 55 . All metadata were input as cofactors in the former two methods.

Network analysis
Network analysis was performed based on reads count data at the genera level using SpeciEasi in deceased, recovered patients at T1, T2, and healthy controls 56 . The density of the networks was calculated by graph.density function in igraph R package 57 . The network was visualized by igraph R package.

Statistical analysis
Alpha-diversity and beta diversity were computed using phyloseq and vegan package 58,59