Overview of the dataset. In total, we collected 536 samples from 204 COVID-19 patients and 97 samples from 31 healthy donors, of which 225 samples were pharyngeal swabs, 256 samples were sputum, and 152 samples were faeces (Fig. 1A, left). To ensure adequate sequencing depth for profiling microbial composition, samples with fewer than 100,000 reads after removal of low-quality reads, human reads, and rRNA reads were excluded (Fig. 1A, right). Finally, 521 samples from 203 patients and 94 samples from 31 healthy donors were kept for following analysis. The summary characteristics of subjects are shown in Table 1. To avoid potential bias introduced by multiple sampling from the same patients at different time points, we selected the first sample of each patient that was collected within 14 days since symptom onset as a representative to profile the microbial composition and function. Using patients with multiple samples available, we further checked the dynamics of microbial signatures. All patient samples were collected before their discharge from hospital. Detailed information about each sample is listed in supplementary Table S1. We also included non-template controls (NTCs) during nucleic acid extraction, library construction and sequencing to profile potential environmental or reagent contaminations. Microbes detected in non-template controls are shown in supplementary Table S2.
Altered microbiota in different sample types from COVID-19 patients. Multiple studies have reported the alteration of respiratory tract microbiome in COVID-19 patients, in terms of both alpha and beta diversity [24,49]. In our dataset, there is no difference in species richness (number of species) in all three sample types between patients and healthy controls, but Shannon diversity was significantly reduced in respiratory tract samples, including both pharyngeal swabs and sputum of COVID-19 patients (Fig. 1B). We also did not observe any difference in the alpha diversity of faecal samples, which somewhat differs from the results reported by others that gut microbiome of COVID-19 patients had significantly reduced alpha diversity [22,25]. However, when we then compared the beta diversity between patients and healthy controls, patient samples form a distinct group from controls on principle coordinate analysis (PCoA) using Bray-Curtis distance (PERMANOVA test, p < 0.001) for pharyngeal, sputum and faeces samples (Fig. 1C). This suggests that even though there is no difference in terms of the alpha diversity, COVID-19 patients’ gut microbiomes are indeed altered compared to controls. We further profiled the microbial composition at the genus level in healthy donors and COVID-19 patients for all three sample types. In pharyngeal and sputum samples, there is a striking increase in Streptococcus, the most abundant genus in the patient group (Figs. 1D and 1E). In faecal samples, Bacteroides is the most abundant genus but shows no obvious difference between patients and healthy controls; rather, the overall microbial composition of patient samples shows distinctive patterns compared to healthy controls, as exemplified by a decrease in Faecalibacterium and expansion of Acinetobacter, Citrobacter and Pseudomonas (Fig. 1F).
Alteration of microbial composition in COVID-19 patients is associated with disease severity. Since there are microbial composition changes in both the respiratory tract and gut samples of COVID-19 patients, we further assessed the effect of meta factors associated with these alterations and found that the microbial composition is significantly affected by disease severity in all three sample types (PERMANOVA; Fig. 2A). Consistently, Shannon diversity was significantly reduced in pharyngeal and sputum samples of COVID-19 patients, irrespective of the disease severity (Supplementary figure S1A). The situation was more complex in faecal samples: while there is no reduction in the alpha diversity, there are significant effects on the microbial composition attributed to SARS-CoV-2 abundance as well as antibiotic and anti-viral treatments (Fig. 2A and Supplementary figureS1A).
To further identify differential microbes associated with disease severity, we performed linear discriminant analysis using LEfSe in pharyngeal samples, and found Tannerella, Fusobacterium, Selenomonas, Burkholderia, Treponema, Micrococcus, Escherichia, Campylobacter, Haemophilus and Neisseria to be depleted in COVID-19 patients; Rothia, Streptococcus and Actinomyces were enriched (Fig. 2B). We also identified differentially represented bacterium in sputum samples: Treponema, Delftia, Porphyromonas, Tannerella, Haemophilus and Neisseria showed decreased abundance in patients, especially in those with severe symptoms; while Capnocytophaga and Streptococcus were elevated (Supplementary figure S1B). Although the origin of the microbes found in sputum samples cannot be precisely pinpointed as upper or lower respiratory tract, we found that the same bacterial species were altered in both pharyngeal (Supplementary figure S1C) and sputum (supplementary figure S1D) samples of severe symptom patients. Multiple Streptococcus species were enriched in patient samples, and some of them are reported to be normal flora colonizing the oral cavity or respiratory tract, but also capable of causing opportunistic infections [13,50,51]. In faecal samples, Lachnoclostridium, Eggerthella, Anaerostipes, Lachnospira, Enterocloster, Roseburia, Flavonifractor, Bifidobacterium, Parabacteroides and Faecalibacterium were reduced in patients, whereas Pseudomonas and Halomonas were enriched (Fig. 2C). At species level (Supplementary figure S1E), multiple depleted microbes were reported to be involved in the generation of SCFA and play important role in modulating gut health and response to inflammation, such as Parabacteroides distasonis, Roseburia hominis, Faecalibacterium prausnitzii, and many Bifidobacterium species [52,53].
For microbes found to be associated with disease severity (Fig. 2B, Fig. 2C, and Supplementary figure S2B), increased abundance in corresponding severity group was observed even though not statistically significant (Supplementary figure S1F). We noticed that there are species enriched in COVID-19 which are not commonly detected in human faeces, including Pseudomonas fluorescens group and unclassified Halomonas (Supplementary figure S1E). Pseudomonas fluorescens generally has low virulence but can also cause human infection infrequently [54]. Halomonas species are usually found in marine or saline environments, as they thrive under high salinity environments. However, some strains were reported to cause nosocomial infections and contaminations in hospital settings [55]. To more accurately identify the Halomonas species in our samples, we separately mapped non-human reads from samples with high abundance of Halomonas ( > = 3% of total bacterial reads) to all Halomonas genomes downloaded from NCBI RefSeq database, and performed meta-assembly with two different approaches. To achieve species-level identification, we then blasted the assembled contigs against the known Halomonas genomes. Most of the hits had ~ 99% percentage identity, confirming the existence of Halomonas in our samples (Supplementary figure S2A). We further compared the genomic similarity between assemblies and top species identified from blast results (Supplementary figure S2B). The similarity between assemblies generated using different methods was 97.9%, indicating comparable results from both. There were no species with more than 70% similarity identified. Interestingly, several species reported to be associated with infection in clinical environments were also detected in our samples, including H. stevensii, H. johnsoniae and H. hamiltonii [55,56]. Genomes with more than 60% similarity were visualized in Supplementary figure S2C.
Gut microbiota alterations are associated with SARS-CoV-2 abundance. We have observed different alteration patterns of gut microbiota in COVID-19 patients from that of respiratory tract when comparing alpha and beta diversity. Meanwhile, patient gut microbiomes were affected by multiple factors, including disease severity, SARS-CoV-2 abundance, as well as antibiotic and anti-viral treatment (Fig. 2A). Moreover, there is a greater Bray-Curtis distance between patients and corresponding healthy controls in faecal samples (Fig. 3A), indicating that the gut microbiota may be more severely disrupted by SARS-CoV-2 infection. We also calculated Bray-Curtis distances for samples within the patient and healthy control groups respectively. Generally, the within-patient Bray-Curtis distance is higher than the within-control distance for all three sample types; moreover, the degree of increase is larger in faecal samples (Fig. 3B), which suggested that gut microbial composition of COVID-19 patients is more disperse than the respiratory tract microbiota.
To further explore potential mechanisms associated with the different alteration patterns between URT and gut microbiota, we assessed the effects of meta factors, including disease severity, age, gender, SARS-CoV-2 abundance, antibiotic and anti-viral treatment using patient samples. Only patients with mild or moderate symptoms were included since there are insufficient numbers of faecal samples from severe patients. Our results show that in pharyngeal and sputum samples, there is no significant difference in the microbial composition between mild and moderate patients (Supplementary figure S3A and S3C); at the same time, no significant associations were found among the tested factors (Supplementary figure S3B and S3D). In contrast, microbial composition in faecal samples was significantly associated with SARS-CoV-2 abundance (Fig. 3C and Fig. 3D), which suggests that gut microbiota might be more vulnerable to disruption by SARS-CoV-2 infection. We then checked for correlation between abundance of SARS-CoV-2 and that of the major species from differential analysis (Supplementary figure S1C, S1D and S1E), and found a stronger negative correlation in faecal samples (Fig. 3E). This result was also confirmed at the genus level (Supplementary figure S3E). Together, these findings suggest that alteration of gut microbiota was more likely to be directly associated with SARS-CoV-2 abundance. In addition to disease progression, gut microbiota could be more easily affected by therapies, showing greater dispersion than upper respiratory tract microbiome in COVID-19 patients.
Microbial composition remains relatively stable during the study period. We checked the dynamics of the microbial composition for patients with multiple samples available. All samples were collected during the patients’ hospitalization period. The microbial composition profiled by representative genera from differential analysis remained relative stable in samples from the same patient, irrespective of disease severity and sample type (Supplementary figure S4A, S4B and S4C). We also checked the dynamics of average Bray-Curtis distance to healthy controls for each patient. There were fluctuations in a subset of patients, such as P17, P37, P60, P102 and P103 in terms of pharyngeal swabs (Supplementary figure S5A); P17, P137, P161, P164, P165, P168, P169 and P198 in terms of sputum (Supplementary figure S5B); as well as P137, P145 and P239 in terms of faecal samples (Supplementary figure S5C). However, generally the Bray-Curtis distance remained relatively stable during the sampling period.
Altered microbial function in COVID-19 patients. In addition to composition, we also assessed the function of microbial community potentially associated with disease severity using LEfSe method. Generally speaking, different patterns of functional dysbiosis were observed in respiratory tract and gut microbiota of COVID-19 patients. In respiratory tract, the microbial community conferred high abundance of stress-response and toxin genes; while gut microbiota was mainly found with loss of carbohydrate metabolism and SCFA-generating pathways, and also with enrichment of stress-response related genes.
Specific to pharyngeal samples, multiple genes related to fundamental cellular activities were enriched in healthy controls, such as glutamate dehydrogenase (NADP+), which is involved in amino acid metabolism; and factors related to protein translation, such as elongation factor G and Ts; as well as cellular components flagellin (Fig. 4A). In contrast, multiple genes related to transportation of different molecules were found to be enriched in patient samples, such as manganese transport protein, phosphate transport system permease, sucrose PTS system EIIBCA component, iron/zinc/manganese/copper transport system substrate-binding protein, oligopeptide transport system permease protein, osmoproctectant transport system ATP-binding protein and so on. Moreover, multidrug efflux pump SatA and SatB, components of norfloxacin and ciprofloxacin ABC transport [57,58], were found enriched in severe patients, indicating higher risk of antibiotic resistance in the microbial community profiled from patient samples. Genes related to bacterial response to stress/inflammation and virulence-related genes were also found to be abundant in patient samples regardless of the disease severity (Fig. 4A). Similarly in sputum samples, functions related to normal cellular activities, including carbohydrate and fatty acid metabolism, protein translation and genetic organization, were enriched in healthy controls. In patient samples, multiple genes related to transportation of molecules, such as metal ions, oligopeptides and amino acids, as well as genes related to stress response and virulence were found highly abundant (Fig. 4B).
Many of the up-regulated genes found in respiratory tract microbiota of COVID-19 patients are associated with bacterial response to diverse stresses. RseA is a negative regulator of sigma E factor, whose function is central to the response to envelope stress [59,60]. General stress protein 13 was found to be in association with the 30S subunit of ribosome and can be induced by heat shock, salt stress, oxidative stress, glucose and oxygen limitations [61]. Clp protease plays a central role in proteolysis and is involved in bacterial adaptation to various environmental stresses [62]. Fatty acid kinase FakA is involved in lipid metabolism and is important for the activation of the SaeRS two-component system and secreted virulence factor like α-hemolysin [63]. SufB is a component of Suf system, which is a specialized pathway for Fe-S cluster assembly under iron starvation or oxidative stress [64]. DNA glycosylase MutY mainly functions to correct DNA G-A mis-pairs from oxidative damage [65]. LiaR is a component of LiaFSR system, which is a gene regulatory system important for response to cell membrane stress in Gram-positive bacteria [66]. Toxin FitB is a component of the type-II toxin-antitoxin system and plays a role in the speed with which bacteria traverse human epithelial cells [67]. Hemolysin-III is a potent pore-forming toxin [68]. Taken together, enrichment of these genes in patient samples suggested that the microbial community were underlying stressful situations, which might be caused by SARS-CoV-2 infection or other factors, such as treatment or host inflammation.
In faecal samples, healthy controls were observed with enriched fatty acid metabolism genes, such as phosphate acetyltransferase and 3-hydroxybutyryl-CoA dehydrogenase; and carbohydrate metabolism genes, such as mannose dehydratase, acetate kinase, glucose-6-phosphate isomerase, aldose 1-epimerase, phosphoglucomutase, gluose-1-phosphate adenylyltransferase, triosephosphate isomerase, 6-phosphofructokinase 1 and gluocosamine-6-phosphate deaminase; as well as carbohydrate transportation proteins, such as raffinose/stachyose/melibiose transport system substrate-binding protein and components of multiple sugar transport system. Some of the genes are known to be involved in the pathways generating short-chain fatty acids. Phosphate acetyltransferase catalyzes the reversible interconversion of acetyl-CoA and acetyl phosphate, which is related to acetate synthesis [69]. Acetate kinase can catalyze the formation of acetyl phosphate from acetate and ATP; and also the reverse reaction to favor the formation of acetate. 3-hydroxybutyryl-CoA dehydrogenase converts 3-hydroxybutanoyl-CoA to acetoacetyl-CoA and is involved in the butanoate metabolism [70,71]. These functions were depleted in patient fecal samples. In patients, the elevated genes were mainly related to amino acid metabolism, such as aspartate carbamoyltransferase catalytic subunit, argininosuccinate lyase and ketol-acid reductoisomerase; and other molecule transporting proteins, such as F-type H + transporting ATPase, preprotein translocase subunit YajC, ABC transport system and N-acetylgucosamine PTS system EIIB components. At the same time, stress response related genes were also found to be enriched in patient faecal samples, such as molecular chaperone HtpG and chaperonin GroES (Fig. 4C).