Median age of the sampling group was 46 years with minimum 12 and maximum 88 years, in which, there were 107 males and 34 females. Wherever possible, we have also collected information of the symptoms and co-morbidities of the COVID-19 patients. Many of the demised patients have comorbidities such as heart disease and hypertension. Except for two patients, none of in the recovered group have any comorbidity. Cough, fever, and breathlessness, were the most common symptoms in the demised COVID-19 patients and fever, body ache, and loss of smell were most common among the recovered patients. Additionally, we also sequenced SARS-CoV-2 genome from all the recovered and demised patients (82 samples), where, except 8 demised patients, all the patients were found to be infected with B.1.617.2 (Delta) variants (Additional file 2: Table S1). For this study, we extracted total DNA from 141 nasopharyngeal swab samples, which included 54, 43, and 44 demised, recovered and control samples, respectively. Out of 141 samples, 14 samples (8 demised and 7 recovered samples) failed in PCR amplification. From the remaining 126 samples, 16 samples were discarded based on either having insufficient reads or they are outliers. Finally, a total of 110 samples which include 48 demised, 29 recovered and 33 control samples were analyzed and reported in this study. In total, we generated 42.7 million reads with an average size of 336bp. Details of sequencing reads generated for each sample is provided in Additional file 2: Table S2.
Microbial diversity analysis
Rarefaction curve (Additional file 1: Figure S1), clearly depicting that, in all the samples the curve almost reached a plateau, means the generated reads are sufficient to capture the bacterial diversity present in the samples. Beta-diversity (Genus level) determined based on PCoA (Fig. 1A) (PERMANOVA; F-value: 6.5886; R2: 0.10965; p-value <0.001) NMDS (Fig. 1B) (PERMANOVA; F-value: 6.5886; R2: 0.10965; p-value <0.001 [NMDS] Stress = 0.20839) based on Bray-Curtis distance matrix revealed significant difference in the microbiota composition among the three groups. PCoA explained overall 22.5% and 17.1% variation on Axis1 and Axis 2, respectively. PCoA plots for between two groups comparison are shown in Additional file 1: Figure S2, where also we observed significant difference (p-value <0.001) between control-demised, control-recovered and demised-recovered groups. Dendrogram (Additional file 1: Figure S3) also showed distinct clustering of three groups except few samples which did not show clustering according to group. Microbial species richness assessed based on Cho1 (Fig. 1C), and ACE (Fig. 1D), and observed features (Figure 1E) index showed higher diversity in the demised patients group, however, the difference among the three groups was not significant [Cho1: p-value: 0.41324; (Kruskal-Wallis) statistic: 1.7674; ACE: p-value: 0.39582; (Kruskal-Wallis) statistic: 1.8536 and Observed: p-value: 0.067215; (Kruskal-Wallis) statistic: 5.3997]. Also, there was no significant difference in the Shannon and Simpson indices among the three groups. However, in two group comparison, we observed a significant difference in specie richness i.e. Observed; p-value: 0.017453; [T-test] statistic: 2.4342, Chao 1: p-value: 0.030277; [T-test] statistic: 2.2084, and ACE: p-value: 0.021; [T-test] statistic: 2.3579 while comparing demised and recovered patients group. However, there was no significant differences between control-demised and control-recovered COVID-19 patients group (Additional file 1: figure S4).
Microbiome composition
In all the samples, unclassified bacteria were present with an abundance >23.0% with highest (35.2%) in the group of the patients who recovered. Genus Pseudomonas was most prevalent in the control group i.e. 31.34%, followed by, 4.11% in demised, and only 1.14% in recovered group. Similarly, Prevotella was also found to be abundant in the control group as compared to the other two groups. Coming to the demised patients, here, Corynebacterium (6.58%), Enterococcus (5.21%), Acinetobacter (3.89%), Streptococcus (3.05%), Pseudoalteromonas (2.36%), Propionibacterium (2.6%), Staphylococcus (2.47%) and others were abundant as compared to the rest two groups. Likewise, unclassified bacteria (35.23%), Ochrobactrum (12.15%), Burkholderia (4.79%), Brevundimonas (2.88%), Leptotrichia (2.15%) and some others were abundant in the patients those who received from COVID-19 (Additional file 1: Figure S5 and Additional file 2: Table S3). An extended error plots at genus level showing between group comparisons are provided in Additional file 1: Figure S6. Differential abundances of various genera across two groups comparison are shown as a heat-map trees as well as dot plot from LEfSe analysis in Fig. 2 and listed in Additional file 2: Table S4. Total 25 and 13 genera and 55 and 33 species with >0.1% abundance, were exclusively present in demised and recovered patients, respectively (Additional file 1: Figure S7, Additional file 2: Tables S5 and S6).
Microbiome signature of the COVID-19 patients
LEfSe analysis is widely used to determine the biomarkers across the different samples. Therefore, in this study, LEfSe analysis was used to identify the biomarker genera as a signature of the nasopharyngeal microbiome among the control, infected but recovered and infected but demised COVID-19 patients. Total 32 genera (15, 10, and 7 in demised, recovered and control patients, respectively) and 46 species (22, 13, and 11 in demised, recovered and control patients, respectively) were identified as biomarkers with LDA score > 4.0 and p-value <0.05 (Fig. 3 and Additional file 2: Tables S7 and S8). Genera, Corynebacterium, with the LDA score 5.51(highest in the demised group) and Staphylococcus, Serratia, Micrococcus, and Klebsiella along with their, pathogenic or opportunistic pathogenic species such as C. xerosis, S. epidermidis, S. liquefaciens, K. pneumoniae were picked out as a biomarker for demised COVID-19 patients. In a similar manner, Ochrobactrum (LDA core 5.79, highest in the recovered group), Burkholderia (LDA core 5.29), unclassified Betaproteobacteria, and their pathogenic or opportunistic pathogenic species such as O. anthropi, and O. tritici, and B. cepacia were spotted as biomarkers for recovered COVID-19 patients. Pseudomoans with highest LDA core 6.19 among the three group was picked as a biomarker for control group. Apart from this, several health oral and nasal cavity commensal including Fusobacterium (F. periodonticum), Veillonella (V. parvula), Porphyromonas (P. catoniae), and Bulleidia (Bulleidia extructa) along with some pathogenic bacteria like, Neisseria (N. flavescens) were biomarkers. Fig. 4 depicts the Log transformed abundance of the important biomarkers genera. Random forest classification showed an excellent judgement of the guessing the samples in their respective group, with 44/48, 19/29 and 29/33 correct prediction of demised, recovered and control patients, respectively in their corresponding group with an overall 16.4% out of bag (OOB) error rate (Fig. 5).