Molecular Phenomics of Gut Ecosystem in COVID-19 Patients

Background: Gut ecosystem has profound effects on host physiology and health. Gastrointestinal (GI) symptoms were frequently observed in patients with COVID-19. Compared with other organs, gut antiviral response can result in more complicated immune responses because of the interactions between the gut microbiota and host immunity. However, there are still large knowledge gaps in the impact of COVID-19 on gut molecular proles and commensal microbiome, hindering our comprehensive understanding of the pathogenesis of SARS-CoV-2 and the treatment of COVID-19. Results: We performed longitudinal stool multi-omics proling to systemically investigate the molecular phenomics alterations of gut ecosystem in COVID-19. Gut proteomes of COVID-19 were characterized by disturbed immune, proteolysis and redox homeostasis. The expression and glycosylation of proteins involved in neutrophil degranulation and migration were suppressed, while those of proteases were upregulated. The variable domains of Ig heavy chains were downregulated and the overall glycosylation of IgA heavy chain constant regions, IgGFc-binding protein, and J chain were suppressed with glycan-specic variations. There was a reduction of benecial gut bacteria and an enrichment of bacteria derived deleterious metabolites potentially associated with multiple types of diseases (such as ethyl glucuronide). The reduction of Ig heave chain variable domains may contribute to the increase of some Bacteroidetes species. Many bacteria ceramide lipids with a C17-sphingoid based were downregulated in COVID-19. In many cases, the gut phenome did not restore two months after symptom onset. Conclusions: Our study indicates widely disturbed gut molecular proles which may play a role in the development of symptoms in COVID-19. Our ndings also emphasis the need for ongoing investigation of the long-term gut molecular and microbial alterations during COVID-19 recovery process. Considering the gut ecosystem as a potential target could offer a valuable approach in managing the disease. Exp LC-MS; Precursor Target ALL lipid classes; Ion adducts (positive ion mode) +NH 4 , +Na, 2 and adducts (negative Ion mode) of − and Top On; Main node Main peak; m-Score Quality Filter, (A: lipid class and FA are B: lipid Lipid D: Lipid ions (H 2 loss, and other m/z and following Calculate area, On; Filter type, New Top rank Main node Main m-Score threshold, 5.0; ID quality lter, A, B and C. using KNN. Quantile normalization and pareto scaling were employed. Unsupervised multivariate data analysis was performed using principal component analysis (PCA) and hierarchical cluster analysis (HCA). Signicantly differentiated omics features between COVID-19 and control groups (should present in at least 50% of samples) were detected using Wilcoxon’s rank sum test (FDR adjusted p value (q) < 0.05). Microbial taxonomic and functional groups were normalized by total abundance. Statistical signicance of microbial taxonomic groups was calculated using Mann-Whitney U test (p value < 0.05). Protein-microbiome and metabolite-microbiome correlations were determined using Pearson correlation (q < 0.05) using R. The association between differentiating omics features with categorical confounding variables (gender and medicine) were determined using Wilcoxon’s rank sum test. The association between differentiating omics features with continuous confounding variables (age and Body Mass Index (BMI)) were determined using Pearson correlation.


Introduction
The ongoing coronavirus disease 2019 (COVID-19) outbreak caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) continues to pose a threat to human beings. In addition to fever and cough, gastrointestinal (GI) symptoms were frequently observed in patients with COVID-19 [1][2][3]. We have provided the rst clinical and molecular evidence for GI infection of SARS-CoV-2 with viral RNA detected in both GI tissues (esophagus, duodenum and rectum) and stool samples [1,4]. Furthermore, viable virus have been isolated in patient stool samples [5]. Our studies revealed that 11.6% of patients have GI symptoms on admission, and 49.5% of patients developed GI symptoms during hospitalization [1]. Importantly, 11.6% of patients did not exhibit any imaging features of COVID-19 pneumonia but only showed GI symptoms. However, little is known about why and how SARS-CoV-2 affects GI tract. Current evidence suggests that SARS-CoV-2 uses the host receptor angiotensin converting enzyme 2 (ACE2) for cell entry and the transmembrane serine protease 2 (TMPRSS2) for Spike protein (S) priming [6].

Results
We collected a total of 53 stool samples from 13 COVID-19 patients with a range of one to nine longitudinal time-points that occurred 1-94 days post symptom onset (Fig. 1A, Table S1). Stool samples from 21 healthy subjects severed as controls. Positive RT-PCR results for SARS-COV-2 were observed in stool and/or perianal swab samples even 3 months post symptom onset in a patient with diabetes (patient 1). Furthermore, SARS-CoV-2 viral RNA can persist in stool and/or perianal swab samples long after respiratory samples have tested negative in patients 1, 4, 6, 7, 8, and 9, highlighting the susceptibility of GI tract to SARS-CoV-2 infection.
Multi-omics pro ling was performed on each sample to study the alterations of molecular phenomics of the gut ecosystem in COVID-19. Principal component analysis (PCA) showed partially or largely separated multi-omics pro les between COVID-19 patients and controls (Fig. 1B). Quality control (QC) samples of different omics data closely clustered together indicating reproducible MS measurements.
Samples from the same patients tended to cluster together indicating greater individual similarity even in the presence of the SARS-COV-2 perturbation.
We then investigated the longitudinal changes of these altered proteins in patients 5, 11, 12 and 13, who had serial stools displaying positive to negative stool SARS-CoV-2 infection. Overall, all patients showed considerable protein abundance variations, indicating an unstable gut proteome during the disease course of COVID-19 (Fig. 2B). In many cases, the protein abundance did not restore to the normal levels even several weeks after symptom onset . For patient 5, IGHV3-64D, IGLL1, CEACAM5, and CELA3A showed similar trajectory changes, whose abundance reached a high level 38-40 days after symptom onset but reduced dramatically on day 49. For patient 11, IGHV3-64D and CELA3A elevated on day 37 but decreased sharply on day 40. The longitudinal proteome changes of patient 13 were characterized by a steep fall of IGHV3-64D on days 37 and 39 and a gradually increase of CELA3A, ALPI, and PLA2G2A.

Glycosylation insight into mucosal immunological pathogenesis
To further investigate the phenomics alterations of COVID-19, we studied the protein glycosylation which plays a key role immunological regulation [19] by HILIC based enrichment (Tables S3 and S4). We rst analyzed the intact N-glycopeptides using pGlyco 2.0 because this search engine improves the identi cation accuracy by comprehensive quality control at all three levels of glycans, peptides, and glycopeptides [20]. In total, 4960 glycopeptide-spectrum matches (GPSMs) derived from 54 human proteins were identi ed with a 1% GPSM FDR (combing peptide FDR and glycan FDR) (  Table S4.3)) of major Nglycosylated sites of proteins involved in neutrophil degranulation (including ANPEP, AZU1, MGAM, CEACAM6, CEACAM8, LCN2, OLFM4, and SERPINA1) and neutrophil migration (GP2) were decreased by up to 81.6% in COVID-19. The N-glycosylation of mucins was dominated by Hex1Fuc1, the frequency of which reduced by approximately 25% (Fig. 3B). The N-glycosylation of proteases was dominated by Hex1Fuc1 and Hex3HexNAc2Fuc1 and the frequency of both glycans was reduced in COVID-19.
In contrast to the above proteins, Ig related proteins including IGHA2, FCGBP, and JCHAIN exhibited greater glycosylation heterogeneity. On the glycosite N131 of IGHA2, the frequency of glycan Hex3HexNAc4 decreased by 63.3% but that of analogue Hex3HexNAc5 (with an additional HexNAc) increased by 46.9% in COVID-19 ( Fig. 3B). On the same glycosite, the frequency of glycan Hex3HexNAc3 was comparable between two groups but the analogue Hex3HexNAc4 (with an additional HexNAc) was only detected in COVID-19. On the glycosite N205 of IGHA2, the relative frequency of glycans increased as the number of HexNAc increased. These results suggest the N-glycosylation alterations of gut IGHA2 are characterized by the conjugation of more complex glycans through the attachment of more HexNAc. The glycan speci c alteration was also observed in JCHAIN (N71), where glycan Hex3HexNAc3Fuc, with an additional Fuc compare to its counterpart, exhibited higher frequency in COVID-19. On the other hand, the frequency of the same glycan on different sites can be quite different. For instance, while Hex3HexNAc2Fuc was only detected in COVID-19 on N1063 of FCGBP, the frequency of this glycan was decreased in COVID-19 on N1317. Taken together, the overall N-glycosylation of IGHA2, FCGBP, and JCHAIN was suppressed with glycan-speci c and site-speci c variations.
We also extended our analysis to O-glycosylation and performed intensity based label-free quanti cation (Tables S4.4 and S4.5). Similar to N-glycoproteome, O-glycoproteome also revealed increased glycosylation of proteases and reduced glycosylation of IGHA2, FCGBP, ANPEP, and GP2. As shown in Fig. 3C Reduction of bene cial gut bacteria and potential hostbacteria interactions We used the metaproteomics approach, which is more accurate than sequencing methods for biomass estimates, to investigate microbial community structure and activity [21]. The relative abundance of 34 bacterial taxa were signi cantly changed between healthy subjects and patients with COVID-19, most of which were from the Firmicutes phylum (20 out of 34, 58.8%) followed by the Bacteroidetes phylum (10 out of 34, 29.4%) ( Table 5). Strikingly, the relative abundance of all 20 altered members in the Firmicutes phylum signi cantly decreased in COVID-19 (p < 0.05), the majority of which were butyrate-producers [22] belonging to the Lachnospiraceae family, such as genera Lachnoclostridium, Ruminococcus, Butyrivibrio, and Dorea, and species Blautia hansenii, Ruminococcus lactaris, and Tyzzerella nexilis (Fig. 4A). There was also a signi cant depletion of butyrate-producing genus Eubacterium in COVID-19, which also carry out bile acid and cholesterol transformations in the gut, contributing to gut and hepatic homeostasis through modulation of bile acid metabolism [23]. In addition, a recent study has found that several species of the phylum Firmicutes (such as genera Clostridium, Ruminococcus, and Eubacterium) were positively associated with memory scores, while species from the phylum Bacteroidetes mainly presented negative associations with memory scores [24]. Taken together, these data suggest a signi cant reduction of bene cial gut bacteria in COVID-19.
The relative abundance of all altered members in the Bacteroidetes phylum signi cantly increased in COVID-19 (p < 0.05), such as Bacteroides coprophilus, Bacteroides coprocola, Bacteroides graminisolvens, Bacteroides uniformis, and Bacteroides stercoris (Fig. 4A). Importantly, it has been shown Bacteroidetes and Firmicutes bacteria mainly down-regulate and up-regulate ACE2 expression in the murine gut, respectively [25]. Therefore, the enrichment of Bacteroidetes and the reduction of Firmicutes may potentially inhibit SARS-CoV-2 entry by down-regulating intestinal ACE2 expression.
Association analysis of altered host proteins and bacteria revealed potential host-microbiome interactions. Overall, bacteria groups increased in COVID-19 including B. coprophilus and B. coprocola exhibited negative correlations with host proteins, while those increased in COVID-19 such as Ruminococcus and Fusobacteria exhibited positive correlations (Fig. 4B). An exception was CEACAM6, which was positively associated with B. coprophilus. A recent study has shown CEACAM6 is critical for pathogen enterotoxigenic Escherichia coli adhesion [26]. The reduction of host proteins such as IGHV3-73 and IGHV3-64D may potentially contribute to the enrichment of Bacteroidetes phylum because of the reduced anti-bacteria Igs.

Enrichment of bacterial related deleterious metabolites
Using untargeted metabolomics, we identi ed 96 fecal metabolites signi cantly differed between control subjects and COVID-19 patients, mainly including nucleosides, nucleotides, bile acids, carboxylic acids, dipeptides, tripeptides, and acylated amino acids (Table S6). Notably, we detected an enrichment of several gut microbiome-related deleterious metabolites in COVID-19 (Figs. 5B and S1), including phenylacetyl glutamine (q = 0.01), which promotes cardiovascular disease such as platelet thrombosis [27], and salsolinol (q = 0.003), which is a potential gut bacterial neurotoxin contributing to the development of neurodegenerative diseases [28,29]. The reduction of Firmicutes phylum (such as class Clostridia, order Clostridiales, and genus Dorea) may be at least partially responsible for the increment of phenylacetyl glutamine because they were correlated inversely with each other (Figs. 5C). Longitudinal analysis indicated that the phenylacetyl glutamine level was sustained at high levels in sever patient 5 and in patient 11 who exhibited signi cant GI symptoms two months after symptom onset. In contrast, this metabolite was kept at a steady and normal level in patient 12 throughout the course of disease and restored to a normal level in patient 13 after one month following symptom onset (Fig. 5B).
We also observed elevated levels of uric acid (q = 0.002) in COVID-19, a uremic toxin playing an important role in several kidney diseases such as lithiasis, gout nephropathy, and preeclampsia. One third of endogenous uric acid is extrarenally excreted via the gut lumen, where it undergoes uricolysis by gut microbiota [30,31]. Increased fecal uric acid was positively associated with several Bacteroides species (Figs. 5C). Interestingly, although all COVID-19 patients involved in this study were non-drinkers, a signi cantly higher abundance of ethyl glucuronide in COVID-19 (q = 0.01), a metabolite of ethanol formed by glucuronidation, was observed in the COVID-19 group, which indicates a higher susceptibility of ethanol toxicity. Recent studies have demonstrated that certain gut bacteria (such as Klebsiella pneumoniae) contribute to endogenous ethanol production and promote the development of nonalcoholic fatty liver disease [32][33][34]. Furthermore, gut microbial (such as E. coli and Clostridum sordellii) β-glucuronidases could hydrolyze ethyl glucuronide, which may increase the retention of ethanol in the body by enterohepatic circulation [35]. For patient 11, both ethyl glucuronide and uric acid climbed sharply on day 35 of disease onset, when the discriminative proteins IGHV3-64D, CELA3A, ALPI, and PLA2G2A also dramatically increased (Fig. 5B).
In addition to chain length, the degree of unsaturation also in uenced the behavior of certain lipid species. Highly unsaturated Cer lipids with trihydroxy bases carrying 6 or 7 double bonds (Cer(t40:7) and Cer(t40:6)) were downregulated in COVID-19, while those upregulated Cer lipids have no more than 3 double bonds (Fig. 6B). Similarly, many highly unsaturated DG lipids carrying 5-8 double bonds, such as DG (

Increased lipid peroxidation and disturbed redox homeostasis in host and microbiome
We also observed proteomics level evidence of altered lipid features. The increased frequency of protein modi cation by reactive lipid peroxidation products 4-hydroxynonenal (HNE) and 4-oxononenal (ONE) suggests oxidative stress in COVID-19 (Fig. 6E). Indeed, human superoxide dismutase (SOD1), the major antioxidant enzyme for superoxide removal and the rst line of defense against oxidative stress, was signi cantly downregulated in COVID-19 ( Fig. 2A). Meanwhile, NADH peroxidase, which reduces peroxides, of several bacterial species and genera belonging to order Clostridiales were also down regulated (Fig. 4D). These results indicate a redox homeostasis disruption for both host and gut bacteria.
Disscussion GI tract is susceptible to SARS-COV-2 infection due to the high expression of ACE2 receptor. GI symptoms are frequently observed in patients with COVID-19. Gut's immune responses to SARS-CoV-2 necessitate greater attention because they can alter the commensal microbiome and the crosstalk between microbiota and extra-intestinal organ immunity. However, little is known about the importance of the enteric SARS-CoV-2 for the development of COVID-19-associated pathologies. Increasing evidence has shown that COVID-19 can promote cardiovascular disorders such as myocardial injury, acute coronary syndrome, and thromboembolism [42], neurologic symptoms such as myalgias, encephalopathy, and dizziness [43], and kidney manifestations such as proteinuria and dipstick hematuria [44]. Our study revealed an enrichment of gut bacteria related deleterious metabolites including phenylacetylglutamine (capable of causing cardiovascular diseases), neurotoxin salsolinol, and uremic toxin uric acid. In addition to metabolites, we observed a larger number of altered host and bacterial lipids (predominated by sphingolipids such as ceramide and hexosylceramide). Sphingolipids produced by gut bacteria can enter host metabolic pathways and impact host ceramide level [40]. Our study may provide an alternative microbiome-based molecular mechanism to explain how the gut ecosystem may play a role in the development of symptoms in COVID-19 and impact the host metabolome and lipidome.
The anti-viral response may impose an immunological off-target effects on gut microbiome in COVID-19 patients. Indeed, we observed disturbed mucosal immunological defense. The reduction of host Igs such as IGHV3-73 and IGHV3-64D may potentially contribute to the enrichment of Bacteroidetes phylum because of the reduced anti-bacteria Igs. The suppressed expression of proteins involved in neutrophil degranulation and migration can also impair the gut anti-bacteria defense system. Furthermore, there is an increased risk of colonic mucosal damage and therefore greater risk of viral and bacterial infection in COVID-19 because of the increased intestinal protease and glycosylation (indicating potential higher activity) and suppressed mucin glycosylation (important for mucin protection function). A major limitation of our study of is the limited sample size and further larger scale studies are needed.
Nevertheless, our study has demonstrated widely disturbed gut molecular pro les which may play a role in the development of symptoms in COVID-19. Considering the gut ecosystem as a potential target could offer a valuable approach in managing the disease.

Conclusions
Using metaproteomics, metabolomics, glycoproteomics, and lipidomics, our study has demonstrated widely disturbed gut molecular pro les and microbial structure in COVID-19 characterized by disturbed immune, proteolysis and redox homeostasis. Our ndings suggest that considering the gut ecosystem as a potential target could offer a valuable approach in managing the disease.

Subject details
This study involved 13 patients with COVID-19 and 21 healthy controls. The associated clinical data and metadata are provided in Table S1 and Fig. 1A. SARS-CoV-2 infection was con rmed by two consecutive real-time reverse transcription PCR (RT-PCR) tests. These patients were classi ed into three groups according to the severity of their symptoms: mild (mild clinical symptoms without pneumonia manifestations in CT imaging (7 patients)); moderate (respiratory symptoms, fever, and imaging features of COVID-19 pneumonia (5 patients); severe (respiratory distress (respiratory rate ≥ 30 breaths/min), oxygen saturation ≤ 93% and arterial oxygen tension (PaO 2 )/fractional inspired oxygen (FiO 2 ) ratio ≤ 300 mm Hg (1 patient)). We collected 53 stool samples from COVID-19 patients with a range of one to nine longitudinal time-points that occurred 1-94 days post symptom onset (Fig. 1A, Table S1). Stool samples from 21 healthy subjects severed as controls. A total of 73 stool samples were subjected to multi-omics analysis.

Glycopeptide enrichment
Glycopeptides were enriched using hydrophilic interaction liquid chromatography (HILIC) cartridges packed with the C18 plug followed by microcrystalline cellulose resins [45].

Glycoproteomics data acquisition
Peptides were trapped onto an Acclaim PepMap 100 C18 column (75 µm × 20 mm, 3 µm, 100 Å, Thermo Scienti c) at a ow rate of 8 µL/min and separated using an Acclaim PepMap 100 C18 column (75 µm × 250 mm, 2 µm, 100 Å, Thermo Scienti c) at 300 nL/min. Mobile phase solvents were 0.1% formic acid in water (A) and 0.1% formic acid in 50% acetonitrile and 40% isopropanol (B). The separation gradient was as follows: 5% B at 0-15 min, 20%-30% B at 90-100 min, and 98% B at 107 min and kept for 20 min. Data were acquired on an Orbitrap Fusion Lumos Tribrid mass spectrometer with a Nanospray Flex ion source in positive ionization mode with a spray voltage of + 2600 V using Xcalibur software (Thermo Scienti c, San Jose, CA, USA). The ion transfer tube temperature was 300°C, the vaporized temperature was 325°C, the sheath gas ow was 40 units, the auxiliary gas ow was 15 arbitrary units, and the sweep gas was 1 unit. Full scan MS spectra was acquired in the 400 − 1,600 m/z range with a maximum injection time of 50 ms and a resolution of 60K at m/z 200. MS/MS spectra were acquired using HCD with stepped NCE at 20%, 30%, and 40% to generate fragment ions of both glycan and peptide of a glycopeptide in a single spectrum MS/MS spectra. The resolution of HCD was 15K with a maximum ion injection time of 22 ms.

Metaproteomics data analysis
Peptide identi cations were performed using the search engine PEAKS DB combined with PEAKS de novo sequencing [48] (De Novo ALC(%) threshold was 15). False discovery rate (FDR) was set to 1% using the decoy fusion approach. Raw les were re ned by precursor ion mass correction and resolving chimeric MS/MS spectra. The precursor mass tolerance was set to 15 ppm and the fragment mass tolerance to 0.03 Da. Enzyme speci city was set to trypsin and up to three missed cleavage sites were allowed. The maximum number of variable posttranslational modi cations per peptide was three, including acetylation of protein N-terminus, carbamidomethylation of Cys, oxidation of Met, deamidation of Asn and Gln as well as Pyro-glu from Gln. PEAKS PTM search tool [49] was used to search for peptides with unspeci ed modi cations (313 built-in post-translational modi cations), and the SPIDER [50] search tool was used for exploring novel peptides that are homologous to peptides in the protein database.
Database search was performed using a comprehensive meta-database containing human, microbial, and dietary organism sequences [51]. The gut microbial protein database was generated by combining the following parts: (1) the integrated gene catalog of 1,267 human fecal metagenomes [52]; (2) the 1,520 reference genomes of > 6,000 cultivated human fecal bacteria isolates [53]; (3)  Glycoproteomics data analysis High-con dence identi cation of intact N-glycopeptides was performed by pGlyco 2.0 [20]. Sequences of proteins identi ed in the above metaproteomics analysis as well as the SARS-COV-2 protein sequences were used in glycopeptide identi cation. The precursor mass tolerance was set to 10 ppm and the fragment mass tolerance to 20 ppm. For N-glycopeptide analysis, a FDR of 5% at the glycan level and a FDR of 1% at the peptide level was used. Match between run and intensity based label-free quanti cation was not supported by pGlyco. To calculate the frequency of N-glycosylation at a speci c site, any sample containing a glycopeptide bearing an N-glycan at that site was considered positive regard less of the variations of glycopeptide sequences and glycans. To calculate the frequency of a speci c N-glycan at a speci c site, any sample containing a glycopeptide bearing that glycan at that site was considered positive regard less of glycopeptide sequence variations. The frequency was de ned as the number of positive samples relative to the total number of samples in each group (18 for the control and 49 for the COVID-19 groups, respectively).

Metabolomics data analysis
Metabolomics features were extracted, aligned, identi ed and quanti ed using Compound Discoverer

Statistical analysis
The raw quanti cation data matrix of different omics was imported to MetaboAnalyst [58] for further processing and analysis. Data ltering was performed using interquantile range (IQR) to remove baseline noises. Missing values were imputed using KNN. Quantile normalization and pareto scaling were employed. Unsupervised multivariate data analysis was performed using principal component analysis (PCA) and hierarchical cluster analysis (HCA). Signi cantly differentiated omics features between COVID-19 and control groups (should present in at least 50% of samples) were detected using Wilcoxon's rank sum test (FDR adjusted p value (q) < 0.05). Microbial taxonomic and functional groups were normalized by total abundance. Statistical signi cance of microbial taxonomic groups was calculated using Mann-Whitney U test (p value < 0.05). Protein-microbiome and metabolite-microbiome correlations were determined using Pearson correlation (q < 0.05) using R. The association between differentiating omics features with categorical confounding variables (gender and medicine) were determined using Wilcoxon's rank sum test. The association between differentiating omics features with continuous confounding variables (age and Body Mass Index (BMI)) were determined using Pearson correlation.

Declarations
Ethics approval and consent to participate This study was approved by the Ethics Committee of The Fifth A liated Hospital, Sun Yat-sen University (K161-1).