Patients Enrollment and Sample Preparation
Blood samples for 231 COVID-19 patients without any comorbidities were collected from Tongji Hospital and Union Hospital of Huazhong University of Science and Technology, Xiangyang Central Hospital, Hubei University of Arts and Science and Hubei Dazhong Hospital of Chinese Traditional Medicine from 19th February, 2020 to 26th April, 2020. Flowchart of patient selection for this study were shown in Extended Data Fig. 1. The demographic data and laboratory indicators were shown in Supplementary Tables 1-2. The mean age of the patients was 46.7 years old (Standard Deviation=13.5), and the ratio of male to female was 1.12:1. All these patients were diagnosed following the guidelines for COVID-19 diagnosis and treatment (Trial Version 7) released by the National Health Commission of the People’s Republic of China. The patients were classified into four groups according to their disease severity: critical, severe, mild, and asymptomatic. The critical disease was defined as fulfilling at least one of the following conditions: (1) acute respiratory distress syndrome (ARDS) requiring mechanical ventilation, (2) shock, (3) combining with other organ failure requiring ICU admission. Severe disease met at least one of the following conditions: (1) respiratory rate ≥ 30 times/min, (2) oxygen saturation ≤93% at resting state, (3) arterial partial pressure of oxygen (PaO2)/fraction of inspired oxygen (FiO2) ≤300 mmHg, (4) pulmonary imaging examination showed that the lesions significantly progressed by more than 50% within 24-48 hours. Mild patients were defined as having fever, respiratory symptoms, lung imaging evidence of pneumonia. The patients with normal body temperature, without any respiratory symptoms were defined as asymptomatic. The definition of each severity was consistent with the previous article 76. All Ethylenediaminetetraacetic acid disodium salt (EDTA-2Na)-anticoagulated venous blood samples were separated by centrifuge at 3,000 rpm, room temperature for 7min after standard diagnostic tests, the whole blood cells were stored at -80°C, 200 μL aliquot of serum were added 800μL ice-cold methanol, mixed well and stored at -80°C, another 200 μL aliquot of serum were added 800μL ice-cold isopropanol, mixed well and stored at -80°C.
Nucleic Acid Extraction
A 200 μL aliquot of each thawed whole blood cells was used to extract DNA using QIAamp DNA Blood Mini Kit (51304, Qiagen), following the manufacturer’s instructions. Total RNA was extracted from another 200 μL aliquot of blood cells using QIAGEN miRNeasy Mini Kit (217004，Qiagen) according to the manufacturer’s protocol. All the extraction was performed under Level III protection in the biosafety III laboratory.
Sequencing Library Construction and Data Generation
The whole genome data was generated through the following steps: 1) DNA was randomly fragmented by Covaris. The fragmented genomic DNA were selected by Magnetic beads to an average size of 200-400bp. 2) Fragments were end repaired and then 3’ adenylated. Adaptors were ligated to the ends of these 3’ adenylated fragments. 3) PCR and Circularization. 4) After library construction and sample quality control, whole genome sequencing was conducted on MGI2000 PE100 platform with 100bp paired end reads.
Transcriptome RNA data was generated through the following steps: 1) rRNA was removed by using RNase H method, 2) QAIseq FastSelect RNA Removal Kit was used to remove the Globin RNA, 3) The purified fragmented cDNA was combined with End Repair Mix, then add A-Tailing Mix, mix well by pipetting, incubation, 4) PCR amplification, 5) Library quality control and pooling cyclization, 6) The RNA library was sequenced by MGI2000 PE100 platform with 100bp paired-end reads.
Small RNA data was generated through the following steps: 1) Small RNA enrichment and purification, 2) Adaptor ligation and Unique molecular identifiers (UMI) labeled Primer addition, 3) RT-PCR, Library quantitation and pooling cyclization, 4) Library quality control, 5) Small RNAs were sequenced by BGI500 platform with 50bp single-end reads resulting in at least 20M reads for each sample.
We detected cytokines including IL-6, IL-8, IL-10, IL-2R in serum samples of patients. Assays were conducted by using an automated analyzer (Cobas e602, Roche Diagnostics, Germany or Immulite 1000, DiaSorin Liaison, Italy) as described in the manufacturer’s instructions. IL-6 kit (#05109442190) was obtained from Roche Diagnostics (Mannheim, Germany). IL-8 kit (#LK8P1), IL-10 kit (#LKXP1), IL-2R kit (#LKIP1) were obtained from DiaSorin (Vercelli, Italy).
WGS data analysis and joint variant calling
Whole genome sequencing data was processed using the Sentieon Genomics software (version: sentieon-genomics-201911) 77. Pipeline was built according to the best practice’s workflows for germline short variant discovery described in https://gatk.broadinstitute.org/. Sequencing reads were mapped to hg38 reference genome using BWA algorithm 78. After duplicates marking, InDel realignment and base quality score recalibration (BQSR), per-sample variants were called using the Haplotyper algorithm in the GVCF mode. Then the GVCFtyper algorithm was used to perform joint-calling and generate cohort VCF. Variant Quality Score Recalibration was performed using Genome Analysis Toolkit (GATK version 4.1.2) 79. The truth-sensitivity-filter-level were set as 99.0 for both the SNPs and the Indels. Finally, variants with PASS flag and quality score ≥ 100 were selected for further analysis.
Genotype-Phenotype Association Analysis
PCA was performed using PLINK (v1.9) 80. Bi-allelic SNPs were selected based on the following criteria: minor allele frequency (MAF) ≥ 5%; genotyping rate ≥ 90%; LD prune (window = 50, step = 5 and r2 ≥ 0.5). A subset of 605,867 SNPs was used to perform PCA on the 203 unrelated individuals. We used rvtest 81 to perform genotype-phenotype association analysis for 5,082,104 bi-allelic common SNPs with MAF > 5%. Gender, age and top 10 principal components were used as covariates for all the association tests. The qqman 82 and CMplot R packages 83 were applied to generate the Manhattan plot and quantile-quantile plot. We defined genome-wide significance for single variant association test as 5e-8, suggestive significance as 1e-6.
We obtained matched proteomics, lipidomics, metabolomics, gene expression and SNP genotyping data for COVID-19 patients (n = 132). For the genotyping data, we removed outlier SNPs with MAF < 0.05. The QTL analysis (cis-eQTL analysis [local, distance < 10kb] for gene expression data, QTL analysis for proteomics, lipidomics, metabolomics data) was conducted using linear regression as implemented in MatrixEQTL 84. In this analysis, age and gender (1 for male and 2 for female) were considered as covariates. Associations with a p value less than 0.001 were kept, followed by FDR estimation using the Benjamini-Hochberg procedure as implemented in Matrix-QTL. QTL associations with an FDR-corrected p value < 5e-8 were considered significant 85.
Gene Expression Analysis
RNA-seq raw sequencing reads were filtered by SOAPnuke 86 to remove reads with sequencing adapter, with low-quality base ratio (base quality < 5) > 20%, and with unknown base ('N' base) ratio > 5%. Reads aligned to rRNA by Bowtie2 (v2.2.5) 87 were removed. Then, the clean reads were mapped to the reference genome using HISAT2 88. Bowtie2 (v2.2.5) was applied to align the clean reads to the transcriptome. Then the gene expression level (FPKM) was determined by RSEM 89. Genes with FPKM > 0.1 in at least one sample were retained. Differential expression analysis was performed using DESeq2 (v1.4.5) with gender and age as confounders. Differential expressed genes were defined as those with Benjamini Hochberg adjusted p value < 0.05 and fold change > 2. GO enrichment analysis was performed using clusterProfiler 90. GO BP terms with an FDR adjusted p value threshold of 0.05 were considered as significant 91.
Small RNA raw sequencing reads with low quality tags (which have more than four bases whose quality is less than ten, or have more than six bases with a quality less than thirteen.), the reads with poly A tags, and the tags without 3' primer or tags shorter than 18nt were removed. After data filtering, the clean reads were mapped to the reference genome and other sRNA database including miRbase, siRNA, piRNA and snoRNA using Bowtie2 87. Particularly, cmsearch 92 was performed for Rfam mapping. The small RNA expression level was calculated by counting absolute numbers of molecules using unique molecular identifiers (UMI, 8-10nt). MiRNA with UMI count lager than 1 in at least one sample were considered as expressed. Differential expression analysis was performed using DESeq2 (v1.4.5) 93 with gender and age as confounders to control for the additional variation and the detection cutoff was set as adjusted P < 0.05 and log2 of fold change ≥ 1.
Construction of mRNA-miRNA and mRNA-lncRNA Network
To investigate the post-transcriptional regulation, spearman correlation coefficients of mRNA-miRNA (Supplementary Table 7) and mRNA-lncRNA were calculated (Supplementary Table 8). Correlation pairs with coefficients < -0.5 in mRNA-miRNA or < -0.6 in mRNA-lncRNA were retained. MultiMiR was used to confirm the top pairs of mRNA-miRNA by performing miRNA target prediction 94. The mRNA-miRNA and mRNA-lncRNA networks were visualized using Cytoscape (Fig. 2d) 95.
The sera samples were inactivated at 56°C water bath for 30min and followed by processing with the Cleanert PEP 96-well plate (Agela, China). According to the manufacturer’s instructions, high-abundance proteins under a denaturing condition were removed 96. The Bradford protein assay kit (Bio-Rad, USA) was used to determine the final protein concentration. The proteins were extracted by the 8M urea and subsequently reduced by a final concentration of 10mM Dithiothreitol at 37°C water bath for 30min and alkylated to a final concentration of 55mM iodoacetamide at room temperature for 30min in the darkroom. The extracted proteins were digested by trypsin (Promega, USA) in 10 KD FASP filter (Sartorious, U.K.) with a protein-to-enzyme ratio of 50:1 and eluded with 70% acetonitrile (ACN), dried in the freeze dryer.
DIA (Data Independent Acquisition) strategy was performed by Q Exactive HF mass spectrometer (Thermo Scientific, San Jose, USA) coupled with an UltiMate 3000 UHPLC liquid chromatography (Thermo Scientific, San Jose, USA). The 1μg peptides mixed with iRT (Biognosys, Schlieren, Switzerland) were injected into the liquid chromatography (LC) and enriched and desalted in trap column. Then peptides were separated by self-packed analytical column (150μm internal diameter, 1.8μm particle size, 35cm column length) at the flowrate of 500 nL/min. The mobile phases consisted of (A) H2O/ACN (98/2,v/v) (0.1% formic acid); and (B) ACN/H2O (98/2,v/v) (0.1% formic acid) with 120 min elution gradient (min, %B): 0, 5; 5, 5; 45, 25; 50, 35; 52, 80; 55, 80; 55.5, 5; 65, 5. For HF settings, the ion source voltage was 1.9kV; MS1 range was 400-1250m/z at the resolution of 120,000 with the 50 ms max injection time(MIT). 400-1250 m/z was equally divided into 45 continuous windows MS2 scans at 30,000 resolution with the automatic MIT and automatic gain control (AGC) of 1E6. MS2 normalized collision energy was distributed to 22.5, 25, 27.5.
The raw data was analyzed by Spectronaut software (12.0.20491.14.21367) with the default settings against the self-built plasma spectral library which achieved deeper proteome quantification. The FDR cutoff for both peptide and protein level were set as 1%. Next, the R package MSstats 97 finished log2 transformation, normalization, and p-value calculation.
The 100μl sera of each sample were transferred into the 96-well plate and mixed with 10μl SPLASH LipidoMixTM Internal Standard (Avanti Polar Lipids, USA) and 10μl home-made Internal Standard mixture containing D3-L-Methionine (100 ppm, TRC, Canada), 13C9-Phenylalanine (100ppm, CIL, USA), D6-L-2-Aminobutyric Acid(100ppm, TRC, Canada), D4-L-Alanine (100ppm, TRC, Canada), 13C4-L-Threonine (100ppm, CIL, USA), D3-L-Aspartic Acid (100ppm, TRC, Canada), and 13C6-L-Arginine (100ppm, CIL, USA). The 300μl pre-chilled extraction buffer of methanol/ACN (67/33, v/v) was added to the plasma sample then vortexed for 1 min and incubated at -20°C for 2 hours. After centrifugation at 4000 RPM for 20 min, 300ul supernatants were taken and dried in the freeze dryer. The metabolites were dissolved in 150μl buffer of methanol/ACN (50/50, v/v) and centrifuged at 4000 RPM for 30min. Supernatants were injected into mass spectrometer.
Metabolomics data acquisition was completed using a same spectrometer, LC, and settings were set as lipidomics except for following parameters: the mobile phases of positive mode were (A) H2O (0.1% formic acid) and (B) methanol (0.1% formic acid). The mobile phases of negative mode were (A) H2O (10mM NH4HCO2) and (B) methanol /H2O (95/5, v/v) (10 mM NH4HCO2). Both positive and negative models used the same gradient (min, %B): 0, 2; 1, 2; 9, 98; 12, 98; 12.1, 2; 15, 2. The temperature of column was set at 45°C. MS1 range set as 70-1050m/z. MS2 stepped normalized collision energy was distributed to 20, 40, 60.
The raw data was searched by Compound Discoverer 3.1 software (Thermo Fisher Scientific, USA) with different libraries including our self-built BGI library containing more than 3000 metabolites with corresponding detailed mass spectrum data. After quantification, subsequent processing steps were finished by metaX as same as lipidomics analysis.
The 100 μl sera of each sample was transferred into the 96-well plate and mixed with 10 μl SPLASH LipidoMixTM Internal Standard (Avanti Polar Lipids, USA). The 300μl pre-chilled Isopropanol (IPA) was added to the plasma sample and vortex for 1 min and incubated at -20°C overnight. Then samples were centrifuged at 4000 RPM for 20min while proteins precipitated. The supernatants were used for MS analysis.
Lipidomics analysis was performed using Q Exactive mass spectrometer (Thermo Scientific, San Jose, USA) coupled with Waters 2D UPLC (waters, USA). The CSH C18 column (1.7μm 2.1*100mm, Waters, USA) was used for separation with following elution gradient (min, %B) consisted of (A) ACN/H2O (60/40, v/v) (10 mM NH4HCO2 and 0.1% formic acid) and (B) IPA/ACN (90/10, v/v) (10 mM NH4HCO2 and 0.1% formic acid): 0, 40; 2, 43; 2.1, 50; 7, 54; 7.1, 70; 13, 99; 13.1, 40; 15, 40. The temperature of column was set as 55°C, the injection value was set as 5μL, and the flowrate was set as 0.35mL/min. For HF settings, the samples were scanned twice in both positive and negative modes. The positive spray voltage was set as 3.80 kV and negative spray voltage was set as 3.20 kV. MS1 range was 200-2000m/z at the resolution of 70,000 with the 100ms MIT and AGC of 3e6. The top3 precursors were set as trigger MS2 scans at the resolution of 17,500 with the 50ms MIT and AGC of 1E5. MS2 stepped normalized collision energy was distributed to 15, 30, 45. The sheath gas flow rate was set as 40 and the aux gas flow rate was set as 10.
The raw data was analyzed by Lipidsearch software Version 4.1 (Thermo Fisher Scientific, USA) which finished feature detection, identification and alignment. The following settings were applied: tolerance of mass shift, 5ppm; identification grade, A-D; filters, top rank; all isomer peak, FA priority, M-score, 5; c-score, 2.0; The export quantitative data from Lipidsearch was analyzed by R package metaX 98 which finished the normalization, correction of batch effect, and imputation of missing value.
For each patient in the cohort, we computed intensity for a given lipid complex class by summing up intensity of each lipid in the class. For each lipid complex class, the intensity value of each patient was further scaled by median value of intensity from mild patient group. We applied Mann-Whitney U-test (multiple comparisons correction with Bonferroni) to test statistically significant difference of scaled intensity of each lipid complex class between severity groups.
Differential Expression of Proteins, Metabolites and Lipids
Expression data was first adjusted using robust linear model (RLM) for gender and age. The residuals following RLM were analyzed by Two-sided Mann-Whitney rank test for each pair of comparing group and p values were adjusted using Benjamini & Hochberg. Differentially expressed proteins, metabolites or lipids were defined using the criteria of adjusted P value < 0.05 and absolute value of fold change > 1.5.
Clustering was performed using the R package ‘Mfuzz’ after log2-transformation and Z-score scaling of the data. For mRNA from whole blood, genes differentially expressed in at least three out of the six comparison groups were clustered. For proteins, metabolites, lipids from sera, all the three analytes were clustered together.
To annotate the proteins and metabolites in 7 clusters, gene ontology (GO) enrichment analysis were performed to obtain the enriched GO Biological Process terms of proteins in different clusters by clusterProfiler 90. And the 7 lists of metabolites in KEGG ID were classified into pathways by the Kyoto encyclopedia of genes and genomes (KEGG) database. The KEGG annotation was finished using in-house software.
Correlation Network Analysis
Pairwise Spearman’s rank correlations were calculated using the r package ‘Hmisc’ and weighted, undirected networks were plotted with Cytoscape. Correlations with Bonferroni adjusted P values < 0.05 and absolute correlation coefficient >0.4 (Supplementary Table 8) were included and displayed via the Fruchterman-Reingold method. Nodes color indicate analytes type and their size represent the degree of the node.
Construction of gene regulatory network
The ARACNe-AP 99 was employed to construct the gene regulatory networks (GRNs) for each group. The variability of genes expression traits was evaluated by Median Absolute Deviation (MAD), and the top half of genes were recruited in the network. Mutual information 100 was introduced to represent the strength of the regulatory relationship between TFs and target genes, and only significant pairs are kept (P<1×10-8).We also executed 100 bootstraps and applied a Data Processing Inequality tolerance filter 101. The consensus network of each group was combined by statistically significant edges across all bootstrap networks (p<0.05, Bonferroni corrected), based on Poisson distribution. The degree was used to evaluate the centrality of genes in the network. To ensure the robustness of our remodeled GRN, we applied Chip-X Enrichment Analysis Version 3(ChEA3) 102 to identify TFs that target to IFN and IFN receptors, and those unrecognized were eliminated.
Quantification of cell fractions from bulk RNAseq profiles
The estimation of abundances of immune cell types in blood tissue was performed using CIBERSORTx 103 based on blood RNAseq data.
Protein interaction network construction and function enrichment analysis
Interaction network construction and biological process GO term enrichment for protein lists were conducted using STRING 104 database with default parameters.