Subjects
Enrolled subjects were 18 years or older, admitted to the intensive care units (ICUs) at NYU Langone Health from March 10th to May 10th, 2020 with a nasal swab confirmed diagnosis of SARS-CoV-2 infection by reverse transcriptase polymerase chain reaction (RT-PCR) assay and respiratory failure requiring invasive mechanical ventilation. Samples were obtained during clinically indicated bronchoscopy performed for airway clearance or for percutaneous tracheostomy placement. Surviving subjects signed informed consent to participate in this study. Samples and metadata from subjects who died or were incapacitated were de-identified and included in this study. Comprehensive demographic and clinical data were collected. We also collected longitudinal data on clinical laboratory culture results and treatment. Supplementary figure 1 shows the distribution of subjects and sampling strategy used for this study. The study protocol was approved by the Institutional Review Board of New York University.
Lower airway bronchoscopic sampling procedure
Both background and supraglottic (buccal) samples were obtained prior to the procedure, as previously described64. The background samples were obtained by passing sterile saline through the suctioning channel of the bronchoscope prior to the procedure. Bronchoalveolar lavage (BAL) samples were obtained from one lung segment as per discretion of the treating physician as clinically indicated. Samples were then transferred to a BSL3 laboratory for processing. Once there, 2 mL of whole BAL was stored in a tube prefilled with 2 mL of Zymo Research’s DNA/RNA Shield™ (R1100-250, https://www.zymoresearch.com/pages/covid-19-efforts) for RNA/DNA preservation and virus inactivation. In addition, background control samples (saline passed through the bronchoscope prior to bronchoscopy) and supraglottic aspirates were stored in the same RNA/DNA shield. A subset of samples underwent BAL cell separation by centrifugation and cells were cryopreserved in DMSO while acellular BAL fluid was aliquoted for cytokine measurements. A paired blood sample was also obtained in EDTA tubes (Becton Dickinson, ref# 366450) and PAXgene Blood RNA tubes (PreAnalytiX) ref# 762165).
Viral load detection targeting the N gene
SARS-CoV-2 viral load was measured by quantitative real-time reverse transcription polymerase chain reaction (rRT-PCR) targeting the SARS-CoV-2 nucleocapsid (N) gene and an additional primer/probe set to detect the human RNase P gene (RP). Assays were performed using Thermo Fisher Scientific (Waltham, MA) TaqPath 1-Step RT-qPCR Master Mix, CG (catalog number A15299) on the Applied Biosystems (Foster City, CA) 7500 Fast Dx RealTime PCR Instrument. Using the positive controls provided by the CDC, which are normalized to 1000 copies/mL, we converted the different Ct positive to copies/mL. This was done using the DDCT method, applying the formula: Power [2, (CT (sample, N1 gene) - CT (PC, N1 gene)] – [CT (sample, RP gene) - CT (PC, RP gene)]*1000.
SARS-CoV-2 viral viability through measurement of subgenomic transcripts
Viral subgenomic mRNA (sgRNA) is transcribed in infected cells and is not packaged into virions. Thus, presence of sgRNA is indicative of active infection of a mammalian cell in samples. We therefore measure sgRNA in all BAL samples obtained targeting the E gene as previously described.21,22 Briefly, five µl RNA was used in a one-step real-time RT-PCR assay to sgRNA (forward primer 5’- CGATCTCTTGTAGATCTGTTCTC-3'; reverse primer 5’- ATATTGCAGCAGTACGCACACA-3'; probe 5’-FAM-ACACTAGCCATCCTTACTGCGCTTCG-ZEN-IBHQ-3') and using the Quantifast Probe RT-PCR kit (Qiagen) according to instructions of the manufacturer. In each run, standard dilutions of counted RNA standards were run in parallel to calculate copy numbers in the samples.
DNA/RNA isolation, library preparation and sequencing
DNA and RNA were isolated in parallel using zymoBIOMICS™ DNA/RNA Miniprep Kit (Cat: R2002) as per manufacturer's instructions. DNA was then used for whole genome shotgun (WGS) sequencing using it as input into the NexteraXT library preparation kit following the manufacturer’s protocol. Libraries were purified using the Agencourt AMPure XP beads (Beckman Coulter, Inc.) to remove fragments below 200 bp. The purified libraries were quantified using the Qubit dsDNA High Sensitivity Assay kit (Invitrogen) and the average fragment length for each library was determined using a High Sensitivity D1000 ScreenTape Assay (Agilent). Samples were added in an equimolar manner to form two sequencing pools. The sequencing pools were quantified using the KAPA Library Quantification Kit for Illumina platforms. The pools were then sequenced on the Illumina Novaseq 6000 in one single run. For RNA sequencing, RNA quantity and integrity were tested with a BioAnalyzer 2100 (Agilent). Among bronchoscope control (BKG) samples, only 5 yielded RNA with sufficient quality and quantity to undergo library preparation and sequencing. The automated Nugen Ovation Trio Low Input RNA method was used for library prep with 3ng total RNA input of each sample. After 6 amplification cycles, samples were sequenced using 2x Novaseq 6000 S4 200 cycle Flowcells using PE100 sequencing.
Microbial community characterization using whole genome shotgun sequencing (WGS) and RNA metatranscriptome
For all metagenomic and metatranscriptomic reads, Trimmomatic v0.3684, with leading and trailing values set to 3 and minimum length set to 36, was used to remove adaptor sequences. All rRNA reads were then removed from the metatranscriptomic reads using SortMeRNA v4.2.085 with default settings. Metagenomic and filtered metatranscriptomic reads were mapped to the human genome using Bowtie2 v2.3.4.186 with default settings and all mapping reads were excluded from subsequent microbiome, mycobiome, and virome metagenomic and metatranscriptomic analysis. Technical replicates for each biological sample were pooled together for subsequent analyses. Taxonomic profiles for all metagenomic and metatranscriptomic samples were generated using Kraken v2.0.787 and Bracken v2.5 [https://doi.org/10.7717/peerj-cs.104] run with default settings. The database used for quantifying taxonomic profiles was generated using a combined database containing human, bacterial, fungal, archaeal, and viral genomes downloaded from NCBI RefSeq on January 8, 2021. Additionally, genomes for Candida auris (Genbank: GCA_003013715.2, GCA_008275145.1) and Pneumocystic jirovecii (Genbank: GCA_001477535.1) were manually added to the database. Differentially abundant bacterial and viral taxa were identified for the BAL and UA samples groups individually using DESeq2 v1.28.188 with the three group clinical outcome meta-data readouts set as the sample groupings. Significantly differentially abundant taxa contained at a minimum an aggregate of 5 reads across samples and had an FDR <0.289,90.
For functional microbial profiling, processed sequencing reads were further depleted of human-mapping reads by removing all reads classified as human by Kraken v2.0.787 using KrakenTools v0.1-alpha (https://github.com/jenniferlu717/KrakenTools). FMAP v0.1591 was run on both the metagenomic and metatranscriptomic reads to profile the metabolic pathways present in each sample. FMAP_mapping.pl paired with diamond v0.9.2492 and FMAP_quantification.pl were used with default settings to identify and quantify proteins in the Uniref90 database. Using DESeq2 v1.28.188, differentially expressed genes were identified for the BAL samples individually using the three group clinical outcome-metadata readouts for all genes that had an aggregate 5 reads across all samples.
Antibiotic resistance genes were quantified in all metagenome and metatranscriptome samples using Salmon v1.3.093 run with --keepDuplicates for indexing and --libtype A --allowDovetail --meta for quantification. Genes were filtered such that only genes that actively conferred antibiotic resistance were kept. To assess differentially expressed classes of antibiotic resistance genes, gene counts for individual antibiotic resistance genes were collapsed by their conferred antibiotic resistance.
Supplementary Figure 1 shows a summary of depth achieved with the parallel WGS and metatranscriptome approach across sample types and the number of reads assigned to different microbial subfractions (bacteria, fungi, DNA viruses, RNA viruses and phages). Further analysis was also done to identify possible contaminants in the metatranscriptome and metagenome datasets. To this end, we compared the relative abundance of taxa between background bronchoscope control and BAL samples. Taxa with median relative abundance greater in background than in BAL were identified as probably contaminant and listed in Supplementary Table 4). None of the taxa identified as possible contaminants were removed from the analyzed data but are shown for comparison with signatures identified in the rest of the analyses.
Anti-Spike SARS-CoV-2 antibody profiling in BAL
BAL samples were heat-treated at 56°C for one hour, and centrifuged at 14000g for 5 min. The supernatant was collected and diluted 50-fold in PBST containing 1% skim milk. The diluted samples were incubated at room temperature (R.T.) for 30 min with QBeads DevScreen: SAv (Streptavidin) (Sartorius 90792) that had been loaded with biotinylated Spike, biotinylated RBD or biotin (negative control) in wells of a 96 well HTS filter plate (MSHVN4550). As positive controls, we used CR3022 antibody, that recognizes SARS-CoV-2 Spike and RBD, in human IgG, IgA and IgM formats (Absolute Antibody). After washing the beads, bound antibodies were labeled with anti IgG-DyLight488, anti IgA-PE and anti IgM-PECy7, and the fluorescence intensities were measured in Intellicyt IQue3 (Sartorius). The acquired data [median fluorescence intensity (MFI)] were normalized using the MFI values of the CR3022 antibodies to compensate for variations across plates. Supplementary Figure 10 shows that the levels of these antibodies were higher in BAL samples of patients with SARS-CoV-2 than in BAL samples from 10 uninfected healthy smokers recruited for research bronchoscopy. Details of method development and validation will be described elsewhere (Koide et al. in preparation).
SARS-CoV-2 preparation and neutralization assay
icSARS-CoV-2-mNG (isolate USA/WA/1/2020, obtained from the UTMB World Reference Center for Emerging Viruses and Arboviruses) was amplified once in Vero E6 cells (P1 from the original stock). Briefly, 90-95% confluent T175 flask (Thomas Scientific) of Vero E6 (1x107 cells) was inoculated with 50 μL of icSARS-CoV-2-mNG in 5 mL of infection media (DMEM, 2% FBS, 1% NEAA, and 10 mM HEPES) for 1 hour. After 1 hour, 20 mL of infection media was added to the inoculum and cells were incubated 72 hours at 37 °C and 5% CO2. After 72 hours, the supernatant was collected and the monolayer was frozen and thawed once. Both supernatant and cellular fractions were combined, centrifuged for 5 min at 1200 rpm, and filtered using a 0.22 μm Steriflip (Millipore). Viral titers were determined by plaque assay in Vero E6 cells. In brief, 220,000 Vero E6 cells/well were seeded in a 24 well plate, 24 hours before inoculation. Ten-fold dilutions of the virus in DMEM (Corning) were added to the Vero E6 monolayers for 1 hour at 37 °C. Following incubation, cells were overlaid with 0.8% agarose in DMEM containing 2% FBS (Atlanta biologicals) and incubated at 37 °C for 72 h. The cells were fixed with 10% formalin, the agarose plug removed, and plaques visualized by crystal violet staining. All procedures including icSARS-CoV-2-mNG virus were performed using Biosafety Level 3 laboratory conditions.
For SARS-CoV-2 neutralization assays, Vero E6 cells (30,000 cells/well) were seeded in a 96 well plate 24 h before infection. Two-fold serial dilutions of BAL lysates were mixed with mixed 1:1 (vol/vol) with SARS-CoV-2 mNG virus (multiplicity of infection, MOI 0.5), and incubated for 1 h at 37 °C. After incubation, 100 μL of the mixtures of the antibody and SARS-CoV-2 mNG were added to the Vero E6 monolayers, and cells were incubated at 37°C. After 20 h, cells were fixed with 4 % formaldehyde (Electron Microscopy Sciences) at room temperature for 1 h. After fixation, cells were washed twice with PBS and permeabilized with 0.25% triton-100, stained with DAPI (Thermo), and quantified on a CellInsight CX7 High-content microscope (Thermo) using a cut-off for three standard deviations from negative to be scored as an infected cell.
Transcriptome of BAL cells
RNA-Seq was performed on bronchial epithelial cells obtained by airway brushing, as described94-96, using the Hi-seq/Illumina platform at the NYU Langone Genomic Technology Center (data available at Sequence Read Archive: # PRJNA592149). KEGG97,98 annotation was summarized at levels 1 to 3. Genes with an FDR-corrected adjusted p-value <0.25 were considered significantly differentiated, unless otherwise specified. Pathway analysis using differentially regulated genes (FDR<0.25) was done using Ingenuity Pathway Analysis, RRID:SCR_0- at least 1 count per million in at least two samples were retained. For digital cytometry with CIBERSORTx, a signature matrix derived from single-cell transcriptome of BAL cells collected from patients with COVID-1933 was first generated with the “Create Signature Matrix” module in the CIBERSORTx online tool. A maximum of 10 cells per cell type per patient were initially sampled from the original data and 20 cells per cell type were then used to build the single-cell reference with the default parameters. Then the “Impute Cell Fractions” module was used to estimate the absolute cell fraction score of different cell types in bulk transcriptomes using the single-cell signatures with “S-mode” batch correction and 100 permutations in the absolute mode. Bulk transcriptomes with a significant deconvolution p-value (≤0.05) were retained. For xCell cell type signature enrichment analysis, the enrichment scores were inferred with built-in signature of cell types detected in the BAL samples as reported previously 33. The two-tailed Wilcoxon rank sum test with Benjamini-Hochberg correction were computed between groups of samples for comparison.
Microbial and Host predictive modeling
Cox proportional hazards model was used for investigating the association between the time to death and the relative abundance of each taxon quantified using metatranscriptomic and metagenomic data separately. We first performed the univariate screening test to identify significant features associated with the time to death using the Cox proportion hazards regression model for the relative abundance of taxa from the RNA and DNA data, and log-transformed count of host transcriptome data, respectively. Within each type of data, given the p-value cutoff, the features with a p-value less than the cutoff were selected and integrated as a sub-community. For the RNA and DNA data, the alpha diversity (Shannon index) was calculated for each sample on the selected sub-community and the negative of the value was defined as the microbial risk score, because high alpha diversity indicates low risk of death. For the host transcriptome data, the log-transformed total count of all selected candidate transcriptome for each sample was defined as the risk score, since most selected candidate transcriptomes increased the risk of death. The leave-one-out cross-validation (LOOCV) was used for the predictions. The p value cutoff was set at the value which produces the largest AUC (area under the receiver operating characteristic curve) in predicting the death/survival status using the risk score we constructed over these features. The additive model was used to integrate when more than one scores are used for the prediction.
Multiscale and co-expression network analyses
Raw counts from the human transcriptome were normalized and converted to log2-counts per million using limma99/voom100 (v3.44.1 with R v4.0.0) with standard parameters. Microbiome abundance information was converted to relative abundance. Low abundance taxa were removed based on average abundance across all samples to yield a minimum of 1000 taxa for each metatranscriptome dataset. All datasets were batch adjusted. Differentially expressed genes (DEGs) and differentially abundant taxa were called using the DESeq2 package88 (v1.28.1), based on the negative binomial (i.e. Gamma-Poisson) distribution. According to the recommendation by the authors, we used non-normalized data (i.e. raw gene counts and abundance data), as DESeq2 internally corrects data and performs normalization steps. For this purpose, raw microbiome abundance data were converted to DESeq2 dds objects using the phyloseq R library (V1.32.0). Contrasts are based on outcome groups (≤ 28 days MV, > 28 days MV or death). Differentially expressed genes and differentially abundant tax with FDR of 0.2 or less are considered significant.
Multiscale Embedded Gene Co-Expression Network Analysis (MEGENA) 34 was performed to identify host modules of highly co-expressed genes in SARS-CoV-2 infection. The MEGENA workflow comprises four major steps: 1) Fast Planar Filtered Network construction (FPFNC), 2) Multiscale Clustering Analysis (MCA), 3) Multiscale Hub Analysis (MHA), 4) and Cluster-Trait Association Analysis (CTA). The total relevance of each module to SARS-CoV-2 infection was calculated by using the Product of Rank method with the combined enrichment of the differentially expressed gene (DEG) signatures as implemented: , where, is the relevance of a consensus j to a signature i; and is defined as , where is the ranking order of the significance level of the overlap between the module j and the signature.
To functionally annotate gene signatures and gene modules derived from the host transcriptome data, we performed an enrichment analysis of the established pathways and signatures¾including the gene ontology (GO) categories and MSigDB. The hub genes in each subnetwork were identified using the adopted Fisher’s inverse Chi-square approach in MEGENA; Bonferroni-corrected p-values smaller than 0.05 were set as the threshold to identify significant hubs. The correlation between modules, modules and clinical traits as well as modules and individual taxa were performed using Spearman correlation. Other correlation measures, such as Pearson correlation or the Maximal Information Coefficient (MIC)101 proved to be inferior for this task. Categorical trait data was converted to numerical values as suitable.
Data availability
Sequencing data are available in NCBI’s Sequence Read Archive under project numbers PRJNA688510 and PRJNA687506 (RNA and DNA sequencing, respectively). Codes used for the analyses presented in the current manuscript are available at https://github.com/segalmicrobiomelab/SARS_CoV2.