Patient enrollment and sample collection
This study leveraged data from the IMPACC cohort17,18, which enrolled 1286 participants from 20 hospitals across 15 medical centers in the United States between May 5th, 2020 and March 19th, 2021. Eligible participants were participants hospitalized with SARS-CoV-2 infection confirmed by RT-PCR and symptoms or signs consistent with COVID-19. Solid organ transplant (SOT) patients were identified by review of medication list for immunosuppressive medications. Patients identified as SOT recipients were confirmed by chart review to verify transplant status and organ type. We conducted a case-control study of patients within the IMPACC cohort, matching all 86 solid organ transplant (SOT) recipients 2:1 by age, sex and study site with 172 immunocompetent controls. The detailed study design and schedule for clinical data and biologic sample collection, and shared core platform assessments were previously described16,18. Detailed clinical assessments and sampling of blood and upper respiratory tract were performed within ~72 hours of hospitalization (Visit 1), and on approximately Days 4, 7, 14, 21, 28 after hospital admission. As previously described18, biological sample collection and processing followed a standard protocol utilized by every participating academic institution.
Ethics
NIAID staff conferred with the Department of Health and Human Services Office for Human Research Protections (OHRP) regarding potential applicability of the public health surveillance exception [45CFR46.102(l)(2)] to the IMPACC study protocol. OHRP concurred that the study satisfied criteria for the public health surveillance exception, and the IMPACC study team sent the study protocol, and participant information sheet for review, and assessment to institutional review boards (IRBs) at participating institutions. Twelve institutions elected to conduct the study as public health surveillance, while three sites with prior IRB-approved biobanking protocols elected to integrate and conduct IMPACC under their institutional protocols (University of Texas at Austin, IRB 2020-04-0117; University of California San Francisco, IRB 20-30497; Case Western reserve university, IRB STUDY20200573) with informed consent requirements. Participants enrolled under the public health surveillance exclusion were provided information sheets describing the study, samples to be collected, and plans for data de-identification, and use. Those that requested not to participate after reviewing the information sheet were not enrolled. Participants did not receive compensation for study participation while hospitalized, and subsequently were offered compensation during outpatient follow-up.
Common statistical analyses framework
Deidentified quality assured raw data was obtained from the IMPACC study and made publicly available17,18. All data analyses employed R v4.0.2. For each data type, we investigated the behavior of features both at Visit 1 (within 72 hours of hospital admission for most of the participants) and longitudinally for scheduled visits (Visits 1-6, up to 30 days post-hospital admission, both inpatient and outpatient samples, and excluding escalation samples). For Visit 1 analyses, we used linear modeling with age as a continuous variable and controlled for sex and baseline respiratory severity. Longitudinal measures of the WHO 7-point severity ordinal scale over time were clustered into five trajectory groups (TG) using group-based trajectory modeling, a likelihood-based approach commonly used to group time series of clinical data, as described previously16. For the severity analysis, we defined mild participants as those with trajectory group (TG) 1-3, and severe participants as those with TG 4-5.
For longitudinal analysis of SARS-CoV-2 nasal viral rpM and serum anti-Spike IgG, we used generalized additive models with mixed effects from the package gamm4 (v0.2.6) to evaluate the effects of age while controlling for sex and TG. Generalized additive modeling was preferred for these features due to their non-linear trajectories as previously reported. For all other data types, we used linear mixed effects models from the package lme4 (v1.1.25). P values in all analyses were adjusted with Benjamini-Hochberg correction.
Analysis of nasal metatranscriptomics data
Taxonomic alignments from nasal metatranscriptomics data were obtained from raw fastq files using the CZ-ID pipeline25, which first removes human sequences via subtractive alignment against human genome build 38, followed by quality and complexity filtering. Subsequently, reference-based taxonomic alignment at both the nucleotide and amino acid levels against sequences in the National Center for Biotechnology Information (NCBI) nucleotide (NT) and non-redundant (NR) databases, respectively, is carried out, followed by assembly of the reads matching each taxon. Taxa were aggregated to the genus and higher phylogenetic levels from NCBI for analyses26. For all analyses using SARS-CoV-2 viral rpM, log transformation of total reads per million (rpM) aligned to the Beta-coronavirus genus was used. Alpha diversity (Shannon Diversity Index) was calculated using the vegan package v.2.6 in R. Differential abundance analyses for Visit 1 samples were performed using linear mixed effect modeling (using the R package nlme v3.1-162) to evaluate SOT effect on individual taxon levels at the genus, family, class, order, phylum, and superphylum levels (rpMs from lower taxon levels were summed to create higher phylogenetic level rpM), using the following formula:
Taxon_abundance ~ transplant_status with random effects of (1|enrollment_site)
Principle coordinate analysis (PCoA) of Bray-Curtis dissimilarity index was performed on Visit 1 nasal metatranscriptomics samples, with significance calculated with Adonis using the R package vegan (v2.6).
Analysis of SARS-CoV-2 viral abundance
SARS-CoV-2 viral abundance was calculated as log10(rpM+1), where rpM is the reads per million of SARS-CoV-2 as measured by nasal metatranscriptomics. The viral rpM in each organ transplant type was compared using a likelihood ratio test on the null and alternative models:
Null: viral_rpm ~ 1
Alternative: viral_rpm ~ organ_type
Where viral_rpm was the log10-transformed viral rpM, and organ_type was the organ transplant type. Longitudinal analysis of SARS-CoV-2 viral rpM was performed using the gamm4 function from the gamm4 package (v0.2.6), using the following formula:
viral_rpm ~ s(event_date, bs="cr") + s(event_date, bs="cr", by=transplant) + transplant
with random effects (1|pid), ie, participant random intercept. In the formula, viral_rpm was the log10-transformed viral rpM as described above, event_date was the number of days post hospitalization, transplant is the transplant status. P-value was calculated using the Chi-squared test on the gam component’s reference degrees of freedom and F-statistics.
Analysis of SARS-CoV-2 antibody titers
Antibody levels against the recombinant SARS-CoV-2 spike protein receptor-binding domain (RBD) were measured using a research-grade enzyme-linked immunosorbent assay (ELISA) as described16. The optical density (OD) was measured and the area under the curve (AUC) was calculated, considering 0.15 OD as the cutoff. For the Visit 1 analysis, we log2-transformed the AUC values and modeled them with linear regression. For the longitudinal analysis, we also log2-transformed the AUC values, and used the linear mixed effects models (to fit the null and alternative models:
Null: z ~ event_date + transplant + (1|pid)
Alternative: z ~ event_date + transplant + event_date:transplant + (1|pid)
Where z is the log2-transformed AUC, event_date is the number of days post-admission, transplant is the transplant status, and (1|pid) is the participant random intercept. The P-values were calculated using likelihood ratio test, and adjusted with Benjamini-Hochberg correction. For visualization of longitudinal antibody levels, data was fit to a third-order polynomial.
Analysis of PBMC and nasal RNA-seq data
RNA was extracted from PBMC and inferior nasal turbinate swabs and gene expression levels were quantified by RNA-seq as previously described17. For all RNA-seq analyses, we retained protein-coding genes that had a minimum of 10 counts in at least 20% of the samples. We normalized the gene counts using the voom function (normalize.method = “quantile”) from the limma package v3.46.0, fitted a linear model for the gene expression with lmFit function (default settings), calculated the empirical Bayes statistics with eBayes function (default settings), and calculated the P values for differential expression controlling for FDR. We controlled for log-transformed viral rpM in certain analyses when indicated.
For longitudinal analyses, we accounted for repeat measures from the same individual using duplicateCorrelation from the limma package, and modelled the interaction between days post-admission and transplant status as follows:
z ~ event_date + transplant + event_date:transplant
Where z is the log-transformed normalized expression count, event_date is the number of days post-admission, and transplant is the transplant status.
Fold-change values for Visit 1 analyses, representing the fold-change of transplant patients over control patients, and longitudinal analyses, representing the interaction term of days post-admission and transplant status, were used as the input for Gene Set Enrichment Analysis (GSEA). We used the gsePathway function from the ReactomePA v1.42.0 package to search for enriched pathways in the Reactome database, with minimum and maximum geneSet sizes of 3 and 1000, respectively.
For analysis of the relationship between interferon signaling and viral rpM at Visit 1, we first subset the total PBMC and nasal RNA-Seq data to genes within the Reactome Interferon Signaling pathway (R-HSA-913531, n=308). We then split the data by transplant status and modelled the relationship between interferon signaling gene expression and log2-transformed viral rpM for controls and transplant recipients separately, using the approach described above.
Analysis of CyTOF data
PBMCs were phenotyped on the Fluidigm Helios mass cytometer using a panel of 46 surface and intracellular markers, and the cell types were annotated using an automated annotation pipeline as previously described16. Prior to analysis, we removed cells identified as red blood cells, multiplets, debris, and those that were not identifiable with high confidence. These counts were converted to proportions per sample, by dividing each cell type count by the total cell count. The minimum proportion per cell type across all samples was added to each sample prior to log2-transformation, to avoid taking the logarithm of zeros.
For the Visit 1 analysis, the log2-transformed cell type proportions were modeled with linear regression. For the longitudinal analysis, the log2-transformed cell type proportions were modelled with linear mixed effects models (to fit the null and alternative models:
Null: z ~ event_date + transplant + (1|pid)
Alternative: z ~ event_date + transplant + event_date:transplant + (1|pid)
Where z is the log2-transformed cell type proportion, event_date is the number of days post-admission, transplant is the transplant status, and (1|pid) is the participant random intercept. The P-values were calculated using likelihood ratio test, and adjusted with Benjamini-Hochberg correction.
Analysis of serum inflammatory protein (Olink) data
All samples were processed with the Olink multiplex assay inflammatory panels (Olink Proteomics), according to the manufacturer’s instructions and as previously described16. This inflammatory panel included 92 proteins associated with human inflammatory conditions. Target protein quantification was performed by real-time microfluidic qPCR via the Normalized Protein Expression (NPX) manager software. Data were normalized using internal controls in every sample, inter-plate control and negative controls, and correction factor and expressed as log2 scale proportional to the protein concentration. For additional quality control, we set any NPX measurements below the assay’s limit of detection (LOD) to zero. Next, we excluded proteins that were detected in fewer than 20% of samples, resulting in 84 proteins for analysis.
For the Visit 1 analysis, we standardized the NPX values and modeled them with linear regression, with and without adjusting for SARS-CoV-2 viral rpM. The viral rpM was also calculated as log10(rpM+1). We then fit the following linear model:
z ~ transplant + transplant:TG
Where z is the standardized protein level, transplant is the SOT status, and transplant:TG is the interaction term between SOT status and disease severity. This formulation allows us to find two separate coefficients (i.e., two separate log fold-change values) for the effects of severity, one for the SOT group and one for the control group.
For the longitudinal analysis, we also standardized the NPX values, and used the linear mixed effects models to fit the null and alternative models:
Null: z ~ event_date + transplant + (1|pid)
Alternative: z ~ event_date + transplant + event_date:transplant + (1|pid)
Where z is the standardized protein level, event_date is the number of days post-hospital admission, transplant is the transplant status, and (1|pid) is the participant random intercept. The P-values were calculated using a likelihood ratio test, and adjusted with Benjamini-Hochberg correction.