DRASTIC study participants and specimens. We enrolled 60 SARS-CoV-2 PCR-positive patients admitted to Austin Health (Victoria, Australia) and six PCR-negative patients as negative serological controls. Two COVID-19 patients and three SARS-CoV-2 PCR-negative patients died during the study. Peripheral blood was collected in heparinized or ethylenediaminetetraacetic acid (EDTA) tubes during hospitalization. Peripheral blood monocular cells (PBMCs) were isolated via Ficoll-Paque separation. Single cell suspensions were isolated from tissues as previously described44, 45. ETA samples were obtained as part of routine suctioning of the endotracheal tube airway and involved the passage of a catheter for suctioning into a sterile sputum trap. Sputum samples were spontaneously collected into a sterile container. Pleural fluid was collected by thoracentesis as part of a routine procedure. The thoracentesis involved percutaneous insertion of a catheter into the pleural space and collection of pleural fluid into a sterile container. Demographic, clinical and sampling information for COVID-19 patients are described in Supplementary Table 1.
Ethics statement. Experiments conformed to the Declaration of Helsinki Principles and the Australian National Health and Medical Research Council Code of Practice. Written informed consent was obtained from all blood donors prior to the study. The study was approved by the Austin Health (HREC/63201/Austin-2020) and the University of Melbourne (#2057366.1, #2056901.1 and #1955465.3) Human Research Ethics Committees.
Genomic sequencing and bioinformatic analysis. Extracted RNA from RT-PCR positive samples underwent tiled amplicon PCR and Illumina short-read sequencing, quality control, consensus sequence generation and alignment as previously described46. A single sequence per patient was used for phylogenetic analysis22, with a maximum-likelihood phylogenetic tree generated using IQ-Tree47 and visualized using the ggtree package in R48. Genomic clusters were defined using a hierarchical clustering algorithm; genomic transmission networks grouped multiple clusters supported by epidemiological and genomic data.
Phenotypic whole blood immune analyses. Fresh whole blood (200μl per stain) was used to measure CD4+CXCR5+ICOS+PD1+ follicular T cells (Tfh) and CD3-CD19+CD27hiCD38hi antibody-secreting B cell (ASC; plasmablast) populations as described15, 49 as well as activated HLA-DR+CD38+CD8+ and HLA-DR+CD38+CD4+ T cells, intermediate CD14+CD16+ and classical CD14+ monocytes, activated CD3-CD56+ NK cells, MAIT cells, ɣδ-T cells, as per the specific antibody panels (Supplementary Table 6; gating strategy is presented in Supplementary Fig. 4b, c). After whole blood was stained for 20 minutes at room temperature in the dark, samples were lysed with BD FACS Lysing solution, washed and fixed with 1% PFA. AccuCheck Counting Beads (Thermo Fisher Scientific) were added for calculating absolute numbers just prior to acquisition. All samples were acquired on a LSRII Fortessa (BD). Flow cytometry data were analyzed using FlowJo v10 software.
Phenotypic immune analyses in respiratory samples. Respiratory samples (ETA, sputum, or pleural fluid) were diluted in PBS and centrifuged. Supernatant was collected and the pellet was filtered through a 45μm filter prior to separation of respiratory fluid and cellular contents. The respiratory fluid was frozen at -20ºC, and the cell pellet was washed with EDTA-BSS. Washed cells were stained with FcR block (Miltenyi Biotec) for 15 minutes followed by 30 minutes staining on ice with specific antibody panels (Supplementary Table 6). After fixing with 1% PFA, the samples were acquired on a LSRII Fortessa (BD). AccuCheck Counting Beads were added for calculating absolute numbers just prior to acquisition. Flow cytometry data were analyzed using FlowJo v10 software.
SARS-CoV-2 RBD ELISA. RBD-specific ELISA for detection of IgM, IgG and IgA antibodies was performed as previously described31, 50, 51, using flat bottom Nunc MaxiSorp 96-well plates (Thermo Fisher Scientific) for antigen coating (2µg/ml), blocking with PBS (with w/v 1% BSA) and serial dilutions in PBS (with v/v 0.05% Tween and w/v 0.5% BSA). Inter- and intra-experimental measurements were normalised using a positive control plasma from a COVID-19 patient run on each plate. Endpoint titres were determined by interpolation from a sigmodial curve fit (all R-squared values >0.95; GraphPad Prism 9) as the reciprocal dilution of plasma that produced >15% (for IgA and IgG) or >30% (for IgM) absorbance of the positive control at a 1:31.6 (IgG and IgM) or 1:10 dilution (IgA). Seroconversion was defined when titres were above the mean titre (plus 2 standard deviations) of non-COVID-19 control respiratory or plasma samples.
Microneutralization assay. Microneutralization activity of serum samples was assessed as previously described52. SARS-CoV-2 isolate CoV/Australia/VIC01/202053 was propagated in Vero cells and stored at -80°C. Sera were heat-inactivated at 56°C for 30 min and serially diluted. Residual virus infectivity in the serum/virus mixtures was assessed in quadruplicate wells of Vero cells incubated in serum-free media containing 1μg/ml of TPCK trypsin at 37°C and 5% CO2. Viral cytopathic effect was read on day 5. The neutralizing antibody titer was calculated using the Reed-Muench method52.
SARS-CoV-2 surrogate virus neutralisation test (sVNT). The plasma samples were tested in neat, and the respiratory samples were tested at 1:9 dilution or at their original dilutions for more diluted samples. The sVNT blocking ELISA assay (manufactured by GenScript, NJ, USA) was carried out essentially as described51, which detects circulating neutralizing SARS-CoV-2 antibodies that block the interaction between RBD and ACE2 on the cell surface receptor of the host. HRP-conjugated recombinant SARS-CoV-2 RBD fragment bound to any circulating neutralizing antibodies to RBD preventing capture by the human ACE2 protein in the well, which was subsequently removed in the following wash step. Substrate reaction incubation time was 20 mins at room temperature and results were read spectrophotometrically. Colour intensity was inversely dependent on the titre of anti-SARS-CoV-2 neutralizing antibodies.
Coupling of carboxylated beads. As previously described25, a custom multiplex bead array was designed and coupled with SARS-CoV-2 spike 1 (Sino Biological), spike 2 (ACRO Biosystems), RBD (BEI Resources) and nucleoprotein (ACRO Biosystems), as well as SARS and hCoV (229E, NL63, HKU1, OC43) spikes and nucleoproteins (Sino Biological) (Supplementary Fig. 5). In addition, SARS-CoV-2 and HKU-1 spike trimers (kind gifts from Adam K. Wheatley), as well as SARS-CoV and NL63 spike trimers (BPS Bioscience) were also coupled. Tetanus toxoid (Sigma-Aldrich), influenza hemagglutinin (H1Cal2009; Sino Biological) and SIV gp120 (Sino Biological) were also included in the assay as positive and negative controls respectively. Antigens were covalently coupled to magnetic carboxylated beads (Bio Rad) using a two-step carbodiimide reaction and blocked with 0.1% BSA, before being resuspended and stored in PBS 0.05% sodium azide.
Luminex bead-based multiplex assay. Using the coupled beads mentioned above, a custom CoV multiplex assay was formed to investigate the isotypes and subclasses of pathogen-specific antibodies present in collected plasma samples25. Briefly, 20µl of working bead mixture (1000 beads per bead region) and 20µl of diluted plasma (final dilution 1:200) or 20µl of diluted respiratory secretions (final dilution 1:800) were added per well and incubated overnight at 4°C on a shaker. Fourteen different detectors were used to assess pathogen-specific antibodies. Single-step detection was done using phycoerythrin (PE)-conjugated mouse anti-human pan-IgG, IgG1-4 and IgA1-2 (Southern Biotech; 1.3µg/ml, 25µl/well). C1q protein (MP Biomedicals) was first biotinylated (Thermo Fisher Scientific), then tetramerized with Streptavidin R-PE (SAPE; Thermo Fisher Scientific) before dimers or tetrameric C1q-PE were used for single-step detection. For the detection of FcγR-binding, soluble recombinant FcγR dimers (higher affinity polymorphisms FcγRIIa-H131, lower affinity polymorphisms FcγRIIa-R131, FcγRIIb, higher affinity polymorphisms FcγRIIIa-V158, lower affinity polymorphisms FcγRIIIa-F158; 1.3µg/ml, 25µl/well; kind gifts from Bruce D. Wines and P. Mark Hogarth) were first added to the beads, washed, and followed by the addition of SAPE. For the detection of IgM, biotinylated mouse anti-human IgM (mab MT22; Mabtech; 1.3µg/ml, 25µl/well) was first added to beads, washed, followed by SAPE. Assays were read on the Flexmap 3D and performed in duplicates.
Data normalization. For all multivariate analysis, Tetanus, H1Cal2009, and BSA antigens (positive controls) were removed, as well as SIV (negative control). Low signal features were removed when the 75th percentile response for the feature was lower than the 75th percentile of the BSA positive control. Right shifting was performed on each feature (detector–antigen pair) individually if it contained any negative values, by adding the minimum value for that feature back to all samples within that feature. Following this, all data were log-transformed using the following equation, where x is the right-shifted data and y is the right-shifted log-transformed data: y = log10(x + 1). This process transformed the majority of the features to having a normal distribution. In all the subsequent multivariate analyses, the data were furthered normalized by mean centering and variance scaling each feature using the z-score function in Matlab. Plasma and respiratory samples were analysed separately. When analysing samples at time of hospital discharge, to adjust for the confounder of time from symptom onset, each of the features were iteratively regressed with ordinary least squares regression, using the residuals as input for the analysis54.
Feature selection using Elastic Net/PLSDA. To determine the minimal set of features (signatures) needed to predict categorical outcomes (COVID-19 diagnosis, NIH scores, drug therapies), a three-step process was developed55. First, the data were randomly sampled without replacement to generate 2000 subsets. The resampled subsets spanned 80% of the original sample size, or sampled all classes at the size of the smallest class for categorical outcomes, which corrected for any potential effects of class size imbalances during regularization. Elastic-Net regularization was then applied to each of the 2000 resampled subsets to reduce and select features most associated with the outcome variables. The Elastic-Net hyperparameter, α, was set to have equal weights between the L1 norm and L2 norm associated with the penalty function for least absolute shrinkage and selection (LASSO) and ridge regression, respectively 56. By using both penalties, Elastic-Net provides sparsity and promotes group selection. The frequency at which each feature was selected across the 2000 iterations was used to determine the signatures by using a sequential step-forward algorithm that iteratively added a single feature into the PLSDA model starting with the feature that had the highest frequency of selection, to the lowest frequency of selection. Model prediction performance was assessed at each step and evaluated by 10-fold cross-validation classification error for categorical outcomes. The model with the lowest classification error within a 0.01 difference between the minimum classification error was selected as the minimum signature. If multiple models fell within this range, the one with the least number of features was selected and if there was a large disparity between calibration and cross-validation error (over-fitting), the model with the least disparity and best performance was selected.
PLSDA. Partial least squares discriminant analysis (PLSDA), performed in Eigenvectors PLS toolbox in Matlab, was used in conjunction with Elastic-Net, described above, to identify and visualize signatures that distinguish categorical outcomes (COVID-19 diagnosis, NIH scores, drug therapies). This supervised method assigns a loading to each feature within a given signature and identifies the linear combination of loadings (a latent variable, LV) that best separates the categorical groups. A feature with a high loading magnitude indicates greater importance for separating the groups from one another. Each sample is then scored and plotted using their individual response measurements expressed through the LVs. The scores and loadings can then be cross-referenced to determine which features are loaded in association with which categorical groups (positively loaded features are higher in positively scoring groups, etc.). All models go through 10-fold cross-validation, where iteratively 10% of the data is left out as the test set, and the rest is used to train the model. Model performance is measured through calibration error (average error in the training set) as well as cross-validation error (average error in the test set), with values near 0 being best. All models were orthogonalized to enable clear visualization of results.
Hierarchical clustering. We visualized the clustering of DRASTIC respiratory and blood samples based on only SARS-CoV-2 antigens or all features using unsupervised average linkage hierarchical clustering of normalized data using MATLAB 2017b (MathWorks, Natick, MA). Euclidean distance was used as the distance metric.
Cytokine analysis. Patients’ plasma and respiratory samples were measured for IL-1β, IFN-α2, IFN-γ, TNF, MCP-1 (CCL2), IL-6, IL-8 (CXCL8), IL-10, IL-12p70, IL-17A, IL-18, IL-23 and IL-33 using the LEGENDplex™ Human Inflammation Panel 1 kit, as per manufacturer’s instructions (BioLegend, San Diego, CA, USA).
sIL-6Rα and ADAMTS4 ELISAs. Soluble protein levels were all measured using DuoSet ELISA kits for each protein (R&D Systems, Minneapolis, MN, USA) according to the manufacturer’s instructions. DuoSet ELISA ancillary reagent kit (R&D Systems) was used for respiratory fluids and in-house reagents with the same composition were used for plasma samples. In brief, 96-well R&D ELISA microplates (respiratory fluids) or 96-well Nunc Maxisorp ELISA plates (ThermoFisher, plasma) were coated with capture antibody overnight, followed by blocking with 1% w/v BSA for a minimum of 1 hour. Samples and standard proteins were added and incubated for 2 hours at room temperature, followed by detection antibody for a further 2 hours. Lastly, streptavidin-HRP, substrate solution and stop solution (2N H2SO4) were added subsequently for 20 minutes each. Plasma samples were diluted in 1:300 for sIL-6Rα ELISAs. Respiratory fluids were diluted in 1:50/1:150 for sIL-6Rα ELISA accounting in the original dilution factors and tested without further dilution for ADAMTS4 ELISA.
Computational flow cytometry analysis. Computational analysis of data was performed using the Spectre R package26 (https://github.com/ImmuneDynamics/Spectre). Samples were initially prepared in FlowJo, and populations of interest were exported as CSV files containing raw (scale value) data. In R, data were subject to arcsinh transformation and clustering using FlowSOM 27. For visualisation, cells in the myeloid panel were subjected to sample-weighted downsampling based on absolute cells/uL counts in the blood, whereas cells in the lymphoid panel were unchanged to preserve samples with low cell numbers. Cells from the lymphoid panel, and downsampled cells from the myeloid panel, were then distributed in 2D via dimensionality reduction (DR) using Fast Interpolation-based t-distributed Stochastic Neighbour Embedding (FIt-SNE)28. Data from both the lymphoid and myeloid panel were subject to two rounds of clustering and DR. The initial round of clustering and DR was used to filter out cellular debris and non-immune cells exhibiting high autofluorescence, using arcsinh transformed expression of CD45, CD3, CD4, CD8, CD14, CD16, CD19, CD64, CD66b, CD38, HLA-DR, and Live-Dead for the myeloid panel; and CD45, CD3, CD4, CD8, TCR-γδ, CD45RA, CD27, CD56, CD19, CD16, CD14, CD38, HLA-DR, PD-1, and Live-Dead for the lymphoid panel. A second round of clustering and DR was then used for detailed immunophenotyping of cells in the respiratory tract, using arcsinh transformed expression of CD4, CD8, CD14, CD16, CD38, CD45, CD64, CD66b, HLA-DR, and Live-Dead for the myeloid panel; and CD45, CD3, CD4, CD8, TCR-γδ, CD45RA, CD27, CD56, CD19, CD16, CD38, and HLA-DR for the lymphoid panel.
Immune cell lineages were manually annotated based on marker expression: neutrophils (FSCintSSCintCD66b+), monocytes (FSCintCD14+), B cells (CD19+), NK cells (CD56+), gamma-delta T cells (TCR-γδ+), CD4+ T cells (CD4+CD14-), CD8+ T cells (CD8+), and dead cells (Live-Dead+). Interestingly, CD3 expression on T cell subsets was not apparent in respiratory samples, and we did not find distinct eosinophil phenotypes. Subsequently, population subsets were manually annotated based on marker expression: neutrophils (CD16hi, CD16lo, CD16-, dead neutrophils Live-Dead+), monocytes (classical CD14+CD16- and intermediate CD14+CD16+), B cells (naïve CD27-CD38-, memory CD27+CD38-, ASC CD27+CD38+), NK cells (CD56bri and CD56dim), CD4+ T cells (naïve-like CD45RA+CD27+, EMRA-like CD45RA+CD27-, CM-like CD45RA-CD27+, and EM-like CD45RA-CD27-), and CD8+ T cells (naïve-like CD45RA+CD27+, EMRA-like CD45RA+CD27-, CM-like CD45RA-CD27+, and EM-like CD45RA-CD27-). Given that NK cells were classified as CD56+ cells, the subset might include other unconventional T cells. Subsets were evaluated for expression of CD38, HLA-DR, and PD-1 expression using manual gating in FlowJo.
Volcano plots and heatmaps were created using the Spectre R package26, where comparisons were performed using a Wilcoxon rank-sum test (equivalent to the Mann-Whitney test) with the wilcox.test function in R. Statistics displayed in volcano plots were corrected with a False Discovery Rate (FDR) adjustment.
Statistical analyses. Statistical significance was assessed using Mann-Whitney, Wilcoxon signed-rank test or Kruskal-Wallis test with Dunn’s correction for multiple comparisons in Prism 9 (GraphPad) unless stated otherwise. Correlations were assessed using Spearman’s correlation coefficient (rs) and visualized in R 3.6.2 as heatmaps using the corrplot package or using the online Morpheus heatmap software (https://software.broadinstitute.org/morpheus; the Broad Institute, MA, USA) and p-values of correlations were corrected for multiple comparisons by FDR in R 3.6.2. P-values lower than 0.05 were considered statistically significant.
Data availability. The published article includes all datasets generated or analyzed during the study.