Antibody Testing Documents the Silent Spread of SARS-CoV-2in New York Prior to the First Reported Case

We developed and validated serologic assays to determine SARS-CoV-2 seroprevalence in select patient populations in greater New York City area early during the epidemic. We tested “discarded” serum samples from February 24 to March 29 for antibodies against SARS-CoV-2 spike trimer and nucleocapsid protein. Using known durations for antibody development, incubation period, serial interval, and reproductive ratio for this pandemic, we determined that introduction of SARS-CoV-2 into New York likely occurred between January 23 and February 4, 2020. SARS-CoV-2 spread silently for 4–5 weeks before the rst community acquired infection was reported. A novel coronavirus emerged in December 2019 in Wuhan, China 1,2 and devasted Hubei Province in early 2020 before spreading to every province within China and nearly every country in the world 3 . This pathogen, now termed severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has caused a global pandemic, with ~ 10 million cases and over 500,000 deaths reported through June 30, 2020 3 . The rst case of SARS-CoV-2 infection in the United States was identied on January 19, 2020 in a man who returned to the State of Washington from Wuhan 4 . In the ensuing months, the U.S. has become a hotspot of the pandemic, presently accounting for almost one third of the total caseload and over one fourth of the deaths 3 . The rst conrmed case in New York was reported on March 1 in a traveler recently returned from Iran. The rst community-acquired SARS-CoV-2 infection was diagnosed on March 3 in a 50-year-old male who lived in New Rochelle and worked in New York City (https://www1.nyc.gov/site/doh/covid/covid-19-data-archive.page.) In the ensuing 18 weeks, New York City has suffered a peak daily infection number of ~ 4,500 (Fig. 1a) and a cumulative caseload of ~ 400,000 to date. The time period when SARS-CoV-2 gained entry into this epicenter of the pandemic remains unclear.


Introduction
A novel coronavirus emerged in December 2019 in Wuhan, China 1,2 and devasted Hubei Province in early 2020 before spreading to every province within China and nearly every country in the world 3 . This pathogen, now termed severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has caused a global pandemic, with ~10 million cases and over 500,000 deaths reported through June 30, 2020 3 . The rst case of SARS-CoV-2 infection in the United States was identi ed on January 19, 2020 in a man who returned to the State of Washington from Wuhan 4 . In the ensuing months, the U.S. has become a hotspot of the pandemic, presently accounting for almost one third of the total caseload and over one fourth of the deaths 3 . The rst con rmed case in New York was reported on March 1 in a traveler recently returned from Iran. The rst community-acquired SARS-CoV-2 infection was diagnosed on March 3 in a 50-yearold male who lived in New Rochelle and worked in New York City (https://www1.nyc.gov/site/doh/covid/covid-19-data-archive.page.) In the ensuing 18 weeks, New York City has suffered a peak daily infection number of ~4,500 (Fig. 1a) and a cumulative caseload of 400,000 to date. The time period when SARS-CoV-2 gained entry into this epicenter of the pandemic remains unclear.
Testing by polymerase chain reaction (PCR) has been the mainstay for con rming SARS-CoV-2 infection worldwide [5][6][7] . However, traditionally, serologic testing to detect antibodies to viral pathogens has been an invaluable diagnostic adjunct. Antibody testing could identify recovered patients who are no longer virus positive 7 . In certain circumstances, the presence of virus-speci c antibodies correlates with immunity 8 . In addition, serologic assays could be used to track infection retrospectively using stored serum samples 9 . Recently, numerous assays for the detection of SARS-CoV-2 antibodies have emerged 6,10,11 , but so far few have been properly evaluated and validated for clinical use.
In this report, we describe the development and validation of serologic assays to detect antibodies to SARS-CoV-2 spike and nucleocapsid proteins, as well as their immediate deployment to measure seroprevalence in a random selection of discarded serum samples from New York City (Fig. 1a) and suburbs to its north (Fig. 1b), including those from a period prior to the rst con rmed cases of community transmission.

Results
Performance of SARS-CoV-2 enzyme-linked immunosorbent assay (ELISA). First, to determine the speci city of our antibody assays, we tested 106 serum samples from healthy blood donors from a pre-SARS-CoV-2 era. The results showed that all but one sample had extremely low optical density (OD 450 ) values that were indicative of non-reactivity. Using a stringent cutoff of 3 standard deviations above the mean, we noted a speci city of 99% for each of the assays to detect spike trimer-speci c IgG (S-IgG), spike trimer-speci c IgM (S-IgM), or nucleocapsid protein (NP)-speci c IgG (NP-IgG) (Fig. 2a). We further assessed assay speci city by testing 48 serum samples from contemporaneous patients seen at Columbia University Irving Medical Center West Campus (CUIMC) for respiratory illnesses that were diagnosed as SARS-CoV-2 negative by PCR. Most of these cases were found to be positive by routine clinical tests for other viral pathogens including in uenza A/B, parain uenza, adenovirus, rhinovirus, human metapneumovirus, and common cold human coronaviruses (HKU1, OC43, and NL63). Only two samples showed OD 450 readings greater than the cutoff value, yielding a speci city of 96% in the S-IgG assay (Fig. 2b). The speci city of the assay to detect IgG to NP was slightly lower at 90% (Supplemental Fig. 1a). In particular, all 6 samples from patients with common cold coronaviruses were negative in the S-IgG assay, whereas 5 of 6 were negative in the NP-IgG assay.
Second, to determine the sensitivity of our antibody assays, we tested 146 serum samples from recent CUIMC cases with PCR-con rmed SARS-CoV-2 infection, at varying timepoints post onset of symptoms, for IgM or IgG to the spike trimer as well as for IgG to NP. As expected, the sensitivities of the assays to measure IgM or IgG targeting the spike trimer were low (<35%) in the rst 7 days post onset of symptoms, but increased rapidly thereafter, reaching a sensitivity of 87% for both assays 22 days after onset of symptoms. By reviewing the medical records of SARS-CoV-2-positive patients who were antibody negative in the spike trimer assays, we noted a number of them were cancer patients on rituximab treatment, renal transplant recipients on immunosuppressive therapy, or patients with aplastic anemia (color dots in Figs. 2c and 2d). If these cases were excluded, the overall sensitivity increases to 93% for both the S-IgM and S-IgG assay. The sensitivity of the NP-IgG assay was slightly lower at 87% (Supplemental Fig. 1b).
Experiments required for the validation of these antibody assays have been carried out successfully, including testing for interference and concordance with a commercial platform (i.e., Roche, Supplemental Fig. 2a-2c).
SARS-CoV-2 Seroprevalence. The CUIMC clinical laboratory received a total of 3,096 serum specimens from 5,464 unique patient visits to the Emergency Department (ED) during the period of study. Of the serum samples slated to be discarded, 814 specimens were randomly picked up for serologic testing for SARS-CoV-2 antibodies. There were no duplicate samples from the same patient. Specimens were evenly distributed between female (51%) and male (49%) patients, and 10% came from pediatric patients.
Age ranged from <1 year to 96 years old. These numbers closely match the sex and age distribution of the ED visits in that period. The randomly retrieved samples were rst tested by ELISA for IgM and IgG to the SARS-CoV-2 spike trimer. Any sample with an OD 450 reading above the previously determined cutoff value was then retested by the same assay and by the NP-IgG assay. Only samples positive for IgG or IgM to the spike trimer and con rmed by repeat testing, which also included IgG to NP, were considered seropositive for SARS-CoV-2.
For the rst week of the study (February 24 to March 1), 3 of 72 serum samples were found to be positive for SARS-CoV-2 antibodies, yielding a seroprevalence of 4.2% (95% con dence interval of 1.4% to 11.6%) ( Fig. 3a). For the second week, 9 of 114 samples (7.9%) were seropositive; the prevalence for SARS-CoV-2 antibodies in the three subsequent weeks ranged from 2.7% to 7.3% (Fig. 3a). Given the expanding epidemic in New York City, one might have expected the prevalence to steadily increase. However, changes in institutional practices such as opening of SARS-CoV-2 biorepository and cough-and-fever clinics (arrows in Fig. 3a) diverted many positive serum samples away from the Emergency Department and directly into a COVID biorepository instead of being discarded after ve to seven days as per routine protocol. Overall, 40 seropositive cases were identi ed through the CUIMC-ED, including 7 who presented to the hospital before the rst community-acquired SARS-CoV-2 infection in New York City was reported.
Of these, two adults complained of dyspnea without fever, three children had gastrointestinal symptoms with or without fever, and two adults had unrelated conditions (loss of balance and foot swelling).  Fig. 3b. Throughout the period of study, the SARS-CoV-2 seroprevalence ranged from 1.4% to 1.9% in this suburban ambulatory care population (Fig. 3b).
Because of the importance of the rst three positive cases (Fig. 3a, period from 2/24 to 3/2) in dating the introduction of the epidemic into New York City, additional serologic testing was carried out on these serum samples. Quantitative ELISA for NP-IgG and S-IgG, as well as for antibodies to the receptorbinding domain (RBD) of the spike protein, con rmed their seropositivity (Fig. 3c), and western blot analysis further demonstrated that these sera contained antibodies directed to the SARS-CoV-2 nucleocapsid protein and RBD, as well as to the spike protein (Fig. 3d). In particular, the western blots showed that the reactivity was directed to proteins with the molecular weights matching those of the viral proteins instead of some contaminants from the cells used for protein expression. Furthermore, note that NP was produced in bacteria while RBD and spike trimer were produced in mammalian cells. Overall, the results in Figs. 3c and 3d, demonstrating reactivity to three different SARS-CoV-2 proteins by two different assay formats, leave little doubt that these three serum samples are truly seropositive.

Discussion
We developed a panel of immuoassays to detect antibodies against the SARS-CoV-2 spike trimer and nucleocapsid protein. These ELISAs have speci city of 99% when tested on stored serum samples from an era before the pandemic (Fig. 2a). In patients presenting with non-SARS-CoV-2 respiratory illnesses largely caused by other viral pathogens, the speci city remained high, at 96% for the S-IgG assay (Fig. 2b) and 90% for the NP-IgG assay (Supplemental Fig. 1a). In non-immunosuppressed patients with con rmed SARS-CoV-2 infection, the sensitivity of the S-IgM and S-IgG assays was both 93% at day 22+ post onset of symptoms (Figs. 2c and 2d), whereas the sensitivity for the NP-IgG assay was 87%. Overall, the kinetics of IgM and IgG responses to the spike trimer were similar, and the trimer-based assays performed slightly better than the NP-based assay. These ELISAs have been approved by the Department of Health Laboratory of the State of New York for clinical use.
When these antibody assays were applied to the "discarded" serum samples, weekly seroprevalence estimates were 2.7% to 7.9% (overall 4.9%) for CUIMC-ED and 1.4% to 1.9% (overall 1.7%) for CareMount clinics (Figs. 3a and 3b). Seroprevalence was lower in the ve counties north of New York City, 20 to 100 miles away and with signi cantly lower population density (Fig. 1b). However, these numbers should not be taken to re ect the true penetrance of SARS-CoV-2 into these select patient populations because of possible skewing of our samples from changing clinical practices including opening up of separate cough-and-fever clinics, telemedicine, and a biorepository that syphoned off samples that were SARS-CoV-2-positive by RT-PCR.
The rst observation period of February 24 to March 1, before the earliest reported case of community transmission of SARS-CoV-2 in New York on March 3, is largely unbiased and most informative. Based on the observed seroprevalence of 4.2% and the overall sensitivity and speci city of our S-IgG assay, we estimated the posterior distribution of prevalence in the ED population as 4.5% (90% credible interval of 0.3% to 11.7%), similar to our observed estimate albeit with a wider credible interval.
As shown in Fig. 4, the three earliest seropositive cases (green dots), rmly established by multiple serologic tests (Figs. 3c and 3d), were seen in the CUIMC-ED on February 25, 26, and 29, and their places of residence were not clustered (Supplemental Fig. 4). Several reports indicate that 10 days post onset of symptoms was a reasonable mid-point to use for the development of antibodies directed to SARS-CoV-2 6,12 , as our own data would indicate as well (Figs. 2c and 2d). In addition, numerous publications have provided a range for the duration of the incubation period [13][14][15][16] , but a period of 5 days was an appropriate average. Using these gures, the mean time of infection by SARS-CoV-2 for these initial cases could be extrapolated back to February 12 (Fig. 4). Since these cases represented 4.2% of the 72 serum samples tested and a total of 1,154 serum samples were collected that week from the CUIMC-ED, we surmise that 48 infected cases likely accessed care in the CUIMC-ED during that week. To extrapolate further back, we use published data on the reproductive ratio (R 0 ) 17-20 and the serial interval (or generation time) 14,16,21 . A range of R 0 from 1.4 to 6.5 have been reported, as have a range of serial intervals of 4 to 7 days.
Conservatively, assuming the index case infected 3 others and then using an R 0 of 4 and a serial interval of 5 days, the 48 cases on February 12 would trace back to an index case for this cluster to January 28, with 95% con dence interval that the true date falls between January 23 and February 4 (Fig. 4).
Alternatively, calculations using the growth rate of 0.28/day for the SARS-CoV-2 epidemic in the U.S. 22 would take us back to the same date, a time frame about 3-4 weeks earlier than the prediction made on the basis of the genetic drift in 84 unique SARS-CoV-2 sequences found in the New York area, with multiple introductions largely from Europe 23 .
Using the above calculated 48 cases on February 12, we can also project this number forward in time using the same serial interval of 5 days and a more conservative R 0 of 3 (Fig. 4), with the assumption that following the announcement of the rst case some behavioral changes (e.g., mask-wearing and social distancing) would have occurred. We arrived at a number of 3,888 cases on March 3 in the CUIMC catchment. Given that the CUIMC caseload is only about 5% of that of New York City (area shaded in green in Fig. 4), we could cautiously conclude that there were already tens of thousands, and perhaps many more, SARS-CoV-2 infections in the city when the rst case was recognized.
Our ndings show that SARS-CoV-2 was spreading silently for 4-5 weeks in the New York metropolitan area before the rst community acquired infection was con rmed on March 3. During this seemingly quiescent period, the extensive outbreaks in China, South Korea, and Italy were widely known. Earlier implementation of surveillance for SARS-CoV-2 could have led to earlier applications of infection control measures and perhaps mitigated the devastation we now face.

Page 8/17
Study design, population, and serum samples. This was an observational study of SARS-CoV-2 serial seroprevalence in New York City, Westchester County, and the Lower Hudson Valley from February 24 to March 29, 2020. The targeted population included patients accessing the Emergency Department at CUIMC in northern Manhattan and CareMount Medical, a network of ambulatory care clinics in Westchester County and the Lower Hudson Valley (Fig. 1b). Clinical specimens collected from patients were designated as residual when they were no longer required for the diagnostic purpose for which they were originally collected. CUIMC and CareMount retained diagnostic samples for 5 and 7 days post collection, respectively. Anonymized serum samples about to be discarded were requested from each clinical laboratory. A total of 814 serum samples from CUIMC-ED was randomly retrieved between February 24 and March 29, and each chosen vial of serum was de-identi ed with a label onto which was marked a study-speci c identi er. The clinical laboratory retained a link between the study-speci c identi er and the patient's medical record number. Slightly later but in parallel, a total of 1,841 serum samples to be discarded was obtained from CareMount in three discrete batches between March 13 and March 28. Duplicate specimens for the same individual were identi ed and excluded before deidenti cation labels were applied. Specimens from one CareMount site in New York City were also excluded.
Assay validation. For the purpose of de ning the sensitivity and speci city of our SARS-CoV-2 antibody tests, we also utilized 106 serum samples obtained from distinct healthy blood donors from New York City between 2015 and 2018, 146 serum samples from distinct patients with PCR-positive SARS-CoV-2 infection at variable time points after onset of symptoms, and 48 serum samples from contemporaneous patients with respiratory illnesses who tested negative for SARS-CoV-2 by PCR, most of whom were con rmed to be infected by another viral pathogen, although prior infection with SARS-CoV-2 cannot be absolutely ensured.
SARS-CoV-2 antigens and ELISA. SARS-CoV-2 antigens and ELISA. The ectodomain of the SARS-CoV-2 spike trimer 24 was cloned into mammalian expression vector pCAGGS (Addgene, Watertown, MA), with a foldon tag followed by 6xHis tag and Strep tag II at the C-terminal. This expression vector was transiently transfected into Expi293 or HEK293F cells and the spike trimer was puri ed from the supernatant 3days post transfection using either Strep-Tactin XT Resin (IBA Life Sciences) followed by size exclusion chromatography on a Superose 6 Increase 10/300 GL column (GE Healthcare). SARS-CoV-2 nucleocapsid protein (NP) was cloned into pET28a(+) vector (Millipore-Sigma, Burlington, MA) with an AAALE linker and 6xHis tag at the C-terminus. The NP construct was then used to transform into E. coli Rosetta 2 (DE3) cells and the target protein was produced and puri ed from the bacterial lysate using PEI precipitation, a nity puri cation with Ni-NTA agarose beads (Thermo Fisher Scienti c, Carlsbad, CA), chromatography on a HiTrap Heparin-HP column, followed by size-exclusion chromatography on a Superdex 200 10/300 GL column (Garg et al, bio-protocol 2020). SARS-CoV-2 receptor-binding domain (RBD) 24 was cloned into vector pCAGGS (Addgene, Watertown, MA), with an HRV-3C protease cleavage site and monoFc tag followed by a 6xHis tag at the C-terminus. This expression vector was transiently transfected into Expi293 cells and the protein was puri ed 5 days post transfection using protein A agarose (Thermo Fisher Scienti c, Carlsbad, CA). The monoFc and 6xHis tags were subsequently cleaved with HRV-3C protease (Millipore-Sigma, Burlington, MA) and counter-selected with protein A agarose (Thermo Fisher Scienti c, Carlsbad, CA) as indicated by the manufacturer.
SARS-CoV-2 spike trimer and NP were coated on 96-well ELISA plate at a concentration of 200 ng/well and 50 ng/well, respectively at 4℃ overnight. The unbound proteins were then removed by two washes with 300 μl PBST (0.5% Tween-20 in PBS) per well. Afterwards, the ELISA plates were blocked with 300 μl 1% BSA in PBS at 37℃ for 2 hours, before the plates were washed 4 times with PBST. Serum samples were diluted 400-fold (or serially diluted for the quantitative ELISA) with dilution buffer (1% bovine serum albumin and 20% bovine calf serum in PBS) and tested using 100 μl per well of each diluted serum. The ELISA plates were then incubated at 37℃ for 1 hour, followed by washing 6 times with PBST. Later, 100 μl of 10000-fold diluted peroxidase a niPure goat anti-human IgG (H+L) antibody or anti-human IgM antibody (Jackson Immune Research, New Grove, PA), or both, were added into each well and incubated for 1 hour at 37℃. After washing 6 times with PBST, the TMB substrate (Sigma, St. Louis) was added and the reaction was stopped using 1M sulfuric acid. Absorbance was measured at 450 nm and expressed as an optical density, or OD 450 value. What constitutes a positive antibody test is discussed below.
Western Blot. SARS-CoV-2 S trimer, NP and RBD (1.5 μg each) were separated on a 4%-12% NuPage gel (Invitrogen, Carlsbad, CA) and stained with SimplyBlue™ SafeStain (Invitrogen, Carlsbad, CA). For the western blot, 200 ng of each protein were mixed and separated on a 4%-12% NuPage, and then electrophoretically transferred to a PVDF membrane (Millipore-Sigma, Burlington, MA). The membrane was then blocked with 10% Blotting-Grade Blocker (Bio-Rad, Hercules, CA) in PBST for 30 min at room temperature and then incubated with 1:400 dilution of serum overnight at 4℃. After washing in PBST, the membrane was incubated with a mixture of 10,000-fold diluted Peroxidase A niPure goat antihuman IgG (H+L) antibody and anti-human IgM antibody (Jackson Immune Research, New Grove, PA) for 1 hr at room temperature. Finally, it was washed, treated with chemiluminescent substrate, and imaged via iBright 1500 (Thermo Fisher Scienti c, Carlsbad, CA).
Clinical information. Basic demographic data, such as age, sex, zip code, and date of onset of symptoms, were collected from the CUIMC-ED cases through the medical record number linked to the study-speci c identi er log retained by the clinical pathology laboratory. For a small number of seropositive cases, the date of sample collection, the reason for visiting the ED, and additional medical history were extracted from the medical record. In parallel, the number of daily new SARS-CoV-2 cases were tracked from the websites of New York City and the counties with CareMount clinics (Fig. 1b), as well as from internal reports of CUIMC and CareMount.
Statistical analysis. We estimated the prevalence of SARS-CoV-2 infection at each clinical site by dividing the number of positive tests by the total number of samples tested for each time period, and reported 95% con dence intervals. We calculated posterior distribution of prevalence in the ED population using overall sensitivity and speci city of our S-IgG assay by the methodology described in Larrimore et al through their calculator (https://larremorelab.github.io/covid-calculator2) 25 . We extrapolated from the seroprevalence observed in the rst week of sampling at CUIMC-ED to the total number of SARS-CoV-2-positive cases for that period based on the known number of ED visits.
Protection of human subjects and con dentiality. The study was approved by the Institutional Review Board (IRB) of CUIMC through an honest broker protocol (E. Hod, Principal Investigator) that allows the clinical laboratory to code specimens while retaining a linked key to medical record numbers from which the demographic and clinical information could be obtained. As all testing for the CareMount discarded serum specimens was done on samples that were irrevocably deidenti ed, the IRB determined that this component of the study did not require ethical approval.

Declarations Data Availability
All requests for raw and analyzed data will be promptly reviewed by the corresponding author to verify whether the request is subject to any intellectual property or con dentiality obligations. Data will then be provided whenever possible.   Seroprevalence in discarded serum samples and testing to con rm seropositivity. Percentage of discarded serum samples testing positive by ELISA for IgG and/or IgM to SARS-CoV-2 spike trimer in a.
CUIMC-ED and b. CareMount clinics. Bars denote 95% con dence intervals. Arrows denote institutional procedural changes. March 15: CareMount signi cantly decreases all elective procedures (peach); March 19: CUIMC SARS-CoV-2 Biorepository opens (white); March 23: CUIMC opens cough and fever clinics (red) and CareMount scales up telemedicine (magenta). c. binding curves on quantitative ELISA of serum samples from earliest positive ED samples (ED15, ED88, and ED166) and two healthy donors (ADARC02 and ADARC03) from a pre-pandemic era against SARS-CoV-2 NP, S trimer and RBD; d. Western blot analysis of serum samples from the three ED cases as well as negative (-) and positive (+) controls binding to SARS-CoV-2 NP, S trimer and RBD.

Figure 4
Imputed dates of SARS-CoV-2 infection for earliest samples detected in CUIMC discarded sera. The shaded areas on the right represent the 7-day moving average for daily new cases detected by PCR reported in New York City and Columbia University Irving Medical Center. The blue arrow denotes the rst reported case of community-acquired SARS-CoV-2 in New York. The green circles denote the three cases identi ed in the CUIMC ED. Brown rectangles denote the estimated date of anti-SARS-CoV-2 antibody development for the earliest three cases; pink rectangles estimate the date of infection of Cases 1-3. The hatched rectangle shows the 95% con dence interval for the date range for infections for Cases 1-3, two to four serial intervals earlier.