Surface swabs specimens – As part of a pilot study screening high-touch surfaces for respiratory viruses, swabs were used to sample 25 cm2 areas of the outside handle of the main entrance door of a joint teaching and office building housing the Colleges of Public Health and Health Professions, Nursing, and Pharmacy at a major Florida university. Over 300 persons were estimated to pass through the entrance during a normal school week (Monday through Friday). Samples were obtained from 1 to 5 February and from 19 February to 4 March, 2020; the dates chosen were arbitrary. Because the door handle was cleaned early each morning, swab samplings were performed after most classroom sessions, typically between 6 – 7 PM, to allow for fresh daily accumulation of hand-deposited microorganisms.
As previously reported by our group [5], flocked nylon swabs pre-moistened with phosphate-buffered saline were used for surface samplings, after which they were immersed into 1.0 mL universal transport medium (UTM) (COPAN Diagnostics, Inc., Murrieta, CA, USA). Swab samples were immediately transported to a BSL2-plus laboratory in a nearby building, material on the swab was extruded into the UTM, and the collection tube frozen at -80°C pending molecular and virology analyses. For molecular tests, RNA was purified by using a QIAamp Viral RNA Mini Kit (Qiagen, Valencia, CA, USA). Influenza A and B virus genomic RNAs were detected by RT-PCR directed at the HA and NA genes [6]. The primers and probes for the CDC 2019-Novel Coronavirus (2019-nCoV) rtRT-PCR test and an in-house (UF) test [7](Table 1) were synthesized by and purchased from Integrated DNA Technologies (Coralville, Iowa, USA). For both UF primer sets, the level of detection using synthesized oligonucleotide target sequences was approximately 5 genome equivalents with at least 95% detection probability per 25 µl PCR test. Neither UF N nor RdRp primer sets detect SARS or MERS CoV genomic RNA, or human RNA sequences. They do not detect human coronavirus OC43, NL63, or 229E genomic RNAs at approximately 1 x 105 genome equivalents per 25 µl PCR test, and did not detect corresponding synthesized HKU1 oligonucleotide N and RdRp sequences. The sensitivity of the CDC, UF and the SARS CoV-2 rtRT-PCR test developed by Zhu et al. [8] are similar (data not shown).
In-house developed Madin-Darby canine kidney (MDCK) cells that over-express α2-6-sialylglycan receptors [9] were to isolate influenza viruses. As previously described [7], the African green monkey kidney cell line Vero E6, obtained from the American type culture collection (catalog no. ATCC CRL-1586), was used for attempts to isolate SARS-CoV-2.
For influenza virus, after about 50% of the killed cells had detached from the growing surface, virus genomic RNAs (vRNAs) were purified from virions in the cell growth media. The vRNAs served as templates to construct a cDNA library using an NEBNext Ultra RNA Library Prep Kit (New England BioLabs® Inc.) followed by sequencing on an Illumina MiSeq sequencer using a version 3 chemistry 600 cycle kit.The complete genome sequence of SARS-CoV-2 in the environmental sample (designated as UF-11) was determined using a genome walking strategy [10]. Briefly, cDNA was produced using AccuScript high-fidelity reverse transcriptase (Agilent Technologies, Santa Clara, CA) and sequence-specific primers based on SARS-CoV-2 WIV04 (GenBank MN996528.1). The resulting cDNA was amplified by PCR with Q5 polymerase (New England BioLabs) and non-overlapping gene-specific primers. The 5′ and 3′ ends of the SARS-CoV-2 genome were determined using a Rapid Amplification of cDNA Ends (RACE) kit (Life Technologies, Inc., Carlsbad, CA, USA), and the resulting sequences were assembled with Sequencher DNA sequence analysis software version 2.1 (Gene Codes, Ann Arbor, MI, USA).
Phylogenetic analyses
SARS-CoV-2 full or nearly-full genome (>29,000 bp) sequences, with a collection date prior or equal to March 6th 2020, were downloaded from GISAID on August 18th 2020. Genomes were subsequently filtered according the following exclusion criteria: 1) sequences with more than 150 uncertain nucleotides due to missing data and/or poor sequence quality; 2) sequences missing sampling date, and 3) sequences missing sampling location. After filtering, 2,439 genomes, including 17 new UF isolates (UF1-UF17), were retained and aligned using MAFFT [11]. Sequences identical or highly similar to UF11 were identified by BLAST. We found 75 identical sequences with a length of 29,596 bp (99+% of UF11 length), no insertion/deletion, nor nucleotide mismatches. We also found 360 similar genomes, defined as genomes with total of nucleotide mismatches < 6 (in coding regions, each long gap in multiples of three, if present, was also treated as a single mutational event). The threshold for highly similar genomes (0 < nucleotide difference < 6) was chosen by calculating the 95% confidence interval of the number of total mutations expected to accumulate, between January and March 2020, among UF-11 and other genomes potentially belonging to the same transmission chain. The mutational process was assumed to be Poisson distributed, with a mean evolutionary rate of 2.4 10-4 nucleotide substitutions per year, independently calculated using a data set of 11,262 full genome sequences available in GISAID on April 25th 2020 [12].
The 2,439 aligned genomes were ranked by similarity to UF-11 by calculating pair-wise Jukes-Cantor (JC) distances. Genomes identical to UF-11 were removed them from the set and the remaining ones were randomly subsampled using the following constraints: 1) final dataset should include min 250 and max 300 sequences; 2) all the UF isolates should be included, and 3) the median genetic diversity of the subsample should be the same as the median of the full data set. The subsampled dataset, representative of the overall diversity of the full data set, included 289 sequences and was used to infer a maximum likelihood tree, with the best fitting nucleotide substitution model and 1,000 bootstrap replicates with the IQTREE software [13]. The presence of sufficient tree-like signal in the subsampled data set was assessed by Likelihood mapping [14] also implemented in IQTREE. Tree branches were scaled in nucleotide substitutions per site since an accurate molecular clock could not be calibrated, given the lack of temporal signal in the phylogeny inferred from the sub-sampled sequences (root to tip distance versus sampling time correlation coefficient < 0.1).