FDA-approved COVID-19 vaccines are effective per real-world evidence synthesized across a multi-state health system

Large Phase 3 clinical trials of the two FDA-approved COVID-19 vaccines, mRNA-1273 (Moderna) and BNT162b2 (Pfizer/BioNTech), have demonstrated efficacies of 94.1% (n = 30,420, 95% CI: 89.3-96.8) and 95% (n = 43,448, 95% CI: 90.3-97.6) in preventing symptomatic COVID-19, respectively. Given the ongoing vaccine rollout to healthcare personnel and residents of long-term care facilities, here we provide a preliminary assessment of real-world vaccination efficacy in 62,598 individuals from the Mayo Clinic and associated health system (Arizona, Florida, Minnesota, Wisconsin) between December 1 st 2020 and February 8 th 2021. Our retrospective analysis contrasts 31,299 individuals receiving at least one dose of either vaccine with 31,299 unvaccinated individuals who are propensity-matched based on demographics, location (zip code), and number of prior SARS-CoV-2 PCR tests. Administration of two COVID-19 vaccine doses was 89.0% effective in preventing SARS-CoV-2 infection (95% CI: 69.1-97.2%) with onset at least 36 days after the first dose. Furthermore, vaccinated patients who were subsequently diagnosed with COVID-19 had significantly lower 14-day hospital admission rates than propensity-matched unvaccinated COVID-19 patients (3.7% vs. 9.2%; Relative Risk: 0.4; p-value: 0.007). Building upon the previous randomized trials of these vaccines, this study demonstrates their real-world effectiveness in reducing the rates of SARS-CoV-2 infection and COVID-19 severity among individuals at highest risk for infection. cohort.


Introduction
To date there have been over 107 million confirmed cases of COVID-19 and over 2.3 million associated deaths globally (1). From the moment that SARS-CoV-2 was identified as the causative agent of COVID-19, efforts were initiated to characterize this virus and to develop vaccines against it (2,3,4). Within months, several candidates were shown to be safe and to induce robust immune responses against SARS-CoV-2 in a series of early phase trials (5,6,7,8). More recently, multiple vaccine candidates have shown over 94% efficacy in preventing symptomatic COVID-19 infection in large phase 3 clinical trials (9,10). Of note, unlike the seasonal flu vaccine, both of these candidates are delivered as a series of two inoculations separated by three or four weeks, with maximal response believed to be achieved by one to two weeks after the second dose (5,6,9,10).
In a phase 3 trial studying BNT162b2 (9), the COVID-19 vaccine candidate developed by Pfizer/BioNTech, 50 out of 21,314 (0.23%) vaccinated patients experienced a symptomatic COVID-19 infection, with an incidence rate of 12.5 cases per 1000 person-years. In contrast, 275 of 21,258 (1.29%) patients receiving a placebo injection developed COVID-19, with an incidence rate of 69.1 cases per 1000 person-years. Thirty patients experienced severe disease, all of whom had received placebo. Seven or more days after the second dose, the difference between groups was even more pronounced, with incidence rates of 3.61 and 72.9 cases per 1000 person-years in the vaccinated and placebo groups, respectively (efficacy = 95.0%; 95% CI: 90.3-97.6%).
Similarly, in the trial studying mRNA-1273 (10), the vaccine candidate developed by Moderna, 19 of 14,550 (0.13%) vaccinated patients experienced a symptomatic infection compared to 269 of 14,598 (1.84%) patients receiving placebo. Among these symptomatic infections, there were 9 cases of severe COVID-19 in the placebo group compared to only one in the vaccinated cohort. This effect was stronger when considering infection rates 14 or more days after the second dose, with incidence rates of 3.3 and 56.5 cases per 1000 person-years in the vaccinated and placebo groups, respectively (efficacy = 94.1; 95% CI: 89.3-96.8%).
Both BNT162b2 and mRNA-1273 are now being administered throughout the United States, with first priority given to individuals at high risk for becoming infected with SARS-CoV-2 or experiencing severe COVID-19, including healthcare workers and residents of long term care facilities (11). While these groups were not excluded from the phase 3 trials, vaccine efficacy has not been specifically demonstrated among them. It is thus critical to analyze outcomes of vaccinated patients to date to determine whether these vaccines are indeed effective in especially high-risk individuals.
Here we conducted a large-scale real world interim analysis of COVID-19 vaccination outcomes in the United States. Specifically, we assessed the rates of SARS-CoV-2 positivity and severity of COVID-19 among 31,299 individuals who received at least one dose of BNT162b2 or mRNA-1273 in the Mayo Clinic health system, including sites in Minnesota, Arizona, Florida, and Wisconsin. One challenge inherent to such real world analyses is the lack of a built-in placebo arm, which is essential to establish the expected infection rate during the study period and thereby to assess vaccine efficacy. To address this shortcoming, we used 1-to-1 propensity score matching (PSM) to generate a cohort of 31,299 individuals who were not previously infected with SARS-CoV-2 and did not receive a COVID-19 vaccine by the end of the study period (Figure 1, Table 1). Between these two arms, we compared rates of SARS-CoV-2 infection and severity of COVID-19 illness during defined intervals after study enrollment.

Study design, setting and population
This is a retrospective study of individuals who underwent polymerase chain reaction (PCR) testing for suspected SARS-CoV-2 infection at the Mayo Clinic and hospitals affiliated to the Mayo health system. This research was conducted under IRB 20-003278, "Study of COVID-19 patient characteristics with augmented curation of Electronic Health Records (EHR) to inform strategic and operational decisions".
In total, there were 507,525 patients in the Mayo electronic health record (EHR) database who received a PCR test between February 15, 2020 andFebruary 8, 2021. To obtain the study population, we defined the following inclusion criteria: (1) at least 18 years old; (2) no positive SARS-CoV-2 PCR test before December 1, 2020; (3) resides in a locale (based on Zip code) with  at least 25 patients who have received BNT162b2 or mRNA-1273. This population included  249,708 patients, of whom 31,623 have received BNT162b2 or mRNA-1273 and 218,085 have  no record of COVID-19 vaccination. Vaccinated patients who had tested positive for SARS-CoV-2 by PCR between December 1, 2020 and the date of their first vaccine dose were excluded, leaving 31,299 patients in the final vaccinated cohort. Propensity matched unvaccinated cohorts for analyses of vaccine efficacy and disease severity were selected from the previously derived set of 218,085 unvaccinated patients. This patient selection algorithm and its associated counts are summarized in Figure 1. More details on the propensity score matching procedures for the vaccine efficacy and disease severity analyses are provided below.

Propensity score matching to select the unvaccinated cohort for efficacy analysis
We employed 1:1 propensity score matching (12) to construct an unvaccinated cohort similar to the vaccinated cohort on key risk factors for SARS-CoV-2 infection, including geography, demographics, and records of PCR testing. Specifically, first we matched exactly based on geography (zip code of the patient's residence). Next, we used propensity score matching to match approximately based upon demographic features (age, sex, race, ethnicity) and records of PCR testing (number of negative PCR tests taken between February 8, 2020 and November 30, 2020); the number of negative PCR tests covariate was meant as a proxy for ongoing baseline exposure to COVID. To obtain the propensity scores for the matching procedure, we trained regularized logistic regression models for each zip code using the software package sklearn v0.20.3 in Python.
Using these propensity scores, we then matched each of the 31,299 patients who received at least one dose of BNT162b2 or mRNA-1273 with 1 patient out of the 218,085 unvaccinated patients, using greedy nearest-neighbor matching without replacement (13). In particular, for each vaccinated patient, we selected an unvaccinated patient that lived in the same zip code with the closest propensity score to the vaccinated patient.
For each vaccinated patient, the date of study enrollment (Day 0) was defined as the date of their first vaccine dose. For each unvaccinated patient, the date of study enrollment was defined as the date of the first vaccine dose for their matched vaccinated patient. The resulting cohorts are summarized in Table 1 along with the standardized mean differences (SMD) of their clinical covariates (14,15). Overall, there is no substantial difference between the two cohorts in any of the clinical covariates that were included in propensity score matching (with SMD < 0.1 for all covariates). The age distributions of patients in the vaccinated, unmatched unvaccinated, and propensity matched unvaccinated cohorts are shown in Figure S1A-B. Additional data regarding the mean follow-up time, the number of vaccine doses received, and the number of patients taking at least one SARS-CoV-2 PCR test after the study enrollment date are provided in Table S1.

Evaluation of vaccine efficacy
To evaluate the COVID-19 vaccine efficacy in a real-world clinical setting, we compared the two populations described above and summarized in Figure 1: (1) 31,299 individuals who received BNT162b2 or mRNA-1273 and did not have a prior positive SARS-CoV-2 PCR test ("vaccinated"), and (2) 31,299 propensity matched individuals who have never received a COVID-19 vaccine and did not have a positive SARS-CoV-2 PCR test before the first vaccination date (dose 1) of their matched patient ("unvaccinated").
Cumulative proportional incidence of SARS-CoV-2 infection was compared between vaccinated and unvaccinated patients by Kaplan Meier analysis. Cumulative proportional incidence at time t is the estimated proportion of patients who experience the outcome on or before time t, i.e. 1 minus the standard Kaplan-Meier survival estimate. We considered cumulative incidence starting at Day 1, Day 14, and Day 28 relative to the date of study enrollment (Day 0). Statistical significance was assessed with the log rank test (16).
Efficacy was also assessed during defined intervals by computing the incidence rate ratio (IRR) of the vaccinated and unvaccinated cohorts. Efficacy was defined as 100% x (1 -IRR). The time periods considered were as follows: (1) Day 1 onwards, (2) Day 15 onwards, (3) Day 29 onwards, (4) Day 36 onwards, and (5) six one-week intervals starting one day after the first dose of vaccination ("Day 1"). Only six one-week intervals were considered because the number of patients contributing at-risk person-days after six weeks was inadequate for this analysis. For each cohort in a given time period, incidence rates were calculated as the number of patients testing positive for SARS-CoV-2 in that time period divided by the total number of at-risk persondays contributed in that time period. For each patient, at-risk person-days are defined as the number of days in the time period in which the patient has not yet tested positive for SARS-CoV-2 or died. The IRR was calculated as the incidence rate of the vaccinated cohort divided by the incidence rate of the unvaccinated cohort, and its 95% confidence interval was computed using an exact approach described previously (17).

Propensity score matching to select the unvaccinated COVID-19 patients for disease severity analysis
Similarly, we applied 1:10 propensity score matching (12) to construct a SARS-CoV-2 positive unvaccinated cohort similar in baseline clinical covariates to the cohort of patients who were vaccinated and subsequently tested positive for SARS-CoV-2. In particular, we used propensity score matching to match approximately based upon demographic features (age, sex, race, ethnicity) and comorbidities (asthma, cancer, cardiomyopathy, chronic kidney disease, chronic obstructive pulmonary disease, coronary artery disease, heart failure, hypertension, obesity, pregnancy, severe obesity, sickle cell disease, solid organ transplant, stroke / cerebrovascular disease, type 2 diabetes mellitus). This list of comorbidities was derived from the list of risk factors for severe COVID-19 illness provided by the Centers of Disease Control and Prevention (18). We used deep neural networks to automatically identify comorbidities from the clinical notes, which are described in the next section. To obtain the propensity scores, we trained a regularized logistic regression model with these features using the software package sklearn v0.20.3 in Python.
Based on these propensity scores, we matched each of the 263 individuals that tested positive for SARS-CoV-2 after vaccination with 10 individuals out of the 14,512 individuals that tested positive for SARS-CoV-2 and that were not vaccinated. As in the previous propensity score matching procedure, we used greedy nearest-neighbor matching without replacement (13). The resulting cohorts are summarized in Table 3, along with the SMDs for the clinical covariates that were balanced upon (14,15). Overall, there is no significant difference between the two cohorts in any of the clinical covariates that were included in propensity score matching (with SMD < 0.1 for all covariates). The age distributions of COVID-19 patients in the vaccinated cohort, unmatched unvaccinated cohort, and propensity matched unvaccinated cohort are shown in Figure S1C-D.
For each SARS-CoV-2 positive patient in both the vaccinated and unvaccinated cohorts, the index date for the analysis (day 0) was taken to be the date of the first positive PCR test. Clinical outcomes at 14 days were compared, including hospital admission, ICU admission, and mortality.

Evaluation of disease severity
We compare the clinical outcomes of vaccinated and propensity-matched unvaccinated SARS-CoV-2 positive patients in order to evaluate the impact of COVID-19 vaccination upon disease severity. Among the patients in each cohort with at least 14 days of follow-up after their first positive PCR test, we evaluate: (1) 14-day hospital admission rate: Number of patients admitted to the hospital in the two weeks following their positive PCR test, (2) 14-day ICU admission rate: Number of patients admitted to the ICU in the two weeks following their positive PCR test, and (3) 14-day mortality rate: Number of patients deceased in the two weeks following their positive PCR test. For each outcome, we report the relative risk (rate in the vaccinated cohort divided by the rate in the matched unvaccinated cohort), 95% confidence interval for the relative risk (19), and Fisher's exact test p-value. Hospital-free and ICU-free survival were also compared via Kaplan-Meier analysis, with statistical significance assessed with the log rank test (16).

Deep neural networks to identify comorbidities from clinical notes
In order to identify the comorbidities from the electronic health record for each patient, we used a BERT-based neural network model (20) to classify the sentiment for the phenotypes that appeared in the clinical notes. In particular, we applied a phenotype sentiment classification model that had been trained on 18,500 sentences which achieves an out-of-sample accuracy of 93.6% with precision and recall scores above 95% (21). This classification model predicts four classes, including: (1) "Yes": confirmed diagnosis (2) "No": ruled-out diagnosis, (3) "Maybe": possibility of disease, and (4) "Other": alternate context (e.g. family history of disease). For each patient, we applied the sentiment model to the clinical notes in the Mayo Clinic electronic health record from December 1, 2015 to November 30, 2020. For each comorbidity phenotype, if a patient had at least one mention of the phenotype during the time period with a confidence score of 90% or greater, then the patient was labelled as having the phenotype.

COVID-19 vaccines reduce the incidence rate of SARS-CoV-2 infection
Over the duration of our study (see Methods), 263 of 31,299 (0.84%) vaccinated individuals had at least one positive SARS-CoV-2 PCR test compared to 661 of 31,299 (2.11%) matched unvaccinated individuals (Table 2, Figure 1). The incidence rates of positive SARS-CoV-2 tests in the vaccinated and unvaccinated cohorts were 0.33 and 0.83 cases per 1000 person-days, respectively. This corresponds to a vaccine efficacy of 60.7% (95% CI: 54.6-66.1%) over the entire study period, and a log-rank test indicates that the hazard rate is significantly lower in the vaccinated cohort over this time interval (p = 5x10 -40 ; Figure 2A). The hazard rates were also significantly lower in the vaccinated group when considering SARS-CoV-2 infections with onset at 14 days after study enrollment (p = 1x10 -28 ; Figure 2B) or 28 days after study enrollment (p = 2.4x10 -13 ; Figure 2C). For the 263 vaccinated individuals who subsequently tested positive for SARS-CoV-2, the distribution of time from first dose to first positive SARS-CoV-2 PCR test is shown in Figure 3.
Starting 36 days after study enrollment (approximately two weeks after the second dose of BNT162b2 and one week after the second dose of mRNA-1273), the vaccinated and unvaccinated incidence rates were 0.079 and 0.47 case per 1000 person-days, respectively. This corresponds to a vaccine efficacy of 83.4% (95% CI: 60.2-94.3%) ( Table 2). Importantly, we found that two of the six infections in the vaccinated cohort after day 36 occurred in individuals who had received only one vaccine dose, even though all vaccinated individuals should have received two doses by this time point per the manufacturer guidelines. Among the properly vaccinated individuals (i.e. those who had received both doses prior to day 36), the incidence rate of a positive SARS-CoV-2 PCR test was 0.052 cases per 1000 person-days (4 cases in 76,465 person days), indicating an efficacy of 89.0% (95% CI: 69.1-97.2%) ( Table 2).
We also assessed the rates of infection and estimated vaccine efficacy in six nonoverlapping 7-day intervals starting at the date of first vaccination. Even in the first seven days after study enrollment, vaccinated individuals had significantly lower infection incidence rates (0.56 cases per 1000 person-days) than unvaccinated individuals (1.26 cases per 1000 persondays), corresponding to an efficacy of 53.7% (95% CI: 41.0%-63.8%) ( Table 2). The vaccination efficacy then generally increased in subsequent weeks, reaching its maximum (92.5%; 95% CI: 70.2-99.1%) during the sixth week after study enrollment (days 36-42) ( Table 2).

COVID-19 vaccines are associated with lower hospitalization rates post SARS-CoV-2 infection
To assess whether vaccination also reduces illness severity, we compared 14-day rates of hospitalization, ICU admission, and mortality in COVID-19 patients who were vaccinated prior to diagnosis (n = 263) and 1-to-10 propensity score matched unvaccinated COVID-19 patients (n = 2,630) (see Methods and Table 3). The vaccinated population showed a significantly lower 14day hospital admission rate (3.7% vs. 9.2%; Relative Risk = 0.4; p = 0.0074) (Figure 4A, Table  4). On the other hand, ICU admission rates were similar between these cohorts (1% vs. 1.3%; Relative Risk = 0.82; p = 1.0) ( Figure 4B, Table 4). 14-day conditional mortality rates were also not significantly different (0% vs. 0.085%; Relative Risk = 0; p = 1.0), but it is worth noting that no vaccinated patients died within 14 days of acquiring COVID-19 ( Table 4). In fact, none of the vaccinated patients who were subsequently diagnosed with COVID-19 have died, including 59 with at least 28 days of follow-up (data not shown).

Discussion
Recent phase 3 trials have led to the approval of two COVID-19 vaccines in the United States, and other vaccines have been approved in other countries or show promise for approval in the near future (7,8). This study provides strong further evidence supporting the use of vaccination to prevent and reduce the severity of COVID-19. While other real world analyses of COVID-19 vaccines are now emerging (22), a defined placebo group or adequately balanced unvaccinated cohort is difficult to ascertain outside of the clinical trial setting. To address this challenge, we used propensity matching to generate cohorts of vaccinated and unvaccinated patients who are balanced for demographic, geographic, and social variables, and then evaluated the effect of vaccination on the rate of SARS-CoV-2 positivity and COVID-19 severity between these cohorts. These vaccines, when administered as two serial doses, were 89.0% effective (95% CI: 69.1-97.2%) in preventing SARS-CoV-2 infection with onset at least five weeks after the first dose. This result is in line with the previously reported efficacies for both BNT162b2 (95.0%; 95% CI: 90.3-97.6%) and mRNA-1273 (94.1%; 95% CI: 89.3-96.8%) in preventing symptomatic COVID-19 with onset at least one week after the second dose (9, 10).
That the efficacy observed in our study is slightly lower than those reported in the two corresponding randomized controlled trials should be interpreted cautiously and contextually, as there are several plausible reasons for this. First, the 95% confidence intervals of all three efficacy estimates are highly overlapping, consistent with the true efficacies not being meaningfully different from each other. Second, due to distribution guidelines that are in place for Phase 1a of the vaccine rollout (11), individuals at high risk for acquiring COVID-19 (e.g., healthcare workers and residents of long term care facilities) are expected to be overrepresented in this vaccinated cohort. This could lead to an underestimation of vaccine efficacy, as the propensity matched unvaccinated group is likely composed of lower-risk individuals despite being matched for age, sex, race, ethnicity, and the number of prior SARS-CoV-2 PCR tests. Third, the likelihood of exposure to SARS-CoV-2 may be dependent on vaccination status to a greater extent in the real world than it is in the context of a randomized trial. That is, vaccinated individuals may feel more comfortable participating in social situations that pose a higher risk for infection, whereas this bias did not exist by definition in the context of the observer-blinded clinical trials.
The incidence rates of SARS-CoV-2 infection in both our vaccinated and unvaccinated cohorts (120 and 303 cases per 1000 person-years, respectively) are notably higher than the COVID-19 incidence rates reported in the placebo groups of both the BNT162b2 and mRNA-1273 trials (69.1 and 79.7 cases per 1000 person-years, respectively), but there are also several explanations for this observation. First and most importantly, in contrast to both clinical trials, our outcome of interest is only a positive SARS-CoV-2 PCR test (with no requirement for presence of any clinical symptoms), whereas the phase 3 trials were designed to study symptomatic COVID-19 infections. Given that over 40% of COVID-19 cases may be asymptomatic (23, 24), we know that the rates of SARS-CoV-2 PCR positivity would have been higher than the rates of symptomatic COVID-19 in both trials. This discrepancy in measured outcomes may also contribute to the slight differences in estimated efficacy addressed previously. Second, due to overrepresentation of high-risk individuals in the vaccinated cohort as described above, it is reasonable to hypothesize that the infection incidence was positively skewed in this study.
Our finding that hospitalization rates are lower in COVID-19 patients who were vaccinated prior to SARS-CoV-2 infection compared to propensity matched unvaccinated COVID-19 patients is consistent with the clinical trial results for both BNT162b2 and mRNA-1273 (9,10). In the trial of BNT162b2, 30 cases of severe COVID-19 occurred, all of which were in the placebo group; and in the trial of mRNA-1273, 9 cases of severe COVID-19 occurred in the placebo group compared to only one in the vaccinated cohort. While the ICU admission and mortality rates were not significantly lower in our vaccinated population, this may be attributable to an inadequate number of patients with these outcomes in either group to date in our study. As more patients are vaccinated and follow-up time increases, we will update our analyses to determine whether vaccination can also reduce the risk of these outcomes.
There are several important limitations to consider in this study. First, while the cohort size was even larger than the cohorts studied in phase 3 trials, the mean follow-up time per patient is substantially lower (mean = 26.9 days versus approximately 80 to 90 days). Consistent with this, approximately 45.5% of our vaccinated cohort had received only one dose of vaccination at the time of this study (Table S1). We were thus limited in the number of patients and at-risk persondays that were available for the critical long term efficacy analyses. Second, we did not assess vaccine-associated adverse events, nor did we compare the clinical symptomatology of COVID-19 infections between vaccinated and unvaccinated patients. Third, it is possible that the likelihood of seeking out a SARS-CoV-2 PCR test was different between vaccinated and propensity matched unvaccinated patients, which could introduce bias into our estimates of vaccine efficacy. Indeed, vaccinated patients may feel less compelled to undergo subsequent PCR testing, thereby reducing the number of positive tests recorded in this group. However, our data suggests that this is likely not a strong confounding factor, as the fraction of vaccinated patients with at least one PCR test after study enrollment (13.8%) was only marginally lower than the same fraction of unvaccinated patients (16.5%) ( Table S1).
Our data demonstrates a strong real world effect of COVID-19 vaccination on par with the results reported in each randomized trial. This study also provides additional information which could not be ascertained from either trial, including the conclusions that (1) vaccination is effective in individuals who are at highest risk for acquiring COVID-19, and (2) vaccination reduces the rate of SARS-CoV-2 infection as defined by a positive PCR test alone. In summary, we emphasize that COVID-19 vaccines should be administered as broadly and rapidly as possible to the public and that the real world efficacy of these vaccines should be continuously monitored as we move beyond Phase 1a of the distribution process.