The True Infection Mortality Rate of COVID-19 During the Spring 2020 Wave

Estimations of the infection mortality rate of COVID-19, the disease caused by the SARS-CoV-2 virus, are prone to biases due to underdiagnosis, false positives, false negatives, and time lag between diagnosis and death. With a systematic analysis that combines epidemiological modeling of COVID-19 in Spain and in the state of New York, and results of random immunological testing in the spring of 2020 in both locations, most of the bias is eliminated and any remaining bias is evaluated and reported as an uncertainty estimate. A true infection mortality rate of 1.45 ± 0.45 % is obtained, representing an average for the two locations, and obtained with a different technique to minimize the effect of potential biases. In the absence of specific local data, this number can be used for the first wave of COVID-19 in OECD countries. This mortality rate estimate of the new coronavirus is sufficiently accurate to be used as a basis for policy decisions. When differences in age distribution between Spain and the state of New York are accounted for, tentative infection mortality rates of 1.18 ± 0.26 % and 1.94 ± 0.43 % are put forward for New York and Spain, respectively.


Introduction
COVID-19 is a disease caused by the SARS-CoV-2 virus. It was rst observed in Wuhan, China, in late 2019 and spread across the world in the rst half of 2020.
One of the main uncertainties that hampers decision making in the response to the COVID-19 epidemic is the uncertainty on the mortality of the disease. The case mortality rate (CMR) is often used as a surrogate for the intrinsic mortality. A quick look at one of the COVID-19 tracking websites indicates that the CMR ranges from 1 to 10 %, depending on the country or state. Peer-reviewed values range from 2 to 6.3 % (1-4). These are likely overestimates of the real mortality because of underreporting of cases, but the extent of overestimation is unclear to date. Random testing of the population for COVID-19 antibodies can potentially resolve this issue by calculating the infection mortality rate (IMR). An example is the controversial Santa Clara study that estimated the IMR at 0.12-0.2 % (5). However, a substantial bias is introduced this way when the disease is spreading rapidly due to the time lag between infection and death. An additional bias is caused by false positives, particularly when the incidence of COVID-19 is low, as was the case in the Santa Clara study (5). These biases lead to underestimates of the true mortality of the virus. Simulations indicate that the bias caused by the time lag alone can be a factor 10 or more when the virus spreads unchecked (6, 7). This leaves a huge uncertainty on the real mortality of the disease. The purpose of this paper is to establish an estimate of the true infection mortality rate of COVID-19 with su cient accuracy to be useful as a basis for policy decisions. To minimize the effect of immunity loss on the infection estimate, the rst wave of the epidemic (spring 2020) was used for the study.
An epidemiological model for COVID-19 (6, 7) has been tted successfully to mortality data of Italy, France, and Iran in previous work. The model distinguishes four illness phases: infected, sick, seriously sick, and better. In the model, sick people either get better (90 %) or seriously sick (10 %). Seriously sick people either get better or die. When the model was applied to The Netherlands, it reproduced a 3 % infection rate, as determined independently with immunological testing, (8) when a mortality of 1.06 % was assumed. Because of the sensitivity of the mortality rate estimate to false-positive immunological test results, this is a lower bound of the mortality of COVID-19 for The Netherlands in spring 2020. Due to lack of information, an upper estimate could not be made. (7) On April 27 -May 11, 2020 an antibody test was conducted in Spain (9). In this study, over 60,000 randomly selected people were tested with a point-of-care test, and over 50,000 of those tested had blood samples taken for a laboratory serological test. Of those tested, 4.6 % and 5.0 % were positive on the point-of-care test and the laboratory test, respectively. Based on the overlap between the results, the researchers concluded that the actual number of antibody carriers is between 3.7 and 6.2 % of the population. Assuming a population of 46.7 million, this leads to 1.73−2.90 million infected people in Spain at the time of the tests. In a separate test of con rmed positive cases, it was found that antibodies could be detected in 90 % of con rmed infected people after 14 days, indicating that the false-negative rate of the tests was 10 %.
On April 22, 2020, Governor Andrew Cuomo of New York announced that 13.9 % of the population of New York state tested positive for COVID-19 antibodies in a random sampling of the population (10). Based on a population of 19.6 million, this suggests that 2.72 million people in the state of New York had been infected by this time. The percentages of the population that tested positive for COVID-19 antibodies in this study differed when considering New York City and upstate New York, which included 19 counties. While New York City had a 21 % rate, upstate New York had only 3.9 % of the population tested positive (10).
The purpose of this study is to use the immunological data to estimate the true infection mortality rate of COVID-19 during the spring 2020 wave in Spain and New York state. To eliminate bias caused by the lag between testing and death, an epidemiological model (6, 7) is tted to death data. The infection mortality rate is one of the parameters of the model. By requiring that both the progression of deaths as a function of time and the number of infected at the time of the testing are predicted correctly by the model, the infection mortality rate can be determined. A preliminary study along those lines based on immunological data from The Netherlands (8) was conducted in a previous study (7) but no information is available to determine potential biases in the immunological data. In the current study, bias due to underreporting of COVID-19 deaths is evaluated by comparing with excess total mortality during the same time span. Bias due to false positives is evaluated by using the range of infected determined in Spain as input data, and by comparing areas with widely diverging COVID-19 incidence in the state of New York. By using a different bias elimination method in Spain and New York, the risk of introducing new bias is minimized.

Results
Spain Data Figure 1 shows a t of the mathematical model to the death data for Spain (11). The parameter values for the model are shown in Table 1. Seroepidemiological data of Pollán et al. (9) is used to obtain a plausible range of mortality rates. Pollán et al. (9) found that the seroprevalence of SARS-CoV-2 in Spain was 3.7−6.2 % between April 27 and May 11, 2020. Assuming a population of 46.7 million, this leads to 1.73−2.90 million people infected with the virus. Model ts were conducted with a range of mortalities, and the predicted number of infected people as of May 5 was compared with the range above. As Table 1 indicates, mortality rates of 0.97 and 1.6 % correspond with 2.89 and 1.74 million infected, respectively. This leads to an uncorrected estimate of 1.285 ± 0.315 % for the infection mortality rate. This estimate already incorporates the effect of false positives of the tests. Two more corrections are still needed.
First, the two tests used in the Spanish study were evaluated for the detectability of antibodies in people who tested positive two weeks prior. Both tests produced approximately 90 % positive results, i.e., a falsenegative rate of 10 %. To account for this, mortality rates should be multiplied by 0.9. A zero uncertainty is assigned to this factor. Second, the excess death numbers in Spain, i.e., the difference between the total number of deaths of all causes in Spain in 2020 and the average of the number of deaths of all causes in the preceding years, far exceed the reported COVID-19 deaths (12). It is expected that part of the excess death rate is due to displacement of people with non-COVID illnesses by COVID patients in hospitals, leading to nonoptimal health outcomes and excess death in non-COVID patients. Hence, it is reasonable to assume that the reported number of COVID deaths represents a lower limit of the actual number, whereas the number of excess deaths represents an upper limit. For the period March 11 − May 5, 2020, the cumulative excess death number is 44,072, 72% more than the reported COVID-19 deaths in Spain over the same period (25,613). Hence, a correction factor of 1.36 ± 0.36 is applied to the mortality rate.
Applying the two corrections, a true infection mortality rate of (1.285 ± 0.315 %)×(1.36 ± 0.36)×(0.9 ± 0) = 1.57 ± 0.80 % is obtained. The error margins are obtained by adding the relative errors, i.e., by assuming that there is no correlation between the sources of error.
New York Data Figure 2 shows a t of the mathematical model to the death data for the state of New York (13). The adjustable parameter values for the model are shown in Table 2. The parameter values were adjusted with an assumed mortality rate of 1.1 % and the process was repeated with an assumed mortality rate of 1.5 %. The simulation results are identical for the two ts. The model predicts 2.71 million positive cases as of April 20, 2020, in the case of a 1.1 % mortality rate and 1.99 million positive cases in the case of a 1.5 % mortality rate. As indicated above, an actual number of infected people deduced from a serological study (10) is 2.72 million. This number contains false positives, whereas the modeled number estimates only real infections. From the number of real cases in a given model run, and the corresponding number of measured cases in the serological study, a false-positive rate can be estimated (see Methods section).
The results are 0.07 % and 4.17 %, respectively, for the New York state model runs with assumed mortality rates of 1.1 % and 1.5 %, respectively. The relationship between the false positive rate and the assumed mortality rate consistent with both the model and the data is shown in Figure 3. The predicted mortality rate is only weakly dependent on the proportion of false positive cases because of the large incidence of the disease among the population.
The same procedure was followed for New York City. According to the serological study (10), the incidence of COVID-19 in New York City was 21 % in April, or 1.76 million people (real and false positives combined) assuming a total population of 8.4 million. Model ts are shown in the Supplementary Figure   S1 of the Supplementary Information. Corresponding parameter estimates are given in Table 2. Simulations with mortality rates of 1.25 % and 1.5 % led to predicted numbers of positive cases of 1.72 and 1.43 million, respectively. These numbers are consistent with false positive proportions of 0.68 % and 4.80 % in the serological tests, respectively. This relationship is shown in Figure 3. Again, there is a weak relationship between the false positive proportion and the corresponding mortality rate.
By contrast, a seroprevalence of 3.9 % was found in Upstate New York. In this population, false positives have a disproportionate effect on the estimated mortality. This property is used to drastically narrow down the set of plausible false positive rates, and corresponding mortality rates. The county of Monroe, NY, was used as a representative data set with average incidence of COVID-19. The model t to the Monroe, NY, death data is shown in the Supplementary Figure S2. The corresponding parameter estimates are given in Table 2. A mortality rate of 1 % corresponds with a false positive rate of 1.28 % whereas a mortality rate of 1.5 % corresponds with a false positive rate of 2.19 %. This relationship is shown in Figure 3.
Comparing the data of New York state, New York City, and Monroe, NY, in Figure 3, a small parameter region can be identi ed that is in general agreement with the three data sets. It will be de ned somewhat arbitrarily as 1.3 ± 0.1 % for the mortality rate, and 1.8 ± 0.2 % for the false positive rate.
For an unbiased estimate of the true infection mortality rate, some corrections are needed. First, there is the effect of underreporting of COVID-19 deaths. Again, the reported deaths are a lower limit of the actual number of COVID-19 deaths. The excess death rate for all causes provides an upper limit of the death rate due to COVID-19.
Excess death during the COVID crisis is tracked in (14). For New York City, the cumulative number during the COVID crisis between March 15 and May 2 is 23,000, which is 27.8 % greater than the reported number of COVID-19 deaths during the same time period. It follows that the mortality is 1 to 1.278 times the estimated value, or a correction factor of 1.139 ± 0.139.
In addition, false negatives in testing may lead to an underestimation of the number of people who contracted the virus. In the absence of speci c information, a range of 0−20 % false negatives is assumed. It follows that the infection mortality rate should be multiplied by a correction factor 0.9 ± 0.1.

Discussion
For Spain and New York state, true infection mortality rates of 1.57 ± 0.80 % and 1.33 ± 0.41 % are obtained. The values are not signi cantly different, and remarkably similar given the differences between the populations and the differences between the methodologies of the studies.
The numbers are also in agreement with the preliminary analysis of serological data in The Netherlands (7), where a lower limit of the true infection mortality rate of 1.06 % was found. This previous study indicated infection mortality rates of 1.57 % and 3.12 % under the assumption of 1 % and 2 % false positive rates, respectively.
In the Spanish study, where a point-of-care test was used in conjunction with an immunoassay, 3.7 % of the population tested positive on both tests, whereas 6.2 % of the population tested positive on at least one test. Assuming that the 2.5 % testing positive on one test were false positives, the average false positive rate for the two tests is 1.25 %, less than the rate estimated for the New York test, but likely of the same order as the selectivity of the test used in The Netherlands. Accounting that 10 % of people with a prior infection may have tested negative in each of the two tests, the number of people testing positive on one test may be indicative of a false positive rate as low as 0.85 %.
Averaging the results of Spain and New York state, a value of 1.45 ± 0.45 % is obtained, assuming no correlation between the errors of both studies, i.e., a range of 1.0 to 1.9 %. This value is for the rst wave of the COVID-19 pandemic, i.e., for spring 2020 and is representative of OECD countries. Improved understanding of the disease as a blood clotting disease rather than a purely respiratory disease, and associated improvement of the recommended treatment of the disease may have reduced the mortality rate of the disease since the rst wave. Hence, care should be taken to extrapolate the results obtained in this study to situations where best practices exceed the prevailing practice during the rst COVID-19 wave. In developing countries with limited medical resources, the mortality rate we obtained may still apply and even be exceeded. On the other hand, developing countries may have lower mortality rates due to differences in the age distribution (see next section).
By comparison, seasonal in uenza has a case mortality rate of 0.166 % in the United States (15). COVID-19 is roughly 10 times as deadly as the seasonal u.

Effect of Age Distribution
Further re nements of the analysis are possible when age distributions are accounted for. It was shown that COVID-19 mortality increases exponentially with age (16, 17). As an average of studies (16) and (17), a proportional relationship with exp(0.1a) is found, where a is the age in years. In what follows, it will be assumed without justi cation that the incidence of COVID-19 is independent of age, i.e., the number of cases in every age group is proportional to the number of people in that age group. Hence, the results of this section are more tentative and prone to bias than the rest of this paper.
By assuming a mortality rate proportional to exp(0.1a) and the age distribution of Spain (18) and New York state (19), it is calculated that the infection mortality rate in Spain should be 1.64 times as high as in New York, assuming all else being equal. This is much more pronounced than the ratio of 1.18 found in the preceding sections. If the con dence intervals of 0.77-2.37 % and 0.92-1.74 % are accepted for Spain and New York, respectively, and a mortality ratio of 1.64 is forced onto the results, then the con dence intervals are narrowed to 1.51-2.37 % for Spain and 0.92-1.45 % for New York state. This would lead to mortalities of roughly 1.94 ± 0.43 % for Spain and 1.18 ± 0.26 % for New York state. Returning to the age distribution, this is consistent with the following equation for the mortality as a function of age: where f mort is the mortality rate as a fraction, A = (3.0 ± 0.5)×10 −5 , B = 0.1 year −1 and a is age in years. To obtain f mort as a percentage, A = (3.0 ± 0.5)×10 −3 % can be used.
To evaluate how representative the average between Spain and the state of New York is in terms of age distribution, the age-dependent mortality rate obtained here is applied to age distributions of Europe, Northern America, and Oceania. Mortalities of 1.69 %, 1.46 % and 1.11 % are obtained, indicating that 1.45 ± 0.45 % re ects OECD countries well. In contrast, the values for Africa, Asia, and South America are 0.292 %, 0.710 %, and 0.806 %, respectively, and the world average is 0.777 %. This suggests that the mortality rate calculated in this study is representative of OECD countries, but not of most of the rest of the world.

Conclusion
Combining reported death data of COVID-19 in Spain and in the state of New York, as well as reported excess total death data, with epidemiological modeling, it is concluded that the true infection mortality rate of COVID-19 was 1.45 ± 0.45 % during the rst wave, which occurred in the spring of 2020. This is a robust estimation that accounts for false positives and false negatives in immunological testing and underreporting of the actual number of COVID-19 deaths. This is an average that represents both jurisdictions. Similar numbers can be expected in other OECD countries.
Accounting for the different age distributions in Spain and New York, tentative values of the infection mortality rates of 1.94 ± 0.43 % and 1.18 ± 0.26 % are put forward for Spain and New York, respectively, assuming that all ages have infection numbers proportionally to their size as a fraction of the population. These numbers are consistent with an age-dependent infection mortality rate (as a fraction) of (3.0 ± 0.5)×10 −5 exp(0.1a), where a represents age in years. These numbers apply to the rst wave of COVID-19, of spring 2020, in OECD countries.

Model Description
The model is an extension of the SIR model (20). A full description of the model is given in (7) and a summary of the essential aspects is given here. The model assumes a constant population (P) that can be divided into uninfected (U), infected, pre-symptomatic (I), symptomatic (S), seriously sick (SS), recovering or "better" (B), recovered (R), and deceased (D). Transitions between the stages occur, each with their own rate constant k, as shown in Figure 3.
The rst transition (U → I) results from four parallel processes: U + I → 2I; U + S → I + S; U + SS → I + SS; U + B → I + B. The transition U + I → 2I is assumed to occur with a pseudo-rst-order rate constant, k 11 . This rate constant is obtained by correcting its initial value, k 11,0 by a time-dependent factor that incorporates the effect of non-pharmaceutical interventions (NPIs). At the time of each NPI, a smooth transition of k 11 from its pre-NPI to its post-NPI value is introduced (see (7) for details). All rate constants are given in (7). Only k 11,0 and its corrections through NPIs are adjustable when the mortality rate is known. The model allows for spikes of the infection rate, but spikes were not used in the analysis of any of the Spain or New York data. The mortality is set by adjusting the rate constant k 4 of the process SS → D. The default value, k 4 = 0.01223 day −1 , corresponds with a mortality rate of 1.5 %. The simulation was developed in MATLAB. The source code is given in the Supplementary Information.

Data Analysis
The Spain death data up to and including July 18, 2020 were collected from the Worldometer website (11) on July 19, 2020. They are shown in the Supplementary Table S1. Cumulative daily death data for New York up to and including July 27, 2020 were collected from the New York Times COVID-19 repository (13) on July 28, 2020. The data are shown in the Supplementary Table S2.
These data were used to calibrate the model (7) summarized above. The calibration was done manually by trial and error. The model incorporated multiple non-pharmaceutical interventions (NPIs). A reopening is represented as an NPI with negative effectiveness. The starting date of each intervention was based on media reports of country-wide or state-wide initiatives to contain the epidemic. The dates chosen for the Spain model were March 15 and April 1, 2020 for the NPIs and June 21 for the reopening. An additional NPI with positive effectiveness was included to ensure a good t with the data. The chosen date was May 12. The May 12 NPI did not improve the agreement with the reported death data so the effectiveness of this NPI was set equal to 0. The dates chosen for the New York model were March 13, March 22, and March 28, 2020 for NPIs and May 15 for the reopening. An additional NPI with negative effectiveness was needed to obtain a good t with the data. The chosen date was April 20.
The starting date of the simulation (t = 0) is set to February 1, 2020. The initial conditions are 100 infected, 10 sick and 1 seriously sick, each multiplied by an adjustable factor labeled Correction in Tables 1 and 2. The remaining adjustable parameters are the basic infection rate, k 11,0 , in the absence of NPIs, the e ciency of each NPI as a fraction of k 11,0 , and the true mortality of the disease, f mort .

Effect of False Positives
It is assumed that there are no false negatives in the testing for the purpose of this calculation. The effect of false negatives is evaluated separately.
In this calculation, x is de ned as the fraction of the actual number of positive cases over the population, y is the fraction of the number of false positives over the actual negative cases, and z is the fraction of the apparent positive cases over the population, which represents the sum of actual and false positives. Hence: Solving for x (actual positives) leads to: Solving for y (false positives) leads to: For the Spain data, the authors of (9) provided a range of real positives based on the results obtained with two different tests. Their numbers were used without further processing. Simulations were made for the entire state of New York, for New York City, and for the County of Monroe, NY (representing upstate New York). The lines representing the three sets of simulations have very nearly coinciding intersections, indicating that there is a unique combination of f mort and the proportion of false positives that is consistent with the three data sets.

Effect of Age Distribution
The age distributions of Spain and New York state were taken from https://www.populationpyramid.net/spain/2019/ (18) and https://www.censusscope.org/us/s36/chart_age.html (19) respectively. For the New York data, the highest age category in the data is 85+ years whereas for the Spain data it is 100+ years. For consistency, all 85+ categories in the Spain data were lumped to a single group and assigned a representative age of 90 years. For Spain, this led to the same result as using the entire distribution with a representative age of 102 years for the 100+ years data. Based on (16) and (17), the mortalities in each age group are estimated as proportional to exp(0.1a), where a is the age in years. Hence, the fraction of the total population in each age group was calculated and multiplied with exp(0.1a) and the results were summed. The sums were 653 and 398 for Spain and New York, respectively. These numbers are proportional to the overall mortality rate. Hence, a ratio of 653/398 = 1.64 is expected.
The estimation of infection mortality rate for Spain and the state of New York bounded by this ratio is given in the Discussion section. Values of 1.94 ± 0.43 % and 1.18 ± 0.26 % were found for Spain and New York, respectively, i.e. fractions of 0.0194 ± 0.0043 and 0.0118 ± 0.0026. By dividing these fractions by 653 and 398, respectively, a universal pre-exponential factor of (3.0 ± 0.5)×10 −5 is found for the agedependent infection mortality rate. This leads to the equation: where f mort is the mortality rate as a fraction, A = (3.0 ± 0.5)×10 −5 , B = 0.1 year −1 and a is age in years.