The importance of COVID-19 testing to assess socioeconomic fatality drivers and true case fatality rate. Facing the pandemic or walking in the dark?

Abstract


Background
The COVID-19 outbreak has disrupted economic and social life all over the world, and its scope is not yet certain, but it is definitively deep and lasting.Governments, policymakers, politicians, physicians, medical employees, scientifics and international organisations have gathered together into a virtual space for collaboration to find answers to all the raised questions.Apart from defeating the virus by developing a vaccine and/or finding a drug largely effective for patients with COVID-19, among the most important governments concerned in the short term, the impact of COVID-19 on the health system, namely, availability of health infrastructure, as well as finding the best strategy for reducing as much as possible the effects of the pandemic in economic and social aspects.The World Health Organization (WHO) has recommended social distancing measures to slow down virus spreading and, in this way, prevent medical services from collapse.However, in the long term, the WHO expects that the virus will remain present with periods of low-level infections, perhaps with seasonal increments (WHO,2020).Therefore, governmental strategies should aim to ensure that health services are available to attend COVID-19 patients without compromising all the other health services in the medium and long terms.In the document published on 15 th April by the WHO (2020), a set of recommended actions for public policies are outlined, in which the continuous tracking of the virus is recommended to be able take regional public health and social measures, so-called lockdowns, only at high-risk regions, or places where contagions return high.At the centre of the recommendations is the importance of testing (Sanchez, 2020) and the use of serological tests in line with scientific recommendations (CDC, 2020).Likewise, the Organisation for Economic Cooperation and Development (OECD) (2020) highlights the importance of testing by presenting an analysis of a better performance observed in countries with a high number of tests per million inhabitants.It is also pointed out that the increase in tests will help gather essential information to study the virus, especially to determine whether the population is developing antibodies, whether the virus can mutate and how to deal with COVID-19 in the following months.In addition, it is particularly important to find the asymptomatic proportion in the population, first to assess the probability of contagion from such individuals to others and second, to estimate the true CFR.
There is great diversity in the public health and social measures taken by each country against the pandemic, which can be grouped into three lines of action.First, it ensures a good supply of medical equipment and vacates the hospitals as much as possible.Second, social distancing measures, from banning international travel, suspending schools, encouraging teleworking, etc. Third, economic measures are needed to guarantee the wellbeing of the population, with special support for firms and families.Naturally, not all countries have followed the same set of actions.In fact, there are wide differences in the economic and social distance measures.Some countries implemented severe restrictions once the domestic contagions increased considerably, such as Italy, France, and the United Kingdom, while Peru and the United States (US) closed the international airports shortly after the first COVID-19 case was confirmed, yet this measure was not that effective, especially for the latter.Others implemented massive testing preventing the cases from exponential increase, such as Iceland, Singapore and Korea (OECD, 2020).Additionally, among the countries with a larger number of applied tests is Luxemburg, which has recently been published to test all its population2 .
In addition, law enforcement capacity and political organisation might have also played a significant role in this regard.For instance, in Mexico and the US, sub-national governments could regulate regional social distance measures.Meanwhile, the economic organisation, informality and the limited or null presence of the welfare state hinder the social and economic lockdown (Loayza, 2020), namely, entrepreneurs and employees in the informal economy might not access economic aid3 .According to the World Labour Organization (WLO), more than 60% of employment in the world is informal, breaking into regions; in Africa, 85.8% of employment is informal, in Asia and the Pacific 68.2%, 68.6% in the Arab States, 40.0% in the Americas and 25.1% in Europe and Central Asia4 .In addition, according to Loayza (2020), in developing countries, lockdown measures are less effective for several reasons, namely, people will continue to work if their income is compromised, confinement in overcrowded dwellings with poor sanity access might increase the risk of contagion, and displacement of people from urban to rural areas would move the contagions spreads to rural areas, which frequently have less access to medical services and sanity.
It is important to note that there are 70 countries in the sample, and they concentrate 96% of confirmed cases worldwide.The distribution is shown in Figure 1.It is clear that the majority of cases are concentrated in developed countries, while developing economies only account for approximately 20% of the cases.Africa registered only 1% of worldwide cases.From the initial analysis presented with the Chinese experience, it has been stated that the health of individuals, as well as their age, are important drivers for virus fatalities (The Novel, 2020).However, there is still little evidence about the correlation between the aggregated indicators of population health and health infrastructure and fatalities.
Resuming, the effectiveness of lockdown measures has been questioned, given that it is likely that the virus will continue to spread in the long term, while there are huge economic losses.
The likely underidentification of cases in developing nations would prevent further control The paper is organised as follows: in the second section, the materials and methods are explained, the third section presents the results, the fourth section presents a discussion, and the fifth section summarises the conclusions and policy implications.

Data
The data employed were taken from different sources.For COVID-19 cases and testing, the data came from ourwoldindata.org, in combination with GitHub5 , and the data on cases, deaths and tests encompassed7 May.For health indicators, the OECD6 and WHO 7 databases were consulted.The data collected correspond to the most recent data available.
For the cross-section models, the countries included are those that reported a 3-day average of 3 new deaths in at least one day.This criterion has been made to take out of the sample the countries in which COVID-19 has not been widely spread until now.Upon this criterion, a sample of 71 was obtained, and the full list is in the additional files (see Additional file 4).
A subsample for the OECD was also built.Not all OECD members were included due to lack of information or because they do not meet the abovementioned criterion for COVID-19 deaths.For the panel data analysis, all available information was used, yet given that many countries do not report daily ciphers, or they do not change over time, the sample is smaller, reduced to 66.A full list of the countries used per model is presented in the additional files (see Additional file 4).

Ordinal Probit model specification
An ordinal probit model allows the use of an ordinal list as a dependent variable, which can be numeric or categorical.The model was estimated with Stata.The dependent variable for this model is the CFR, which takes values from 1 to N, where 1 is assigned to the countries with the lowest CFR.
The estimation of CFR is difficult for several reasons.First, the universe of confirmed cases.
Due to the very different criteria for test applications, in most countries, the tests are administered only to those presenting symptoms, at least fever, or those requiring hospitalisation.Therefore, the universe of cases is well underestimated.Nonetheless, there is still no agreement over the likely size of this underestimation; depending on the study, the asymptomatic cases are estimated to be between 5% and 80% (Heneghan, Brassey and Jefferson, 2020).For instance, Iceland is the country with more tests applied per million inhabitants due to a massive testing strategy.In this case, they identified 50% of the positive cases as asymptomatic (Heneghan, Brassey and Jefferson, 2020).In the case of the Diamond Princess cruise ship, the proportion of asymptomatic to total infected was estimated to be 17.9% (Mizumoto, K., Kagaya, K., Zarebski, A., Chowel, G., 2020).Second, differences in registers.Some countries recognize COVID-19 death as suspicious; this is that lived with a former late COVID-19 patient or was closely related; meanwhile, other countries only account for the confirmed cases.Third, the timing matters.It has been confirmed that, similar to other viruses, once a person is infected, it takes up to two weeks to develop symptoms; if that is the case, a person can develop a mild flu-like illness, which according to the first Chinese analysis, this proportion was estimated to be up to 81% (Novel Coronavirus Epidemiology Response, 2020).However, those entering severe and critical states might be hospitalised, and it takes several days until a fatality occurs.In view of that, obtaining the CFR by using the proportion of current deaths to current cases is a misleading indicator, since the actual deaths from current cases will be reported later (Battegay et al., 2020).
Following the recommendation by Battegay et al. ( 2020), the third problem has been addressed by estimating the CFR as follows: This measure is larger than a current indicator, yet it might be more accurate.Figure 2 shows three different CFRs throughout the world.It is clear that the larger the lag in the total cases, the larger the CFR will become.However, it is noticeable that they tend towards convergence.
Figure 2 CFR for the world.Source: own elaboration.
In Table 1, the values at the beginning and end of the period are shown.For the three indicators, the CFR is higher at the end of the period, and the difference among them diminished.It is also important to mention that the first reported death came on the 12 th day after the first case was registered.Therefore, it is important to use a lagged number of cases for a better estimate.The model used is as follows: where   is the Case Fatality Rate ranking for country ; for the full CFR per country, see the additional file (see Additional file 1),   is a vector of variables corresponding to health indicators, both on infrastructure and on population health, which could help to explain the difference in CFR across countries, such as obesity, diabetes, presence of elderly people, and The models are specified as follows: For all the models, the explanatory variables are two: the 7 th lag of new tests per million inhabitants and the square of the stringency index.The seventh lag of new tests per million is used given the claims that early testing reduces the chances or greater infections (OECD, 2020).At the same time, similar to CFR, it is considered the time for the virus to develop; for instance, a person who is asymptomatic today might develop symptoms within a week.
Mizumoto et al. ( 2020) estimated a range of 5.5 to 9.5 days for incubation, yet it is still uncertain.There are cases in which people might show symptoms and die within a few days8 .Given the difficulties determining the best lag to consider, two choices are shown, the 7 th and the 15th.Regarding quarantine measures, many countries converge to similar levels in the index at the end of the period, yet squaring the variable allows us to model the fact that the index has a maximum, and its marginal effect is smaller in the time.
Additionally, countries taking early measures should be able to content the spread to a larger extent; thus, this is modelled through the initial larger marginal effect on the dependent variables of a squared variable.
In equation 5, the model has as a dependent variable the natural logarithm of the first difference in CFR.In equation 6, the dependent variable is the natural logarithm of new COVID-19 cases per million (first difference of total COVID-19 cases per million) and, in a similar fashion, the natural logarithm of new deaths per million (first difference of total COVID-19 deaths per million).By using weighted variables per million inhabitants, the population size differences across countries are addressed.
All the variables and their summary statistics are shown in Table 2.As seen in the last table, the mean CFR is similar for both datasets (0.0683694 and 0.0633442), which implies that the CFR keeps its trend in the time period analysed.Although this is not the case for the coefficient of variation 9 , which is greater for the panel data (268.80)than for the cross section (69.15), which is explained by the different results in the period for the different countries.
It is also worth noting that the maximum CFR in the panel data can be higher than 1.The reason is that in countries with very explosive growth, the total cases confirmed one week are less than the total deaths occurring the following week, by which time the confirmed cases grew exponentially.

Results
In Table 3, the results for the ordinal probit model are presented.The infrastructure variables and the population's health indicators were not statistically significant; instead, an indicator for health expenditure was used.Since health expenditure is related to infrastructure endowments and some population health indicators are related to expenditure, the variables on infrastructure/population health and expenditure are alternatively used.Full tables with all the considered variables are shown in the additional files (see Additional files 2 and 3).Columns 1 and 3 present the results for the sample with 70 countries, while columns 2 and 4 present those for the OECD members.A negative sign is shown between CFR ranking and the total test per million; therefore, countries running more tests observed a larger probability of having a lower CFR.In contrast, countries with larger expenditures on health observed a larger probability of having a higher CFR.For the OECD subsample, only the first variable was statistically significant.Finally, the stringency index is not statistically significant in any case.
In Table 4, the results from the cross-section model are displayed.In this model, only the explanatory variables that were statistically significant in the previous model were used.
Columns 4 and 5 show that there is a positive correlation between the number of tests and the total cases, which only confirms that the countries running more tests are identifying more cases, yet this is not directly related to the number of deaths.In other words, the total tests per million did not show a significant correlation with the number of fatalities.
Health expenditure is statistically significant for all the models.This is definitively related to a problem of COVID-19 cases and deaths identification and records, rather than to causation.This is, higher health expenditure as a GDP proportion cannot be a causal factor for larger contagions and deaths related to COVID-19, but the positive correlation confirms that countries spending more on health are identifying more cases and deaths.For instance, this variable has a larger coefficient for OECD members, from which the majority are developed countries and spend more on health as a GDP proportion.Namely, for OECD countries, the average was 8.8%, while for non-OECD countries, it was 5.32, while the difference in purchasing power parity dollars is wider; on average, OECD countries spent $2547 USD vs $1088 USD in non-OECD countries.Finally, the results from the panel data analysis are shown in Table 5.Fixed effects were chosen over random effects using the Hausman test as the criterion.In column 9, new tests per thousand inhabitants show a negative correlation with first difference of CFR, which means that countries applying more tests per capita showed smaller differences on CFR across the period; that is, CFR observed a trend of reduction.Consequently, this supports that the widespread application of tests to reduce the fatality rate has been effective.In addition, it is also expected that CFR from countries identifying more positive cases converge to the real CFR, given that massive testing will give the true proportion between contagions and deaths.In the same model, the Stringency index coefficient is not statistically significant, and the trend is negative, as expected, since it should be smaller over time.It is important to note that the panel data are unbalanced, and all countries with available data are included, which are mostly from Europe, Asia, North America and South America.
In columns 10 and 11, the dependent variables showed a high positive correlation with new tests, similar to the previous models.This means that the correlation between testing the new deaths and new cases is sustained over time.Meanwhile, the stringency index showed a negative coefficient; nonetheless, it is only statistically significant in column 11, with new deaths as the dependent variable.Therefore, it is confirmed that stringency measures have helped to reduce the number of COVID-19 deaths, but there is no statistical evidence of being effective in reducing the number of new cases.The trend means that new deaths have a significantly positive trend, meaning that they are still growing.As a robustness check, a longer lag has been included, which is the 15 th lag of new tests per million, to control if there is any change over time.The results are very consistent, the variables kept the same sign, and they remained statistically significant.The value of R 2 diminished for the three models, which can be affected by the smaller number of observations and countries included.

Discussion
Our results support the WHO recommendations to increase testing and track of COVID-19 cases in all countries, given its definitive impact on reducing the CFR.In line with Stojkoski et al. (2020), we found that the countries' expenditure on health as well as their development level is positively related to CFR, cases and deaths, which cannot be interpreted as causation, but it indicates that developing countries do not track enough cases yet.Consequently, we claimed that there is an underidentification of data given the positive correlation between cases and deaths and testing, meaning that testing is still reactive and with little identification of asymptomatic, which is also highlighted by the OECD (2020) and the WHO (2020).
Furthermore, given the under identification of cases, it is still very difficult to identify the country-specific drivers for contagions and CFR.
Lockdown measures, by the Stringency index, were shown to be effective at reducing the number of new deaths, yet it was not for new cases and CFR.Therefore, the results support the propositions to stop severe lockdown measures given the heavy economic losses and burdens for governments, which in turn will not significantly reduce the number of cases and CFR.
One significant limitation of this study is the usage of aggregated national data, rather than regional data, which could have helped to identify regional socioeconomic drivers for the COVID-19 spread and CFR, given that in some countries, the cases seemed to be very concentrated within few cities or regions.

Conclusions
Testing proved to be a significant factor in decreasing CFR; thus, it should be supported as the main strategy to follow for pandemic control in the medium and long terms.The findings suggest that there is a large underidentification of COVID-19 cases, especially for developing countries, which compromises the long-term control of the pandemic.Thus, it is essential to make agreements with all nations to keep increasing the testing for further knowledge of the COVID-19 and its spreading drivers at the national level, allowing tailored public policies.
The data show a particular performance for the cross-section, in which the coefficient of variation is very low, but this trend changes when using panel data, in which the coefficient of variation shows a significant change.In this case, the panel data regression analysis captures the idiosyncratic errors in this time period, with a more precise estimation of the effects of the test per million habitants.
By means of using the Stringency Index, it was found that lockdown measures have been effective in reducing the number of new deaths, while they showed no impact on new cases and CFR reduction.This has public policy implications, since lockdown measures generate great economic losses and are already inducing economic crises all over the world, with greater affectations for developing and less developed countries (Loayza, 2020).
Another general conclusion is that the availability of data for all countries is still very limited, which hinders further analysis of COVID-19 spread and CFR drivers at the national level.This is, the question remained unanswered whether countries with large proportions of the population aged over 65 or over 80, such as Japan or Italy, are more susceptible to greater CFR.Additionally, at the aggregate level, it was not possible to link variables such as obesity and diabetes with a higher CFR or number of deaths.Likewise, there is a significant difference in infrastructure endowments across the sample used; nevertheless, the CFR or the number of deaths appeared to be statistically explained by these factors.
The pandemic is still developing, and there are countries in which the highest peak of contagions has not yet been reached; thus, further analysis for narrowed public policies will be needed.The current recommendation from the WHO, OECD, and other medical bodies to increase testing proved to be the wiser path to follow at the moment.

Figure 1
Figure 1 Proportion of cases by country by 7 th May 2020.Source: own elaboration with data from Ourworldindata.org others.It is important to mention that not all the variables are included at the same time in the models to prevent biases, especially by the correlation among health expenditure, infrastructure and population health indicators; the variables are not included in the model at the same time.The number of tests per million inhabitants is also included, since it has been claimed that the only way to decrease the CFR in the long term is to massify the applied tests(OECD,     2020).Finally, considering that quarantine measures have been considered a determinant factor for fatality rate, the Stringency index byThomas et al. (2020) is also added as an explanatory variable.This index is a wide indicator of all the different social measures taken by governments to reduce the speed of spread, such as schools closing, cancelation of public events, closing borders, etc.It is available daily for several countries.It gives a weight to each measure taken, and the highest level for any given country is 100.Cross-section model specification.These models are estimated by ordinary least squares (OLS) in Stata.The first model uses as a dependent variable the total cases per million inhabitants, and the second model uses the total of deaths per million inhabitants.The aim of this model is to show a robust statistical correlation between the cases and death and the explanatory variables that were statistically significant in the first model.The models are specified as follows:     =  0 +     +   (3)  ℎ    =  0 +     +   (4) Panel Fixed Effects models Finally, a group of panel data estimations have been made for evaluating greater robustness for the models specified above.Panel data models can potentially include a larger number of data by combining cross-section and time-series analysis.The cross-section models were used to be able to link the dependent variables varying daily to annual variables by using one static picture at the data.Instead, for the panel analysis, only data varying daily are used, including cases, tests, deaths and the Stringency index.Given the type of data, these models allow the use of dynamic variables.Thus, first differences of the dependent variables are employed.Natural logarithms are used to find elasticities. Figures

Table 1
CFR for the Wold.Source: Own estimation with data from Oueworldindata.org

Table 2
Summary statistics.Source: Own elaboration

Table 3 Estimation results from the ordinal probit model
. Source: Own elaboration

Table 4 Estimation results for cross-sectional models.
Source: own elaboration.

Table 5
Panel data estimation results.Source.Own estimation