Predictors for County Level Variations in Initial 4-week COVID-19 Incidence and Case Fatality Risk in the United States

While studies indicate differences in incidence and case fatality risk of COVID-19, few efforts have shed light on regional variations in the intensity of initial community spread. We conducted a nationwide study using county-level data on COVID-19 from Center for Systems Science and Engineering at Johns Hopkins University. We characterized intensity of initial community COVID-19 attack by calculating the incidence and case fatality risk (CFR) for the first 4-week period of COVID-19 spread in each county. We used multivariate multilevel multinomial logistic regression to estimate the association of county-level characteristics with COVID-19 incidence and CFR. Of 3,143 counties, we included 1,052 with at least 100 reported cases on June 1st. Median incidence was 193.4 per 100,000 population (IQR: 94.2–397.5). Median case fatality risk was 3.6% (IQR: 1.4–7.3). Median age, rural population, population density, lower education, uninsured population, obesity, COPD prevalence were positively associated, while population, female sex, races (Asian, white), higher education, excessive drinking were negatively associated with initial COVID-19 incidence. Median age, female sex, Asian race, population density, higher education, excessive drinking, Intensive Care Unit beds, airborne infection isolation rooms were positively associated, while Hispanic ethnicity, lower education, obesity (paradox), uninsured population were negatively associated with initial COVID-19 CFR.

Furthermore, few studies have focused on the initial community spread, which may indicate regions and communities particularly vulnerable to the effect of COVID-19. The initial intensity by which a disease spreads through a community may be in uenced by numerous factors such as the virulence of the pathogen, the health behaviors of citizens, the biologic susceptibility of the population, or the health resources of the community. Understanding the factors responsible for the variation in initial incidence as well as case fatality risk could help efforts to identify high risk communities as well as targets for mitigating the spread of infection.
The primary objectives of this study were 1) To determine county level variations in initial COVID -19 incidence and case fatality risk indexed to the start of epidemic in each county and 2) To identify the predictors for county level variations in initial incidence and case fatality risk of COVID-19.

Study design and data source
We performed an ecological study examining the regional variation of COVID-19 across counties in the United States. We obtained county-level data on COVID-19 con rmed cases and deaths from the COVID- 19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University through 29th of June, 2020 [1,2].

Study population
We included counties with at least 100 cases on 1st June, 2020 to allow for 4-week period before we obtained the data i.e. 29th June, 2020.

COVID-19 related outcomes
The primary outcomes of the study were incidence (number of new con rmed cases per 100,000 population) and case fatality risk [23] (CFR: ratio of number of new deaths and new con rmed cases, expressed as a percentage) of COVID-19. We calculated the incidence and case fatality risk for the 4-week period from the day of reporting at least 100 cases in each county to ensure fair comparison between counties. We focused primarily on initial community spread so as to identify high risk communities and their characteristics.

Exposure (county-level community characteristics)
County-level data on socio-demographic factors, health behaviors, chronic medical conditions' prevalence rates and availability of healthcare resources were obtained from the 2020 County Health Rankings (CHR) [24], 2018-2019 Area Health Resources File (AHRF) [25] and 2017 Centers for Medicare & Medicaid Services (CMS) [26] report on chronic medical conditions. We linked these county-level community characteristics with COVID-19 data using Federal Information Processing Standards (FIPS) codes. The details of sources and de nitions for variables used can be found in the appendix I.

Statistical analysis
We estimated descriptive statistics for COVID-19 outcomes as well as various community characteristics of the counties included in the study. We t multilevel multinomial logistic regression models to estimate the association of county-level factors (socio-demographics, health behaviors, air pollution level, chronic medical conditions' prevalence and availability of healthcare resources) with incidence and case fatality risk (CFR) of COVID-19. We used quartiles of incidence and CFR of COVID-19 as dependent variables. The models also constituted a random intercept for each state to account for unknown variations among states, such as weather, social distancing norms, timing of stay-at-home orders, etc. All models were adjusted for median age, sex (females) and race/ethnicity (Asian, Hispanic, non-Hispanic black, non-Hispanic white). All analyses were conducted at the county level. We performed all statistical analyses

IRB Statement
This study was considered exempt from Institutional Review Board (IRB) review as we used publicly available, population-level data.

Results
Of the total 3,143, we included 1,052 counties with at least 100 cases on 1st June, 2020. The characteristics of these counties are presented in Table 1  We used multinomial regression to determine the association of county-level characteristics with the quartiles of 4-week COVID-19 incidence. Median age, rural population, population density, lower education (< HS Diploma), adult obesity prevalence, COPD prevalence, and uninsured population were positively associated with the highest quartile of incidence compared to the lowest quartile. While population, female sex, races (Asian and non-Hispanic white), higher education (HS diploma or more, 4 + years of college), and excessive drinking were negatively associated with the highest quartile of incidence. (Table 2) Table 2 Association of county-level characteristics with the quartiles of 4-week COVID-19 incidence (1st quartile as a reference category).

Discussion
Ours is the rst study to examine association of multiple population-level factors with the county-level variations in initial incidence and case fatality risk of COVID-19. We focused primarily on initial community spread so as to identify populations with higher susceptibility for COVID-19 infection and fatality. We found signi cant variation in the incidence We also identi ed various independent predictors of initial incidence of COVID-19. The positive association with higher median age, male sex, and chronic medical conditions (obesity and COPD) is in accordance with the various individual-level risk factors described by numerous clinical studies [6][7][8][9][10].
The elderly male populations with higher chronic disease burden are likely to have high susceptibility for COVID-19.
Interestingly, female sex was negatively associated with higher incidence. Biological susceptibility, occupational roles as well as responsible behavior with regard to following public health guidelines might explain this. Excessive drinking was also found to be strong protective factor, which could be explained by less mobility and social interaction by this population. On the other hand, population density was positively associated with higher incidence, supporting the role of social mobility in driving the spread of infection. All of these factors underscore the utility of social distancing in slowing the transmission of COVID-19. Additionally, higher education was negatively associated and percent uninsured population was positively associated with highest quartile of incidence. This highlights the importance of regular academic education as well as health education (percent uninsured population as proxy) in slowing the spread of the virus.
Furthermore, we identi ed independent predictors of case fatality risk of COVID-19 during initial community spread. Higher age and female sex were the strongest predictors associated with higher CFR, as shown by other individual-level clinical studies [14][15][16][17]. We also found signi cant positive association of Asian race with higher CFR, whereas Hispanic ethnicity was found to be negatively associated. Non-Hispanic black race was not found to be signi cantly associated with higher CFR. Various other studies have found non-signi cant association of black race with CFR [27][28][29], while some have shown signi cantly higher mortality [30]. Further research is needed in this area.
Unexpectedly, we did not nd association of higher CFR with the prevalence of any of the included chronic medical conditions, except adult obesity. Adult obesity was negatively associated with the highest quartile of CFR (aOR: 0.95; 95% CI: 0.90, 0.99), supporting the 'obesity paradox'. Obesity paradox has been described as an association of obesity with decrease in mortality in patients with acute respiratory distress syndrome (ARDS), reported previously in various studies [31][32][33]. However, whether such a phenomenon also holds true for ARDS following COVID-19 infection is not yet clear [32,34].
Moreover, we found that ne particulate matter (PM 2.5) was not associated with CFR. This is in consonance with another nation-wide cross sectional study on effect of air pollution, which showed insigni cant effect of PM 2.5 and Ozone, but signi cant effect of NO 2 on C0VID-19 death outcomes [22].
We also did not nd independent association of smoking with CFR. However, different meta-analyses have identi ed signi cant associations of smoking with severe complications as well as higher mortality from COVID-19 [35,36].
Surprisingly, availability of healthcare resource, de ned by number of Intensive Care Unit beds and number of airborne infection isolation rooms, was found to be positively associated, although weakly, and uninsured population was found to be negatively associated with the highest quartile of case fatality risk. The lesser disease burden as well as rapidity of spread during the initial weeks of epidemic in each county might explain this contradictory effect of healthcare resources availability on CFR variation. A study in China showed that the rapid escalation in the number of infections around the epicenter of the outbreak (Wuhan city) resulted in an insu ciency of health-care resources, thereby negatively affecting mortality in Hubei province, but not in other provinces of China [37].
Our study included an assessment of comprehensive range of factors with potential predictability role for the spread and fatality of COVID-19. In contrast to other population-level studies on COVID-19, we were able to control for major confounding by epidemic timing as well as stage of the epidemic by identifying a common starting point for each county (i.e. reporting of rst 100 cases). We also were able to control for the unmeasurable effect of various factors such as diverse weather, varied social distancing norms, different timing of stay-at-home orders, etc. by including the group effect for each state.
However, we do acknowledge that our study is limited in several key areas. Firstly, the data on con rmed cases and deaths of COVID-19 at CSSE at Johns Hopkins University is derived from publicly available data from multiple sources such as the World Health Organization, the U.S. Centers for Disease Control and Prevention, state and national government health departments, local media reports, etc [1,2].
Because of the different COVID-19 case de nitions used by different organizations, there could be an arti cial variability in the data itself. Secondly, the case fatality risk estimation used does not provide the true rate, as there is a substantial lag of reported deaths among reported cases (most hospitalizations take 2-3 weeks till experiencing mortality) [38]. However, this is the limitation for all population-level studies. Thirdly, because of limited sample size, we were not able to control for all the plausible confounders in our modeling. Fourthly, we did not look at some other potential factors as it was beyond the scope of this study. Speci cally, we could not examine the effect of important chronic medical conditions identi ed by various other studies, such as hypertension [39], chronic heart disease [40], cancer [41], etc. as well as other air pollutants such as NO 2 & Ozone [22,42,43]. Fifthly, few chronic medical conditions' data (asthma, COPD, chronic kidney disease) used in this study was obtained from CMS [26]. This is a Medicare bene ciary data and hence is not generalizable to the general population.
Caution should be taken while interpreting the ndings with respect to these three factors.
Since the beginning of the pandemic of novel coronavirus, there have been numerous efforts to build better prediction models. However, the predictability of these models has not been up to the expectation. The predictors identi ed by our study will de nitely help build better models. Additionally, these ndings may help identify most susceptible and high-risk populations and target public health interventions to focus areas. Lastly, our study also highlights the importance of social distancing as well as health education.
To summarize, we identi ed various county-level independent predictors of initial incidence as well as case fatality risk of COVID-19. The ndings can help build better future prediction models. The results also support targeted public health actions by identifying susceptible and high-risk populations as well as counties.

Declarations
Ethics approval and consent to participate This study was considered exempt from Institutional Review Board (IRB) review as we used publicly available, population-level data.

Consent for publication
Not applicable County-level 4-week COVID-19 incidence (per 100,000 population). Depicts incidence for the rst four weeks of COVID-19 spread in each county. Four week period de ned by the date of the 100th reported case in each county. Includes counties with at least 100 cases as of June 1st, 2020 Note: The designations employed and the presentation of the material on this map do not imply the expression of any opinion whatsoever on the part of Research Square concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries. This map has been provided by the authors.

Figure 2
County-level 4-week COVID-19 case fatality risk Depicts case fatality risk for the rst four weeks of COVID-19 spread in each county. Four week period de ned by the date of the 100th reported case in each county. Includes counties with at least 100 cases as of June 1st, 2020 Note: The designations employed and the presentation of the material on this map do not imply the expression of any opinion whatsoever on the part of Research Square concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries. This map has been provided by the authors.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. Appendix.docx