Data source and study population
We conducted a cross-sectional study with data obtained from the 2020 Korea Community Health Survey (KCHS), which is an anonymized and nationwide health interview initiated by the Korea Disease Control and Prevention Agency (KCDA), aimed to establish a standardized community survey to help develop health projects for all local districts. The 2020 KCHS was conducted from October 16, 2020 to December 31, 2020. Detailed information on the study design and aims of the KCHS has been previously reported.[22] Trained interviewers conducted one-to-one interviews with individuals aged 19 or older across the 255 local districts, which are all administrative districts in South Korea, yielding about 230,000 representative population. In the 2020 KCHS, new questionnaires regarding COVID-19 were added, including types of concerns related to COVID-19. In our study, we excluded 474 individuals who refused to answer at least one question regarding concerns related to COVID-19, and 3,106 who had at least one missing value of covariates considered in our study, accordingly 225,689 remained as the final study population (Fig. 1).
(Fig. 1)
We deployed data of cumulated infection rate of COVID-19 regional from the initial occurrence date (January 20, 2020) to the last survey date of KCHS (December 31, 2020), from open data source of Public Data Portal managed by the Ministry of the Interior and Safety (https://www.data.go.kr/en/index.do). Thus, our study reflects the COVID-19 experience of the study population in KCHS as possible.
We also used 2015 Korean population census data to calculate area deprivation index (ADI) in the context of material and social deprivation.[23] We tried to use up-to-date census data as possible to calculate ADI, where might have some changes in status of each area between 2015 and 2020.
All personal information in this data was de-identified before its distribution; therefore, the institutional review board of Yonsei University confirmed that this study is eligible for exemption from full institutional review board review.
Variables
Dependent variable: Concerns related to COVID-19 (0–16 score)
Concerns related to COVID-19 questionnaires are as follows: (1) “I am concerned about getting infected with COVID-19,”(response rate: 99.9%; 229,202/229,269) (2) “I am concerned about dying if I get infected with COVID-19,” (response rate: 99.9%; 229,077/229,269) (3) “I am concerned about being criticized or disadvantaged by others around me if I get infected with COVID-19,” (response rate: 99.9%; 229,007/229,269) (4) “I am concerned about my family members who are vulnerable (e.g. elderly, infant, or patient) getting infected with COVID-19,” (response rate: 99.9%; 229,192/229,269) and (5) “I am concerned about the financial loss to my family and I caused by COVID-19 pandemic” (response rate: 99.9%; 229,145/229,269). Of the five types of concern related to COVID-19, we excluded the fourth type since its questionnaire only targeted individuals who live with a vulnerable family member (e.g. elderly, infant, or patient); thus, it did not align with our study objectives. Each question is rated on a 5-point scale from 1 (most concerned) to 5 (least concerned). We manually recorded the scales from 0 (least concerned) to 4 (most concerned). Accordingly, the total score ranges from 0 to 16, with a higher score indicating greater concerns related to COVID-19.
Main independent variable: Area deprivation index (ADI)
ADI is a summary measure used to indicate the level of material and social deprivation of a geographical area and consists of a number of standardized and weighted variables.[24] Previous research has proven that the ADI can be useful for uncovering geographically-based differences in a community’s health.[25] Specifically, it has been reported that different levels of ADI were associated with the infection rate of COVID-19.[15–21] The ADI used in our study was calculated based on the 2015 Korean population census data.[23] A total of eleven variables were utilized for the overall degree of AD across 13 regional states and four metropolitan cities, which are the entire geographical area of South Korea. These variables are (1) proportion of people aged 25–64 with no high school diploma, (2) proportion of households not owning their own house, (3) proportion of households living in a monthly/yearly rental house, (4) proportion of households with overcrowded living conditions (> 1 person/room), (5) proportion of the population aged 65 or over, (6) proportion of households with a woman as head of the household, (7) proportion of separated, divorced, or widowed individuals aged ≥ 15 years, (8) proportion of households living below the minimum housing standard (house without separate kitchen, bathroom, hot-water supply system, and heating apparatus), (9) proportion of households without a motor vehicle, (10) proportion of people living alone, and (11) proportion of the population with occupational lower social class. These occupations include a) agriculture, forestry, and fishing workers; b) device, operation, and assembly workers; and c) simple labor workers.
Each variable was standardized using a Z-score, combined to calculate the district-specific deprivation score, and linked with a participant’s residential area code. We manually classified the ADI into quartile groups: Quartile 1 (Least deprived, 1–25%, z-score < -3·67); Quartile 2 (26–50%, -3·67 < z-score ≤ -0·15); Quartile 3 (51–75%, -0·15 < z-score ≤ 3·61); Quartile 4 (Most deprived, 76–100%, 3·61 < z-score).
Covariates
We incorporated certain variables into the analysis as possible covariates to the aspect of individual- and area-level. Included individual-level covariates were sex, age groups (19–29, 30–39, 40–49, 50–59, 60–69, or ≥ 70), monthly household income (< ₩2,000,000, ₩2,000,000–2,999,999, ₩3,000,000–3,999,999, ₩4,000,000–4,999,999, or ≥ ₩ 5,000,000; ₩1,000 = around $0·921), education (primary school graduated or below, middle school graduated, high school graduated, or college graduated or above), marital status (single, married living together, or separated, divorced, or bereaved), subject health status (good, fair, or bad), smoking status (every day, occasionally, past, or never), alcohol drinking status (more than 4 times/a week, 2–3 times/a week, 2–4 times/a month, once or less than once/a month, or never), diabetes (no, or yes), high blood pressure (no, or yes), depressive symptom (Patient Health Quastionnaire 9 items (PHQ-9) score; ranges from 0 to 27), and daily sleep hours. Included area-level covariates were COVID-19 infection rate by region and region type (capital city, metropolitan areas, or others).
Statistical analysis
We estimated the regression coefficient using multilevel regression model with individual-level factors nested within 255 district areas in South Korea (area-level). Since the dependent variable was continuous (ranges 0–16) with normal distribution, we fitted the model with the identity link. Multilevel modeling begins with analyzing a null model. The null model includes distinct types of variance of the dependent variable, such as within-area and between-area variances.[26]
To test between-area variability, we calculated the intraclass correlation coefficient (ICC). The ICC is the ratio between the between-area variance and the sum of both within-area and between-area variances. In other words, the ICC reports on the amount of variation unexplained by any predictors in the model that can be attributed to the grouping variable, as compared to the overall unexplained variance (within and between variance). A high ICC indicates that between-area variance is not negligible, and thus a multilevel model should be employed to explain the inter-area dynamics. ICC equation is expressed as follows:
$$\text{I}\text{C}\text{C}=\frac{{\sigma }_{{u}_{0}}^{2}}{{\sigma }_{{u}_{0}}^{2}+{\sigma }_{e}^{2}}$$
where \({\sigma }_{{u}_{0}}^{2}\) is the variance of the level-2 (area-level) residuals and \({\sigma }_{e}^{2}\) is the variance of the level-1 (individual-level) residuals.
After examining the crude association (null model), we included area-level deprivation (Model 1). We then entered individual-level characteristics to the null model in Model 2. To follow, we included both individual- and area-level characteristics in Model 3. The multilevel equation is expressed as follows:
Level 1 (Individual level)
$${Y}_{ij}={\beta }_{0j}+{\beta }_{1j}{X}_{ij}+{e}_{ij}$$
Level 2 (Area level)
$${\beta }_{0j}={\gamma }_{00}+{\gamma }_{01}{Z}_{j}+{\mu }_{0j}{X}_{ij}+{e}_{ij}$$
$${\beta }_{1j}={\gamma }_{10}+{\mu }_{1j}$$
Here, \({Y}_{ij}\) represents the value of the dependent variable of the \(i\)th individual in area \(j\), while \({X}_{ij}\) and \({Z}_{j}\) indicate the independent variables at different levels. To explain, \({X}_{ij}\) contains values about the individuals in area \(j\); \({Z}_{j}\) includes values about the areas. \({\beta }_{0j}\) and \({\beta }_{1j}\) are the individual-level intercept and slope, respectively, in area \(j\). \({e}_{ij}\) indicates the error term at the individual-level (i.e., within-area variance). \({\gamma }_{00}\) denotes the average of the dependent variable \({Y}_{ij}\), controlling for the area-level variables \({Z}_{j}\); \({\gamma }_{01}\) is the slope of the area-level variables \({Z}_{j}\); and γ\({\gamma }_{10}\) indicates the overall value of the slope at the individual-level, controlling for the area-level variables \({Z}_{j}\). Lastly, \({\mu }_{0j}\) and \({\mu }_{1j}\) are error terms at the area-level (i.e., between-area variance).
All the statistical tests were two-tailed, and a p-value of < 0·05 was considered to be significant. The analyses were performed using Stata (15.1, StataCorp LLC, College Station, TX).