Data
In this study, we utilized the National Health Information Database (NHID) from 2013 to 2017. The NHID covers the entire population of Korea and is managed and provided by the National Health Insurance Service, Korea’s single health insurance provider. The NHID is composed of several databases (20). The eligibility database, one of the databases in the NHID, contains sociodemographic information on the entire population of Korea, including parameters such as sex, age, residence, and income-based insurance premiums (25). Death information is also collected individually in conjunction with death certificate data from Statistics Korea (25). In a previous study, the numbers of population and deaths at the district level (the administrative level in Korea above the dong/eup/myeon level) in the national statistics database and the NHID were highly correlated (26). Prior research compared the NHID with the NAD of the Ministry of Interior and Safety (MOIS) for calculating small-area level mortality (27). The numbers of population and deaths were nearly identical between the two databases, and the estimated SMRs were correlated to a great extent in both sexes (22). Thus, using the NHID to estimate small-area mortality is considered to be valid. One of the substantive strengths of using the NHID to calculate small-area mortality is the availability of age-specific mortality data in each small area (25), unlike what was possible when using the NAD and death certificate data in previous studies (17, 18). This strength allowed us to measure small-area mortality metrics, not only with SMR, but also with CMF and LE.
As of January 1st of each year, we obtained the annual population in small areas in 5-year age groups (0, 1–4, 5–9, 10–14, …, 85+) from the NHID as aggregated data. The subjects were followed for 1 year, and those who died by the end of the year were classified as deceased. If the subjects were foreigners or did not have any gender, age, or residence information, they were excluded from the analysis (1.4% of total NHID subjects), and most of those (99.8%) were foreigners.
Unit of analysis
The unit of analysis in this study was the dong/eup/myeon, which typically had between 3,850 and 21,886 inhabitants and 46 and 109 deaths as of 2017. The distribution of the numbers of population and deaths among all 3,377 small-areas in this study is presented in Supplementary Table 2. The median population for each small-area of the NHID for 2013–2017 was 111,077, (IQR = 10,244), the minimum was 10,244, and the maximum was 1,476,696. The metropolitan area had a higher median population than the urban area, but the median number of deaths were smaller. The rural area had a smaller population and death numbers than the other two areas, especially the population. Previous studies have also used the dong/eup/myeon as the unit of analysis to calculate small-area mortality in Korea (17, 18). Due to changes in administrative districts over time, we adjusted the unit of analysis by analyzing merged or split small areas as one unit for the entire study period. Since it is known that more than 5,000 subjects are required to calculate a stable LE (6), areas with an average population of less than 1,000 per year were merged with adjacent areas. Finally, this study reclassified the 3,500 dong, eup, and myeon areas as of December 31st, 2017 to 3,377 (28). As of December 31st, 2017, 8 out of 3500 small-areas were excluded from the analysis as they are civilian access control areas for military purposes. There were 26 small-areas with an average population of less than 1,000 during the study period, and all other adjustments have been made due to administrative changes. The more detailed description of adjusting the unit of analysis can be found in another study (27). Deidentified numbers were assigned to avoid stigma for small areas found to have high mortality rates (29).
Statistical analysis
We estimated the SMR, CMF, and LE in all small areas in Korea. In this study, only age was considered to be a confounder of the association between areas and mortality, and was adjusted in the calculation of mortality metrics. Data from 2013 to 2017 consisted of a total of 64,163 (3,377 small areas x 19 age bands) cells. Of those, 15,296 (23.8%) cells had 0 counts for deaths. A total of 6,871 (17.8%) out of 38,114 cells had 0 counts for deaths in the metropolitan areas (n = 2006). In 221 urban areas, 691 (16.5%) out of 4,199 cells had 0 counts for deaths, while rural area (n = 1,150) had 0 counts for deaths in 7,734 (35.4%) of 21,850 cells.
We used equation (1) to calculate SMR by dividing the number of observed deaths in a small area by the expected number of deaths. The expected deaths were estimated by multiplying the age-specific population in the small area by the age-specific mortality rate of the standard population. The standard population was the total population of this study. (see Equation 1 in the Supplementary Files)
Where Ti = age-specific population of standard population, Di = age-specific number of deaths of standard population, tir = age-specific population of each small area, dir = age-specific number of deaths of each small area. r = small area, i = 5-year age group.
We followed the method presented in the previous study for calculating the standard error (SE) and 95% confidence interval (CI) of SMR (5).
(see Equations 2 and 3 in the Supplementary Files)
CMF was calculated by dividing the expected number of deaths in the standard population by the number of observed deaths in the standard population. The expected number of deaths in the standard population was calculated by multiplying the age-specific mortality of each small area by the age-specific population in the standard population. The standard population used in the calculation of CMF was also the total population of this study. The equation (4) was used to calculate CMF. (see Equation 4 in the Supplementary Files)
We used equations (5), (6), and (7) to estimate the SE and 95% CI of CMF (5). (see Equations 5-7 in the Supplementary Files)
We multiplied the calculated values of SMR and CMF by 100 to help readers understand more intuitively and to provide more detailed information (5, 30).
LE is often calculated by a deterministic approach (31). Sampling variation is not an essential issue when calculating LE at national or regional levels (1). However, when calculating LE at a small-area level, it is necessary to consider sampling variation according to the occurrence of stochastic variation over time (1, 16). The calculation of the SE of LE can also answer the question of how many years of data must be combined to achieve the appropriate level of precision (1). Chiang presumed that death numbers were distributed binomially, calculated the SE of the probability of dying in the interval, and linked it to the LE calculation in a previous study (as cited in (32)). Eayres and Williams contended that both assumptions—that deaths have a binomial distribution and a Poisson distribution—showed a high level of agreement in the results, but in the analysis of LE at the small-area level, they insisted that it would be preferable to assume a binomial distribution (33). We performed Monte Carlo simulations using the probability of dying from an abridged life table to generate a binomial distribution of death numbers (1, 32). The simulation was performed 10,000 times for each small area. We used it for the LE calculation and generated the LE distribution. The mean value of the distribution for a small area was defined as its LE. The 2.5th and 97.5th percentiles of the distribution were defined as the lower and upper limits of the CI of LE, respectively. No imputation was conducted even if the number of deaths for a specific age band was zero (33, 34). There was no small area where the number of deaths in the final age band (85+) was zero.
We set up a hypothetical situation with the same age-specific mortality rates across all small areas, applying the national age-specific mortality rates in 2015 to calculate SMR and to compare its distributions by urbanity. We also compared the ranking of areas by SMR, CMF, and LE, from the highest to lowest and from the lowest to highest. Lastly, we examined the ratio of CMF to SMR stratified by urbanity.