Background and Aim. Garbage code (GC) in death surveillance data may affect the statistics on causes of death, and the public health policies made based on it. Redistribution of GCs means to assign a GC to a plausible correct underlying cause of death (UCOD), ways of doing which include expert consultation, fixed proportional reassignment, computed proportional reassignment based on the information from cause of death chain, regression models, and so on. The Global Burden of Disease (GBD) study has used several ways to redistribute various GCs in the death data sets all over the world. In this study, using heart failure as an example, we aimed to discuss the GCs redistribution in relatively small city-level death data sets, and the suitability of certain redistribution methods.
Methods. We collected the cause of death surveillance data in two Chinese cities, Weifang and Xuanwei, checked and improved the data quality before analysis. We extracted the death records attributed to heart failure, then manually corrected their UCOD, based on the cause of death chain information and according to the rules and guidelines for morbidity coding established by World Health Organization (WHO). After this step, we conducted the redistribution process for the records with UCOD remaining to be heart failure, using two different approaches, coarsen exact matching and linear regression, respectively. At last, we calculated the cause-specific mortalities before and after to observe the changes.
Results. Death cases with UCOD stated as heart failure were 1556 (percentage: 0.33%) in Weifang and 226 (0.41%) in Xuanwei, respectively. After manual correction, in both cities UCOD remained the same in about 75% of the records. In Weifang, when using coarsen exact matching, heart failure was mainly redistributed to ischemic heart disease (IHD, 45.31%), hypertensive heart disease (HHD, 21.56%) and chronic obstructive pulmonary disease (COPD 8.98%), but the death counts due to HHD, rheumatic heart disease (RHD) and other cardiovascular diseases except for IHD, HHD, RHD and stroke (CD) increased the most, with increasing percentages being 3.288%, 2.451% and 1.619%. When using linear regression, heart failure was almost all redistributed to IHD (91.20%), the death counts due to CD and IHD increased 1.213% and 0.929%. In Xuanwei, when using coarsen exact matching, heart failure was mainly redistributed to IHD (24.70%), diabetes mellitus and chronic kidney disease (DMCKD 23.25%) and COPD (16.10%), but the death counts due to HHD, DMCKD and RHD increased the most, with increasing percentages being 7.786%, 4.107% and 2.156%. When using linear regression, heart failure was all redistributed to COPD (94.83%), and its death count increased 1.622%.
Conclusions. In cities with 1 to 10 million permanent residents, if the percentage of certain GC was quite low in their death data sets, the necessity of redistributing it would be worth discussed. If redistributing it, coarsen exact matching should probably be more suitable then linear regression, as linear regression may cause inappropriate centralization of redistribution target diseases. The fundamental way for improving the quality of death data is to improve the capacity of primary staff on UCOD identification and coding.