National cause-of-death data are widely used to describe the health of populations.[1] These data are exhaustive and collected in a standardized fashion, allowing international comparisons.[2] They are extracted from medical death certificates where certifiers (physicians or coroners) are asked to describe the causal sequence leading to death. These data have been studied to assess associations between diseases in the general population,[3–9] although the difficulties of such study design have long been emphasized.[10–12] For example, the risk of suicide in patients with Parkinson’s disease was estimated in an often-cited study based on death certificate data.[5] The authors found a 10-fold lower risk of suicide in people with Parkinson’s disease than for other individuals who died. However, instead of a decreased risk, prospective studies highlighted a two- to five-fold higher suicide risk in these patients.[13,14] Indeed, the design used in the first study (estimating associations between health states in the general population from death certificates) is subject to two main types of bias, which could explain misleading findings.
Collider bias
Studies based on death certificate data are conducted on non-representative samples of the general population. Indeed, even if all deaths are reported, no information is available on living individuals. This leads to a selection bias, as inclusion in the study population is conditioned on death, which is a common effect of the diseases under study (defined among causes of death), called a collider (Figure 1). This selection bias is called “collider bias”, or “bias due to conditioning on a collider”, and can strengthen or reverse associations between variables of interest.[15,16]
Reporting bias
Studies on death certificate data are also subject to information bias, which we hereafter refer to as “reporting bias”. This bias is the result of (1) the task assigned to the certifier, who has to report diseases and events that effectively contributed to death only, rather than all diseases present prior to death, and (2) possible incompleteness in the filling out of the death certificates (which depends, among other things, on the certifier’s level of knowledge of the deceased patient and his/her medical history).[17]
Aim and organisation of the paper
Seminal literature that warned on the risks of using comprehensive mortality data to assess associations between diseases only provided leads to reduce these risks, without giving a deep insight into the mechanisms of the biases involved. The general purpose of this paper was to assess to what extent associations of causes of death estimated from individual mortality data can be extrapolated to the general population, given collider and reporting biases. As an illustrative example, we estimated the association between cancer and suicide in death certificates depending on the cancer site and assessed the order of magnitude of the collider and reporting biases. In the first section of the paper, we describe how multiple causes-of-death data are constructed, from medical certification to medical coding of causes of death (including the international rules for the selection of the underlying cause of death). We also describe the framework for the assessment of associations between causes of death and the biases involved in such studies. In the second section, we present the methods and results of our illustrative example on cancer and suicide. Finally, we conclude by addressing recommendations for future studies and discussing how to improve the use of multiple causes-of-death data.
Analyses of cause of death associations in death certificates
Data from death certificates
Medical certification of death is mandatory in most industrialized countries and must be performed by a physician or a coroner. The World Health Organization [WHO] has designed the structure of the international medical death certificate with two parts: Part I is dedicated to the description of the causal sequence of events that directly led to death and Part II the reporting of significant morbid conditions that may have contributed to death, but are not involved in the sequence of events that directly led to death (Figure 2).
The WHO defines the underlying cause of death as “the disease or injury which initiated the train of morbid events leading directly to death or the circumstances of the accident or violence which produced the fatal injury”. Selection of the underlying cause of death is performed automatically by software (e.g., Iris)[18] or based on the expertise of a mortality medical coder (or “nosologist”) for the most complex cases. This selection is governed by several rules prescribed by the WHO in the tenth revision of the International Statistical Classification of Diseases and Related Health Problems [ICD-10].[19] The main rule, called the “General Principle”, states that “when more than one condition is entered on the certificate, […] the condition entered alone on the lowest used line of Part I” (i.e., the first condition mentioned in the train of morbid events leading to death) must be selected as the underlying cause of death, “only if it could have given rise to all the conditions entered above it” (i.e. to the subsequent conditions of the train of morbid events leading to death). If the General Principle does not apply, Rules 1 and 2 state that the originating cause of the immediate (or final) cause of death, mentioned first in the train of morbid events leading to death, has to be selected as the underlying cause of death. Finally, Rule 3 states that “if the condition selected by the [previous rules] was obviously a direct consequence of another reported condition, whether in Part I or Part II”, this condition has to be selected as the underlying cause of death. For instance, HIV disease and external causes of death can meet Rule 3.
Framework for the assessment of associations between causes of death and the biases involved
Data from death certificates can be used to assess associations between health states (diseases and/or injuries) mentioned as causes of death. Standardised mortality ratios are a tool to assess such associations.[10,20] Multivariable logistic regression models can also be used, allowing adjustment for potential confounders. Odds ratios [OR], resulting from these models, convey information concerning both the direction of the association (the risk is higher if OR>1 or lower if OR<1) and its magnitude. When the prevalence of the assessed outcome is low, the OR is a good approximation of the relative risk and can be interpreted accordingly.[10]
Assessment of collider bias
Collider bias is due to conditioning the study sample on death. A multistate model can be used to generate populations of individuals and simulate their health states up to their deaths from national health statistics. Associations between health states can then be estimated from such simulated deaths (with logistic regression models, in the same way as with observed deaths) and the collider bias assessed, as these simulated deaths artificially replicate this bias. Collider bias can then be estimated from the following ratio:

Multiplicative measures of bias are better suited in this context, in which associations are expressed in the multiplicative scale (ORs).
Assessment of reporting bias
The magnitude of reporting bias can be approximated by the difference between the estimates obtained from observed death certificates (which are subject to both collider and reporting biases) and those obtained from simulated deaths (which are subject to collider bias only). Reporting bias can then be approximated from the following ratio:

However, the two sources of reporting bias ((1) the difference of the definition between measuring associations of diseases and measuring associations of causes of death and (2) the incomplete filling out of death certificates by certifiers) cannot be distinguished from one another.
Illustrative example: Association between cancer and suicide in death certificates in France
Suicide is a major public health issue, accounting for 1.4% of all deaths worldwide.[21] The impact of psychiatric diseases (notably, depression, anxiety, and psychotic disorders)[22] is well known, but somatic disorders may also play a role in the occurrence of suicide deaths. Cancer, due to its impact on health, the adverse events of treatments, and stigma, can substantially reduce the quality of life and promote the onset of suicidal ideation and suicide deaths. This phenomenon can vary depending on the cancer site prognosis, notably after receiving the diagnosis.[23]
Our illustrative example is based on French multiple causes-of-death data. Data from death certificates are commonly used to study suicide mortality and its determinants, with various study designs: ecological studies,[24] studies based on disease registries,[25] analyses of cause of death associations.[5,6] Inclusion in our study population was structurally conditioned on death, a common effect of cancer (the exposure) and suicide (the outcome), i.e. a collider (Figure 1). We first measured the cancer/suicide association in death certificates, according to cancer site, and then assessed the magnitude of the collider and reporting biases, using simulations.