Life expectancy using flexible parametric methods
Flexible parametric methods, first introduced in 2002 [9] and extended further in 2009 [10], allow a parametric form to be specified for the baseline hazard function. Whilst simple parametric models can be difficult to apply to real-world data because they cannot model fluctuations or turning points in the hazard function, flexible parametric methods use restricted cubic splines to flexibly and smoothly capture the shape of the baseline log (cumulative) hazard function over time. The spline function is defined by constrained cubic polynomial functions forced to join at a pre-selected number of joining points, called knots, which equate to the degree of complexity, often expressed as degrees of freedom. Knots are usually spaced equally as percentiles across the distribution of event times. For example, the default knot placement for flexible parametric methods in the statistical package Stata (‘stpm2’) [10, 11] for four degrees of freedom is at the 0th, 25th, 50th, 75th and 100th percentile of the event distribution for a specific population. In this study, we used age as the timescale of interest as this is a natural choice for life expectancy calculations and aligns with previous research in this area [12-15]. The knots were, therefore, placed according the distributions of age at death in the study population.
The formula for the survival function for flexible parametric methods and Stata code (v16.0) for calculating life expectancy are shown in the supplementary material (Box S1 and S2). Additional years expected to live from a given age, a, to a maximum age, w, can be estimated fitting a flexible parametric model and integrating under the survival function curve (i.e. calculating the area under the curve) to age w, scaled by the survival function (i.e. proportion live) at age a using the formula below.
This can be approximated using numerical integration techniques in statistical software.
The example in Figure 1 illustrates how life expectancy of people aged 40 years in a given population can be calculated using flexible parametric methods. The figure shows the survival function for all ages after fitting a flexible parametric model with 4 degrees of freedom using age as the time scale. We can see that the survival function at age 40 years, denoted by the horizontal red line, is 0.9879. Therefore, the additional years expected to live is the integral of the survival function from 40 to the maximum age (w=110 years), conditional on surviving to this age.The area under the curve is 41.96 years. Therefore, people aged 40 years in this population can expect, on average, to live for an additional 42.5 years and have an overall average life expectancy of 82.5 years.
A period life expectancy estimate is given in a similar way to Chiang’s methods [3, 4], by combining the current age-specific rates of mortality in the most recent calendar year to calculate life expectancy. Parametric models are particularly good for this as we are able to model a smooth representation across age, borrowing strength accordingly.
The theoretical advantages of flexible parametric methods over traditional life expectancy approaches include the modelling of life expectancy with greater statistical precision because fewer parameters are used. For example, taking 5-year-age groups from birth to 75+ years would require 15 parameters under the Chiang approach as compared with the five parameters described in the example. The methods also allow age to be modelled by exact age which means that individual-level average risk can be estimated and there are not the same age restrictions at the tail of the distribution because age is not truncated. Flexible parametric approaches also have the potential to allow prediction for different covariate patterns by modelling age-varying effects using interaction terms by the covariates and spline variables for age [16].
Data sources
The case example for this work used the Clinical Practice Research Datalink (CPRD GOLD), linked (person-level) with hospital episode statistics (HES) and death registrations from the Office for National Statistics (approved study protocol number: 19_267RA3). The CPRD is an electronic health record research database of more than 11.3 million patients, broadly representative of the national population in terms of age, gender, and ethnicity [17], from general practice (GP) surgeries in the UK. The study comprised GP surgeries in England only – of which approximately 75% consent to linkage to deaths data.
The study followed the Reporting of studies Conducted using Observational Routinely‐collected health Data (RECORD) checklist [18] (see supplementary Table S1). People with intellectual disabilities were identified from a pre-agreed set of primary care Read codes and has been described in previous research (see supplementary Table S2) [19]. Diagnostic codes for Type 2 diabetes were identified using previous literature [20] and are described in supplementary Table S2. The initial extract from the CPRD has been described previously [19] and was based on the following inclusion criteria: registered at the GP surgery between 1 Jan 2000 to 29 Sept 2019, and 10 years old or over to account for delays in reporting of diagnoses of intellectual disability in children [21]. An additional 23 patients with Angelman or Cockayne syndrome were added in August 2021 after an amendment to the original protocol (approved March 2020 but delayed during the COVID period). A simple random sample of people without intellectual disabilities (initially 1 million from 2000–2019 before exclusions) was used for the comparison group with the same eligibility criteria (but without a diagnosis of intellectual disability).
Statistical analyses
For the purposes of this work and to align with standard period life expectancy calculations, we restricted the observation time to a one-year period (2012) (i.e. person-time only contributed to this calendar year). Date of entry into the cohort was defined as the latest date according to the person and practice’s characteristics: 01 Jan 2012; date of registration with the GP practice; date the practice was defined as being up to standard (using the CPRD’s own quality indicators); or date the individual turned 10 years old (to align with the eligibility criteria). Because there are known delays in reporting intellectual disability diagnoses [22] and to avoid conditioning on the future, intellectual disability status was treated as an age-dependent covariate such that people with intellectual disabilities contributed to the comparison cohort prior to their first diagnosis. T2DM status was also allowed to change during the observation period. Date of exit was defined as: date of death; date of end of calendar period; date of last practice update; date of transfer out of practice; or 31 Dec 2012, whichever was first. For T2DM, the baseline measure at 2012 was taken; people in the cohort who developed T2DM after 1 January 2012 were treated as being T2DM-free during the entire one-year period.
Table 1 Summary of flexible parametric methods used to estimate life expectancy in people with and without intellectual disabilities and T2DM
Name of method
|
Approach to analysis
|
Method 1: fully stratified
|
Stratified models of intellectual disabilities and T2DM
|
Method 2: partially stratified
|
Model, stratified by intellectual disabilities, T2DM interaction with age
|
Method 3: modelled covariates, age interactions
|
Intellectual disability interaction with age, T2DM interaction with age
|
Method 4: modelled covariates, age interactions – with adapted knots to minority groups
|
Intellectual disability interaction with age, T2DM interaction with age – knots forced to intellectual disability population
|
Table 1 summarises the models compared. We estimated life expectancy and 95% confidence intervals by intellectual disability and T2DM status using: fully stratified models (Method 1: fully stratified); with intellectual disability stratified but T2DM as an age-varying covariate (Method 2: partially stratified); and for the entire population with both T2DM and intellectual disability as age-varying covariates, also fitting an interaction term (Method 3: modelled covariates, age interactions). Follow-up started from adulthood (age 20+ years) because T2DM is known to be relatively uncommon in younger ages [23], and life expectancy was reported for people aged 40+ years only. The models were also compared with Chiang’s abridged life table approach, stratified by intellectual disability and T2DM status.
For the flexible parametric models, life expectancy estimates and confidence intervals were calculated using the Delta method after fitting models with 4 degrees of freedom using ‘stpm2’ in Stata (v16.0) [11]. The default knot placements (0th, 25th, 50th, 75th and 100th percentile) were not used in this context (with age as the timescale) owing to the sparsity of events at the tails of the distributions which sometimes prevented the models from converging and led to poor statistical precision in the older age groups. Instead, we used the knot placements ‘where data exist’ recommended by Harrell (2016) [24] at the 5th, 27.5th, 50th, 72.5th and 95th percentile). We also calculated life expectancy in the entire population after forcing knot placements to match the event distribution in the intellectual disability population (Methods 4: modelled covariates, age interactions with adapted knots to minority groups) – see Table 1. The variance of the estimate of life expectancy was calculated on the log scale in order to stabilise the variance and avoid negative confidence intervals.
For older individuals with intellectual disabilities with/without T2DM, sample sizes could become very small. Therefore, confidence intervals were also compared with percentile-based bootstrapped confidence intervals in people aged 80–99 years. As an additional validation of the confidence intervals derived from the model, these were compared with both percentile-based and normal-based confidence intervals after bootstrapping in the larger sample without intellectual disabilities at the older ages (95–105 yrs).
For the Chiang’s abridged life table approach, confidence intervals were calculated using the adjusted Chiang approach advocated by Eayres & Williams [7] which involves adding a correction term to the original Chiang variance to incorporate length of survival in the last age group [25]. When calculating life expectancies in small areas, Chiang’s methods with the adjusted variance are recommended over Silcocks’ methods [7].