Given the miscalibration in the validation dataset, and evidence indicating that the model was reflecting changes in patient characteristics, this indicated that the secular trend could not be explained by changes in predictor variables alone. This provided support for modelling the secular trend in the development cohort, to try and remove the miscalibration in the validation cohort. The same Cox model defined by Eq. (1) was fitted to the development cohort, but with cohort entry date included as a variable, referred to as calendar time. This is denoted by T0 in Fig. 2 (DAG-2) and Eq. (2). Fractional polynomials for this variable were tested using the mfp package.(11) Five year risks were generated for validation cohort and the calibration of the models was assessed.
Developing an MSM to assess secular trend after adjusting for statin use during follow up.
MSM – overview
A major concern was that an increase in statin use over time may have caused some of the reduction in CVD incidence. If the secular trend was driven by statin use, then modelling it (which would result in lower predicted risks) would make lots of patients whose risk if they remained untreated was > 10%, ineligible for treatment. Statin use at baseline could not have been driving this secular trend as the development cohort only considered patients who were statin free at baseline, however patients could initiate statins during follow up. The aim of this section was therefore to assess the presence of the secular trend when adjusting for statin use during follow up.
Consider Figure 3, where denotes baseline, and two time points during follow up (this could be extended to any number of time points). denotes the statin treatment status at time , covariate information prior to time , and calendar time at time . Note is not included in DAG-3 as by definition of the CVD primary prevention cohort. It is possible to adjust for changes in and post baseline using standard regression techniques (such as an interval censored Cox model). This would result in an estimate of the direct effect of calendar time on CVD incidence, the portion of which is not explained through changes in and during follow up. This would be sufficient for assessing our aim of whether the secular trend remained after adjusting for statin use during follow up. However it would be useless in a risk prediction setting, as there is no way of knowing a patients future set of predictors. Therefore the proposed method to answer our question was an MSM.
MSMs were developed to calculate the causal effect of a time dependent exposure on an outcome in an observational setting, where the treatment and outcome are confounded by time varying covariates.(13,14) Sperrin et al.(15) have shown how MSMs can be used to adjust for ‘treatment drop in’, the issue of patients starting treatment during follow up in a dataset being used for risk prediction. In the absence of unmeasured confounding, they allow for the estimation of , where A denotes the entire treatment course during follow up, as opposed to . The strategy involves adjusting for variables at baseline as normal and then re-weighting the population by variables that may be on the treatment causal pathway, breaking the links from to . In the resulting pseudo population the allocation of treatment during follow up happens at random (within the levels of the variables defined at baseline). This allows the generation of risk scores using data at baseline only, but also accounting for statin use during follow up. Importantly for this study, if calendar time only effected the outcome Y through increasing statin use in follow up, when using an MSM the direct effect of on Y would be zero, and adjusting for calendar time at baseline would not result in a drop in the average risk score of patients in the validation cohort.
The estimator of is only valid under the three identifiability assumptions of causal inference (exchangeability, consistency and positivity) and correct specification of the marginal structural model, and the model used to calculate the weights. The viability of these assumptions in this study is discussed in the limitations.
MSM - data derivation
The CVD primary prevention cohort was used as a starting point. However in order to derive the MSM, patient information was extracted at 10 time points, at 6 month intervals from the cohort entry date, denoted as and for ,…, 9. The variable contained all the QRISK3 predictors evaluated at time (for test data this was the most recent value prior to time ). if a patient had initiated statin treatment prior to , and otherwise. As patients were excluded from the cohort if they have had a statin prescription prior to their cohort entry date, A0 = 0 for all patients. If a CVD event happened within 6 months of a statin initiation, the statin initiation was ignored. This was to stop any effects of poorly recorded data (start of statins may have been triggered by the CVD event).
A key issue in deriving the dataset was missing data. A combination of imputation techniques were implemented to maintain consistency in variable information within each patient across the 10 time points.First, where possible, last observation carried forward imputation was implemented within each patient. Then, where possible, next observation carried backwards imputation was used to impute the remaining missing data. However, there was still missing data for patients who had no entries across all 10 time points for a given variable. The data at baseline was then extracted and missing values were imputed using one stochastic imputation. All predictor variables, Nelson Aalen estimate of baseline hazard and the outcome indicator were included in the imputation model (same process that was used to impute the data for the standard Cox model). These imputed baseline values were then used at each following time point (last observation carried forward imputation).
MSM - Calculation of weights and specification of model
The MSM was fitted as a weighted interval censored Cox model using the coxph function from the survival package.(16) The weights themselves were calculated using the IPW package.(17) Stabilised weights were calculated as is common practice to provide more precise estimation of the weights. For individual , the formula for the weight of interval/time period K was defined as:
where and , and and denote treatment history and covariate history respectively up time point for individual . More simply put, the denominator is the probability that the individual received the treatment they did, based on time varying predictors and predictors at baseline. The numerator is the probability that the individual received the treatment they did, based on predictors at baseline only. The models used to estimate the probability of treatment when deriving the weights were interval censored Cox models. If calendar time at baseline, , was being included in the MSM, it was also included as a stabilising factor in the calculation of the weights as part of . Detailed information on how to calculate weights is also given in the literature(14,17,18) and the formula for calculating weights (and notation for variables) matches that from the work by Sperrin et al.(15)
Two MSM’s were created, one that adjusted for calendar time at baseline and one that did not:
The same fractional polynomials of age, BMI, SBP and calendar time that were found to be optimal in the standard Cox models were used in the MSM, and in the models used to calculate the weights. Ideally we would have re-calculated the optimal fractional polynomials for the weighted model fitted to the interval censored data, however software was not available to do this. Using the same fractional polynomials from the standard Cox analysis was preferred to having no fractional polynomials, as removing them led to poorly calibrated models. The coefficient is the average causal effect of initiating statin treatment after adjusting for all other variables. It is quite common to allow the effect of statin treatment to be modified by baseline variables, which could be achieved by including interaction terms . However the primary aim was to account for statin use in follow up, rather than calculate the effect of statin treatment in different subgroups, so we did not feel this was necessary.
As a comparison, unweighted interval censored Cox models using only data at baseline (i.e. equation (1) and equation (2) were fitted to the same data as the MSM. The effect of modelling the secular trend could then be assessed when using (interval censored) Cox regression, as well as under the MSM framework. This was preferred to re-using the standard Cox models directly, which were fitted to a different dataset.
MSM – analysis of interest
The MSM was used to generate risk predictions assuming no statin treatment at baseline or during follow up, , the estimator of . The interval censored Cox model only produced risk predictions based on no statin treatment at baseline, , the estimator of . The outcome of interest was the risk ratio of the average predicted risk of patients in the validation cohort, before and after adjusting for calendar time at baseline in the MSM framework, . This was compared to the risk ratio after adjusting for calendar time at baseline in the unweighted interval censored Cox models, .