Case study: Health and Retirement Study data
We created a nationally representative cohort of 5,531 community-dwelling seniors enrolled in the HRS, who were 70 years old or older at the time of their baseline interview in 2000. The HRS is an ongoing longitudinal survey of a representative sample of all persons in the United States over age 50 that examines changes in health and wealth (22). It is sponsored by the National Institute on Aging (grant number NIA U01AG009740) and conducted by the University of Michigan. We used the public HRS data: Cross-Wave Tracker file (23) and RAND (24, 25) HRS data file.
The pool of predictors included 39 health-related and demographic variables measured at baseline. All the predictors were categorical variables. We used 4 clinical outcomes encompassing 15 years of follow-up: (1) time to first ADL dependence (including five ADLs: bathing, dressing, toileting, transferring, and eating), (2) time to first IADL difficulty (including two IADLs: managing money and medication), (3) time to first mobility dependence, and (4) time to death.
The Best Average BIC (baBIC) method
Our proposed method, the Best Average BIC (baBIC) method, selected the best subset of common predictors for all 4 outcomes based on the minimum averaged normalized BIC across outcomes. We compared our method with: (1) a method that selects individual subsets of predictors for each outcome (Individual Outcome method), and (2) an enhanced method that creates a best subset of common predictors based on the union of individual subsets obtained in the Individual Outcome approach (Union method).
Information criteria like the BIC are useful for selecting the best subset of predictors because they work well for both a fixed number of predictors and across predictor sets of varying sizes. In contrast, statistics like concordance (or C) statistic are not that useful for the selection across sets of different number of predictors since, in general, models with more predictors will tend to have higher C-statistic than those with fewer predictors. Another advantage of the BIC is that it will tend to select more parsimonious models since it penalizes larger models more heavily compared, for example, with the AIC.
The BICs were obtained from survival models. For time to death, we fitted Cox proportional hazards regression models (26). For times to first ADL dependence, IADL difficulty, and mobility dependence, we fitted Fine and Gray competing-risk regression models to appropriately account for the risk of death (27). In the baBIC method, we used a normalized BIC (nBIC) to ensure that a change in BIC from a complex to a simpler model meant roughly the same across multiple outcomes. The nBIC was computed by dividing the BIC from the fitted model of each outcome by the difference between the BIC in the full model and the BIC in the best individual model. That is:
See Formula 1 in Supplemental Files
Where:
See Formula 2 in Supplemental Files
L: the maximized value of the likelihood function of the fitted model
k: number of parameters estimated by the fitted model
The baBIC method started with all 39 (p) predictors and selected a subset of 38 (p-1) predictors with minimum average nBIC across 4 outcomes; the lower the nBIC the better the model. To select the subset of predictors with minimum average nBIC, we fitted for each outcome all possible combinations of predictors obtained by removing 1 predictor at a time. We then computed the average of the nBICs across the 4 outcomes within each subset of predictors and selected the subset of 38 (p-1) with the minimum average nBIC (Fig 1). In the next step of backward elimination, the method started with 38 (p-1) predictors and selected a subset of 37 (p-2) predictors that again rendered the minimum average nBIC across 4 outcomes. The same process continued until there were only 2 variables left (i.e. “Male” and “Age decile groups”), which were forced in. Lastly, the method selected the final subset of predictors that had the minimum average nBIC across all subsets of different number of predictors from p-1 to 2 (Fig. 2).
For the comparative methods, Individual Outcome and Union methods, we followed a similar approach as described above. The only difference being that the backward elimination was based on the minimum BIC for each individual outcome instead of the minimum average nBIC across the 4 outcomes. We then obtained the Union model that contained all the predictors that were in at least 1 of the 4 best subsets of the Individual Outcome models.
For all final models, we computed the number of variables and measured predictive accuracy using the Harrell’s C-statistic (28). For times to first ADL dependence, IADL difficulty, and mobility dependence, we used Wolbers et al. (29) adaptation of Harrell’s C-statistic to the competing risks setting, where death status is switched to censored and the time-to-event is equal to the longest possible time-to-event that any respondent was followed up (i.e. 15 years).
Simulation study
We performed a simulation study to assess the performance of the proposed baBIC method across highly correlated and uncorrelated data. We created 4 sets of 500 simulations with the same sample size (N=5,531) and the same number of initial predictors (p=39) with identical distribution as in the HRS data. The 4 outcomes in the HRS data were highly correlated based on the pairwise Pearson correlations (range: 0.80-0.91). Thus, to test whether the correlation among the outcomes impacted the selection methods, simulated data with high and low correlation among the outcomes were generated.
Due to the computation time constraint of fitting Competing-risk regression models in the simulation study for times to first ADL dependence, IADL difficulty, and mobility dependence, we used a modified version of the HRS data where those who died were treated as being censored at the longest possible time that any respondent was followed (i.e. 15 years) (29). Of note, we obtained the same final subset of predictors with and without this simplification in the HRS data.
The times-to-event of correlated outcomes were obtained as follows. First, we simulated 4-variate normal random variables that had means of zero, standard deviations of 1, and the correlation structure from the HRS data. Next, we inverted the random values to probabilities. For uncorrelated outcomes, we used probabilities simulated from the uniform distribution. These probabilities were then used as look-up values in the observed time-to-event distributions for each of the outcome variable in the HRS data.
For each of the simulated data, we obtained the final baBIC, Individual Outcome, and Union models. The averages and corresponding 95% Confidence Intervals (CI) of the Harrell’s C-statistic and the number of predictors were computed over 500 simulations for each model. Additionally, we calculated the percentage of times that each of the variables in the final baBIC model of the HRS data appeared in the baBIC models of the simulations (percentage of correct inclusion), and the percentage of times that each of the variables that were not present in the baBIC model of the HRS data did not appear in the baBIC models of the simulations (percentage of correct exclusion).
All the analyses were performed with SAS® 9.4 (TS1M5) statistical package (Copyright © 2016 by SAS Institute Inc., Cary, NC, USA).