The CF cohort for this empirical study consists of patients with longitudinal data recorded from childhood until early adulthood (ages: 6-20 years) in the US Cystic Fibrosis Foundation Patient Registry (CFFPR) between January 1, 2003, and December 31, 2015. Patients younger than 6 years of age were excluded due to potentially unreliable pulmonary function testing. To focus the study on the most modern era of CF care, we considered available data from 2003 onward. A detailed description of this registry and its contents is provided elsewhere (20).
To build the multivariate joint model for this cohort, we focused on established outcomes of lung function and nutrition. Outcomes on nutrition and growth, available from the CFFPR and computed based on formulas from the Centers for Disease Control and Prevention, included the aforementioned percentiles for BMIp, WFA and HFA. Observed FEV1 was expressed as percentage of predicted using established reference equations for age, sex and race(21, 22). Occurrence of first PE, the event outcome, was subject to censoring and considered to have occurred if documented in the CFFPR as warranting treatment with intravenous antibiotics in the hospital.
Covariates from the aforementioned literature were considered, as well as findings from analyses specified in the most recent CFFPR report. These included genotype (F508del homozygous, heterozygous or neither/unknown), sex, Hispanic ethnicity, socioeconomic status (SES, having only state/federal or no insurance was recorded as 1, and 0 otherwise); time-varying covariates included use of pancreatic enzymes (corresponding to pancreatic insufficiency), CF-related diabetes (CFRD, with or without fasting hyperglycemia), positive cultures for methicillin-resistant staphylococcus aureus (MRSA) and Pseudomonas aeruginosa (Pa).
Baseline was defined as the time at which all longitudinal outcomes were first recorded during the eligibility period. A birth cohort variable was used to account for potential delayed entry into the registry and changes due to advancements in CF care and therapeutics approvals. Another time-varying covariate was used to account for irregular sampling bias, potentially induced by patients having a varying number of clinical encounters over time; for a given patient and encounter, this variable was the total number of encounters within the prior year. Data acquired after lung transplant were excluded.
The Institutional Review Board at Cincinnati Children’s Hospital Medical Center approved the study. Upon local IRB approval of this study, we submitted a data use proposal to the CFF Patient Registry Committee, which undertakes an established peer-review process of every request. We refined our request according to Committee feedback and, once they gave final approval, we received non-identifiable and non-traceable data through an encrypted data delivery Citrix ShareFile system.
Descriptive statistics, including median (Q1-Q3) for continuous variables and n (%) for categorical variables, were used to summarize cohort characteristics and extent of follow-up available for each patient.
Joint model setup and notation
(see Supplementary Files)
We employed Markov-Chain Monte-Carlo (MCMC) via Gibbs sampling using the ‘JMbayes’ package (Version 0.8-82) in R (Version 3.5.3., R Foundation for Statistical Computing, Vienna Austria) to obtain the parameters from the respective posterior distributions under the multivariate joint model(18). The highest posterior density (HPD) and accompanying standard errors were used to estimate each parameter of interest. Due to the large patient sample size that led to memory problems, we randomly divided the dataset into three parts (each part had similar percentage of PE at onset). Each of the datasets was fitted separately, and for the final results we combined the MCMC samples as described previously(24). Code implementation is provided as supplemental material. In total, we fit five joint models wherein PE onset was the event: i) only FEV1; others included ii) BMIp; iii) HFA; iv) WFA; v) WFA and HFA. The multivariate joint models in (ii) – (iv) were implemented to evaluate the impact of different measures of nutritional status on predicting PE onset. Due to the large number of parameters in scenario (v), we assumed a less flexible evolution for FEV1 and BMIp over time. In particular, we postulate natural quadratic splines to estimate this evolution instead of natural cubic splines assumed in the other scenarios.
Evaluating predictive performance
We calculated the area under the receiver-operator characteristic curve (AUC) to evaluate predictive accuracy of each model with respect to PE events using ten-fold cross validation. A large value of AUC indicates the preferred model. To obtain correct estimates of the AUC we performed a 5-fold cross validation procedure, wherein each time we selected 700 individuals from the large data set and we split it in five subsets. Each time we fitted the model in four of the subsets, we calculated the AUC in the subset that was excluded. The calculation of the AUC was performed at 12 and 16 years of age with prediction windows of 0.5, 1, 2 years, in order to mimic clinically meaningful age strata and prediction windows. This procedure was repeated 100 times. Appropriate diagnostics for joint models were examined. The code of the analysis and cross validation are shown in the Supplement.