Participant recruitment
Data for this analysis were drawn from two longitudinal cohorts, one in London Ontario, Canada (SYMBIOME, Systematic Merging of Biology, Mental Health and Environment, clinicaltrials.gov ID no. NCT02711085) and one in Chicago, Illinois, United States (Neuromuscular Mechanisms Underlying Poor Recovery from Whiplash Injuries, clinical trials.gov ID no. NCT02157038). Eligible participants were identified by emergency or acute-care nursing or medical clinicians, all within hours to four weeks of MSK trauma. Participants were 18 to 65 years old, had to have suffered a non-catastrophic MSK injury that did not require inpatient admission or surgical correction, and could speak and understand conversational (at least grade 8) English. The London cohort included participants with non-catastrophic acute MSK injury affecting any body region, while the Chicago cohort included only those with acute whiplash-associated symptoms about the neck arising from a motor vehicle collision. Exclusion criteria were those with one or more prior motor vehicle collisions (Chicago cohort only), any metabolic systems disorders (Chicago cohort only), and any nervous system or major systemic disorders that would be expected to otherwise impair recovery independent of the trauma (e.g. active cancer, neuromuscular disease, autoimmune diseases). Co-treatment or other chronic comorbidities were captured as part of the intake and follow-up packages. Patients in the two cohorts were not matched, but cohorts were combined based on their overall study design. The recruitment environment/process, inclusion/exclusion criteria, follow-up procedures, and analysis were similar enough to facilitate a combination of databases and potentially provide more generalizable results.
After being medically cleared, interested participants provided permission for a member of the research team to describe the study, answer questions, and obtain consent to participate before leaving the hospital. Participants were provided a package of self-report questionnaires to be completed and returned within 24 hours. While follow-up periods differed slightly between the two cohorts, consistent were follow-up within two-four weeks from inception, and again three and 12 months after injury.
Demographics and outcomes
The constructs being captured through self-reported questionnaires were similar enough to allow meaningful pooling of the two cohorts. Both studies captured demographic and social data including: age, sex, body mass index (BMI, kg/m2), work status, medicolegal status, and significant comorbidities (e.g. depression, existing pain conditions, etc.). The outcomes for defining recovery trajectory were pain severity using a 0-10 Numeric Pain Rating Scale (NPRS) (where 0 = no pain and 10 = extreme pain), and pain-related functional interference as measured by the Interference subscale of the Brief Pain Inventory (BPI (18), London cohort) or the Neck Disability Index (NDI (19), Chicago cohort). The BPI is one of the most widely used pain interference scales globally, (20) and has considerable evidence of validity across many clinical populations including MSK pain (21). The NDI is one of the most widely used region-specific scales for neck disorders and captures pain interference with function on most items. The two tools share several items including work ability, sleep, and recreation, but the NDI excludes walking interference which is less relevant to those with neck pain. Both the NDI and the BPI Interference subscale have demonstrated acceptable reliability, validity, and responsiveness for capturing interference (18, 19, 21-24) and both show a strong and similar correlation with pain severity rating scales (NDI: r = 0.64 (25), BPI: r = 0.67, current database). Both can easily be converted into a percentage of the total scale range (0% = no interference, 100% = complete interference) for meaningful pooling.
Intervention between follow-up periods occurred at the discretion of the participant and their healthcare providers without influence by the research team. Type of intervention was captured in general terms (e.g. physical therapy, chiropractic, pharmaceuticals, massage therapy, work hardening) for descriptive purposes as the balance of evidence available in the field does not consistently support the superiority of any one treatment modality over another (26-29).
Ethics approval was obtained by the respective research and hospital institutional ethics boards prior to recruiting participants into the study. Participants were reimbursed up to the equivalent of $240 Canadian dollars ($175 US dollars) for expenses and time incurred during participation across all follow-up periods.
ANALYSIS
Pre-Analysis
Participant demographics and baseline scores on the two outcomes were evaluated descriptively (frequencies, means, ranges). The primary (% Interference) and secondary (pain severity out of 10) outcomes were first explored for missing data and normality. Region of injury was coded according to the primary area of symptoms: presence of any head, neck or back injuries (regardless of any peripheral injuries) were classified as “axial” while those affecting the upper or lower extremities (shoulder, elbow, wrist, hip, knee, ankle) were classed as “peripheral”. As recovery was anticipated to occur in the majority of patients by 12 months, scores were square-root transformed across collection periods to reduce the skewness and kurtosis of the distribution to within acceptable limits for statistical modeling.
Growth Mixture Modeling
Maximum likelihood-based latent growth curve analysis (LGCA) using the GMM procedure in MPlus v6.12 software (Muthen & Muthen, Los Angeles USA) was conducted, following the steps of DiStefano and Kamphaus (30). A series of models were constructed for both % Interference and Pain Severity, starting with a base single trajectory model and increasing classes until i) the model fit no longer improved substantially, ii) the estimation could not derive a mathematically valid model, or iii) one of the classes possessed fewer than 10% of participants. The fit indicators of interest were the Akaike Information Criterion (AIC) (31-33), the Bayesian Information Criterion (BIC) (31-33), and entropy (32). While no set criteria exist for deeming an acceptable model fit (33), the cluster solution providing the lowest AIC and BIC and the highest entropy value (ideally >0.70) that also conforms to theory is generally considered optimal (34).
An additional statistical analysis was conducted using the k-means approach, where the Lo-Mendell-Rubin Adjusted Likelihood Ratio Test (LMR-LRT) (31, 33) was used to statistically compare the fit of the k cluster solution (e.g. 3) with that of the k-1 class solution (e.g. 2). When fit did not statistically improve (p>0.05) with the addition of a new class, the solution with the smaller number of classes is generally accepted for reasons of parsimony (33, 35). All models included a quadratic (non-linear) growth function, as pre-analysis revealed the quadratic growth factor was significant in the base model. ‘Region’ (axial or peripheral) was then included as a covariate in each model to control for the effect of the different cohorts and tools. After identifying the optimal model, each participant received an assignment to the most likely trajectory (termed ‘class’) based on the highest posterior probability from the modeling procedure.
Missing data
Only those participants with at least two data points were included in the modeling procedure. Maximum Likelihood Estimation (MLE)-based curve estimation is useful for missing data as it uses all available data to estimate an appropriate trajectory to describe the entire sample, if the data are missing at random (36, 37). For evidence of randomness, independent samples t-tests were conducted between those with full datasets and those with missing data. Any differences in baseline participant characteristics (i.e. age, sex, BMI, pain severity or interference) were examined for the presence of potential systematic biases in those lost to follow-up.
Model validation
To improve confidence in the model structures, we statistically compared the observed data (using all data) to the MLE-based estimated values from the non-region-adjusted modeling procedure using paired samples t-tests. No significant difference between observed and estimated values would indicate the predictive model was adequately accurate for estimating the data including missing values.
Description of trajectories
Forward-entry binary logistic regression was conducted to determine the extent to which sex, age, BMI, and region of injury (in that order) could predict trajectories using odds ratios as a standardized metric of effect size comparison across the variables. Age and BMI were dichotomized through median split (Age ≤38 years, BMI ≤25.09 kg/m2). For this analysis, the anticipated ‘worst’ (persistent problems) trajectory was coded as the state to be predicted against any other trajectories that emerged. Model fit was explored according to Peng and colleagues (38) using the Hosmer-Lemeshow (H-L) test wherein a non-significant effect indicates good fit to the data, supported by the Nagelkerke R2 and overall accuracy of the regression-predicted trajectory classifications. The supporting indices were needed as there were fewer than five groups to be predicted rendering the H-L test vulnerable to potentially biased estimates. (38)
Sample size estimation
Sample size estimates for GMM are difficult to establish through traditional effect size metrics, though previous studies investigating pain recovery trajectories using GMM and ANOVA-based approaches have identified distinct classes with a medium-sized standardized difference between the classes.(39) The planned intention to conduct between-group regression analyses led to an a priori decision that a minimum of 30 participants in the smallest class was necessary. Based on work of prior authors, we anticipated the smallest class to represent about 15% of the sample, leading us to target a total sample of 200 usable datasets.