Summary of main findings
We have developed and validated prediction models, based on a combination of a patient’s demographic details, health status and job-type, to estimate an individual’s risk of any work absence over 6 and 12 months, and their expected level of presenteeism over 6 months, following consultation in primary care for MSDs. A total of 10 predictor variables were included for predictions of 6-month absence and 6-month presenteeism, including patient demographics (age, sex, occupational class [(i) managerial/administrative/professional; (ii) intermediate; (iii) routine and manual]), pain features (multisite or single site pain, pain intensity at baseline, pain duration), comorbidities (anxiety/depression, any other comorbidity), work absence in previous six months and current performance at work. For prediction of 12-month absence, predictors regarding previous absence and comorbidities were not consistently recorded across studies, leaving just eight available predictors. These models may be implemented during a consultation with a primary care clinician, to help inform targeted interventions designed to reduce the impact of MSDs on work absence and productivity.
On IECV, the six-month absence model was well calibrated on average, with a pooled calibration slope of 0.93 (95%CI: 0.41 to 1.46, \({\tau }^{2}\)= 0.123) and a CITL of 0.03 (95%CI: (-1.37 to 1.44), \({\tau }^{2}\)= 1.23), although substantial heterogeneity in calibration performance was seen across studies. In particular, considerable overestimation of absence risk was seen when the model was applied in the KAPS study alone. This is possibly due to the lower prevalence of 6-month absence in the KAPS population, with only 3% (21 individuals) with an absence over 6-months following consultation. The model’s discrimination performance was more consistent, however, with a pooled C-statistic of 0.76 (95%CI: 0.66 to 0.86, \({\tau }^{2}\)= 0.004). The results across IECV cycles suggest the model can separate well between those who went on to take work absence and those who did not across populations, despite some variation in model calibration (see Figure S2a).
Calibration performance of the model to predict continuous presenteeism at six months was good on average, with a pooled calibration slope and pooled CITL of 1.00 (95%CI: 0.82 to 1.19, \({\tau }^{2}\le\) 0.001) and 0.01 (96%CI: -0.70 to 0.72, \({\tau }^{2}\)= 0.067) respectively. While this model predicted well on average, across the whole population, it is worth noting that there was still considerable variation evident between the predicted and observed presenteeism scores on the individual level.
Full IECV was not possible for the 12-month absence model, as only two studies were available with the outcome measured at this time point. On internal validation, the model showed a large amount of overfitting to the model development data (heuristic shrinkage = 0.7832). After adjusting for this overfitting, the model showed reasonable discrimination across both studies, with a C-statistic of 0.737 (95%CI: 0.69 to 0.79), although it was miscalibrated at the extremes, with evidence of underestimation of risk in those at higher risks.
Strengths and limitations of this work
An important strength of this study was the sample size available for development of most models. For the primary objective (to model any work absence over 6 months of follow-up), the model development sample was sufficiently large to meet current minimum sample size recommendations (30), while incorporating the desired clinically important predictors, identified through the literature (PROSPERO CRD42020219452) and through expert clinical opinion. Similarly, there was more than enough data to meet the recommendations for the secondary analysis of modelling presenteeism at six months, while including all chosen predictors (31).
Unfortunately, insufficient data were present in the available studies to meet the recommended minimum sample size for the development of a model to predict 12-month absence. These results should be viewed with caution with further external validation, as it is possible the model would not perform well in new individuals.
Model development for all outcomes involved using mixed populations through multiple studies, all with similar recruitment regions and inclusion criteria, methodologies included one cohort study, RCTs and cluster RCTs which means the influence of treatments should be minimised due to randomisation (see Tables 1 and 2 for study descriptions). The generalisability of RCT populations can be limited, however the trials included in this analysis all used broad inclusion criteria meaning the design is unlikely to greatly affect generalisability of the models. This meant that we were able to test model generalisability across similar populations through use of IECV, whilst still allowing maximum data to be used for model development. Although model generalisability has been tested there was variability in calibration, further external validation in populations of different compositions e.g., different healthcare settings, or those with specific MSDs will be important, to provide information regarding the model’s transportability.
One further limitation to generalisability is that the populations included in these studies contained disproportionately high numbers of people in the higher socioeconomic status banding, compared to the population in North Staffordshire where the studies were based. Thus, it is possible that the models may not perform as well in those with a lower socioeconomic status, as they were underrepresented at model development.
While there are strengths in terms of participant numbers, there were some drawbacks to having developed these models across multiple previous studies, primarily that we were restricted to the use of predictors that were consistently collected. While some important predictors were able to be accommodated through multiple imputation of systematically missing information (32, 33), it is possible that some important and strong predictors of work outcomes may have been omitted from model development due to them not having been collected or recorded consistently across enough of the included studies. For example, for the model predicting 12-month work absence, previous work absences were not recorded at consistent time-points across the studies used for model development and so this predictor could not be included, despite being the strongest predictor of absence within 6 months.
All predictors included in these models were self-report measures, but omit any information arising from the objective patient assessment. This decision was made due to a lack of a standardised assessment items for patients consulting in primary care with MSD, and the availability of self-report items being more consistent across our model development studies. In practice, primary care clinicians assess MSD patients in varied ways and our use of self-report predictors could enable consistent application of models, despite wide variation in objective assessments. Indeed, previous work suggests that clinical examination and imaging results add little to predictions in patients with low back pain, over predictors such as age, pain features, or depression, all of which we included in our models (42, 43). Predictor variables were reported 1–4 weeks after consultation due to recruitment methods. While the values of many of the included predictors would not be expected to vary over short periods of time, some (e.g., current pain intensity) would. It is recommended that prognostic variables are collected at consultation as this is the point at which decisions about interventions are most commonly made, therefore testing these models at the point of consultation is a key next step to assess their usefulness in practice and to guide timing of their use (44, 45).
Regarding the prediction of 12-month absence, model development was conducted in the limited population of patients who reported they were still working one year after consultation; thus, predictions can only be assumed to apply to an individual if they were to still be in work at that time. While ensuring the population was clearly defined, this restriction resulted in our model missing those who were no longer working but had nevertheless taken an absence during the 12-month period where they had still been at work. This omits the higher risk, and arguably more relevant population, of people who stop work within the year due to their MSD.
Comparison with previous literature
Predicting work absence in patients with MSD is challenging, with other studies in this field finding that there are many and varied predictors, often with inconsistent measurement of predictor variables between studies (46–48). A number of models have been developed that also predict work outcomes, the most commonly used is the Orebro Musculoskeletal Pain Questionnaire (20, 49, 50). The full version of this questionnaire contains 25 questions (of which 21 are scored) and it has good predictive ability with a sensitivity of 89% and specificity of 65% for absenteeism. The short-form version of the Orebro questionnaire includes 10 questions and has been demonstrated to be useful as an early screening tool in primary care with a sensitivity of 0.75 and specificity of 0.78 (21). We were unfortunately unable to validate either form of the Orebro questionnaire in our populations, as necessary predictor items had not been measured, thus we were unable to directly compare the performance of our new models with that of Orebro.
Whilst there are some similarities in the measures included in Orebro (e.g. pain intensity, anxiety/depression), there are also some differences in particular the populations in which each model was developed, with Orebro developed in a population with back pain only. Whilst the Orebro has been used in patients with other regional pain symptoms, specifically neck and shoulder pain, its predictive ability with broader MSDs and multisite pain is not clear. Other models also demonstrate reasonable predictive ability but again these are lengthy and do not always include previous work absence (19, 51, 52). Furthermore, none of these models predict presenteeism, which may have a bigger impact on work than absenteeism (22). However, in line with our findings, other research has consistently identified previous sickness absence as being strongly associated with future absence, indicating that this variable is an important predictor of future work absence (51, 53–56). The Orebro questionnaires do not include previous work absence but do include a measure of “work expectation” in 3-months, and various measures of self-perceived/rated work ability have been demonstrated to predict absence over time (54).
Implications for policy and practice
By understanding which patients are at higher risk of work absence, preventative consultations (or targeted vocational advice and support interventions / referral to occupational health services) may be put in place and support the appropriate use of scarce vocational resources (57). Discrimination performance of our work absence models is consistent with that of previous models to predict work absence in more general populations (54, 58), with the benefit of being tailored specifically to the MSD population. Identification of patients at risk of work absence may be useful for targeting work-focused interventions and occupational health strategies, such models could be valuable when used as a part of a wider assessment of patients with MSD to ensure appropriate support is offered.
To our knowledge this is the first time presenteeism has been predicted, it is more commonly included as a predictor variable in prognostic models of absence (59) or has been predicted using a very narrow prognostic model focusing on one concept rather than including a wider range of prognostic factors (60). Predicting presenteeism may be useful to identify those patients who are likely to struggle with their MSD at work, allowing the patient and clinician to be aware of the potential risk of absence and to plan mitigating strategies before absence occurs.
Future research should focus on a comparison between our model and the Orebro questionnaires in addition to further validation of the 6-month absence and presenteeism models with other MSD populations in other settings, to ensure the models sufficiently and accurately predict work absence and performance across settings. Additionally, these models would need to be tested in clinical practice to explore their usability and feasibility in the practice setting, prior to considering wider implementation. The impact of the proposed models could then be considered in terms of potential for targeting interventions, and effects on patient and cost outcomes.