The initial search retrieved 2,744 publications. After removing duplicates, and limiting to ‘English’ and ‘human’, the results numbered 2,019. Of the 2,019 results, 1,997 studies were judged, via the title and abstract, as not relevant to the scope of this review. The flow chart diagram of the included studies and the reason that studies were excluded is presented in Figure 1.
The remaining 22 full-text articles were assessed for eligibility. Only five studies fulfilled our inclusion criteria. Seventeen studies were excluded and details of the excluded studies and the reasons for exclusion are described in Addional file 2 .
1.3.1 Study characteristics
Five studies were included in the review; two studies were conducted in Norway (19, 20), two in the Netherlands (21, 22) and one in the UK (23). These studies were published between 2002 and 2014. The characteristics of the included studies are summarised in Table 1. The total number of participants in the five studies was 1,023, with the sample size ranging from 134 to 349 people. Two of the studies only included adults in employment taking sick leave due to CLBP (20, 21). The mean age of the participants ranged from 41 to 45.5 years, with equal gender distributions. The mean duration of CLBP symptoms ranged between 5 and 8 years.
Table 1 Characteristics of the included studies
Study ID
|
Setting
|
Target population/sample size
|
Intervention
|
Comparator(s)
|
Skouen 2002 (20) Norway
|
Outpatient spine clinic
|
Patient sick listed for at least 8 weeks or employees not on sick leave but sick listed for at least 2 months /year
n=195
|
Light multidisciplinary programme (assessment by physiotherapist and psychologist + individual education and feedback)
Extensive multidisciplinary programme (4-week programme - 6 hr session 5 days/week) consisting of CBT, education, exercises, work place intervention)
|
Usual care
|
Rivero-Arias 2005 (23) United Kingdom
|
Secondary care
|
18-55 years with CLBP > 1 year
n=349
|
Intensive rehabilitation programme (education + exercise) led by a physiotherapist and clinical psychologist
|
Spinal stabilisation surgery
|
Smeets 2009 (22) Netherlands
|
Primary care
|
Age between 18-65 years with CLBP>3 months with RMD score >3
n=172
|
Active physical treatment + graded activity + problem training
|
2 comparators: active physical treatment and graded activity plus problem training
|
Lambeek 2010 (21) Netherlands
|
Primary and secondary care
|
Adults (18-65) with CLBP in paid work for at least 8 hr/week on partial sick leave
n=134
|
Integrated care programme (workplace intervention + graded activity)
|
Usual care
|
Johnsen 2014 (19) Norway
|
Hospital
|
Adults with CLBP > 1 year with degenerative change in lumbosacral intervertebral disc n=173
|
Multidisciplinary rehabilitation (outpatient programme focussed on exercise and CBT delivered by a physiotherapist and physical medicine specialist)
|
Total disc replacement surgery
|
CLBP = chronic low back pain, CBT = cognitive behavioural therapy, RMD = Roland Morris Disability.
|
A wide range of outcome measures were used in the studies included in this review. Return-to-work (RTW) was the primary outcome in three studies (20, 21, 23). Four studies used two measures for disability (19, 22-24). Regarding the baseline measures of disease severity, the mean functional disability score using the Roland Morris Disability Score (RMDS) (25) was 14, the score can range between 0-24 (21, 22). Whereas the Oswestry Disability Index (ODI) (26) mean score was 43 (19, 23), the ODI score ranges between 0-100. The mean pain intensity score from the three studies was 5.8, with the possible ranges of pain intensity being between 0-10 (19, 22, 24). For all measures, the high scores identify the greatest disability and pain. The mean EQ-5D-3L (27) score ranged from 0.26-0.49 (19, 22, 23). A generic health status measure, generally, the possible ranges for the EQ-5D-3L are between 0 and 1, where high scores mean better health status. A further generic health status measure, the Short Form 6D (28), was also used in conjunction with EQ-5D-3L in one study (19).
One study looked at costs from the patient and healthcare provider perspectives (23), while the remaining studies were conducted from the societal perspective (19-22). The length of follow-up was between 12 and 24 months.
The PMS consisted of combinations of cognitive behavioural therapy, physical therapy and workplace interventions. Two studies compared PMS with usual care (20, 21), two with surgery (19, 23) and one with physical treatment and graded activity (a treatment that includes behavioural and cognitive methods to improve activity endurance) (two comparators) (22). The outcomes were return to work (20, 21), quality-adjusted life-years (QALYs) using EQ-5D-3L (19, 22, 23) and disability using RMDQ (22) and ODI (19, 23).
1.3.2 Design and description of pain management services
The studies were delivered in a secondary care setting only (19, 20, 23), primary care only (22), or a combined setting (21). All studies clearly described the service in terms of treatment modalities and the staff involved in delivering these services. However, in some studies the duration of treatment varied between people within the study (19, 21). Among the included the duration ranged between one and three months and the total number of hours ranged between 3 and 75 hr. In another study, the intensity of treatment in terms of the total hours provided was not consistent between individuals in the study (23). Study participants were generally working-age adults; two studies focused on employees with CLBP(20, 21) and no studies included people above 65 years old.
The description of services provided in the included studies is summarised in Table 2
Table 2 Service description in the included studies
Study ID
|
Staff delivering the intervention
|
Intervention description
|
Number of hours/day and duration
|
Notes
|
Skouen 2002 (20) Norway
|
PT
Nurse
Psychologist (if necessary)
|
Light intervention:
Assessment, a lecture, three individual sessions and individually-based graded exercise.
|
Assessment (1-2 hours)
Lecture (1 hour)
Total for individual sessions (45 minutes)
|
Some people were offered visits to an external PT, local National Health Insurance and workplace visits.
|
Extensive intervention:
Educational sessions, exercises and (occasional) workplace intervention
|
6 hours, five days a week for four weeks.
|
Rivero-Arias 2005 (23) United Kingdom
|
PT and clinical psychologist
|
Education
Exercises and hydrotherapy
|
5 days a week for three weeks
The total hours were 60-110 (75 hr on average)
|
One of the centres did not have a psychologist or hydrotherapy sessions.
|
Smeets 2009 (22) Netherlands
|
PT
Clinical psychologist social worker
|
Physiotherapy
Graded activity with problem-solving training
|
Physiotherapy: 105 min three times a week for 10 weeks
Graded activity: 19 sessions starting from 3rd week.
|
|
Lambeek 2010 (21) Netherlands
|
OT physician
PT
OT
medical specialist
|
Workplace intervention
Graded activity
|
Graded activity (26 sessions) for three months or until RTW
|
Coordinated
|
Johnsen 2014 (19) Norway
|
PT
Specialist in physical medicine
|
Lectures and individual discussions
Daily workout
|
60 hours over 3-5 weeks
|
Nurse, psychologist or social worker as needed
|
PT= Physiotherapist, OT= Occupational therapist
|
1.3.3 The comparator
Two studies, which were conducted in the Netherlands and Norway, compared PMS with “standard care” (20, 21). In Lambeek et al., standard care consisted of family physician visits, in addition to occupational therapist consultations, provided in a primary care setting (21). In Skouen et al., standard care consisted of examination in the spine outpatient clinic by a physician, followed by referral back to the GP (20).
Two studies compared the effect of PMS with surgery (19, 23). The surgical procedures were total disc replacement (19) and spinal stabilisation (23). As surgical options are usually reserved for the severest cases, the patient populations in these studies are likely to be different from those where GP/medical management is offered as standard care (20, 22, 24). This is demonstrated by the increased pain intensity and lower quality of life at baseline in the surgical studies. (19, 23). The mean baseline utility scores (EQ-5D) were 0.26 and 0.49 in studies assessing surgery (19) and non-surgical treatments (22) as comparators, respectively. Similarly, the baseline pain intensity score was 6.9 in the study assessing surgery (19), while in the study assessing standard care, the baseline score was 5.7 (21). Therefore outcomes achieved from referral to PMS are not comparable to other studies due to higher baseline pain and disbility levels.
1.3.4 Methodological design
All of the economic evaluations were conducted alongside RCTs. The risk of bias assessment of the included studies is described in Table 3.
Table 3 Risk of bias assessment according to the Cochrane Back Review Group (CBRG)
Risk of bias item
|
Skouen 2002 (20)
|
Rivero-Arias 2005 (23)
|
Smeets 2009 (22)
|
Lambeek 2010 (21)
|
Johnsen 2014 (19)
|
Was the method of randomisation adequate?
|
Low
|
Low
|
Low
|
Low
|
Low
|
Was the treatment allocation concealed?
|
Low
|
Unclear risk
|
Low
|
Low
|
Low
|
Was the patient blinded to the intervention?
|
Not possible
|
Was the care provider blinded to the intervention?
|
Was the outcome assessor blinded to the intervention?
|
Was the dropout rate described and acceptable?
|
Low
|
High
|
Low
|
Low
|
Low
|
Did the analysis include intention-to-treat analysis?
|
Low
|
Low
|
Low
|
Low
|
Low
|
Are the study reports free of suggestion of selective outcome reporting?
|
Unclear risk
|
Unclear risk
|
Unclear risk
|
High
|
Unclear risk
|
Were the groups similar at baseline?
|
Unclear risk
|
Low
|
Low
|
Low
|
High
|
Were co-interventions avoided?
|
Unclear risk
|
High
|
High
|
Unclear risk
|
Unclear risk
|
Was the compliance acceptable in all groups?
|
Unclear risk
|
High
|
High
|
High
|
Unclear risk
|
Was the timing of outcome assessment similar in two groups?
|
Low
|
Low
|
Low
|
Low
|
Low
|
Summary risk of bias
|
HIGH
|
HIGH
|
LOW
|
LOW
|
HIGH
|
Unclear = item not reported clearly.
The study will be considered to have a low risk of bias if 6 or more items are satisfied, otherwise it will be considered to have a high risk of bias.
|
Two of the five studies were considered to have low risk of bias (21, 22). The major strengths in all of the included studies were that the methods of randomisation and allocation were clear. Moreover, intention-to-treat analysis was considered in the statistical analysis for missing data.
High risk of bias was identified in three studies (19, 20, 23). Two studies reported that the intervention group received extra visits to physiotherapists and other healthcare professionals compared with the standard care arm (22, 23). Hence, the intervention group might have had better outcomes, compared with standard care, because of these additional visits. One of the important aspects in assessing the quality of RCTs is the sample size and statistical power (29). Four out of five studies were sufficiently powered (power threshold 80%) to detect a difference in functional disability using the ODI (19, 23), RMDS (22) or return to work (21).
Although all included studies incorporated RCTs, randomisation by itself does not ensure that the baseline characteristics of the study participants in the comparator groups are similar (29). Knowing this information is essential to demonstrate that the participant response to treatment is directly attributed to the intervention effect and not to other patient-related factors. Adjusting effect size for baseline characteristics should be performed using statistical methods, generally regression. In our review, only one study (22) performed regression to adjust effect size for baseline characteristics. Furthermore, two studies clearly reported that there was a significant difference between the study participants at baseline, which they did not then go on to adjust (19, 20).
The quality of reporting economic evaluations in terms of costs and outcomes is reported in Table 4, while the details of sensitivity analysis and the results are summarised in Table 5.
Table 4 Assessment of economic evaluations based on CHEERS criteria (inputs to economic evaluation: costs and outcomes)
Study ID
|
Currency/year
|
Direct cost
|
Indirect costs
|
Time horizon
|
Health outcome
|
Valuation of preference outcomes
|
Skouen 2002 (20)
|
Norwegian Krone, price of clinic in 1996, no inflation
|
Top down approach
|
Yes
|
26 months
|
Return to work
|
NA (utilities were not collected)
|
Rivero-Arias 2005 (23)
|
2002-2003 GBP inflated to base year (2005)
|
Bottom up approach
|
Yes. costing total hours worked by each patient
|
24 months
|
Return to paid employment, total hours worked, utility using EQ5D
|
Social tariff from representative sample of UK population
|
Smeets 2009 (22)
|
2003 Euros
|
Top-down approach and costing diaries
|
Yes, using human capital approach
|
12 months
|
Disability using RMDQ,utility using EQ5D
|
AUC, population and techniques were not described
|
Lambeek 2010 (21)
|
Index year 2007 (Euro converted to GBP)
|
Bottom up approach
|
Yes, using human capital approach
|
12 months
|
Return to work, utilities using EQ5D
|
Dutch tariff however no description of population or methods used
|
Johnsen 2014 (19)
|
Norwegian Krone with 2006 as a base year. Costs were adjusted for inflation into 2012 prices and converted to Euros using the rate 1 €=6.7 Kr2006
|
Top-down approach and costing diaries
|
Yes, using human capital approach
|
24 months
|
Utilities using EQ5D and SF-6D
|
QALY was estimated as AUC using trapezoidal method. Population and techniques were not addressed.
|
AUC = area under the curve, CE = cost effectiveness, CB = cost benefit, CU = cost utility, QALY = quality adjusted life years, EQ5D = EuroQol 5 dimensions, SF-6D = Short Form 6 dimension, GBP = British pound, NA= not applicable.
|
Table 5 Economic evaluation based on CHEERS criteria (statistical analysis and results)
Study ID
|
Statistical analysis of cost data
|
Statistical analysis for missing data
|
Incremental economic analysis reported
|
Sensitivity analysis
|
Difference in outcome
|
Difference in costs
|
Difference in outcome and costs
|
Skouen 2002 (20)
Norway
|
Not reported, only mean difference in outcome evaluated by ANOVA. RR and 95% CI for the effect of intervention versus comparator on outcome.
|
Not reported
|
productivity gain = NoK 7858100 - cost, the net productivity gain = 240900
|
Not performed
|
Yes
|
Not reported
|
No
|
Rivero-Arias (23) 2005
UK
|
Arithmetic mean for cost and outcomes
|
Intention to treat and multiple imputation (variance correction)
|
Bootstrapping (1000 replications)
|
Yes
|
No
|
Yes
|
Yes
|
Smeets 2009(22)
Netherland
|
student t test for change in outcomes, linear regression to adjust for baseline differences
|
Intention to treat analysis, missing data were replaced by mean score of cost and utility
|
Bootstrapping (1000 replications)
|
Yes
|
No
|
No
|
No
|
Lambeek (21) 2010
Netherlands
|
Not reported
|
Intention to treat analysis, multiple imputation
|
Bootstrapping (5000 replications)
|
Yes
|
Yes
|
Yes
|
Yes
|
Johnsen 2014 (19)
Norway
|
Student t test for cost and utilities
|
Intention to treat analysis,
multiple imputation
|
Bootstrapping (10,000 replications)
|
Yes
|
Yes
|
Yes
|
Yes
|
ANOVA = analysis of variance, RR = relative risk, CI = confidence interval, MDT = multidisciplinary treatment, NoK = Norwegian Krone
|
1.3.5 Healthcare resource use and cost
In this review, all of the studies included direct medical and indirect costs. Four studies included direct non-medical costs, such as travel expenses (19, 21-23). Four studies took the societal perspective (19-22) and one study took the healthcare provider perspective (23). Although the last study stated that they conducted their evaluation from a healthcare provider perspective, indirect costs were calculated. Although Skouen et al. stated that their study took a societal perspective (20), direct non-medical costs were not collected.
There are two methods of assessing the service costs, the top-down and the bottom-up (micro-costing) approaches (30). The top-down approach divides the total budget of a health intervention by the total number of people to give an “average” estimate of cost per patient, whereas the bottom-up approach uses patient-level resource use data to generate costs. The latter is the preferred method in economic evaluations to account for variations in costs between study participants (30). In this review, three studies used the top-down approach (19, 20, 22), while two studies used the bottom-up method (21, 23). The method of collecting costs was implied, rather than clearly stated, in three studies (20, 22, 23). Only two studies clearly reported all resource use and their unit costs (21, 22). In two studies, some unit costs were missing (19, 23) and Skouen et al. did not report unit costs (20).
In this review, two studies used postal questionnaires to collect resource use data from people (21, 23), which might be subject to recall bias, especially if the recall period is more than three months (31). In Lambeek et al., the recall period was three months (21) while, in Rivero-Arias et al., the recall period was six months and one year (23). Two studies used costing diaries to collect resource use data (19, 22). Costing diaries aim to collect data prospectively, which reduces the risk of recall bias. To minimise the risk of incompletion, regular telephone reminders are recommended but neither of the studies using diaries reported providing reminders (19, 22).
Productivity loss due to illness can be accounted for by absenteeism, the inability to attend work, and presenteeism, the reduced functionality in terms of quality and quantity while working (32, 33). Productivity loss can be measured either objectively, by using attendance records, or subjectively, using self-report by the employee (32). These methods have some limitations; objective measures might be inaccurate for assessing presenteeism, as they only record employee attendance, with no emphasis on productivity levels in terms of quality.
All studies assessed the effect of PMS on productivity loss. Absenteeism was the only work outcome evaluated. Four studies clearly reported their methods of collecting productivity loss (20-23). Although the appropriate recall period is still inconclusive, three months’ recall for absenteeism and one week for presenteeism is recommended (32). Two studies used “monthly” self-reported methods, utilising calendars (21) and diaries (22). In another study (23), the employment status was self-reported over a relatively long period of six months and one year with insufficient information about the measurement method to assess quality. An objective measure was used in one study (20), which utilised the national health insurance registry to assess sickness absence. Johnsen et al. (19) did not report the method of data collection.
In order to value productivity loss among employees, the “human capital approach” and the “friction cost method” can be used (32). As the friction cost approach can produce lower estimates of cost, it is recommended to use both approaches to determine any methods-dervied difference. Four studies used the human capital approach alone to value productivity loss (19-22). In Rivero-Arias et al. study the productivity was assessed by calculating the total hours worked by each patient at baseline, six,tweleve and twenty four months (23).
1.3.6 Statistical analysis
The statistical analysis of patient-level cost data needs to be adjusted from standard approaches as cost data are generally “positively skewed”, because a small number of people usually require more healthcare resources (30, 34). Non-parametric tests rely on medians and distributional shape. Non-parametric bootstrapping with replacement is the preferred method to analyse cost data because it compares arithmetic means while avoiding distributional assumptions. Standard parametric tests can be used to analyse cost data only if the sample size is large, where skewness will not affect the validity of the analysis. Barber et al. reported that, for sample sizes larger than 150 participants (35), the t-test is usually robust and valid, as parametric assumptions will generally hold. In this review, two studies used non-parametric bootstrapping to test the difference in cost (21, 22), whereas two studies with larger sample sizes, 349 (23) and 173 (19), used parametric t-tests for cost analysis. Skouen et al. did not analyse differences in costs (20).
Discounting is used to estimate the future value of outcomes and costs and assumes present outcomes and costs are considered more valuable than those in the future(30). Future costs and outcomes should be discounted where follow-up is longer than one year, using nationally preferred discount rates. Lack of discounting can lead to inaccuracy in estimating the cost-effectiveness results . In this review, three studies had interventions that continued for two years (19, 20, 23), two of which reported the discount rate according to the country-specific rates (20, 23).
1.3.7 Dealing with uncertainty
The incremental cost effectiveness ratio (ICER) is the main summary measure of an economic evaluation and is the difference in cost divided by the difference in effect (outcome) between two interventions (30). The base case analysis generates the ICER from the preferred outcome and cost data. Sensitivity analysis is used to test the sensitivity of the ICER to variation in cost and outcome parameters used in the base case analysis (30, 34). In one-way sensitivity analysis, one parameter is changed at a time to test the results. Multiple-way analysis changes multiple parameters at the same time. Although one-way sensitivity analysis is easy and understandable, it can underestimate total uncertainty in the ICERs (36).
Probabilistic sensitivity analysis (PSA) assumes that the values of input cost and outcome variables have a probability distribution. Probabilistic incremental economic analysis is usually carried out using bootstrapping to generate credibility intervals that provide a quantitative measure of uncertainty around ICER point estimates (“expected value”). For the graphical representation of ICERs, cost effectiveness planes are used to present the distribution of bootstrapped ICERs (30). Another common graphical presentation used in economic evaluation is the cost effectiveness acceptability curve (CEAC) (30). The CEAC is a technique for representing information on uncertainty in cost-effectiveness. A CEAC demonstrates the probability that an intervention is cost-effective compared with the substitute, given the observed data, for a range of maximum monetary thresholds that policy makers are willing to pay for a specific unit change in effect (37).
In this review, all studies carried out one-way sensitivity analysis. Four studies generated ICERs using bootstrapping (19, 21-23) and three of them presented ICERs on cost effectiveness planes (19, 21, 22). CEACs were used in these studies to present the probability of cost effectiveness (19, 21-23).
1.3.8 Cost-effectiveness of PMS
The ICERs generated by the studies are summarised in Table 6 . Only one study concluded that PMS dominates usual care (more effective and less costly) (21). Skouen et al. concluded that multidisciplinary services are cost-effective in men only (20). However, this conclusion needs to be interpreted with caution given that top-down costs were used and there was no sensitivity or statistical analyses reported. Two studies reported that PMS are cheaper and less effective than surgery (19, 23). Therefore, a trade-off between cost and effect needs to be considered. Smeets et al. compared PMS with active physical treatment (APT) and graded activity plus problem solving (GAP) (22). In this study, the PMS was dominated when compared with GAP, while it was cheaper and less effective when compared with APT.
Table 6 Summary of incremental cost effectiveness analysis results
Study ID
|
Intervention/Comparator
|
Outcome
measure
|
Intervention cost
|
Cost difference
|
Outcome difference
|
ICER
|
Skouen 2002 (20)
Norway
|
Multidisciplinary treatment vs usual care
|
Return to work
|
NoK 9023
|
Not reported
|
Net productivity gain NoK 7858100 (USD 924500)
|
Net productivity gain = NoK 7240900(USD 852000)
|
Rivero-Arias 2005 (23)
UK
|
Intensive rehabilitation vs surgical stabilisation
|
QALY
|
£1410
|
- £3304
|
- 0.068
|
48588 £ per QALY
(CE threshold 20,000-30,000£ per QALY)
|
Smeets 2009 (22)
Netherland
|
Combined treatment vs active physical treatment
|
RMDS
|
Not reported
|
- €456
|
- 1.23
|
371 € per one point reduction in RMDS
|
QALY
|
0.014
|
35060 € per QALY
|
Combined treatment vs graded activity plus problem solving
|
RMDS
|
€4765
|
- 1.27
|
(dominated)
|
QALY
|
- 0.045
|
(dominated)
|
Lambeek 2010 (21)
Netherlands
|
Integrated care vs usual care
|
Return to work (days)
|
£1077
|
£217 (direct cost)
|
- 68
|
-3* £ per one day earlier return to work
|
QALY
|
- £5310 (total cost)
|
0.09
|
(dominant)
|
Johnsen 2014 (19)
Norway
|
Multidisciplinary rehabilitation vs total disc replacement
|
QALY using EQ-5D
|
€5977
|
- €13506
|
- 0.34
|
39748* € per QALY
|
QALY using SF-6D
|
0.11
|
128328 € per QALY
(CE threshold €74,600 per QALY)
|
*Cost effective, NoK = Norwegian Krone, QALY = quality adjusted life years, USD = United States Dollar
|