Collider Bias in Administrative Workers’ Compensation Claims Data: A Challenge for Cross-Jurisdictional Research

Workers’ compensation claims consist of occupational injuries severe enough to meet a compensability threshold. Theoretically, systems with higher thresholds should have fewer claims but greater average severity. For research that relies on claims data, particularly cross-jurisdictional comparisons of compensation systems, this results in collider bias that can lead to spurious associations confounding analyses. In this study, I use real and simulated claims data to demonstrate collider bias and problems with methods used to account for it. Using Australian claims data, I used a linear regression to test the association between claim rate and mean disability durations across Statistical Areas. Analyses were repeated with nesting by state/territory to account for variations in compensability thresholds across compensation systems. Both analyses are repeated on left-censored data. Simulated claims data are analysed with Cox survival analyses to illustrate how left-censoring can reverse effects. The claim rate within a Statistical Area was inversely associated with disability duration. However, this reversed when Statistical Areas were nested by state/territory. Left-censoring resulted in an attenuation of the unnested association to non-significance, while the nested association remained significantly positive. Cox regressions with simulated claims data demonstrated how left-censoring can reverse effects. Collider bias can seriously confound work disability research, particularly cross-jurisdictional comparisons. Work disability researchers must grapple with this challenge by using appropriate study designs and analytical approaches, and considering how it affects the interpretation of results.


Introduction
Collider bias is the problem of conditioning statistical analyses on a variable that is caused by two or more other variables, resulting in spurious associations [1][2][3]. Known as conditioning on the collider, this can occur when statistical analyses include inappropriate controls or when they are conducted on a restricted sample or population [4]. Workers' compensation data, which includes only a subset of injured workers, is an example of the latter. It may be a particular problem for cross-jurisdictional comparative research, the focus of this special issue. Yet to date, collider bias has not received wider recognition in work disability research. One obstacle is that collider bias is unintuitive and often needs seemingly paradoxical examples to illustrate. In this paper, I demonstrate collider bias using simulated and real data of well-known examples from other fields, then use workers' compensation data to show how the problem can present in the study of work disability.

Motivating Examples
In the general population, beauty and talent are unrelated. But it seems the opposite is true among stars in Hollywood: greater beauty predicts less talent [4]. The reason is that stardom is conditional on some combination of beauty and talent. The more beautiful need less talent, and vice-versa. I illustrate this using simulated data in Fig. 1.
Hollywood stardom is causally influenced by beauty and talent, which makes Hollywood stardom a collider. In the language of Directed Acyclic Graphs (DAG), this would be expressed as beauty → Hollywood stardom ← talent (for an introduction to DAGs, see Rohrer 2018 [1] or Pearl and Mackenzie 2018 [2]). Testing an association between beauty and talent among a restricted population of Hollywood stars introduces collider bias. In this example, we can observe excluded cases, which makes the bias in Fig. 1's right-hand plot obvious. It also shows how subjective and difficult-to-measure concepts like beauty and talent can nevertheless have real world effects. The next example demonstrates collider bias using real world data [5].
Height is generally considered an advantage in basketball. Taller players should therefore score more points. But as Fig. 2 shows, this is not the case among professional players (p = 0.909) [6]. Once we understand that we are looking at a dataset that has been conditioned on extreme ability, which is necessary to become a professional player, it becomes clear that whatever advantage height provides, other skills can compensate. If skill was quantifiable and the data were available, we would almost certainly see that it is inversely associated with height and has no association with points per game among professional players. In this case, the DAG language would describe the collider as: height → professional basketball player ← skill.

Workers' Compensation Claims Data and Collider Bias
In work disability research, injured workers are often identified through claims data. However, only a fraction of workplace injuries become a claim. Of an estimated 563,000 Australians who experienced a work injury in 2017/18, only 174,000 (31%) applied for workers' compensation and 154,000 received it (27%) [7]. Compensated injuries skew towards the more severe. The most common reason injured workers gave for not lodging a claim was because the injury was too minor (43%) [7].
Each compensation system has its own compensability threshold, a term I use to refer to the formal and informal settings that determine whether an injury becomes an accepted claim. Common in American compensation systems, waiting periods are a type of formal compensability threshold that are associated with longer disability durations [8]. In Australia, most compensation systems have employer excess periods, which are similar to waiting periods except employers are obligated to pay lost wages until compensation benefits start [9]. There are also informal practices such as purposely delaying liability decisions to cause injured workers to abandon their claim [10]. Others plausibly exist but are hidden because they are unethical or even illegal.
In the DAG language, the collider that shapes workers' compensation claims data can be described as: compensability threshold → compensation ← injury severity. Unfortunately, like talent and beauty in the Hollywood example, compensability thresholds and injury severity do not lend themselves to simple quantification. While injuries are often rated for degree of impairment to determine what benefits an injured worker is entitled to, scores are highly subjective and biased [11]. This study therefore relies on objective and measurable proxies. Claim rate stands in for compensability thresholds on the assumption that higher thresholds correspond to lower claim rates and vice-versa. Disability duration stands in for injury severity, which it is strongly related to [12,13]. The DAG to be tested in this study is the following: claim rate (proxy for compensability threshold) → compensation ← disability duration (proxy for injury severity). Compensation systems with higher compensability thresholds should have fewer claims but higher average severity and therefore longer disability durations.
Collider bias makes it extremely difficult to differentiate cohort-shaping effects (who gets into the compensation system) and outcome-shaping effects (how the system changes the individuals within them). As a result, there has been some conflation in work disability research. For instance, in a study of differences in sick leave across several European countries, the authors conclude that "less strict compensation policies to be eligible for long-term (partial) benefits, contributed to sustained RTW [return to work]." While making it easier to access compensation benefits could plausibly minimise the iatrogenic effects of being on compensation [14][15][16], it would also expand system access to those who would fare better anyway. Collider bias can make it impossible to tell the difference. For some purposes, this is not much of a problem. When the aim is to reduce system costs, it matters little if savings are achieved by restricting access or improving outcomes. But if the aim is to improve injured worker outcomes, it is essential to disentangle cohort and outcome-shaping effects.
In this study, I test whether claim rates and disability duration are in fact inversely correlated. I also demonstrate what happens when variations in compensability thresholds are accounted for by treating compensation systems as fixed effects. Finally, I test the effects of left-censoring disability duration, which I have previously applied to overcome the problem of variations in compensability thresholds [17][18][19]. I also test the approach of adding arbitrary low disability durations to medical-only claims to include them in survival analysis, which has been suggested as a way to overcome biases due to waiting periods /employer excess [20].

Setting
Australia has largely devolved workers' compensation to the states and territories, each of which maintains its own system. Comcare provides coverage of federal government employees and grants self-insurance licenses to certain interstate private employers [21]. Each system is cause-based, meaning compensation is only provided if the injury or illness can be demonstrably linked to work [22]. Collectively, these cover 94% of Australia's workforce [23].

Data
Claim records are from the National Data Set for Compensation-based Statistics (NDS), a minimum dataset compiled by Safe Work Australia from each compensation system [24]. Records are limited to those lodged between 2010 and 2015. As of writing, records are updated for six years up to July 2017, providing a minimum of 1.5 years of follow-up to calculate disability duration.
As the nine state, territory and Comcare compensation systems provide insufficient data points for analysis, claims were aggregated at Statistical Area of residence at Level 4 [25]. Within each Statistical Area, I calculated the rate of claims using a labour force denominator [26] and mean disability duration with individual records capped at five years. The use of Statistical Area subunits also provided an opportunity to examine the association between claim rates and disability duration while accounting for compensability threshold fixed effects (i.e., by nesting Statistical Areas within a state or territory, a rough approximation of compensation system). The Australian Capital Territory was excluded because it is composed of a single Statistical Area at Level 4 and its labour force estimates were unavailable.

Statistical Analyses
I used a linear regression to test the association between claim rates and disability duration. Analyses were repeated in a multi-level linear regression with Statistical Areas nested by state and territory. Claim rates were z-transformed, which mean-centred the distribution at zero and scaled variance in terms of standard deviations, to provide a more meaningful scale of effects. Disability duration was log-transformed to estimate percent rather than absolute changes.
Sensitivity analyses tested the effects of left-censoring at two weeks to account for known compensability thresholds (the longest employer excess periods in Australia, found in Victoria and South Australia). The first sensitivity analysis replicated the analysis above with the left-censored data. The second analysed simulated claims data with a Cox regression model to demonstrate how left-censoring can reverse effects. A third analysis adopted an approach developed by Sears and Heagerty [20] to counter waiting periods in the Washington compensation system. In the original paper, the authors assumed many "medical-only" claim had time loss that did not exceed the waiting period. They were arbitrarily assigned low disability duration values (0.001) to allow their inclusion in survival analysis. In this study, left-censored cases are assigned the arbitrary low disability duration value on the assumption that medical-only and left-censored claims are similar because they both have artificially suppressed disability duration.

Claim rate as a Predictor of Disability Duration
There was an inverse association between claim rates and disability durations within a Statistical Area. A standard deviation increase in the claim rate was associated with a 15.9% decrease in disability duration (95% CI − 22.9% to − 8.8%). However, this reversed when Statistical Areas were nested by state and territory. A standard deviation increase in the claim rate was associated with a 6.4% increase in disability duration (95% CI 2.6% to 10.1%). These effects are illustrated in Fig. 3, which also shows a distinct clustered ordering of data points by state and territory.

Sensitivity Analysis: Effects of Left-Censoring
When the above analyses were limited to claims with at least two weeks of compensated time loss, the unnested association attenuated by two-thirds and became non-significant (− 5.2%; 95% CI − 10.9% to 0.5%). This suggests a reduction in collider bias. However, the nested association was a 6.4% increase, the same point estimate as for all time loss claims, though confidence intervals narrowed, indicating greater precision (95% CI 4.2% to 8.6%). The clustered ordering by state and territory remained intact (see Supplementary Fig. 1). Figure 4 illustrates the results of Cox regressions of all simulated claims data, left-censored simulated claims data, and left-censored simulated claims data with arbitrary low values. Systems A and B were assigned mean disability durations of 0.5 weeks (SD: 0.9) and 1 week (SD: 0.6), both on a log scale to reflect the heavy right skew often found in disability duration [20]. Admittedly, I tested multiple iterations to achieve the effects described below, meaning the simulation is somewhat contrived.
When analysing all simulated claims data, those in System B were 23% less likely to exit compensation at each time point (Hazard Ratio 0.77; 95% CI 0.74-0.81) relative to System A. When left-censored at two weeks, claims in System B became 30% more likely to exit compensation (HR 1.30; 95% CI 1.22-1.38), a reversal of effects. When left-censored cases were assigned an arbitrary low value to allow their inclusion in analysis, the results were similar to analysis of all claims: those in System B were 18% less likely to exit compensation at each time point. However, the confidence interval did not include the original point estimate (HR 0.82; 95% CI 0.79-0.86).

Discussion
This study demonstrates how collider bias can confound cross-jurisdictional comparative research. As predicted, there was an inverse association between claim rate and disability duration, proxies for compensability thresholds and injury severity, indicating systematic baseline differences in cohorts. This makes it extremely difficult to differentiate a compensation system's cohort-shaping and outcome-shaping effects.
The association reversed when compensation systems were treated as fixed effects in a textbook example of Simpson's Paradox, or more technically, Simpson's Reversal [42]. If the positive association can be considered the "true" relationship between claim rate and disability duration (i.e., occupational injury frequency and severity are positively associated in the real world), the results demonstrate how collider bias can mask real effects. To be clear, neither the inverse nor the positive association are inherently misleading or wrong on their own. Simpson's Paradox only highlights the importance of matching analysis to the research question [43]. In this study, the question was whether differences in compensability thresholds between compensation systems produce a spurious association between claim rates and disability duration. This makes the unnested approach the appropriate one, though the nested approach adds insight.
The clustered ordering of states/territories in Fig. 3 provides additional evidence of collider bias due to compensability thresholds, in this case employer excess periods. Victoria and South Australia have the longest excess period in Australia at ten days/two weeks, which is twice the next longest. Correspondingly, both are situated in the upperleft quadrant of Fig. 3, denoting fewer claims and longer durations. However, the remaining order of states/territories appears unrelated to employer excess: after Victoria and South Australia comes Western Australia, which has no employer excess period, followed by Tasmania (no employer excess period), Northern Territory (part of first day), New South Wales (one week), and Queensland (around one week) [9]. This suggests other compensability thresholds are at work.

The Effects of Left-Censoring and Potential for Further Bias
Left-censoring was applied to account for employer excess periods, a known type of compensability threshold. While the inverse association between claim rate and disability duration attenuated to non-significance with left-censored data, the direction of effect remained negative. However, when Statistical Areas were nested by state and territory, which treated compensability thresholds as fixed effects, the association between claim rate and disability duration was again positive, as it had been in uncensored analysis. As above, if this is the "true" association, it remained masked in unnested analyses even when data were left-censored to account for employer excess periods.
Survival analyses demonstrated how left-censoring claims can also reverse effects. In real world settings, this could happen where one system successfully resolves many of the low-severity, easy-to-resolve cases quickly, while another delays their exit until after the censored period, a "depletion of the susceptibles" in reverse [44].
Adding an arbitrary low value to left-censored cases largely accounted for the effects of left-censoring, which is in line with the theoretical paper that proposed this approach [20]. However, the simulated data include all censored cases, an advantage that real world data likely lack. For instance, in real world data the arbitrary low value would be applied to medical-only claims, which have no recorded time loss. However, medical-only claims can have similar issues in terms of completeness. Such injuries are only recorded as claims if they are compensated for treatment. Medical care benefits also have compensability thresholds that vary across systems. For instance, as of 2019 Victoria required employers to cover the first $707 of medical costs, while Queensland required employers to pay the first $1527.80 of combined medical and income replacement costs [9]. Even for medical-only claims, compensation status can be a source of collider bias.

Alternative Approaches
There are other ways to test whether and how compensation systems improve or worsen injured worker outcomes. Randomised controlled trials avoid much of the problem of collider bias by randomly allocating exposures, theoretically balancing baseline differences between cohorts. However, these are often ethically or financially impractical and can lack external validity [1,45]. Quasi-experiments may be better suited to answering questions about the impact of system settings. These include study designs such as interrupted time series and difference-in-differences, which compare outcomes before and after an event like legislated changes, and regression discontinuity, which use arbitrary Fig. 4 Survival curves of simulated compensation systems, both uncensored and censored at two weeks; System A has a mean of 0.5 weeks (standard deviation: 0.9 weeks) and System B a mean of 1 week (standard deviation: 0.6 weeks) on a log scale cut-offs like wage replacement caps. They can also overcome logistical hurdles of randomised controlled trials through exogenous allocation of large populations to experimental/ exposure and control conditions in ways that mimic randomisation [45,46]. In some circumstances, quasi-experiments have better external validity than randomised controlled trials because they rely on population-level data and real world settings [45].
However, quasi-experiments have important limitations. Events like legislative change are infrequent and may not modify policies of interest, or entirely change the compensation system, making it difficult to differentiate the effects of policy change from service disruption [47]. Legislative change may also introduce a collider if it alters compensability thresholds. For instance, when New South Wales restricted eligibility to its compensation system in 2012, the claim rate decreased while disability duration increased [48]. Such an outcome would be consistent with both a change in the cohort towards more severe and complex injuries and a system that has increased the iatrogenic effects of compensation. At the very least, analyses based on legislative change should examine whether the event affected claim rates [47,49] to pro-actively identify indicators of collider bias.

Strengths and Limitations
Study strengths include use of population-level claims data from workers' compensation systems with near-universal coverage of the Australian workforce and the use of simulated data with known characteristics to demonstrate how collider bias can distort statistical associations. To my knowledge, this is the first work disability study to directly engage with the problem of collider bias.
Limitations include the inability to test the proposed mechanism of compensability thresholds directly. I used labour force denominators to estimate claim rates, which are not equivalent to covered worker estimates. Variations in the proportion of the workforce who are insured against workplace injury could vary across Statistical Areas or compensation systems and bias claim rate estimates. Respondents were nested by state and territory of residence to account for fixed effects of compensation systems. However, while the vast majority receive benefits from the state and territory in which they reside [50], there is some crossover due to flyin-fly-out workers, border areas, and workers covered under the federal regulator, Comcare.

Conclusions
Collider bias is an under-recognised problem in work disability research. In this paper, I present evidence that compensation status is a collider between injury severity and compensability thresholds, which manifests as an inverse relationship between claim rates and disability duration. This makes it difficult to determine whether differences in disability duration between compensation systems are due to who is compensated or how the system treats them. I also show that left-censoring to account for compensability thresholds such as waiting periods and employer excess may not account for collider bias and can be an additional source of bias.
Randomised controlled trials and quasi-experiments can produce more robust causal estimates of how system factors affect injured worker outcomes, though these have their own theoretical and practical limitations. Work disability researchers must pro-actively engage with the problem of collider bias to improve the reliability of research that depends on compensation data and to better understand the implications of their findings.