In our study, we have shown that the unemployment mode has a considerable effect on estimates of the health effect of unemployment. It was interesting to notice that the absolute difference between self-reported and register-based unemployment was so substantial, ranging from as much as 2.5–3.1% for the risk difference estimators when there was no censoring for unemployment episodes during the follow-up period, and even greater when censoring for unemployment episodes. The results were similar for the logistic regression estimator too.
On the other hand, our evaluation of assumptions in the statistical models yielded only rather marginal deviations in most cases. The exceptions were mainly related to the doubly-robust estimator, which is known to be sensitive to small samples, where we were able to identify some notable differences between the full model with all its variables and the model with only significant potential confounders, and estimates based on the current unemployment mode. For other comparisons between these models, the difference was no greater than 1.1%, and in most cases it was at most 0.6%.
Also, in the models where we evaluated the contribution of each potential confounder to the estimates, by either adding or removing a variable to or from the full or the reduced model, the effect estimates were at most marginally affected, with the exception of the presence of previous health in the analyses. Thus, our study shows that even if a correctly specified model is advantageous, the risk of bias is likely to be rather limited for similar research questions such as those we have investigated. To avoid biases due to the model set-up, it is paramount to take health selection in unemployment into consideration.
It is interesting that the more popular logistic regression estimator seems to be more sensitive to poor model assumptions than the propensity scores and G-computation methods. The comparison of our risk difference estimators, which showed that they yield mainly similar results, is very much in line with previous comparisons between propensity score methods and conventional multivariable methods . Thus, it seems that the choice of the statistical method is not the main challenge in achieving estimates with a low bias.
Even if our study indicates that the statistical model might only cause small biases, the importance of the choice of variables is undoubtedly very important. Interestingly, in a review from 2014, only 6 of the 41 reviewed articles discussed the choice of statistical method, and if any of these publications discussed how to measure unemployment it was at least very rare . Thus, among most researchers there is a need for a more informative way of describing potential limitations from the choice of statistical model.
Our interest was in the estimate for labour market status. Interestingly, when studying how other estimates were affected by the model set-up, we observed that education and occupation were in some cases not significant on their own, while they were significant in combination, while the estimate of labour market was not as sensitive to the presence of these variables in the analysis. This behaviour of the estimates for education and occupation is contrary to the expectations when a variable is added, as collinearity is expected to lead to less statistical significances than the other way round. This finding further highlights the importance of a well thought through variable selection in the main analyses. Nevertheless, the rationale for including variables in the statistical analysis needs to be improved as has been highlighted in previous research [3, 20].
The importance of a correctly specified research question for the results was one of the key issues in our study. Whether unemployment is self-reported or taken from a register should respond to the same research question and yield similar results. In our study, the deviation between results for register-based and self-reported labour market was rather large. To some extent, this might be explained by the register data not being available until the day of the survey. We do however think that the main explanation for the deviation is that differences in how data are collected lead to slightly different research questions being answered.
Neither the accumulated unemployment spell during recent years nor current unemployment might cover how recent or current unemployment affects health in the long term. They may both be too limited in that the accumulated unemployment relates too much to historical unemployment and current to a very short and negligible unemployment spell for the person in question. Thus, it is likely to be more complicated to know how unemployment affects health, and, hence, also the importance of including health status at the time of unemployment into the analysis. The big discrepancy between the estimates from the measures used to collect unemployment in our study highlights the importance of a well thought-through measure. Thus, the main message in our work is to gain a deep understanding of how unemployment data are collected. Based on our experience, we recommend accumulated self-reported unemployment measured with a detailed retrospective matrix.
We have made a thorough analysis of different aspects of the analysis of unemployment. A possible limitation of our study is that only one population has been analysed and, despite different research questions used, we have still had a somewhat limited focus of the consequences of long-term unemployment on health. Even so, we believe that our results can have a major contribution for the future understanding of the priorities needed not only for this research topic but also for other research topics.