Study design and population
The evaluation study of the remodelled shift in the police department is a broader project including questionnaire surveys, qualitative interviews and analysis of routinely collected data (see Figure 1). In this paper, we present the results of the survey study.
We conducted a controlled before-after study during the pilot phase (2015-2016). The intervention group included the 6 police stations which implemented the remodelled shift schedule as of June 2015 for a period of one year. The control group included those 17 police stations which continued to operate with the current shift schedule throughout the same period of time. Outcome parameters were evaluated in both groups in May 2015 (1 month before starting the pilot-phase) and 12 month afterwards (June 2016).
A follow-up survey was conducted in December 2020, 5.5 years after the implementation of the remodelled shift schedule. The follow-up was originally scheduled for June 2020 (i.e., 5 years after the introduction of the remodelled shift schedule) but had to be postponed due to the SARS-CoV-2 pandemic. The control group vanished between 2016 and 2020 due to the progressive adoption of the remodelled shift schedule in all police stations of the city. At this time point, the remodelled shift schedule had been already adopted by all police stations (with one exemption, which implemented it as of January 2021). The officers working in the operations command centre were also included in the third survey, since they had also adopted the remodelled shift schedule. Thus, the long term follow-up corresponds to a prospective cohort study in which participants had different levels of exposure to the remodelled shift schedule – i.e., the length of time they worked with this shift schedule.
The questionnaire was paper-based and anonymous. In order to match responses over the three survey waves, participants were asked to provide a matching code consisting of a combination of letters and numbers that the participants chose themselves. The data protection officer of the Department of the Interior and the police staff council approved the content of the questionnaire and the survey method. The questionnaire was distributed among all police officers working according to the rotating shift schedule via the internal staff post. Participants had four weeks to return the filled in questionnaires. Locked and sealed ballot boxes were set up in the police stations for collecting the questionnaires.
We collected data on gender, age (in five year categories), relationship status (‘living in a relationship’ / ‘not living in a relationship’), parenthood (‘yes’ / ‘no’), single parenting (‘yes’ / ‘no’) and taking care of dependents (‘yes’ / ‘no’).
We collected data on experience with shift rotations (‘less than 5 years’ / ‘5 to 10 year’ / ‘more than 10 years’), working full or part-time, and main type of duty (‘office duty’ / ‘patrol duty’).
Work ability was measured with the German version of the Work Ability Index (WAI) . The WAI consists of ten questions covering the dimensions of current work ability compared to lifetime best (score 0-10), current work ability in relation to job demands (score 0-10), impairment of work performance due to illness (score 1-6), sickness leave in the past 12 months (score 1-5), anticipated work ability for the next two years (score 1-7), psychological resources (score 1-4) and number of medical conditions out of a short list of 14 . The WAI score ranges from 7 to 49: Scores below 28 are referred to as ‘critical’, between 28 and 36 points as ‘moderate’, between 37 and 43 points as ‘good’, and higher scores as ‘very good’ work ability . The WAI can be considered reliable (Cronbach’s α .78) and valid .
Self-rated general health was addressed with a single question “How would you rate your health in general?” on a five-point Likert scale (‘excellent’ / ‘very good’ / ‘good’ / ‘fairly good’ / ‘poor’) as recommended by WHO . For further statistical analysis we dichotomized the variable merging the categories ‘excellent’ / ‘very good’ / ‘good’ on the one side and ‘fairly good’ / ‘poor’ on the other side. In addition, we asked participants to rate their health on a 0-10 scale, where 0 represents worst imaginable health.
Quality of life
Quality of life was assessed with the global domain of the German version of WHOQOL-Bref . It consists of two questions (“Over the last two weeks, how would you rate your quality of life?” and “Over the last two weeks, how satisfied were you with your health?”) answered on a five-point Likert scale from 1 = “very bad/unsatisfied” to 5 = “very good/satisfied. The answers are transformed into a global score ranging from 0 to 100, 100 indicating highest quality of life. The instrument in its short version can be considered reliable (Cronbach’s α ranging from .57 to .88) and valid . In addition, we asked participants to rate their quality of life with the shift model on a 0-10 scale, where 0 represents worst and 10 the best imaginable quality of life.
We did not perform any imputation for any variable, items left unanswered were treated as missing values and accordingly the corresponding scores. Descriptive statistics are reported as means with standard deviation (SD) for continuous variables, and as frequencies and percentages for categorical variables. We calculated two-tailed p values. The statistical significance level was set at p < 0.05. All computations were carried out with IBM® SPSS® Statistics (IBM Corp. released 2015. IBM SPSS Statistics for Windows, Version 25.0. Armonk, NY, USA).
Normally distributed score means were compared in bivariate analysis with t-test for independent samples before starting the pilot (T0) and 12 months later (T1). We calculated effect sizes for those scores showing statistically significant differences. The effect size Cohen’s d (|d|) for mean differences between two groups (comparison of mean values from the two groups) was determined as an effect measure. |d| < 0.2 is rated as insignificant, |d| ≥ 0.2 to < 0.5 as small, |d| ≥ 0.5 to < 0.8 as medium and |d| ≥ 0.8 as large effect size . For categorical variables, the chi-square test for independence was used to test for group differences in bivariate analysis.
We performed multiple linear regression with the scores of the outcome parameter at T1 as the dependent variables. The explanatory variables were the type of shift worked with (‘old schedule’ / ‘remodelled schedule’), the score values at baseline (T0), gender (‘male’ / ‘female’), age group at the time of the second survey (< 35 years, 35 – 49 years, ≥ 50 years), parenthood (‘yes’ / ‘no’), a variable representing ‘burden due to care” (‘yes’ / ‘no’), which was a composite variable of the information on status from the questions on single parenthood ‘single parent’ and ‘care of persons in need of care’, as well as the type of service (‘patrol’ / ‘office duty’). We report the coefficient with 95% confidence intervals for the predictor. For binary variables, we performed logistic regression including the same variables as in the linear regression models, with the exception of baseline score.
Data from the T2 survey were first analysed in bivariate analyses stratified by the length of time working with the remodelled shift model in months (up to 24 months, 25 – 48 months, ≥49 months). For comparison across “exposure” categories, analyses of variance were carried out using Welch tests for correction. For this purpose, the effect size measure Eta-squared (η²) was used, whereby an Eta-squared of 0.01 is considered a small effect, of 0.06 a medium effect and of 0.14 a large effect . We performed multivariate linear regression analyses with the dependent variable were the scores at T2. In addition, the length of time servicing with the remodelled shift schedule in months, gender (male/female), age group at the time of the third survey (≤ 34 years, 35-49 years and ≥ 50 years), having children (‘yes’ / ‘no’), care burden (‘yes / ‘no’), police station (‘originally piloting’ / ‘non-piloting’) as well as type of service (‘patrol’ / ‘office duty’) were included as explanatory variables. We determined the effect size f², where an f² of 0.02 corresponds to a weak effect, 0.15 represents a medium effect and 0.35 a strong effect . For binary variables, we performed logistic regression including the same variables as in the linear regression models.