Two Years of Ethics Reection Groups About the Use of Coercion. Changes Over Time in Employees’ Normative Attitudes Regarding the Use of Coercion, User Involvement, Team Cooperation and the Handling of Disagreement

Background: Research on the impact of ethics reection groups (ERG) or moral case deliberations (MCD) is complex and scarce. Within a larger study, ERG has been used as an intervention for stimulating critical ethical reection and improved team cooperation while observing changes over time. Research question: Are there – during and after two years of ERGs - changes over time regarding employees’ normative attitudes regarding the use of coercion and how employees perceive user involvement, team cooperation and the handling of disagreement in teams? Methods: Repeated cross-sectional survey to multidisciplinary employees at seven wards within three Norwegian mental health care institutions (T0-T1-T2). Changes in normative attitudes over time were estimated using linear mixed models. Results: In total, 817 surveys (from employees that did and did not participate in ERG) were included in the analyses. Of these, 7.6 % (N=62) responded at all three points in time, 15.5% (N=127) at two points, and 76.8 % (N= 628) once. On average, over time, respondents who participated in ERG agreed less that coercion can be seen a form of care or security. ERG participants more often reported that they involved users and that they handled disagreement within the team constructively. Furthermore, more frequent ERG participation was associated with a more critical attitude towards coercion and higher scores for user involvement, the coercion competence of the team and the constructive handling of disagreement within their teams. Conclusions: Structural ERGs or MCDs seem to contribute to employees reporting a more critical attitude towards coercion, more user involvement around coercion and a more constructive handling of disagreement. Differences were generally small in absolute terms possibly due to the low amount of longitudinal data and the relative low frequency of ERG’s during the two years. Studying changes over time in clinical practice and trying to nd a relationship between CES interventions and CES outcomes is dicult yet important and needs to be further developed in future CES evaluation studies. This explorative quantitative study may be a rst step from qualitative evidence towards more robust quantitative evidence of the contribution of CES to clinical practice and quality of care.


Introduction
In their continuous aiming for quality of care, health care professionals inherently experience various kinds of moral challenges.
Health care professionals report that dealing with these moral challenges in a methodological sound and constructive way is often di cult [1][2][3][4][5][6][7][8]. In order to support health care professionals in dealing more systematically with moral challenges, different types of clinical ethics support (CES) -such as ethics consultants, clinical ethics committees and moral case deliberations (MCD) or ethics re ection groups (ERG) [1] -have been developed [9][10][11][12][13]. Many papers on CES implicitly or explicitly state that CES not only supports professionals with respect to the handling of the speci c case at hand but also contributes to the moral competency of professionals, the multidisciplinary team cooperation and, in the end, a better quality of care [14][15][16][17]. Although most participants in CES repeatedly report satisfaction with the ethics support, there is still little research on possible outcomes of CES and research focusing more on the impact of CES on clinical practice and quality of care. In particular, there is a lack of research that over time measures changes in relevant outcomes, related to the implementation of or participation in CES. Within this paper, we present the results of a study in which we aimed to report changes over time after implementing regular ERG sessions during 24 months at seven departments in three Norwegian institutions for mental health care. All sessions dealt with employees' moral challenges related to the use of coercion.
Evaluation of clinical ethics support CES evaluation (review) studies are increasingly focusing on quite different things; such as structure, process, content, outcomes and e ciency of CES [18][19][20][21][22][23][24][25][26][27][28][29] [2]. CES evaluation studies are of crucial importance since they may contribute to the further development of the relatively young professional domain of CES. Executing and reporting CES evaluation research can be seen as a way of exchanging lessons learned, offering input for (developing) trainings for CES staff, and evoking critical questions about justi cation, appropriateness, method, quality and impact of CES. When it comes down to the impact of CES, both critics and advocates of CES state that evaluation research focusing on the impact of CES is needed for the transparency of the usefulness of CES (like for any intervention in health care) [30][31][32][33][34][35]. However, measuring impact of CES is complex. CES, and ERG in particular, can easily be understood as a complex intervention of which the ingredients are often unclear or not made explicit [36][37][38][39]. There are many different ingredients of ERG and there is also a huge variety of the ingredients within ERG (depending on, among other things, the training of the ERG facilitators, the speci c context in which the ERG is implemented, the used conversation method used within ERG and the characteristics of the case at hand).
Indeed, high quality prospective CES evaluation studies which include baseline and follow-up measurements are rare [23,25,[40][41][42]. A recent Cochrane review, studying the available evidence of controlled studies of the effectiveness of ethical case interventions for adult patients, included 6 articles from 4 randomised trials, concluded that it was not possible to determine the effectiveness of CES due to low quality of the evidence presented in those studies [43]. The authors end with a plea for future research to identify and measure CES related outcomes, taking into account the different goals of different types of CES interventions. With respect to the goals, there are many different goals for ERG. Usually authors distinguish the following levels or domains of goals of ERG: a) case related goals (e.g. nding alternative actions); b) goals related to the empowering of professionals' moral competency; c) goals related to the improving of the multidisciplinary cooperation; and d) goals related to developing policy or organisational change [14,[44][45]. Hence, CES evaluation studies focusing on outcomes should make explicit which goals for the speci c kind of CES are at stake.
Which outcomes for Clinical Ethics Support?
Not all CES outcomes are equally important, feasible or even desirable. For example, measuring the ethicality of a speci c outcome of CES according to beforehand prede ned criteria of what is seen as 'ethical' is problematic since there is a wide variety and even disagreement about how to de ne what is assumed as ethically right [34,46]. It is also problematic since one of the core tasks of CES is precisely to critically re ect upon how to de ne the ethicality of a speci c outcome which also includes questioning current understandings of 'ethicality' [35]. Another example of CES outcome research which becomes morally problematic is the focus on less medical consumption due to the use of CES. According to a review of RCTs for CES evaluation, Chen and Chen [47] found three papers based on two RCT studies in the USA on the evaluation of CES outcomes. In those studies, Schneiderman et al [48][49] and Gilmer et al [50] found that "… patients who did not survive to hospital discharge, ethics consultations were signi cantly associated with shorter ICU stays, shorter hospital stays, less use of life-sustaining treatments and lower hospital costs" [47; p. 595). Although these are interesting and somehow plausible results (like the results of a more recent Asian study [51]), it could become morally problematic if these outcomes transform into one of the major aims of CES i.e. reducing consumption of medical resources. Hence, when looking for CES outcomes studies it is important to distinguish CES outcomes from CES goals and to focus on the right kind of CES outcomes which t with the speci c goals of the speci c CES intervention in the speci c context [35,43].
State of the art regarding Ethics Re ection Group outcome research When it comes down to studies describing or evaluating outcomes for ERG or MCD, some qualitative and self-reported evaluation studies indicate that MCD and ERG sessions can lead to improved team cooperation [10, 18-19, 21, 52-54]. A recent systematic review on impact of MCD, covering 25 empirical evaluation papers, con rmed that MCD participants reported that MCD can bring about changes in practice, mostly for professionals in inter-professional interactions [23]. Furthermore, given the speci c characteristics of MCD and ERG, i.e. learning from different viewpoints in constructive and respectful dialogues, qualitative evaluation studies reported that MCD and ERG may contribute to a more constructive handling of disagreement in teams [4,55]. Finally, based on the fact that within MCD and ERG participants are urged to think about values and norms of patients and family, and to see the moral challenges from their perspectives as well, it has been suggested that MCD and ERG contribute to a better understanding of the viewpoints of patients and next of kin [10,21,[56][57][58].
Moral challenges related to use of coercion in mental health care In mental health care the use of coercion is one of the most pressing ethical issues, and there are many qualitative studies reporting negative experiences of patients due to the use of coercion [59]. At the same time, quantitative research on the relationship between the use of coercive measures and patient outcomes is sparse [60]. Many express strong criticism of the use of coercion in mental health care, while others argue that a limited use of coercion is ethically acceptable when the bene ts with regard to protection or treatment outweigh the negative effects on patients' autonomy, integrity and comfort [59,[61][62]. Independent of how one morally thinks about the use of coercion, we think a critical re ection of the use of coercion (including the timing, alternatives, proportionality and the effectiveness of its use) is always needed since the use of coercion involves an infringement of patients' autonomy and integrity. Hence, coercion is, and should be, always an intervention with di cult value con icts. Yet, these value con icts are often implicit and not explicitly addressed and weighed.
There are indications that improved communication and decision-making processes in mental health care may reduce the use of coercion, for example due to case discussions, clinical case review, and facilitated deliberation. For example, Donat et al [63] found that there was a reduction of use of seclusion and restraint after the use of clinical case reviews in which critical cases were identi ed. Bak et al reported that systematic follow-up evaluation or review after all mechanical restraint episodes was signi cantly associated with low rates of mechanical restraint use [64]. Gaskin et al [65] reported a study from Mistral et al [66] which found a decrease of the use of coercion when meetings were conducted with an outside facilitator to analyse the root causes of the use of coercion.
Changing staff attitudes regarding the use of coercion and reducing a paternalistic department culture may be a key both to reduce the use of coercion and to make the coercion used more humane. Scanlan [67] writes that training to promote attitudinal change is essential, i.e. without substantial shifts in staff attitudes, efforts to reduce the use of seclusion and restraint are unlikely to be successful [68][69]. Changing the ward culture and staff attitudes is challenging. However, there are indications from various explorative research projects on the use of coercion that use of ethics re ection (e.g. MCD and ERG sessions) can contribute to a more critical culture and attitude towards the use of coercion [55,[70][71][72]. To our knowledge, quantitative research on how to change paternalistic ward cultures in mental health is scarce.
Our current study focuses on studying the correlation between structural participation in ERGs at the one hand and change of respondents' attitude with respect towards the use of coercion at the other hand. In addition, we also studied whether respondents report that they involve patients and family more around the use of coercion, that their team cooperation improved and that they handled disagreement in their teams more constructively.
To conclude, based on what we just described about the state of art regarding CES evaluation research, the speci c ingredients, goals and outcomes of ERG and the research into changes of normative attitudes on coercion, we came to the following two main research questions for this speci c study: Footnote: [1] MCD and ERG are synonyms for the same activity: a structured case discussion on a real case within a group, facilitated by a trained facilitator. From now on we will use the term ERG only for the readability of the paper. The way employees normatively think about the use of coercion; The way employees report about the factual competence of the team regarding the handling of coercion; The way employees report about the factual involvement of patients and families in situations in which the use of coercion is at stake; The way employees think about the quality of their team cooperation; The way employees perceive the handling of disagreement in their team?
2) In which way do general levels of the 7 outcomes parameters, and changes in these outcomes, differ by department, profession, frequency of participation in ERG and frequency of presenting a case in ERG?
We did not specify nor calculate speci c hypotheses with respect to the above mentioned research questions. Yet, based on the ERG evaluation literature and studies related to changing practices and attitudes regarding the use of coercion, our expectation of the implementation and evaluation of ERGs was that: ERG participants would develop a more critical view on the use of coercion; ERG participants would increase their attention for patient and family involvement in situations concerning the use of coercion; and ERG participants would report an improvement of team cooperation and the constructive handling of disagreement within the teams.

CONTEXT OF THE STUDY
The results presented in this paper are part of a larger study called "mental health care, ethics and coercion" (further referred to as "PET", based on the Norwegian abbreviation for the study: see Figure 1). This PET study, which took place from 2011 until 2016, included four sub studies: a) a systematic literature review on evaluation of ethics support in mental health care [18], b) interviewing patients, their children and next of kin about coercion and involvement [57,[73][74], c) the implementation and evaluation of ERG [4,18,55] including an enumeration of ethical challenges related to coercion [75][76], and d) a regional and a national survey among mental health care staff and patients on normative attitudes related towards coercion [77][78] [1]. The results presented in this paper are from part c of the PET study.
Based on availability and motivation, seven departments from three different mental health care institutions from three different Norwegian counties joined the study. From these departments, 23 employees were trained, during in total 5 training days, as ERG facilitators by ethicists from the Centre for Medical Ethics (CME) at the University of Oslo [2]. Usually, two newly trained facilitators facilitated one ERG session at their own department. Within each department, one person (usually also a newly trained ERG facilitator) functioned as coordinator and organized the ethics re ection groups. Each ERG took place once or twice a month.
Multidisciplinary health care professionals (i.e. nurses, socio-therapists, psychologists, psychiatrists, doctors, quality management staff, team leaders, managers) participated voluntarily in the groups. The ERG sessions lasted between 50 and 90 min; between 2 and 20 people participated in each group [74]. A stepwise ethics re ection model -the CME-model -was utilised in the deliberations [6].
Various research methods were utilised in order to study the implementation and evaluation of ERGs, for example: informal notes from different meetings, three survey questionnaires (T0, T1 & T2), focus group interviews, and facilitator questionnaires (for the evaluation of each ERG session). The survey consisted of several thematic areas (e.g. staff's attitudes regarding coercion, patient and next of kin involvement when using coercion, team cooperation & dealing with disagreement, evaluation of ethics re ection groups, demographics of respondents). In this paper, we focus on the associations between ERG participation and the scores for the above mentioned 7 parameters.

Method Design
We performed a repeated cross-sectional survey at seven departments within three different institutions (T0-T1-T2). Some employees participated in the survey two or three times and provided longitudinal data.

Study sample
The study sample existed of the employees from the seven participating wards. From hospital 1 we included a geriatric department, from hospital 2 an acute, community, youth and specialist department, and from hospital 3 an acute and rehabilitation department.
During this study, all of these departments had regular ERG sessions during two years. The survey used in this paper has been distributed at baseline (T0) before the departments started with ERG sessions, and at 12 months (T1) and at 24 months (T2) after the start of the ERG sessions. The ERGs dealt with ethical challenges experienced by the health care staff and emerging from concrete situations in which the use of coercion (and related moral issues) was at stake. The employees consisted of various health care professionals (such as nurses, auxiliary nurses, psychiatrists (including psychiatrists in training), and psychologists), team leaders and management. Employees were invited by the local study coordinator (i.e. an employee at that department) and\or the management to ll in the written questionnaire either during team or department meetings or individually by email. Temporarily employed staff and supporting staff did not participate in the study.

Research instruments
The survey consisted of the following dependent variables, independent variables and co-variates. An earlier version of the survey was piloted for clarity by various health care professionals and commented by members of the PET Sounding Board (i.e. expert researchers in the eld of coercion).

Dependent variables[1]
Staff's normative attitudes regarding the use of coercion Each item was scored on a Likert scale ranging from 1 (strongly disagree) to 5 (strongly agree). For each subscale we calculated the mean of the items and used these as dependent variables in separate models. Mean scores on 'Offending' and 'Care & security' were calculated only if respondents had valid answers on at least 4 of the 6 items; for 'Treatment' when each of the three items was answered validly.
Textbox 1: The 15 normative statements of the SACS [79] Coercion competence of the team We developed 6 'factual' statements [2] in order to nd out how the respondents thought about the competence of the team in dealing with coercion. The statements were tested for clarity in a pilot study; see Textbox 2). Each item was scored on a Likert scale ranging from 1 (strongly disagree) to 5 (strongly agree), and a mean score was calculated.
Textbox 2: The 6 statements about the competence of the team regarding use of coercion

Involvement of patients and family in situations of coercion
We developed 11 factual statements in order to nd out to which degree respondents thought they involve patients and family before, during and after situations of coercion (see Textbox 3). Each item was scored on a Likert scale ranging from 1 (never) to 3 (once in a while) to 5 (almost always), and a mean score was calculated.

Team cooperation
We made use of 13 factual statements from two validated questionnaires in order to ask respondents how they thought about the cooperation within their team: 10 items from the Team Re exivity Scale [80] and 3 items from the Tolerance and Openness Scale [81] (see Textbox 4). Each item was scored on a Likert scale ranging from 1 (strongly disagree) to 5 (strongly agree), and a mean score was calculated.

Constructive disagreement
We used 8 statements from the validated Constructive Confrontation Norms questionnaire [82] (see Textbox 5). Each item was scored on a Likert scale ranging from 1 (strongly disagree) to 5 (strongly agree), and a mean score was calculated. At T1 and T2, respondents were asked whether they participated in ERG's in the last 12 months (yes\no) and if yes, how often during the last 12 months (0 times, 1-5 times, 6-12 times, 13 or more times). Due to small numbers for participating in ERGs very often, we merged the latter two in: 6 or more times.

Presentation of a case in the Ethics Re ection Groups
At T1 and T2, respondents were asked whether they presented a case in ERG's in the last 12 months (yes\no) and if yes, how often during the last 12 months (0 times, 1 time, 2 to 4 times, more than 4 times). Due to small numbers presenting a case very often, we merged the latter two: 2 times or more.

Department
The seven departments as described previously were added as dummy variables to the models. The Acute Care department of hospital 2 was used as the reference group in all analyses.

Type of profession
We categorized the respondents' professions into 5 categories: 1) 'psychologists', 2) 'psychiatrists and related medical professions' (e.g. psychiatrist in training, physician, chief-physician), 3) 'nurses & related professions' (e.g. auxiliary nurses, milieu therapist, helping assistant), 4) 'management' (unit team leader, department manager, director), and 5) 'other professions' (e.g. physiotherapist or occupational therapist). Temporary staff and supporting staff did not participate in the study. For employees who participated more than once, only their baseline profession was included. 'Psychiatrists and related medical professions' was used as the reference group in all analyses.

Demographics
Age was categorised into younger than 29, 30-49 years, and 50 years or over. Gender was coded as 1 (female) and 0 (male). Age at the rst participation was used in the analyses.

Analytic strategy
First, we provide descriptive statistics of all variables for each time point separately.
Subsequently, we used linear mixed models in SPSS v22 to analyse the pooled data from the three time points [83]. This method enabled us to incorporate all available observations, including those from respondents with repeated measures, for whom dependency of these repeated measures was taken into account. Furthermore, we estimated changes in outcomes over time by adding the effect of time (T0-T1-T2) to the models and by testing interaction effects between time and other independent variables. We used the default Restricted Maximum Likelihood (REML) method to estimate the regression coe cients.
In the rst set of mixed models we estimated unadjusted and adjusted models without time*predictor interaction effects. The unadjusted models included a single independent variable without any covariates. The adjusted models included either time, ERG Participation or Case Presenting, adjusted for department and profession. In these models, the effect of time shows the average change in the outcomes per year, whereas the effects of ERG Participation and Case Presenting show their association with the general level of the outcomes, regardless of time. We decided to not include age and sex as covariates in the mixed models, as we found they were not associated with ERG Participation, Case Presenting, and the outcome variables.
In the second set of mixed models we examined whether changes in outcomes over time were predicted by ERG Participation, Case Presenting, department and profession, by adding an interaction effect of time*predictor to the models. If the interaction effect is statistically signi cant, this means that there is strong evidence that the change in an outcome differs between levels of the predictor. The effects of ERG Participation and Case Presenting were adjusted for department and profession, as we found that the latter were associated with both predictors and outcomes, indicating possible confounding.
We generally used p<.05 as the cut-off point for statistical signi cance. However, given the lower statistical power of interaction effects, we considered these statistically signi cant at p<.10.
Footnote: [1] The reliability of all the seven scales were satisfactory within this study: varying from 0.62 Cronbach's α for Constructive Disagreement to 0.83 Cronbach's α for Team Cooperation.
[2] 'Factual' statements are statements about how respondents perceive the facts regarding a speci c phenomenon (e.g. the way employees involve patients during the use of coercion). 'Normative' statements are statements about how respondents think or judge about a topic (e.g. 'use of coercion is wrong').

Results
Within this Results section we will rst describe the results of the descriptive analyses in section I in order to give an overall impression of how the various respondents scored for the 7 outcome parameters. Then, in section II, we will present the overall results of the multivariate analyses in order to see whether there are changes over time after two years of ethics care re ection groups (ERG) for the 7 outcome parameters (i.e. research question 1). Subsequently, in section III, we will present results of the bivariate analyses per speci c predictor in order to see whether scores for the 7 outcome parameters and changes in these outcomes differed among professions, departments, the amount of participation in ERG and the amount of presenting a case in ERG (i.e. research question 2).

I. Descriptive analyses
An overview of descriptive statistics of the sample is provided in Table 1.
We observed some differences between T0, T1 and T2 with respect to the average scores for all seven parameters of all respondents together (i.e. those who participated in ERG and those who did not participate in ERG). Regarding respondents' attitude about coercion, respondents scored just over 3.0 on the Offending scale, with very similar scores across the three time points. This means that on average the respondents did not agree neither disagree with the viewpoint that coercion can be offensive. Care and Security scores varied between 4.11 at T0 and 3.99 at T2, indicating a clear agreement with justifying the use of coercion for reasons of care and security. Scores on the Treatment scores varied between 2.58 at T0 and 2.50 at T2, indicating a modest disagreement with the idea that coercion can be seen as a form of treatment. Respondents were slightly positive about the current team competence for using or preventing coercion (average scores varied between 3.65 at T0 and 3.66 at T2). Regarding user involvement in the prevention, execution and evaluation of coercion, with scores 2.82 at T0 to 3.03 at T2, respondents reported that they once in a while involved patients or family (i.e. not 'often' and not 'almost always'). Respondents slightly, yet not fully, agreed when asked whether they had a good team cooperation within their team (scores ranging from 3.70 at T0 and 3.72 at T2). Finally, respondents slightly, yet again not fully, agreed that that they handled disagreement constructively (scores between 3.57 at T0 and 3.61 at T1). ERG Participation and outcomes on the 7 parameters (regardless of time) A general pattern was that more frequent ERG participation was associated with lower scores for the SACS scales, and higher scores for the other outcomes. However, this association seemed not very strong. We highlight a few ndings. Unadjusted models showed that compared to those who did not participate in ERG, those who participated 1-5 times agreed less that coercion is a form of Care or Security (b=-0.08, p < .05). However, this difference was no longer statistically signi cant after adjustment for departments and professions. This indicates that the observed association between ERG frequency and Care/Security were, at least partly, due to differences in the composition of the ERG frequency groups in terms of professions and wards rather than to the effect of ERG frequency itself.
Furthermore, more frequent ERG participation was associated with higher perceived Coercion Competence within the team (1-5 times: b = 0.09, p < .05) and User involvement (1-5 times: b = 0.21, p < .001), although the latter also appeared to be partly confounded by department and/or profession. The differences for the group who participated in ERG six or more times pointed in the same direction as those for the 1-5 times group, but these differences were not statistically signi cant, possibly due to the smaller size of the group with the highest frequency of ERG participation.
Presenting a case within ERG sessions and outcomes on the 7 parameters Compared to those who did not present a case, those who presented a case once in a year perceived coercion more as Care & Security (adjusted b = 0.20, p < .01) and reported higher Coercion Competence within the team (adjusted b = 0.21, p < .05). This was similar for the group who presented a case twice or more.

Differences and similarities in outcomes between Departments and Professions
We observed quite substantial differences in the outcomes for the 7 parameters between Departments and Professions. For example, adjusted for profession and compared to the Hospital 2 Acute Department (reference), the Hospital 3 Rehabilitation Department scored on average 0.32 lower on Offending, the Hospital 2 Specialist Department and Hospital 3 Acute Department scored 0.34 higher on User Involvement, and the Hospital 1 Geriatric Department score 0.28 lower on Constructive Disagreement.
Furthermore, adjusted for department and compared to 'psychiatrists and related medical professions', psychologists experienced coercion more strongly as Offending (b = 0.45, p < .001), less as a form of Care/Security (b=-0.20, p < .05), and experienced less Cooperation (b=-0.24, p < .05). Managers experienced coercion less strongly as Offending (b=-0.36, p < .05) than 'psychiatrists and related medical professions'. Finally, nurses experienced substantially less User Involvement than 'psychiatrists and related medical professions' (b=-0.32, p < .001).   We will now present associations between ERG participation, ERG case presentation, Departments, and Professions, and changes in outcomes over time (see Table 3). We included one predictor at a time and maintained the other 3 predictors constant in order to see what the impact of the speci c predictor was.

Note on the interpretation of interaction effects
For establishing differences for the change in outcomes over time, we needed to interpret two coe cients estimated in the mixed models. First, the main effect of time. In models including an interaction effect with time, the main effect of time expresses the yearly change in the reference category of the predictor entered in the interaction effect (e.g., no ERG participation). Second, the interaction effect, indicated as Time* [predictor]. This coe cient expresses how much stronger or weaker the yearly change in the group of interest (e.g., 1-5 times ERG participation) is, compared to the yearly change in the reference group. By adding the coe cient of the interaction effect to the main effect of time, the yearly change in the group of interest can be calculated. The pvalue of the interaction effect indicates the statistical signi cance of the difference in yearly change between the group of interest and the reference group.

ERG participation and changes over time
For ERG participation, we found one statistically signi cant interaction effect, for Offending. The main effect in this model indicated that for those not participating in ERG, there was virtually no change in their view of coercion as Offending over time (b = 0.01). However, the interaction effect of b = 0.24 showed that compared to those who did not participate, those who participated six or more times each year increasingly felt that coercion is Offending (yearly change: b = 0.25, p < .05; See Fig. 2 for a graphical depiction of this interaction effect).

ERG case presentation and changes over time
For case presenting, we found two signi cant interaction effects. The rst indicated that while those who did not present a case had almost no change in the extent to which they felt coercion is Offending (b=-0.03), in those who presented a case twice or more, their view of coercion as offending tended to increase (note: from initially a lower score) (b interaction = 0.23, p < .10; yearly change in this group: b = 0.20; See Fig. 3).
Second, those who presented a case twice or more reported decreasing perceived User Involvement, whereas those who did not present a case slightly increased in User Involvement over time  Footnote: [1] Bear in mind that these are repeated cross-sectional measurements in which only some ERG participants were included multiple times. Therefore, these averages do not directly demonstrate within-person changes in attitudes during the study period.

Discussion
This paper presents the results of a unique clinical ethics support evaluation study. By implementing structural Ethics Re ection Group (ERG) sessions about the use of coercion at seven Norwegian departments within three different mental health care institutions, we tried to study changes over time. This study has been designed because of recent insights from coercion reduction studies. These studies demonstrated that changing staff's normative attitudes regarding the use of coercion and reducing a paternalistic department culture is a key for both reducing the use of coercion and for executing coercive measures in a better (…) way. In order to study changes over time, we asked at three subsequent moments how employees thought about: the use of coercion, the competence of the team regarding the handling of coercion, the way they involved patients and families in situations of coercion, the team cooperation, and the way they handled disagreement in their team. In order to do so, we performed a repeated cross-sectional survey at baseline, after 12 and after 24 months of implementing ERG sessions about moral challenges related to coercion (T0-T1-T2). Of the in total 817 respondents only 8% (N=62) responded at all three points in time. In order to use all observations and to estimate the missing values, we used a mixed model analysis which is a well-known statistical procedure [83]. This method enabled us to incorporate all available observations, including those from participants with repeated measures, for whom dependency of these repeated measures is taken into account. However, the results of this study should be interpreted with caution.
After presenting our main results we will give our interpretation of the results. Furthermore, we will brie y summarize and interpret the speci c correlation between participating in the ERG sessions and presenting a case at the one hand and the seven parameters at the other hand, including the role of departments and professions there in. Subsequently, some methodological and normative re ections on the results will be brie y discussed. Finally, we will re ect upon the lessons-learned with this research design and the used methodology, ending with some recommendations for future evaluation research regarding outcomes and changes over time due to clinical ethics support.

Main results regarding change over time
Despite the fact that ERG is a complex intervention and that its evaluation is even more complex, and that we had only limited longitudinal data, we were able to nd several signi cant changes over time during the implementation of ERG sessions. In the multivariate analyses, taking all predictors into account, we found that the extent to which all respondents agreed that coercion can be seen as Care and Security decreased over time. This concerns a relatively small decrease of the extent in which respondents agreed that coercion could be seen as protection in dangerous situations (see Textbox 1). We also found that the extent to which respondents reported that they involved patients and families increased over time. For example by discussing with the patient if and how they experienced coercion before and now, or by asking patients what kind of alternatives for coercion employees should try ( rst) (see Textbox 3). Finally, respondents agreed stronger with items resembling the constructive handling of disagreements in the team (see Textbox 5). For example, they reported that they focused more on different opinions or views on treatment issues rather than on disagreement between persons. They also reported more respect for each other's viewpoint, even when they disagreed.
Interpretation of main results regarding change over time These three signi cant changes during the implementation of ERGs could have been caused by various factors of which we are not aware of. For example, teams and team leaders could have changed and\or severe coercion or team incidents might have an impact on how respondents think about coercion, user involvement and the constructive handling of disagreement. Furthermore, the more general attention for coercion during the implementation of the ERG sessions (e.g. via other projects and educational programs) could have in uenced changes in culture and attitudes. Yet, when looking at the speci c intervention of implementing ERG sessions on coercion for two years, changes could also be very well caused by the ERG sessions on moral challenges related to coercion. For example, the slight decrease for Care and Security could have been caused by a more nuanced deliberation on the use of coercion with the ERG sessions. Perhaps, due to the ERG facilitator and the speci c focus of the conversation method within the ERG sessions, ERG participants wondered more whether the use of coercive measures actually prevents, or in some cases even could contribute to, a dangerous situation. Or, because of the ERG sessions, perhaps respondents learned more about alternative actions for protecting care and security. The increase of User Involvement could have been caused by the fact that due to the ERG sessions, respondents were urged to think more actively on a) the values and norms of patients and families in situations where coercive measures were used, and b) how to take their perspectives into account. Finally, the increase of Constructive Disagreement within the teams can be related to the fact that within ERG sessions both a respectful dialogue and learning from different and opposing viewpoints is at the core of the ethics re ection.

Results and interpretation of the results speci cally related to Participation in ERG
For participation in ERG we found one signi cant change over time within the seven outcome parameters: those who participated in ERG six or more times each year, perceived coercion clearly more as Offending. Repeated ethical re ection groups about the use of coercive measures can have made these respondents extra aware of the potential offending character of coercion and possible alternatives for the use of coercion.

Results and interpretation of the results speci cally related to presenting a case in ERG
Those who presented their case in ERG more than 2 times a year, ended up with seeing coercion also as signi cantly more Offending. Perhaps those who presented their moral challenges regarding the use of coercion within the ERG session experienced already some initial moral concerns which made them present a case within the ERG sessions. And perhaps, during the participation in ERG, they became extra aware of the fact that, and in which way, the use of coercion can be seen as offending. Finally, those who presented their case in ERG more than 2 times a year, ended up with lower scores for User Involvement. Perhaps, due to the ERG sessions, they started to realize that they knew relatively little about how to involve patients and families and what their speci c values, norms and perspectives are with respect to the use or the prevention of coercion.
We found more signi cant changes over time for the other parameters (due to Participation in ERG and Case presentation in ERG) yet they did not remain statistically signi cant after adjustment for departments and professions within the statistical analyses. This could be an indication for the fact that the initial signi cant changes over time can be better explained by differences between departments and professions rather than differences in ERG participation or Case presentation itself. Therefore, we will now speci cally summarize and interpret the results for the various Departments and Professions.

Results and interpretation of the results for Departments
We found several signi cant differences among departments when looking at changes over time. We observed both a decrease and an increase of seeing coercion as Offending (for respectively Community Care and Rehabilitation at the one hand and Acute Care of hospital 2 at the other hand). A possible explanation for the decrease can be that some health care professionals in community and rehabilitation care felt they waited too long with not using coercion (e.g. due to the fact that health care law does not allow the use of coercion outside the hospital). Using coercion too late might even cause harm. In contrast, health care professionals working at Acute Care departments might use coercion more often, sometimes even as a routine. So participation in ERG sessions could make them more aware of the offending nature of coercion. Furthermore, respondents from Rehabilitation perceived an increase of Team Coercion Competence regarding the handling of coercion. One explanation could be that since coercion if often not that much used at the Rehabilitation department, the deliberation about speci c coercion cases in ERG sessions made them think they became more competent in handling coercion. With respect to User Involvement, Acute Care of hospital 3 demonstrated a decrease while Acute Care of hospital 2 and Specialist Care had an increase. Respondents from Youth Care and Rehabilitation thought their Team Cooperation improved. Finally, Youth Care thought their Constructive Disagreement improved.
Most of the hypotheses mentioned in the Introduction were actually realized for at least some of the seven involved departments. These signi cant changes and the differences among the various departments can be explained in many ways yet it is di cult to know how plausible the explanations are. Perhaps the departments already had very different points of departure concerning their normative attitudes regarding coercion and user involvement, including different cultures for team cooperation and the handling of disagreement, when entering the study. Furthermore, the various patient categories on each department can explain differences in perception and evaluation of coercion. Also, the amount trainings and courses related to the use of coercion might vary among departments.

Results and interpretation of the results for Professions
We found several signi cant differences among professions when looking at changes over time. When compared with the group of 'psychiatrists and related medical professions', psychologists perceived coercion signi cantly more as Offending and managers signi cant less. Overall, when compared with 'psychiatrists and related medical professions', it seems that psychologists are more critical about the use of coercion, and managers less critical [1]. This could be explained by the fact that usually 'psychiatrists and related medical professions' have the nal responsibility for and should decide about the use of coercion; psychologists usually not.
Furthermore, psychologists are perhaps trained differently in ways to take care for patients and how to manage con icts or possible dangerous situations (i.e. more relational, less focused on interventions and use of medicine). Managers are perhaps more distanced from the actual context in which coercion is used.
Psychologists and 'psychiatrists and related medical professions' reported they involved patients and families more (often) when compared with what nurses and 'other professions' reported. A possible explanation could be that nurses and 'other professions' already have more frequent and intensive contact with patients and next of kin than 'psychiatrists and related medical professions', hence they did not experience that they involved them more than before. Besides response shift explanations, it could also be the fact that speci c routines or procedures for how to use coercive measures at the various departments have been changed during the study. This would imply that when asking about coercion two years later, we in fact evaluate another clinical practice of coercion than at the beginning of the study.
Another precaution concerns the way in which one normatively interpret changes over time. This of course applies to drawing normative conclusions based on empirical results in general [86], yet this certainly applies to research in which researchers aim to study both changes in normative attitudes and the normative value of outcomes or impact of ethics support services (e.g. ERG or MCD sessions). Drawing normative conclusions, e.g. whether a speci c result or outcome of this study can be interpreted as morally better or as a moral improvement, is a complex matter [35]. For example, given the initial hypotheses of this study, it sounds perhaps plausible that seeing coercion as more offending, after two years of critical re ection on moral challenges regarding coercion, could be seen as a desirable and hence, morally, good result. Yet, after deliberation in ERG, and nding good ways of performing coercion in a more transparent and respectful way, respondents perhaps also realized that coercion can be performed in a less offending way. In order to draw normative conclusion when interpreting the results of this study, one needs complementary qualitative data (e.g. thick descriptions of speci c situations in which employees uses coercion & together with respondents carefully studying how to interpret and judge the speci c situation). Finally, as mentioned in the Introduction section, one should not automatically conclude that eventually positive outcomes of CES also become the primary goal of or justi cation for CES. Stimulating ethics re ection by means of implementing ERGs or MCDs has a value in itself. Despite the value and importance of CES evaluation studies in general; participating in ERGs and MCDs should not get instrumentalized as an intervention in which the only aim is to reach speci c outcomes because this would threaten the inherent intellectual and normative freedom of ethics re ection.
Relationship with other ERG or MCD impact evaluation studies As mentioned earlier, this study took place within a much larger study in which also qualitative analyses of transcribed focus groups were organized with some of the respondents at every department, at both T0 and at T2 [55]. In line with some of the signi cant changes or trends in the ndings of this paper, respondents reported at T2 that ERGs increased their awareness of various examples of formal and informal coercion and that they learned to challenge 'problematic' concepts, attitudes and practices regarding coercion. This is in line with signi cant changes in respondents' normative attitudes towards coercion which is described in this paper. Furthermore, Hem and colleagues mentioned respondents reported that they improved their professional competence and con dence, a greater trust within the team, and more constructive disagreement and room for internal critique (i.e. less judgmental reactions and more reasoned approaches) [55]. This resembles the signi cant changes in the Constructive Disagreement scale within this paper. Yet, this is not con rmed by changes in the Team Cooperation scale in this paper.
Furthermore, in a recent systematic literature review in which 25 empirical papers on evaluation of ERG or MCD were analysed in order to identify various impacts of ERG, Haan and colleagues found a change in one's professional opinion or attitude and a more critical attitude towards professionals' practice. Again, this resembles our ndings in which respondents became more aware of and more critical towards the use of coercion. According to Haan et al, most reported changes took place on a personal and interprofessional level [23]. For example, health care professionals felt more feelings of relief, relatedness and con dence. They also mentioned that their understanding of the perspectives of colleagues, one's own perspective and the moral issue at stake increased. In particular, Haan and colleagues mentioned that several studies found that ERG or MCD reduced con icts and leads to more solidarity, respect, tolerance, collegial support and cooperation. Again, these ndings resemble the changes in Constructive Disagreement which we found in our study. Finally, Haan et al reported about MCD participants being more aware of patients' and families' rights in decision-making processes and thinking more about patients' and families' perspectives, wishes, and needs. This is in line with the signi cant increases for User Involvement in this study. Yet, at the same time, Haan et al concluded that empirical evidence of ERG's or MCD's concrete impact on the (improvement of the) quality of patient care is limited and is mostly based on self-reports [23]. This clearly sets the agenda for future CES evaluation studies.

Strengths and limitations of the study
There are several strengths and certainly limitations of this study. A unique strength is the fact that this study focuses on changes over time after two years of ERG or MCD at seven different departments within three different hospitals by means of indirect observations of reported answers (i.e. not by asking respondents how they perceived changes over time themselves). Another strength is that this study combines a speci c clinically relevant topic (i.e. the use of coercion in mental health care) with more general evaluative measures of CES. The latter are potentially interesting for other CES evaluation studies. Also, the fact that we were able to compare the ndings of this cross-sectional survey study with the focus group interview study and the experiences we had when working closely with the hospitals contributed to a better understanding of the quantitative data. The choice for measuring changes over time with respect to normative attitudes, team cooperation and constructive disagreement ts well with the theoretical understanding of ethics support, in particular ERG and MCD. Finally, we learned about how to develop and execute a speci c research design and methodology. The latter also applies to the limitations of this study.
The few actual longitudinal data stresses the importance of guiding and monitoring the response rate more intensive in future evaluation studies. The mixed model analyses helped us in this respect yet more longitudinal data is preferable. Furthermore, studying change over time within seven different departments in three different hospitals made it di cult to relate the changes over time to the ERG or MCD sessions themselves. Finally, despite the signi cant changes over time, the differences between no, little or much ERG participation were generally small in absolute terms. The fact that most respondents only participated in a few ERGs over two years could also play a role here: perhaps more participation in more frequent ERGs will contribute to stronger changes over time. A meagre comfort is perhaps that when measuring changes over time after the implementation of a complex intervention one always is confronted with serious methodological challenges [43]. According to Craig et al [87], a lack of nding effects of a complex intervention may perhaps more re ect implementation and methodological challenges rather than genuine ineffectiveness of the intervention. Yet, this is still something we need to explore. Experiences with and results of these kind of explorative studies on the impact of CES might pave the way to new study designs with control groups and some sort of randomization in combination with qualitative research methods. Learning about the actual contribution of clinical ethics support to a better health care is important; for researchers, for health care professionals, for ethics support staff, and for patients. For, despite the intrinsic value of participating in ethics support activities, clinical ethics support inherently aims, and should aim, at actually improving clinical practices.
Footnote: [1] This nding is also reported in another paper from the overall PET study. In a national Norwegian survey among various mental health care professions, psychologists were more critical towards the use of coercion than psychiatrists [77].

Conclusions
Structural ERGs and MCDs at the ward seems to contribute to employees reporting a more critical attitude towards coercion, more user involvement around coercion and a more constructive handling of disagreement within their teams. However, differences were generally small in absolute terms possibly due to the low amount of longitudinal data. Studying changes over time in clinical practice and trying to nd a relationship between CES interventions and CES outcomes is di cult yet important; this needs to be further developed in future CES evaluation studies. Explorative quantitative evaluation studies, like this repeated cross-sectional survey, can be a rst step from qualitative evidence towards more robust quantitative evidence with the use of control groups. Studying changes over time due to the use of ethics support contributes to the further professionalization and critical appraisal of the quality and usefulness of ethics support. It also helps to better appreciate the value of ethics support as such and the results it can bring about. Since, in the end, ethics support is inherently aiming at improving practices.

Declarations
Ethics approval and consent to participate The protocol for the research project has been approved by the Norwegian Social Science Data Services where aspects of privacy protection were assessed (approval September 17, 2012, project number 31360). Informed consent was obtained from all respondents for participation and publication. Since the study does not include patients as respondents, we were not, according to Norwegian regulations, obliged to seek approval from the Regional Committee for Medical and Health Research Ethics (ACT 2008-06-20 no. 44: Act on medical and health research, § 4). All methods were carried out in accordance with relevant guidelines and regulations.

Consent for publication
All respondents in this study gave their informed consent for participation in the survey study and publication. A draft of this manuscript has been sent to the seven departments for member check.
Availability of data and materials The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.