Measuring the motivation of health workers: a reflection on the methodological issues and lessons learnt

Objective: In this article, we reflect on our experiences in developing and validating an instrument to measure health worker motivation in the Democratic Republic of Congo. Specifically, we recount what actually happened and identify what could have been improved upon at the design, sampling, data collection and analysis stages. Results: Key issues manifesting at the analytical and interpretative stage largely related to shortcomings in the preparatory phase, due to lack of time and resources as well as psychometric and motivation-related expertise at the outset. The main lessons learnt were that the design stage is critical as it lays the foundations for meaningful data collection and analysis. Any shortcomings at this stage inevitably reveal themselves at the analysis stage, both in technical terms as well as in regards to the interpretation of results. It is therefore worth investing time and resources when developing a motivation instrument, particularly within a sound and culturally appropriate conceptualisation of motivation. We hope by openly sharing our experience with other public health researchers working in this area that they will be more aware of some of the common pitfalls to avoid thus helping them to improve future practice.


Introduction
Low-and middle-income countries (LMICs) face significant challenges with respect to human resources for health, including: general and regional shortages workers, deficiencies in skill-mixes and inadequate training (1). According to the World Health Organization, "developing capable, motivated and supported health workers is essential for overcoming bottlenecks to achieve national and global health goals" (2). Motivation is a key determinant of health workers practice and behavior (3,4). Understanding the drivers of motivation is therefore important to improving health workers' performance.
A recent paper outlines the steps required in developing a high-quality motivation scalewhich refers to a set of statements or questions in a survey intending to measure motivation (13). Nonetheless, few studies reflect on this methodology and challenges in its practical application (14).
This article shares lessons from developing and validating a psychometric scale to quantify health worker motivation in the Democratic Republic of Congo (DRC). Our reflections are also valid in relation to the measurement of related psychological constructs such as satisfaction, perceptions, or attitudes, in health workers as well as other populations.

Main Text
The motivation scale was nested in a larger health worker survey conducted as part of the baseline evaluation of a new health systems strengthening programme, Accès Aux Soins de Santé (ASSP). ASSP covered 56 health zones in the DRC and aimed to support health facilities to deliver basic primary health services. It included interventions to improve health worker motivation but also phased out performance-based financing (PBF) in 20 health zones where it had been provided by a predecessor programme.
In addition to informing the ASSP evaluation, we aimed to compare the motivation of health workers who had previously received and then experienced the withdrawal of PBF with that of workers who had never received PBF, hypothesising that motivation levels would be lower in the former group. The results of this analysis have been published elsewhere (15).

Design stage
Initially, 47 Likert-type items for the motivation scale were selected by the principal investigators based on previous surveys they had used in other countries as well as a literature review (16)(17)(18). These items were grounded in the Franco conceptual framework of motivation (19) given its extensive use in LMICs ( Figure 1). Motivation is determined by the congruence of worker and organizational goals ("will do" motivation) and factors that are focused on the ability of the individual to execute a task ("can do" motivation).
A month before finalising the questionnaire, the principal author joined the research team as a PhD student and had the opportunity to review, modify and add questions. Given her particular interest in measuring motivation, another rapid literature review on motivation surveys was undertaken (10,11,14,(20)(21)(22)(23)(24), resulting in 11 additional questions.
The tool was translated into French and pre-tested with six health workers in two nonstudy facilities. Additional file 1 summarises the final survey items alongside each dimension.

Reflections on design
Motivation is often viewed as a complex, multi-dimensional construct. Numerous theories and taxonomies of work motivation exist (19,(25)(26)(27)(28)(29)(30). Therefore, a clear conceptualisation of motivation is fundamental to developing a good measurement instrument (13). In retrospect, we had not given this conceptual phase enough room in the design of the study and questionnaire. This was because motivation had not been a key focus initially, and there was limited time to finalise the survey.
The lack of time devoted to conceptual thinking at this stage had two consequences. First, it became apparent later on that the Franco framework-and tools developed to measure its dimensions-was not ideally suited to answer key questions related to health worker motivation and PBF; for example, Self Determination Theory may have been a better framework to examine the potential crowding out of intrinsic motivation through financial incentives (6).
Second, rather than using qualitative research to inform item development customized to framework and context, we worked with questions which had been used and validated in other contexts and then linked them back to the Franco framework. While the advantages included efficiency and using questions that had been pre-tested and used in other settings, they were not explicitly adapted to the local context. Furthermore, although we were able to easily assign all items to their respective motivation dimensions, at the analytical stage we realized that not all dimensions had been adequately covered, both in regards to their conceptual breadth and in terms of number of items per dimension. A key lesson therefore was to think about the dimensions and their relevance to the intervention/theory first, and then ensure adequate items were included to measure them.
The tool was piloted with six health workers, all of whom were secondary-level educated.
However, a larger variety of cadres and grades of health workers were interviewed in the final survey. It is possible that the more complex questions were not as well understood by those with lower educational levels, as experienced in previous research (31). The pilot sample size was also insufficient for undertaking meaningful psychometric analysis. An alternative could have entailed using "think-aloud" interviews, in which participants verbalise their thoughts while completing the survey (32), helping to uncover difficulties in understanding and differences in the interpretation of questions. The latter is particularly important in assessing content validity, ensuring the instrument measures what it intends to measure.

Sampling
The motivation scale was embedded within a wider health worker survey carried out according to the protocol of the ASSP impact evaluation, which also included a health facility and household survey (33). Motivation was not a primary outcome for the wider evaluation, and sample size calculation was based on household survey outcomes. In total, 210 health facilities were to be sampled, and all health workers present on the day of the survey were to be interviewed by trained data collectors. We had assumed that there would be on average 4 health workers in facilities based on national staffing statistics, yielding an approximate sample of 840 respondents. This met the required sample size for confirmatory factor analysis (CFA) of at least 200 observations (34), which will be discussed under "data analysis".

Reflections on sampling
Following data collection, the total number of survey respondents was only 485, almost half the number estimated, however this still met the requirements for CFA. Reasons included larger-than-expected discrepancies between staff on the payroll and staff actually available for interviews, as well as differences in staffing levels by region and facility-type.
One objective of the research was to compare motivation levels of workers who had previously received PBF with those workers who had never received PBF. However, the sampling not undertaken with respect to PBF status but rather to evaluate the ASSP programme. However, geographical overlap between PBF and ASSP was only partial, resulting in sample size imbalances between the two groups, I Implications are discussed under "data analysis".

Data analysis
We chose CFA (rather than exploratory factor analysis) to validate the scale, given that we had a priori assigned items to motivation dimensions in the Franco framework and wished to empirically confirm this assignment (13).
CFA largely supported the assumed item attribution with a few modifications (additional file 2). We then calculated scores for each motivation dimension as unweighted means of responses to items, since within each dimension, item-factor loadings were of approximately the same magnitude. Multiple Ordinary Least Squares regression models were employed to examine determinants of these 'composite' scores for each dimensions.
We also applied measurement invariance (MI) testing, which is possible in a CFA framework (13), to examine whether the scale had the same measurement properties for health workers previously exposed to PBF and those who had never received PBF.

Reflections on data analysis
On average, each dimension originally had four items assigned to it. However, in the course of the CFA model fitting, we dropped a number of items which did not group with other items as intended. For most of these items, ill model fit made us realize suboptimal phrasing of items within the scale, implying that dimensions were not well measured by questions. In retrospect, more extensive piloting may have detected these issues before the main data collection.
Dropping items in the CFA model fitting stage is a frequent reality of psychometric research (13,35). The risk, as in our case, is that it results in some dimensions being covered by only one or two items. This is not only problematic from a psychometric perspective, where a minimum of three items per dimension is recommended, but also from a conceptual perspective, as one can question whether a complex motivation construct can be assessed in its conceptual breadth by only one item. In hindsight, it would have been good to anticipate the potential loss of items during analysis, and include more items per dimension within the initial survey. As the survey was already very long, a separate survey exclusively measuring motivation might have been more appropriate.
MI testing does not necessitate a much larger sample than a simple CFA, at least so long as the subgroups to be compared are of similar size (35). As described above, the sampling strategy was designed for the primary purpose of the ASSP impact evaluation, and resulted in a comparatively much smaller sample of health workers previously exposed to PBF (118 workers) compared to the non-PBF group (335 workers). To address unbalanced sample issues in MI testing, we tested the previous PBF group alongside a random sample of workers not exposed to PBF. However, the resulting small sample sizes ended up affecting the precision of MI testing (36). Although strong invariance was still identified for a number of dimensions, two dimensions were not invariant meaning differences in scores between groups needed to be interpreted with caution.
We employed multiple regression to compare composite scores between respondents previously versus never exposed to PBF. The use of composite scores inevitably results in a loss of information at the individual item level as variation is averaged out in the calculation process. Structural Equation Modelling (SEM), on the other hand, is a more sophisticated technique which preserves full information in the data37).
Using SEM for substantive regression analyses required a bigger sample size than we had, considering the large number of parameters to be estimated (38). So long as MI testing supports equal measurement properties across groups, however, comparing composite scores across groups is a good second-best strategy to SEM.

Limitations
The conceptual and survey design stage is critical as it lays the foundations for data collection and analysis. Any short-comings and omissions at the design stage inevitably manifest themselves at the analysis and interpretation stage. Therefore, it is worth investing time and resources when designing the motivation measurement instrument, particularly in clarifying how motivation should be conceptualised in the specific context of application. Preliminary qualitative research is immensely helpful in this regard.
We did not give enough consideration to the number and characteristics of workers to be sampled, and how they would respond to the questionnaire. It is recommended to pre-test the tools with respondents who are as representative of the final survey sample as possible or at least employ "think aloud techniques", to limit biases, and ensure clarity and comprehension of questions.
Finally, although the field of health worker motivation is gaining interest, it is still not a priority outcome of large-scale impact evaluations of health programmes targeting health workers. This was evidenced here by the limited time available to develop the scale and reliance on using questions from previous studies without tailoring them adequately to the context. Given the central role played by health workers in health systems, we strongly advocate for affording the issue of health worker motivation and its drivers more prominence in research, as this will ultimately help to inform more effective interventions aimed at improving healthcare delivery. Availability of data and material

List Of Abbreviations
The data for the study referred to in this article are available from the corresponding author upon reasonable request.

Competing interests
The lead author has recently been appointed as a health adviser for the Department for  Figure 1 Franco framework of motivation The Franco framework conceptualises motivation as being influenced by various determinants and consequences at the individual, organisational and societal level. Motivation is determined by the congruence of worker and organizational goals ("will do" motivation) and factors that are focused on the ability of the individual to execute a task ("can do" motivation).

Supplementary Files
This is a list of supplementary files associated with the primary manuscript. Click to download.