Health system workforce’s performance capacity towards integrated leading, managing and governing practices: with special reference to the application of factor analysis and ordinal logistic regression

Abstract Observing over-led and under-managed, over-managed and under-governed, and even out of these health organizations remains a common phenomenon in low and middle-income countries. The current study looks at the health system workforce’s performance capacity towards integrated leading, managing, and governing practices and its predictors in Ethiopia. Eight hundred thirteen health facility employees completed a multi-item questionnaire. The data were fitted to factor analysis and ordinal logistic regression models. The factor analysis was employed to develop a scientifically reliable and empirically scalable measurement model. The model was assembled from items rated, factors extracted and error variances observed. Besides, the health system workforce’s performance capacity was computed and labeled. Moreover, the ordinal logistic regression was conducted to identify predictors of the performance capacity. The outputs of factor analysis provided a four-factor measurement model. This model had acceptable estimates, composite reliability, and average variance extracted. Eighty-four percent of the participants had reported low (41.3%) and moderate (42.7%) performance capacity towards integrated leading, managing, and governing practices. Sex and responsibility were among significantly associated predictors. Empowering the health system workforce towards integrated leading, managing and governing practices using a scientifically reliable and empirically scalable model is important, particularly in resource-limited settings. In this regard, the policies and strategies should give due attention to females and service owners. The current results could provide a foundation for training and future research.

Second, preaching these paths as they belongs to an elite group of people who are naturally gifted, rather than discoursing as every ones job is another reason. Third, poor integration between the three worlds: civil service, academic sphere and research institute. Usually, the workforce residing in civil service focus on how to do it; those nesting in academic sphere emphasize on what it is, and those living within research institute stress on how to model it. Finally, it would be that the integration concept is donor driven, meaning it is not initiated from within.
Studies indicate that the health system workforce capacity towards leading or/and managing or/and governing practices improves Key Performance Indicators (KPIs) [1,13,[22][23][24][25][26]. Some other studies report that there is a significant duplication among the practices of the three dimensions [22,27] . Moreover, integrating distinctly put, but duplicated practices will be difficult unless the diverse L+M+G practices are amalgamated meaningfully. This, in fact, needs rethinking and redesigning beyond the rhetoric [25,28,29].
Thus, the current study levels the performance capacity of the health system workforce towards integrated L+M+G practices and identifies predictors that affect it in northwest Ethiopia. This is done after developing and testing a measurement model, benchmarking the Management Science for Health (MSH) integrated health system L+M+G practices framework [11,14], illustrated in figure 1. This framework was developed in 2015 [14], and has been introduced in many low and middle income countries including Ethiopia [4,11,14]. As indicated in figure 1, the model consists of twelve practices, of which fourpractices are put parallels for leading, managing and governing paths in equidistant [28,30,31]. Nonetheless, neither how these practices are correlated nor modeled is yet reported. Why do not people see it?
Generally, this may be due to the long-standing rhetoric that leading [16,17], managing [18,19] and governing [20,21] are seen as distinct paths. The interruption power of this thought towards seeing soundly correlated and modeled L+M+G practices can be exemplified with the problems that have been raised by considerable participants who have trained putting the above model at the locus in Ethiopia [4]. The problem can be summarized with two-fold: first, the model comprises numerous jargons which needs more investment on what it is, than how to do it; second, while the jargons can be understood with what is so ever investments, the issue of cross duplications among them are followed.
Thus, the results of this study could support to scheme scientifically reliable and empirically scalable capacity building policies and strategies towards performing integrated health system L+M+G practices.

Study design, participants and data collection
This study, designed to be cross sectional, aimed at leveling the performance capacity of the health system workforce towards integrated L+M+G practices and identifying its predictors. Eight hundred thirteen participants, aged 18 years and greater, were included in the current study. These were the health system workforce, selected from 32 health care organizations located in northwest Ethiopia. Data from these participants were collected using structured self-rated multi-item questionnaire. Each participant rated his or her respective staff capacity towards performing integrated L+M+G practices. The process was strictly anonymous and questionnaires completed were stored in a locked cabinet.

Measures
Twenty-six items with five-point Likert scale were rated. The scales were ranged from 1 = very low to 5 = very high. The test stimuli (psychometric properties) of the questionnaire was refined through rigorous debriefing sessions. The debriefing was focused on instrument clarity, question wording and validity. Five specialists of health service management, of whom three were from civil service and two from the academic spheres were involved in reviewing the questions and wordings. The questionnaire was pre-tested on 42 participants working in similar settings but out of the actual study area. Finally, an item was retained only if there was an internal consistency, alpha value, of 80% or greater [32].

Analysis
Data were entered using Epi-demographic Information (Epi-Info) version 7 and analyzed using Statistical Package for Social Science SPSS version 22. In analyzing this data, different techniques such as descriptive, exploratory factor, composite reliability, average variance extraction and ordinal logistic regression were employed.

Descriptive analysis
Information about the socio-demographic characteristics of the participants, and the central tendency of the rated items was summarized with descriptive statistics.

Exploratory factor analysis
Exploratory Factor Analysis (EFA) was unlocked to assemble the relationship among factors extracted, items rated and error variances observed. Here, an item was the measured variable that is included in the data collection questionnaire; a factor was unobserved variable which typically could not be directly measured, but it was assumed to cause the observed scores on the item; and error variance was the portion of the factor that could not be predicted from the remaining factors.
About five data to model fit indices were tested: inter-correlation of .3 and greater, Kaiser-Meyer-Olkin (KMO) test of overall Measure of Sampling Adequacy (MSA) of .5 and greater with Bartlett's test of sphericity (P<.05), intra-item consistency of .7 and greater, total variance explained of 60% and greater, and communality of .5 and greater [33][34][35].
The communality represented proportion of each item's variance that can be explained by the factors [34,36]. Due to violating the rule of communality (table 3), two items were removed from the original 26-item dataset and reduced it to a 24-item dataset.
EFA with the these dataset was iterated to extract factors and to display factor loadings using principal axis factoring method with varimax rotation and a cut point of eigenvalue greater than 1. But, another four items (discussed somewhere else) were removed from the 24-item dataset, due to violating the rule of complex structure, that is, any factor had not been resided on more than one item with factor loadings of .4 and greater [33]. It showed that the dataset that satisfied the necessary requirements of factor analysis was reduced again to a 20-item dataset. From this dataset, the measurement model intended for leveling the performance capacity of the health system workforce towards integrated L+M+G practices was developed. Moreover, to make this measurement model more meaningful, factors extracted were labeled considering the contents (scientific and empirical domains) of the items clustered within them [36][37][38].

Composite reliability and average variance extraction
Composite Reliability (CR) and Average Variance Extraction (AVE) were calculated to test the reliability and validity [39,40] of the measurement model using excel 2016. The reliability of the model was tested with CR, which was calculated from the squared sum of factor loadings divided by the squared sum of factor loadings and the sum of error variances [41]. To reaffirm the reliability of the model the CR was triangulated with the total variance explained. Whereas, the validity of the model was tested using the AVE, which was triangulated with factor correlations. The AVE was calculated from sum of factor loading squared divided by sum of factor loading squared and sum of error variances [42]. The squared root of AVE was also considered, by which the presence of validity was supported if this value was greater than most of the correlation coefficients of items clustered within its own factors. Correlations were also tested whether they were significantly different from zero, which supported convergent validity; or the higher number of times that the item highly correlated within its own factor compared with the items of the other factors, that in fact covered the divergent validity. Here, the percent of variability that the items shared was determined by squaring the correlations between items and multiplied by 100. Generally, the rule is that items should relate more strongly to their own factor than to another factor.

Labeling the performance capacity of the health system workforce
The performance capacity of the health system workforce towards integrated L+M+G practices was computed from the 20 items indicated on the measurement model ( figure   2). This again was leveled into four ordinal categories, which were labeled as low, moderate, high and very high that represented scores of <60, 60-79.99, 80-94.99 and >95 respectively, which based the performance appraisal guideline in Ethiopia (un published document).

Ordinal logistic regression analysis
Ordinal Logistic Regression Analysis (OLRA) with logit link function was conducted to model the relationship between the performance capacity of the health system workforce towards integrated L+M+G practices and its predictors: socio-demographic characteristics and items trimmed from factor analysis. Model fitting information that was tested by (-2Log Likelihood) was significant at p<.001. Besides, the consistency of the observed data tested with Pearson chi-square goodness-of-fit was remained satisfactory with p = 1.
In addition, the explained variance of the outcome variable from the predictors was tested by pseudo r-squared value (Nagelkerke's R 2 =.765), which indicated strong association.
Moreover, the test of parallel lines or testing proportional odds assumption that is testing whether the location parameters (slope coefficients) of predictors were the same across outcome variable categories was tested by (-2Log Likelihood, and became non-significant with Probability value (P) = .487. This showed that the slope coefficients were the same across response categories, which told that there had no evidence to reject the parallelism hypothesis. Finally, to interpret the impact of individual predictors in a better way, odds ratios with 95% Confidence Intervals (CIs) were calculated from the odds. Table 1 presents the participants socio-demographic characteristics. From 813 participants, 396(48.7%) were females and 582 (71.6%) were service owners. Their mean (+ standard deviation) age was 29 (+5) years. The ages were ranged from 18 to 56 years. Table 2 indicates the means (x̅ ), standard deviations (s), and correlations (r) of measuring items. The means and standard deviations were included to show the central tendencies together with the corresponding dispersions as part of the descriptive statistics.

Indices test of measuring items
The inter-correlations presented on the off diagonal part of the table ranged from .328 to .812. When each correlation is squared and multiplied by 100, it determines the percentage of variability that the respective two variables shared. For example, when the coefficient .328 that is the coefficient between item "12" (row) and item "1" (column) is squared, it becomes .108, and when multiplied by 100, it is 10.8%. This shows that the two items shared 10.8% of the variability each other.
Likewise, the correlations indicated on the diagonal section of the table were the correlations of the individual items with itself, which often gives the value of one, telling perfect correlation. Respecting the rule of thumb that is removing a single item with a minimum value at a time, a series of factor analysis was run until all the items had a communality of .5 and greater. Thus, by removing "set annual and strategic organizational plan"; and iterated the analysis, the communality for " allocate adequate resources for work became .481. The analysis was iterated eliminating it, and at this point the output, particularly presented within the bracket, showed that the remained items had communality .5 and greater. At this stage, the original 26-item dataset was reduced to a 24-item dataset. Table 4  accounted for considerable level of variance than the other three that is 52.612% compared to 6.596%, 5.070% and 4.156% move from factor 2 to factor 4, but when rotated, it accounted for only 20.572%, compared to 15.771%, 13.798 and 12.716% respectively. Table 5 provides factor loadings and communality values of the 20-item dataset. This was displayed after trimming another four items that violated the rule of complex structure.

Factor loadings
Firstly, "provide appropriate feedback to other organization members" with factor loadings of .446 and .544 within factors 1 and 2; and "look for best practices in the last 12 months" with factor loadings of .415 and .522 contained by factors 3 and 4 respectively.
Secondly, "match deeds to words" with factor loadings of .508 and .457 within factors 2 and 4; and "develop a structure that provide accountability and authority" with factor loadings of .405 and .607 enclosed within factors 2 and 3, respectively.
In the factor loadings table, unless the coefficients .4 and less were suppressed to emphasize that which factor was highly loaded on a specific item, other ways all the factors had a loading on each item. Perhaps, if each loadings was displayed, it could be supportive to check the communality of each item manually [34], which was displayed in the last column of the table. Likewise, unless factor loadings were sorted by size, other ways the table could also be presented differently.

Factor labeling
The four factors extracted were labeled considering the contents (scientific and empirical domains) of the items clustered within each of them [36][37][38]. As indicated on table 5, the eight items that loaded highly on factor 1 seemed to value the different aspects of organizational principles. Thus, this factor was labeled as compliance with principles.
Besides, the four items that loaded highly on factor 2 contained diverse characteristics of strategy. Hence, it was termed as strategic sensitivity. Additionally, another four items that loaded highly on factor 3 appeared to correlate to various features of system development. Then, it was named as system building. Finally, the remaining four items that loaded highly on factor 4 gave the impression to relate to context. Accordingly, it was called as contextual thoughtfulness.
A four-factor measurement model Figure 2 indicates the four-factor measurement model. Observing the figure from left to right, the lines rayed from performance capacity of the health system workforce towards integrated L+M+G practices denote the factors extracted. The lines radiated from each factor towards the item represent the degree of correlation of each item with the corresponding factor. The lines reflected against each item symbolize the error variance.
These variances were calculated from one minus communality (values in the last column of table 5), which explained the portion of each observed item that was not predicted from the factors. The higher error variance (.5 and greater) indicated that an item might not belong to any factor. Table 6 presents the Composite Reliability (CR), and Average Variance Extraction (AVE) of each factor. For example, the CR for compliance with principles is .921, which could be interpreted that the reliability of this factor in the measurement model was 92.1%; and the AVE for it was .598, which means that it explained 59.8% of the variance of the corresponding items in the measurement model. Figure 3 indicates the levels of performance capacity of the health system workforce towards integrated L+M+G practices. For instance, about 41.3% of the health system workforce had low level of performance capacity towards integrated L+M+G practices. Table 7 displays estimated coefficients of the ordinal logistic regression model. The estimates labeled "threshold" indicated that where the latent variable was cut to make the groups that were observed in the table, other ways it was not used in the interpretation of the results.

Predictors of performance capacity towards integrated L+M+G practices
The other estimates labeled "location" were the ones that the researchers interested in, which were the coefficients (odds) of the predictors.
To interpret the impact of individual predictors in a better way, proportional odds ratio with 95% CI, for each categories of predictors were calculated by coefficient exponentiation, which were indicated in the exponential (EXP) column of table 7. From the observed significance levels: sex and responsibility were significantly related (P<.05) to the levels of performance capacity of the health system workforce towards integrated L+M+G practices, whereas, age, educational level and service year were appeared nonsignificant (P>.05). For example, the odds ratio of male health system workforce was 1.502 (95% CI, 1.038 to 2.173); which could be interpreted as being male health system workforce was 50.2% higher to perform integrated L+M+G practices in a very high level compared with those of females (p = .031), holding all other predictors in the model constant. Furthermore, all the six items that were trimmed from the measurement model and treated as predictors were significantly related to the levels of performance capacity (p<.05).
For instance, the odds ratio of the health system workforce who had very low rate of 'look for best practices in the last 12 months' was .029 (95%CI, .011 to .080). This could be inferred as very low rate of "look for best practices in the last 12 months", reduced the workforces' higher level of performance capacity towards integrated L+M+G practices by 97.1% compared with very high rate of it (P = .000), holding all other variables in the model constant.

Discussion
The current study levels the performance capacity of the health system workforce towards integrated L+M+G practices. It also identifies predictors related with this performance capacity. These are done after developing and testing a four-factor measurement model. This measurement model provided from a study done in Ethiopia can catch the attentions of the health system workforce at all levels, particularly in low and middle-income countries [4,11,12,14]. In fact, the workforce in such countries are assigned to ensure UHC in an increasingly socially, politically, economically and technologically turbulent ecosystem [4,5]. These turbulences might be overcome by empowering the health system workforce towards integrated L+M+G practices, using scientifically reliable and empirically scalable models. This concept is supported by some studies, which report that capacitating the health system workforce with the 12-practice integrated L+M+G framework improves the Key Performance Indicators (KPIs) of health services [22][23][24][25]. Nevertheless, few other studies report that there exists significant duplications among the practices assembled in the 12practice framework [22,28]. This might emanate, on one hand, from the rhetoric that accounts these paths as separate [17,18,21], and on the other hand, from the lack of using statistically reasonable techniques in modeling the 12-practice framework.
Thus, the current study answers the existing gaps in modeling integrated leadingmanaging-and-governing practices, through three main actions. Firstly, these authors collected the data using the multi-item questionnaire that incorporates a representative number of items from the three none hostile paths. Secondly, they employed a statistically reasonable technique that is factor analysis with varimax rotation in extracting factors. Finally, they developed a measurement model by assembling these four factors together with the items rated and error variances observed.
In the meantime, to make this measurement model more meaningful, the four factors extracted (table 5) are labeled based on the contents that the factor loadings reflect [36][37][38]. The first factor is named as compliance with principles. The word compliance describes act of acquiescing with a set of rules and the term principle explains accepted rule of action. Thus, compliance with principles could be stated as ability to act with accepted set of rules. The second factor is labeled as strategic sensitivity. Here, strategic describes mindfulness about mission and vision and sensitivity refers to strong attention.
Hence, strategic sensitivity might be operationalized as intensity of mindfulness towards mission and vision [43]. Likewise, the third factor is termed as system building. System means group of interdependent components that form unified whole [14] and building refers to improving interactions among the components. Therefore, system building might possibly referred as ongoing process of improving interaction among the components. The final factor is denoted as contextual thoughtfulness. The term contextual refers to state of exploring conditions regarding to the environment and the word thoughtfulness describes deliberate thinking before doing something. Accordingly, contextual thoughtfulness can be defined as deliberate thinking in exploring conditions regarding to the environment.
The measurement model is also tested for reliability and validity through the values of CR and AVE (table 6) respectively, which in general show that groups of items assembled in the model are nicely loaded. In addition to the CR, the reliability of the model is checked with the total variance explained (table 4), in that the value for the first factor is quite larger than the next factor [41]. Similarly, the validity of the model that is the variance is due to the construct, but not due to the measurement error is reaffirmed with the correlation coefficients (table 2), besides the AVE. At this point, the correlations are significantly different from zero showing convergent validity, and the items are highly correlated with higher number of times within their own factor compared with the items of the other factors demonstrating divergent validity [39,40,42,44].
Generally, the current measurement model is acceptable in that: all estimates sound well, all estimates are above .5, CR for all factors is above .7, the total variance for the first factor is quite larger, AVE for all factors is above .5, and items are highly correlated with higher number of times within their own factor.
As noted earlier, the current study levels the performance capacity of the health system workforce towards integrated L+M+G practices as low, moderate, high, and very high.
This leveling base the categories that are indicated in the health system workforce performance appraisal guideline of Ethiopian.
Though limited in scope, some studies report that the health system workforce capacity towards leading or/and managing or/and governing practices improves the health service outcomes [1,12,22,23,26,27]. Nevertheless, these studies would not report levels of capacity, as well as, the degree of considering the three paths in assessing it. These might be due to the belief that leadership, management and governance have been accounted for a small number of actors, perhaps only those who are legally authorized; and the dearth of representative integrated models respectively. These indicate that there should be a breakthrough that shows the importance of performing the three paths in an integrated way, using scientifically reliable and empirically scalable models, by recalling that the entire reason being human is leadership or ruler-ship regardless of situations and hierarchies [45].
The current study also models the relationship between the outcome variable and its predictors including socio-demographic characteristics, by employing ordinal logistic regression (table 7). Sex and responsibility, as well as, the six items treated as predictors such as: (1) look for best practices in the last 12 months; (2) match deeds to words; (3) set annual and strategic organizational plan; (4) allocate adequate resources for work; (5) develop a structure that provide accountability and authority; (6) [18].
Moreover, a synthesis paper on effective governance for health revealed 10 determinants of governance: leadership, corruption, management, transparency, accountability, systems to manage data, participation of key stakeholder, political context, check and balance strategy, and financial resources [20].
The above exemplifications clearly show that one path is even accounted as determinant for the other path. For instance, leadership and management are mentioned as determinants of governance. Additionally, regardless of the level of specificity, mostperhaps-all of the characteristics mentioned as determinants in one path have a twin concept in the other paths. For example, common sense, unity of direction, and participation of stakeholder are similar concepts, but mentioned as determinants of different paths. This is the other reason, in that the current study develops and tests the above mentioned four-factor measurement model, considering representative items from each path.
Moreover, other than used as hypotheses, the report of the above and the like literatures would be helpless. Particularly, when people or organizations need to develop capacity building policies and strategies based on socio-demographic characteristics, in which their relationships with the outcome variable are nevertheless modeled. Hence, to illustrate the gap and provide a founding information, the current study models the relationships between the outcome variable and its predictors, particularly socio-demographic characteristics. For instance, being male workforce has a higher performance capacity towards integrated L+M+G practices compared with female workforce (table 7). This deviation might be arose from that, limited number of females are legally authorized to lead, manage and govern organizations, mainly in developing countries including Ethiopia.
In such countries, this has a historical trend, in which breaking it and bringing adequate number of females to the stage is a troublesome investment. However, almost 50% of the participants in this study are females, which might indicate that considerable number of the workforce in the health facilities are females. Thus, whatever reasons people have, without empowering half of the segment of the workforce towards integrated L+M+G practices, getting organizations to the intended stage would be rather impossible.
The other significantly associated predictors (most at P = .000) as indicated in table 7 are the six items that are trimmed from factor analysis, and fitted to ordinal logistic regression analysis. This implies that, in scheming capacity building policies and strategies, as well as, designing further research, nesting them within the biologically plausible factor would be more meaningful. For example, among the six items that are significantly related with the outcome variable in the current study: item 1 can be captured by contextual thoughtfulness; item 2 can be enclosed within strategic sensitivity; items 3, 4 and 5 can be nested within system building; and item 6 can be contained by compliance with principles.
Away from all the implications, the dearth of available literatures that either develop or test integrated measurement model, level the performance capacity of the health system workforce towards integrated L+M+G practices and identify predictors, particularly socio-demographic characteristics that affect it, had limited the depth of our discussion.

Conclusions
Empowering the health system workforce towards integrated leading-managing-andgoverning practices is important, particularly in resource-limited settings. The policies and strategies in this regard should give due attention to females and service owners.
These policies and strategies should also consider the current four-factor measurement model, which is developed using a statistically recommended model, factor analysis.

Ethics approval and consent to participate
Ethical clearance was secured from Bahir Dar University (BDU) with a protocol number 090/18-04, and written consent was obtained from each participant.

Availability of data and material
Data and materials have been available from the corresponding author, which upon reasonable request and the permission of BDU will be released.

Competing interests
All the authors declare that they have no competing interests.

Funding
BDU funded the research. It had no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.

Authors' contributions
YA pictured the original idea, designed the study and participated in all performance stages of the project; analyzed the data and finalized to write the manuscript.
GD pictured the original idea, designed the study and participated in all performance stages of the project; analyzed the data and finalized to write the manuscript.
DH pictured the original idea, designed the study and participated in all performance stages of the project; analyzed the data and finalized to write the manuscript.
All authors reviewed and approved the final manuscript.

Acknowledgements
Our special thanks and sincere appreciation go to the study participants, data collectors, and data supervisors, for their respective valuable contribution. Our gratitude also extend to BDU for funding this study, which is part of a PhD dissertation research.

Authors' information
YA, a Public Health PhD candidate, is the lecturer of health service management in BDU.
He has also done research, and delivered community services.