We developed a new customer satisfaction measurement scales model and test its reliability and validity. Given that the quality of a study’s results are related directly to the quality of the instrument used to collect data, it is easy to see the importance of collecting data by means of reliable and valid instrument (Andrew DPS, et al, 2001).
Our sampling was exhaustive. The criterion recommended by Hair et al., says that for an adequate sample size, it is necessary to have between 5 and 10 individuals for each instrument item (Hair et al, 2009). To Tabachnick and Fidell, factor analysis validity is compromised with less than 300 individuals (Tabachnick, BG., & Fidell, LS., 2007). Our new instrument had 14 items in its application version, which would require a minimum sample size of 70 people, according to Hair et al. (2009) criterion. Three hundred and thirty people composed our sample that attended to both criteria, allowing the exploratory and confirmatory validations to be performed.
Before performing a factor analysis, the literature suggests evaluating the sample size adequacy using the Kaiser-Meyer-Olkin test of sampling adequacy (KMO). Furthermore, it is necessary to assess whether the factor analysis should be continued or not by employing Bartlett’s Test of Sphericity (Schmidt and Hollensen, 2006, pp.302-303). These two tests indicate the suitability of the data for structure detection. The Kaiser-Meyer-Olkin Measure of Sampling Adequacy is a statistic that indicates the proportion of variance in the variables that might be caused by underlying factors. Kaiser gave the KMO test standard about whether it is suitable for factor analysis: KMO > 0.9, quite suitable; 0.9 > KMO > 0.8, suitable; 0.8 > KMO > 0.7, generally suitable; 0.7 > KMO > 0.6, not quite suitable; KMO < 0.5, not suitable. (Ershi Qi et al, 2013). Bartlett's test of sphericity tests the hypothesis that correlation matrix is an identity matrix, which would indicate that variables are unrelated and therefore unsuitable for structure detection. Small values (less than 0.05) of the significance level indicate that a factor analysis may be useful with data. Table 1 shows that in the present test, the KMO value of the variables was 0.934, which indicated sampling adequacy such that the values in the matrix was sufficiently distributed to conduct factor analysis (George D.and Mallery P., 2016.). The value obtained by Bartlett’s test of sphericity, Approx. Chi-Square was 8,249.985, which was highly significant at p < 0.001 level, indicating that the data were approximately multivariate normal (Pallant J., 2013.). The results of KMO and Bartlett’s Tests proved satisfactory for further analysis (table 1). So the variables that the paper selects are quite suitable for factor analysis.
In factor analysis, methods of Principal Component Analysis (PCA) and varimax rotation were employed because they maximize variance and facilitate the interpretation of the constructs deduced. In view of the arbitrary nature of factor extraction, and practicability and meaningful interpretability, the following three criteria were observed in data reduction: (1) the eigenvalue was greater than 1 and there were more than 3 items in one factor; (2) factor loadings lower than 0.4 were deleted and not counted in any factor; (3) when double loadings occurred, decisions were made on meaningful interpretations. (Xu Q. and Liu J., 2018).
Based on the three criteria mentioned above, three common factors were extracted from the questionnaire. Table 2 shows that the accumulative contribution rate of three extracted common factors is 93.481%, which is bigger than 85%, i.e., the extraction of common factor is effective (Huang C, et al, 2020). Scree plot also flattened out after the first three factors. The original 14 indexes can be integrated into three common factors. According to the principle of factor analysis, the three common factors have no correlation with each other, but each common factor is highly correlated with its own contained original variables.
The three common factors extracted were named according to the items included. Table 3 shows the correlation coefficient between common factors and their own contained original variables. As a result, it is suitable to use reliability of tests’ results (TR), Responsiveness of services (RS) and Laboratory Personnel’s willingness to help (LP) to represent the original variables and evaluate customer satisfaction with laboratory services.
The World Health Organization (WHO) indicates that evaluations of client satisfaction might address various aspects of the provided services: reliability and consistency of the services, the responsiveness of services, and the willingness of providers to meet client’s expectations and needs (WHO, 2000). Our construct meets the WHO recommendations.
The validity or quality of the items that composed each factor was also analyzed, based on Comrey and Lee classification. Comrey and Lee classified items with loadings higher or equal .71 as excellent; higher or equal .63 as very good; higher or equal .55 as good; higher or equal .45 as reasonable; and higher or equal .32 as poor (Comrey, A. L., & Lee, H. B., 1992). Thus, as to the items’ quality, 100% of them were classified as excellent.
Questionnaires must be both reliable and valid in order for researchers to have confidence in the data collected with the instrument. Reliability, the consistency of the results obtained, concerns the extent to which the instrument yields the same results in repeated trials (Andrew DPS, et al, 2001). The most common test for a construct’s internal reliability is Cronbach alpha. However, more recently composite reliability and Jöreskog’s Rhô have become more pertinent measures of construct reliability in research studies that utilize Structural Equation Modeling (SEM) and Confirmatory Factor Analysis (CFA) as part of their data analysis. There is a consensus in the literature that a score of 0.7 or higher is indicative of a construct’s reliability (Hair, 2010; Nunnally, 1978; Malhotra, 2010). In this study, Cronbach alpha, Jöreskog Rhô and Composite Reliability are greater than 0.7 which confirm the good reliability of the construct (table 4).
Validity, the extent to which an instrument accurately measures the target it was designed to measure, helps a researcher determine whether or not an instrument addresses its designed purpose (Andrew DPS, et al, 2001). Testing of construct validity concentrates not only on finding out whether an item loads significantly on the factor it is measuring (convergent validity) but also on ensuring that it does not significantly load across or measure other factors (discriminant validity) (Usunier JC and Stolz J., 2016). Our results confirmed the convergent validity and discriminant validity of the constructs (table 4).
CFA were performed to compare three different models: (1) a null model; (2) a one-factor model and (3) a two-factor model. To determine how well the specified factor model represented the data, goodness-of-fit indices were examined (table 5). There are several indices to assess model-fit and they are categorized into three groups, namely absolute fit indices, incremental fit indices and parsimony fit indices (Frikha A., 2019). The two-factor model was chosen as the best fit model based on the cutoff criteria for good model fit recommended by Kline (Kline R.B., 2015).
In order to assess the structural model, Hair et al., (2014) proposed five step structural model assessment procedure. 1) Assess structural model for collinearity issues 2) Assess the significance and relevance of the structural model relationship 3) Assess the level of R2 (coefficient of determination) 4) Assess the effect size f2 5) Assess the level of q2 effect size. The results of the structural model analysis, shown in table 6, meet the criteria of the Evaluation of Assessment Model based on the Partial least Squares Structural Equation Modeling (PLS-SEM) analysis procedure. Thus, our three hypotheses were confirmed: the three latent variables have a positive influence on customer satisfaction.
Limitations of the study and future research directions
This study’s limitations must be acknowledged. Since this is the first instrument of its kind to have been fully validated, there are no gold standards to evaluate criteria against it. Criterion related validity cannot be established for this instrument. The major inherent limitation is the generalization of the outcome of the study. Since the study was limited to only the University Hospital of Kinshasa and not the entire hospital market in Democratic Republic of the Congo (DRC), attempt to generalize the results should be made with caution since the study was not cross-sectional across the entire health system in DRC. Future research should, therefore, reproduce the study in other hospitals in order to confirm the results of our findings across the health system. Because we surveyed attending physicians only, we can’t confirm that the developed instrument is reliable or valid for patients. Additional research could develop another instrument for measuring customer satisfaction among patients and other customers who attain the clinical laboratory. Finally, the test-retest reliability of the instrument should be evaluated. Measures of reliability include the stability of an instrument over time. Therefore, the stability of this new instrument, including short- and long-range stability, should be further investigated using the test-retest correlation method.