Descriptive statistics
Responses distribution on five categories for each item in the K6 is presented in Table 1. We can see the symptoms distributed as a positive skewness. The majority of people have no symptom, while only a few have severe symptoms. According to the cut point of 12/13, the prevalence of psychological distress among the current sample is 5.3%.
PLEASE INSERT TABLE 1 AROUND HERE
Examining factor structure
Assessment of dimensionality
Scalability of the K6 is presented in Table 2. For inter-item pairs, the inter-item scalability coefficients (Hij) range from 0.47 to 0.68. For items, the item scalability coefficients (Hi) ranged from 0.57 to 0.59. For the whole K6 scale, the scalability coefficient was 0.58(SE=0.009). All the scalability coefficients were significantly greater than the conventional lower-bound value of 0.3. The results suggested the K6 should be considered as a scale of strong strength. The internal consistency of the six items was also excellent (Cronbach's alpha =0.87).
PLEASE INSERT TABLE 2 AROUND HERE
We further explored the dimensionality for all the six items by conducting iterative automated item selection procedure (AISP). The results were presented in Table 3. We followed the recommendation of Hemker et al. (1995), and set an initial value of lower bound c from 0 to 0.75 with increment steps of 0.05. For 0 ≤ c ≤ 0.55, all six items were selected to form one scale. For c=0.6, two scales emerged, including items 1-3 and items 4-6, respectively. For c=0.65, items 1 and 3 were unscalable. For c>0.7, all items were unscalable. The c value is suggested to set at 0.3 in practice, because the solution produced by the AISP is often hard to interpret when c ≥0.35 (Sijtsma & van der Ark, 2017). Therefore, the results from the AISP confirmed the unidimensionality of the K6.
PLEASE INSERT TABLE 3 AROUND HERE
Assessment of local independence and monotonicity
Moreover, we examined local independence and monotonicity ensure the data were adequately fit to the Mokken scale. For local independence, no item-pair was flagged as locally dependent according to two indices (W1 and W2) calculated in the conditional association procedure [37]. That is, there is no evidence of violating local independence. For monotonicity, the results showed that no item violated the monotonicity assumption. Graphical analysis indicated that all items showed monotonical increases (see Figure 1).
PLEASE INSERT FIGURE 1 AROUND HERE
PLEASE INSERT TABLE 4 AROUND HERE
Reliability
Table 2 shows the MS method reliability-estimate. Table 1 also provides coefficients α =0.87 (Cronbach, 1951) and λ2=0.87 (Guttman, 1945). All estimates are close to .9, and thus satisfactory. The corrected item-test correlations were satisfactory for all items, ranged from 0.64 to 0.70.
Gender differences
We also conducted the same analyses on the data from the male and the female subgroups separately. A similar pattern emerged for the scalability assessment among these two samples. Therefore, the K6 assesses psychological distress in a similar way and with a similar strength both gender.
Examining measurement invariance
Following the procedure proposed by Choi et al. [32], we conducted differential item functioning (DIF) analysis under the hybrid iterative LR/IRT framework with "Lordif" package in R. Three ordinal logistic models (models 1, 2 and 3) were established for each item involving item performance, latent trait score, group membership, and the interaction between the latter two. Model 1 is a baseline model, including only the latent trait score as the predictor. Model 2 is a uniform DIF model, including the latent trait score and group membership as predictors. Model 3 is a non-uniform DIF model, including latent trait score, group membership, and their interaction as predictors. DIF detection is based on the likelihood ratio (LR) χ2 test at the α level of 0.01. A significant difference in the log-likelihood values between Model 2 and Model 1 reveals uniform DIF, while a significant difference between Model 3 and Model 2 indicates non-uniform DIF. DIF magnitude is based on McFadden’s pseudo-R2, <0.13 as negligible, 0.13<R2<0.26 as moderate, >0.26 as large [33]. Under the framework, the latent trait score was estimated by default fitting Graded Response Model (GRM).
Figure 2 illustrates trait distributions of the male and the female. The male has lower mean scores than the female, but there is still a broad overlap. Table 5 presents the main results of DIF analysis. According to the LR χ2 test, Item 4 and Item 5 were marked for uniform DIF, but none was flagged for non-uniform DIF. Diagnostic plots for the two DIF items display in Figure 3 and Figure 4. Further examination of these two items revealed that for the same latent trait score, females are always rated with higher frequencies on than males. For both items, the lower-left graph shows the uniform DIF was mainly caused by the fifth category threshold value (3.31vs.2.9, 2.57 vs 2.45). However, McFadden's pseudo R2 statistics (no more than 0.0011) indicated that the magnitude of DIF was very small for each item. Figure 5 is a graphical representation of the impact of all items and DIF items on the whole scale. The left one shows the impact of all six items, indicating negligible difference across gender. The right one shows curves for the 2 DIF items, indicating that female score a bit higher when sex group-specific parameter estimates were used.
PLEASE INSERT TABLE 5 AROUND HERE
PLEASE INSERT FIGURE 2 AROUND HERE
PLEASE INSERT FIGURE 3 AROUND HERE
PLEASE INSERT FIGURE 4 AROUND HERE
PLEASE INSERT FIGURE 5 AROUND HERE