Characteristics of the patients
A total of 3354 patients were included in the study. Table 1 showed the distribution characteristics of scores of K10. The K10 had an average score of 3.34 (standard deviation was 5.28), with a median score of 1.00 and the scale’s score range was 0-40. The study comprised 1885 females (56.2%) and 1469 males (43.8%), with a mean age of 56.0 years (range 18-91 years). The Mann Whitney U test of K10 scale showed that there were significant differences in the total scores of gender and age. The average score of female psychological distress was higher than that of male, and the average score of psychological distress of the elderly was higher than that of young people. In addition, the test of single item showed that the average scores of other items except item 10 were higher in women than those in men. The average scores of K10 of the elderly in items 5 and 6 were higher than that of young people, and there was no significant difference in the average score between the elderly and young in the remaining items.
Unidimensionality
Principal component analysis showed that the residual explained 60.22% (greater than 50%) of the original variance. Moreover, the ratio of the characteristic root of the first factor to that of the second factor was 7.642 (the ratio was far greater than 3). The Infit MNSQ value for each item (see Table 2) was acceptable (0.928-1.072). Both results indicated that there was no evidence that K10 was multidimensional.
The goodness of fit of the Rasch model
In the Rasch model, infit MNSQ values were usually used to evaluate the goodness of fit of the items. As Wright and Linacre's (1994) suggestion, when the infit MNSQ value of an item was greater than 1.5 or less than 0.5, it could be considered that the fitting effect of the item was poor. Rasch model analysis showed that 10 items in K10 scale had a good fitting effect (infit MNSQ values, 0.928-1.072), as shown in Table 2.
Threshold order and Category probability curves
There was no evidence of disorder threshold in the category probability curve, and the category threshold increased in an orderly manner (as shown in Figure 1 and table 2). All projects had five response categories, and it was recommended that each item had four thresholds. The items 1, 3, 5, and 10 in K10 were slightly lack of discrimination, which showed that P2 (a little of the time) was indistinguishable from P1 (none of the time) and P3 (some of the time), and P2 was poorly differentiated. Since the performance of these items were not good, the subsequent screening indicators could be deleted.
Reliability
The Cronbach' alpha coefficient of K10 was 0.916 (0.907, 0.924). Removing the item 5 would increase Cronbach' alpha value to 0.919 (0.911, 0.927), while deleting any other items would decrease Cronbach' alpha value (see Table 2).
Differential item functioning (DIF)
The items of K10 were assessed for DIF across gender and age (table 3). We used likelihood ratio (LR) Chi-square value to test DIF, and set the detection level α to 0.01. Significant DIF was found on item 3 when assessing age. However, DIF between ages was not significant when α was adjusted by Bonferroni (0.01/10 = 0.001). There was no DIF difference between genders.
The block diagram on the left of figure 2a showed the dispersion of K10 scores between ages. The shaded boxes in the block diagram were inter-quartile range, representing a 50% difference in the middle (the boundary is at the bottom and top of the shadow box), ranging from - 0.002 logit to + 0.003 logit, with a median value of about 0.001 logit. The left of figure 2b showed a box plot of the differences of gender. The interquartile range, representing the middle 50% of the differences (bound between the bottom and top of the shaded box), ranged from - 0.002 to 0.003, with a median value of about 0.001.
In the graph on the right the same difference scores were plotted against the initial scores ignoring DIF (“initial theta”), separately for younger and older individuals. Guidelines were placed at 0.0 (solid line), i.e., no difference, and the mean of the differences (dotted line). The positive value on the left side of the graph showed that for older, if DIF was considered, the score would decrease (i.e., ignoring the true score of DIF minus the score of DIF greater than 0, so the calculated DIF score was less than the true score), which was the opposite for younger. Figure 2b showed that the median 50% difference in gender DIF ranges from - 0.010 to 0.008. The solid line (standard line) on the left coincided with the dashed line (the average of the differences) indicating that DIF did not differ between genders. The influence of DIF on K10 score of male and female could be ignored.
Person-item map
In the person-item map, the histogram of human ability estimation θˆ was shown at the top. The black dots on the lines at the bottom of Figure 3 were the threshold of item difficulty. In the K10 scale, the individual's ability θˆ was between - 3.00 logit and - 2.00 logit, and individuals were more inclined to choose items with low scores of "none of the time" and "a little of the time". In other words, participants were more likely to choose items with difficulty θ higher than 1.00 logit. Figure 3 showed that the difficulty of items 1, 3, and 5 were around 0.00 logit, while the difficulty of other items was generally around 1.00 logit, and the overall difficulty of K10 was relatively high.