3.1 Sample characteristics
Of the 20,226 questionnaires received, 798 had no responses on some of the SF-36 items. In the end, 19,428 samples were included in the study. The mean age of the sample of respondents was 14.78 years (standard deviation [SD] = 1.77), and 49.4% (9,595) were boys. Among the SF-36 and SF-12 sub-scales, the physical functioning (PF) mean score was the highest, and the role emotional (RE) mean score was the lowest. The biggest mean difference in scores between the two scales was in the social functioning domain (SF). Of the corresponding domains of the two scales, the RE dimensions were the most relevant (r = 0.923), while the smallest correlation coefficient was in the VT dimension (r = 0.670), which means domains of the SF-12 scale could reflect the information of 67.0% to 92.3% of the corresponding domains of the SF-36 scale (Table 1).
Table 1 Scores of SF-36 and SF-12 among adolescents (n = 19,428)
|
SF-36
|
SF-12
|
Mean
difference
|
Correlation coefficient
|
PF***
|
89.10 ± 14.39
|
91.64 ± 16.85
|
-2.54
|
0.800
|
RP***
|
68.86 ± 34.28
|
68.08 ± 39.44
|
0.78
|
0.897
|
BP***
|
79.97 ± 19.77
|
85.09 ± 19.25
|
-5.12
|
0.876
|
GH***
|
70.41 ± 19.53
|
62.72 ± 26.39
|
7.69
|
0.670
|
VT***
|
65.04 ± 17.19
|
62.11 ± 25.90
|
2.93
|
0.645
|
SF***
|
77.98 ± 19.07
|
66.17 ± 23.17
|
11.81
|
0.875
|
RE***
|
54.82 ± 37.45
|
52.14 ± 40.44
|
2.68
|
0.923
|
MH***
|
68.51 ± 17.18
|
64.86 ± 18.83
|
3.65
|
0.799
|
PCS***
|
75.00 ± 11.10
|
70.52 ± 13.65
|
4.48
|
0.812
|
MCS***
|
68.55 ± 14.18
|
61.32 ± 7.17
|
7.23
|
0.779
|
Abbreviations: PF, physical functioning; RP, role physical; BP, bodily pain; GH, general health; VT, vitality; SF, social functioning; RE, role emotional; MH, mental health; PCS, physical component summary; MCS, mental component summary. *p < 0.05; **p < 0.01; ***p < 0.00
3.2 Psychometric properties in classical test theory
3.2.1 Factor analysis by EFA
The construct validity of SF-36 was good in adolescents, as determined by the Kaiser-Meyer-Olkin Measure of Sampling Adequacy (0.884). Communalities of all of variables were over 0.5. Factors rotated by the varimax method such that eigenvalues were greater than 1 were extracted. Eight components were produced and explained 69.21% of the total variance. The structure loading of factors extracted and the component score coefficient matrix are presented in Table 2. The structure of the 8 domains identified (PF, RP, BP, GH, VT, SF, RE, and MH) was not supported by EFA. The domains of BP, SF, VT, and MH were not divided into identified structures, due to the strong correlations between BP and SF and between VT and MH. Details are shown in Table 2.
Table 2 Results of factors analysis of SF-36 among adolescents (n = 9,741)
|
1
-PF
|
2
-PF
|
3
-RP
|
4
-BP\SF
|
5
-GH
|
6
-SF\VT\MH
|
7
-MH\VT
|
8
-RE
|
PF01
|
-
|
0.794
|
-
|
-
|
-
|
-
|
-
|
-
|
PF02
|
-
|
0.668
|
-
|
-
|
-
|
-
|
-
|
-
|
PF03
|
-
|
0.600
|
-
|
-
|
-
|
-
|
-
|
-
|
PF04
|
0.708
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
PF05
|
0.844
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
PF06
|
0.631
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
PF07
|
0.342
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
PF08
|
0.618
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
PF09
|
0.593
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
PF10
|
0.694
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
RP1
|
-
|
-
|
0.689
|
-
|
-
|
-
|
-
|
-
|
RP2
|
-
|
-
|
0.709
|
-
|
-
|
-
|
-
|
-
|
RP3
|
-
|
-
|
0.726
|
-
|
-
|
-
|
-
|
-
|
RP4
|
-
|
-
|
0.692
|
-
|
-
|
-
|
-
|
-
|
BP1
|
-
|
-
|
-
|
0.766
|
-
|
-
|
-
|
-
|
BP2
|
-
|
-
|
-
|
0.774
|
-
|
-
|
-
|
-
|
GH1
|
-
|
-
|
-
|
-
|
0.625
|
-
|
-
|
-
|
GH2
|
-
|
-
|
-
|
-
|
0.654
|
-
|
-
|
-
|
GH3
|
-
|
-
|
-
|
-
|
0.723
|
-
|
-
|
-
|
GH4
|
-
|
-
|
-
|
-
|
0.577
|
-
|
-
|
-
|
GH5
|
-
|
-
|
-
|
-
|
0.751
|
-
|
-
|
-
|
VT1
|
-
|
-
|
-
|
-
|
-
|
-
|
0.775
|
-
|
VT2
|
-
|
-
|
-
|
-
|
-
|
-
|
0.660
|
-
|
VT3
|
-
|
-
|
-
|
-
|
-
|
0.701
|
-
|
-
|
VT4
|
-
|
-
|
-
|
-
|
-
|
0.746
|
-
|
-
|
SF1
|
-
|
-
|
-
|
0.570
|
-
|
-
|
-
|
-
|
SF2
|
-
|
-
|
-
|
-
|
-
|
0.555
|
-
|
-
|
RE1
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
0.690
|
RE2
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
0.725
|
RE3
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
0.688
|
MH1
|
-
|
-
|
-
|
-
|
-
|
0.660
|
-
|
-
|
MH2
|
-
|
-
|
-
|
-
|
-
|
0.783
|
-
|
-
|
MH3
|
-
|
-
|
-
|
-
|
-
|
-
|
0.710
|
-
|
MH4
|
-
|
-
|
-
|
-
|
-
|
0.731
|
-
|
-
|
MH5
|
-
|
-
|
-
|
-
|
-
|
-
|
0.706
|
-
|
Similarly, the construct validity of the SF-12 was also good in adolescents; the Kaiser-Meyer-Olkin Measure of Sampling Adequacy was 0.732. Eight components were extracted and explained 63.50% of the total variance. Due to the strong correlations between MH and SF and between VT and MH, the domains of SF, VT, and MH were not divided into identified structures in SF-12 (Table 3).
Table 3 Results of factors analysis of SF-12 among adolescents (n = 9,741)
|
1
-PF
|
2
-RP
|
3
-BP
|
4
-GH
|
5
-VT/MH
|
6
-SF\MH
|
7
-RE
|
8
-RE
|
PF02
|
0.808
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
PF04
|
0.829
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
RP2
|
-
|
0.742
|
-
|
-
|
-
|
-
|
-
|
-
|
RP3
|
-
|
0.872
|
-
|
-
|
-
|
-
|
-
|
-
|
BP2
|
-
|
-
|
0.951
|
-
|
-
|
-
|
-
|
-
|
GH1
|
-
|
-
|
-
|
0.949
|
-
|
-
|
-
|
-
|
VT2
|
-
|
-
|
-
|
-
|
0.696
|
-
|
-
|
-
|
SF2
|
-
|
-
|
-
|
-
|
-
|
0.766
|
-
|
-
|
RE2
|
-
|
-
|
-
|
-
|
-
|
-
|
0.865
|
-
|
RE3
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
0.929
|
MH3
|
-
|
-
|
-
|
-
|
0.872
|
-
|
-
|
-
|
MH4
|
-
|
-
|
-
|
-
|
-
|
0.855
|
-
|
-
|
3.2.2 Factor analysis by CFA
We confirmed two conceptual models. Conceptual Model I assumed that PCS was associated with PF, RP, BP, and GH, whereas MCS was associated with VT, SF, RE, and MH. Conceptual Model II assumed that PCS and MCS were associated with most of the 8 domains. Fit indices of the two models revealed that no matter whether SF-36 or SF-12 were used, Conceptual Model I was better than Conceptual Model II in the structures identified (Table 4). The structure of this model has been used widely in studies in China. In our study, we selected the structures of Model I as the two summary scales (PCS and MCS) of SF-36 and SF-12. Standardized parameter estimates for CFA on each path are shown in Figure 1.
Table 4 Two summary scales confirmed by CFA in SF-36 and SF-12 among adolescents
|
SF-36
|
SF-12
|
|
Conceptual Model I
|
Conceptual Model II
|
Conceptual Model I
|
Conceptual Model II
|
|
PCS
|
MCS
|
PCS
|
MCS
|
PCS
|
MCS
|
PCS
|
MCS
|
PF
|
0.363
|
-
|
0.244
|
0.394
|
0.652
|
-
|
0.559
|
0.179
|
RP
|
0.583
|
-
|
0.358
|
0.465
|
0.705
|
-
|
0.778
|
0.144
|
BP
|
0.663
|
-
|
0.656
|
0.194
|
0.572
|
-
|
0.697
|
0.231
|
GH
|
0.737
|
-
|
0.758
|
0.247
|
0.566
|
-
|
0.564
|
0.243
|
VT
|
-
|
0.909
|
0.357
|
0.839
|
-
|
0.334
|
0.158
|
0.579
|
SF
|
-
|
1.119
|
0.470
|
1.041
|
-
|
0.342
|
0.102
|
0.645
|
RE
|
-
|
0.429
|
0.280
|
0.406
|
-
|
0.707
|
0.210
|
0.748
|
MH
|
-
|
0.915
|
0.450
|
0.726
|
-
|
0.932
|
0.098
|
0.892
|
Fit indices for 2-factor confirmatory factor analysis (n = 9741)
|
χ2 statistic (df)
|
6948.000 (551)
|
20771.000 (551)
|
3089.478 (49)
|
5769.000 (49)
|
RMSEA (90% CI)
|
0.061 (0.060, 0.063)
|
0.075 (0.074, 0.075)
|
0.060 (0.058, 0.062)
|
0.080 (0.078, 0.082)
|
CFI
|
0.94
|
0.70
|
0.969
|
0.769
|
3.2.3 Validity and reliability of domains of SF-36 and SF-12
As mentioned above, standardized parameter estimates for CFA in Model I was selected as factor loading. CR and AVE were calculated according to Formulas 4 and 5.
Except for SF, scales composed of multiple items had generally acceptable internal reliability (Table 2). The low internal reliability of SF was probably because of inconsistent understanding of the meaning of the only two items in that sub-scale, which may be biased or difficult to parse for adolescents (“To what extent has your physical health or emotional problems interfered with…” and “How much of the time has your physical health or emotional problems interfered with…”). Moreover, consistent with related studies, the internal reliability of MH in SF-12 was low (0.369). On the other hand, the internal reliability of SF-36 in each domain was better than that of the corresponding domains of SF-12, which was consistent with higher internal reliability due to more items. The domains of PF, RP, BP, GH, and PCS in SF-36 had good construct reliability (CR > 0.6). Except for RP and PCS, the domains in SF-12 were not good at construct reliability.
Criterion validity was calculated based on the item of self-reported health (“In general, would you say your health is….”). It is worth noting that criterion validities of all the domains of the two scales were low, but especially so for PF, RP, and SF, which suggested that the correlation between physical health and self-perceived health was weak. Moreover, in PCS, the criterion validity of SF-12 was much higher than the criterion validity of SF-36. Although the criterion validities of SF-36 were higher in other corresponding dimensions, the differences were small.
PF, RP, BP, and PCS had generally acceptable convergence validity whether in SF-36 or SF-12. Moreover, in these domains, the convergence validities of SF-12 were higher than SF-36, while there was little difference in other domains (Table 5).
Table 5 Validity and reliability of SF-36 and SF-12 in classical test theory
|
SF-36
|
SF-12
|
Difference (SF-36 – SF-12)
|
Reliability
|
Validity
|
Reliability
|
Validity
|
Reliability
|
Validity
|
Cronbach’s Alpha
|
CR
|
Criterion Validity
|
AVE
|
Cronbach’s Alpha
|
CR
|
Criterion Validity
|
AVE
|
Cronbach’s Alpha
|
CR
|
Criterion Validity
|
AVE
|
PF
|
0.841
|
0.858
|
0.085
|
0.380
|
0.564
|
0.540
|
0.055
|
0.371
|
0.277
|
0.318
|
0.030
|
0.009
|
RP
|
0.727
|
0.728
|
0.173
|
0.405
|
0.605
|
0.602
|
0.171
|
0.432
|
0.122
|
0.126
|
0.002
|
-0.027
|
BP
|
0.670
|
0.690
|
0.283
|
0.528
|
-
|
0.222
|
0.227
|
0.222
|
-
|
0.468
|
0.056
|
0.306
|
GH
|
0.766
|
0.781
|
0.670
|
0.420
|
-
|
0.134
|
-
|
0.134
|
-
|
0.647
|
-
|
0.286
|
VT
|
0.569
|
0.577
|
0.309
|
0.302
|
-
|
0.117
|
0.252
|
0.117
|
-
|
0.460
|
0.057
|
0.185
|
SF
|
0.211
|
0.329
|
0.113
|
0.146
|
-
|
0.112
|
0.027
|
0.112
|
-
|
0.217
|
0.086
|
0.034
|
RE
|
0.626
|
0.489
|
0.203
|
0.371
|
0.485
|
0.491
|
0.199
|
0.329
|
0.141
|
-0.002
|
0.004
|
0.042
|
MH
|
0.625
|
0.426
|
0.243
|
0.316
|
0.396
|
0.360
|
0.049
|
0.303
|
0.229
|
0.066
|
0.194
|
0.013
|
PCS
|
0.562
|
0.935
|
0.350
|
0.430
|
0.422
|
0.719
|
0.589
|
0.392
|
0.14
|
0.216
|
-0.239
|
0.038
|
MCS
|
0.609
|
0.418
|
0.476
|
0.299
|
0.429
|
0.690
|
0.300
|
0.399
|
0.18
|
-0.272
|
0.176
|
-0.1
|
Construct Reliability = CR, Average Variance Extracted = AVE
2.3 Psychometric properties in item response theory
The parameter values and information content of the items according to the Samezima grade response model are shown in Table 6. The discriminations of items were between 0.45-2.73, with a large gap. The difficulty of the items ascended from the lowest level to the highest level unidirectionally, which meets the difficulty assumptions estimated by the model. The average amount of information of each item was between 0.07-1.02.
In SF-36, the domains of PF, RP, GH, and RE had acceptable discriminations of items (> 1), but the remaining dimensions were less differentiated, especially BP and SF, probably because for teenagers, there was strong homogeneity between individuals in terms of physical pain and social function. On the other hand, in SF-12, BP, SF, RP, and VT had higher discriminations of items than in SF-36.
With reference to the relevant literature, the amount of information measured on the scales > 25 indicated that the quality of the evaluation items was good; the amount of information < 16 indicates that the evaluation items were poor [28]. Given the number of items on the scale for SF-36, we divided 25 and 16 by 36 to get the average information amount for each item, so as to obtain the determination criterion: the average information amount of excellent items was > 0.69 (25/36), while items < 0.44 (16/36) were judged to be poor. Similarly, for SF-12, the average information amount of the excellent items was > 2.08, while items < 1.33 were judged to be poor. Except for PF05 and PF09, the items of the PF domain in SF-36 were excellent, and the items of the GH domain in SF-36 were excellent too, though the items of BP, VT, SF, RE, and MH were poor. On the other hand, the average amounts of information in SF-12 items were poor.
Table 6 Item discrimination, difficulty, and average amount of information in item response theory
Label
|
SF-36
|
SF-12
|
|
Item Discrimination
(SD)
|
Item
Difficulty
(SD)
|
Average Amount of Information
|
Item Discrimination
(SD)
|
Item Difficulty
(SD)
|
Average Amount of Information
|
Physical Functioning (PF)
|
PF01
|
2.73 (0.01)
|
-1.43 (0.01) 0.21 (0.01)
|
1.02
|
|
|
|
PF02
|
2.73 (0.01)
|
-2.53 (0.05) -1.07 (0.01)
|
0.74
|
2.20 (0.03)
|
-3.13 (0.05)
-1.40 (0.02)
|
0.45
|
PF03
|
2.73 (0.01)
|
-2.55 (0.05) -1.14 (0.01)
|
0.73
|
|
|
|
PF04
|
2.73 (0.01)
|
-2.05 (0.03) -0.87 (0.01)
|
0.90
|
2.20 (0.03)
|
-2.60 (0.04)
-1.17 (0.01)
|
0.54
|
PF05
|
2.73 (0.01)
|
-2.45 (0.04) -1.54 (0.02)
|
0.65
|
|
|
|
PF06
|
2.73 (0.01)
|
-1.88 (0.02) -0.90 (0.01)
|
0.89
|
|
|
|
PF07
|
2.73 (0.01)
|
-1.42 (0.01) -0.25 (0.01)
|
0.95
|
|
|
|
PF08
|
2.73 (0.01)
|
-1.96 (0.03) -0.92 (0.01)
|
0.89
|
|
|
|
PF09
|
2.73 (0.01)
|
-2.51 (0.05) -1.58 (0.02)
|
0.63
|
|
|
|
PF10
|
2.73 (0.01)
|
-1.69 (0.02) -1.20 (0.01)
|
0.74
|
|
|
|
Role Physical (RP)
|
RP1
|
2.17 (0.02)
|
0.77 (0.01)
|
0.43
|
|
|
|
RP2
|
2.17 (0.02)
|
0.53 (0.01)
|
0.43
|
2.32 (0.03)
|
0.52 (0.01)
|
0.46
|
RP3
|
2.17 (0.02)
|
0.65 (0.01)
|
0.43
|
2.32 (0.03)
|
0.63 (0.01)
|
0.46
|
RP4
|
2.17 (0.02)
|
0.52 (0.01)
|
0.43
|
|
|
|
Bodily Pain (BP)
|
BP1
|
0.45 (0.01)
|
-10.26 (0.48) -8.08 (0.31) -4.60 (0.16)
-1.33 (0.06)
1.24 (0.06)
|
0.06
|
|
|
|
BP2
|
0.45 (0.01)
|
0.32 (0.05)
4.65 (0.17)
7.46 (0.28)
10.00 (0.44)
|
0.05
|
1.06 (0.02)
|
0.18 (0.02)
2.40 (0.04)
3.83 (0.07)
5.28 (0.12)
|
0.24
|
General Health (GH)
|
GH1
|
1.76 (0.01)
|
-3.05 (0.05) -1.11 (0.02) -0.13 (0.01)
1.2 (0.01)
|
0.76
|
0.91 (0.01)
|
-1.80 (0.03)
0.21 (0.02)
1.72 (0.03)
4.93 (0.09)
|
0.24
|
GH2
|
1.76 (0.01)
|
-2.33 (0.03) -1.46 (0.02) -0.23 (0.01)
0.54 (0.01)
|
0.73
|
|
|
|
GH3
|
1.76 (0.01)
|
-2.77 (0.03) -2.14 (0.02) -0.89 (0.01)
0.35 (0.01)
|
0.68
|
|
|
|
GH4
|
1.76 (0.01)
|
-2.43 (0.03) -1.55 (0.02) -0.52 (0.01)
0.17 (0.01)
|
0.67
|
|
|
|
GH5
|
1.76 (0.01)
|
-2.75 (0.03) -2.04 (0.02) -0.78 (0.01)
0.57 (0.01)
|
0.71
|
|
|
|
Vitality (VT)
|
VT1
|
0.74 (0.00)
|
-2.43 (0.04) 0.29 (0.03)
1.68 (0.03)
3.33 (0.05)
4.71 (0.07)
|
0.17
|
|
|
|
VT2
|
0.74 (0.00)
|
-2.74 (0.04) -0.40 (0.03) 1.22 (0.03)
2.89 (0.04)
|
0.17
|
0.91 (0.01)
|
-2.36 (0.01)
-0.35 (0.02)
1.07 (0.02)
2.50 (0.04)
3.90 (0.07)
|
0.25
|
VT3
|
0.74 (0.00)
|
-4.73 (0.07) -3.10 (0.04) -1.97 (0.03)
-0.50 (0.03) 2.10 (0.03)
|
0.16
|
|
|
|
VT4
|
0.74 (0.00)
|
-4.26 (0.06) -2.55 (0.04) -1.36 (0.03)
0.15 (0.03)
2.93 (0.04)
|
0.18
|
|
|
|
Social Functioning (SF)
|
SF1
|
0.50 (0.01)
|
-1.68 (0.06) 2.80 (0.08)
5.92 (0.17)
8.66 (0.29)
|
0.07
|
|
|
|
SF2
|
0.50 (0.01)
|
-6.35 (0.18) -4.73 (0.13) -3.48 (0.10)
-2.06 (0.07) -0.01 (0.05)
|
0.07
|
1.07 (0.02)
|
-3.42 (0.06)
-2.58 (0.04)
-1.92 (0.03)
-1.15 (0.02)
-0.02 (0.02)
|
0.28
|
Role Emotional (RE)
|
RE1
|
1.82 (0.02)
|
0.35 (0.01)
|
0.36
|
|
|
|
RE2
|
1.82 (0.02)
|
0.23 (0.01)
|
0.36
|
1.63 (0.02)
|
0.24 (0.01)
|
0.31
|
RE3
|
1.82 (0.02)
|
-0.07 (0.01)
|
0.36
|
1.63 (0.02)
|
-0.07 (0.01)
|
0.32
|
Mental Health (MH)
|
MH1
|
0.78 (0.00)
|
-4.35 (0.07) -2.59 (0.04) -1.53 (0.03)
-0.33 (0.03) 1.50 (0.03)
|
0.19
|
|
|
|
MH2
|
0.78 (0.00)
|
-4.49 (0.07) -2.99 (0.04) -2.12 (0.03)
-1.01 (0.03) 0.82 (0.03)
|
0.18
|
|
|
|
MH3
|
0.78 (0.00)
|
-11 (-)
-2.84 (0.07) -0.42 (0.03)
1.03 (0.03)
2.72 (0.04)
|
0.19
|
0.79 (0.01)
|
-3.94 (0.06)
-2.24 (0.03)
-0.82 (0.03)
0.55 (0.03)
2.91 (0.04)
|
0.20
|
MH4
|
0.78 (0.00)
|
-4.83 (0.08) -3.18 (0.05) -2.15 (0.03)
-0.82 (0.03) 2.04 (0.03)
|
0.18
|
0.79 (0.01)
|
-4.73 (0.07)
-3.18 (0.04)
-2.19 (0.03)
-0.88 (0.03)
2.00 (0.03)
|
0.18
|
MH5
|
0.78 (0.00)
|
-11.18 (-)
-1.35 (0.04) 0.84 (0.03)
2.05 (0.03)
3.48 (0.05)
|
0.18
|
|
|
|
HT
|
0.91 (0.00)
|
-4.84 (0.09) -2.59 (0.04) -0.23 (0.02)
1.25 (0.03)
|
0.23
|
|
|
|
SD = standard deviation