3.1 General situation
Of the 1,468 oil workers, 1,105 were male, with an average age of 43(38,48),363 were women, with an average age of 44(42,47). The prevalence rate of metabolic syndrome in petroleum workers was 40.67%, among which, the rate of central obesity was 56.81%, the rate of abnormal blood glucose was 49.39%, the rate of abnormal triglyceride was 32.90%, the rate of abnormal HDL was 19.28%, and the rate of abnormal blood pressure was 55.99%. As shown in Fig. 1.
3.2 Independent variable screening
Single factor analyses were performed on the basic conditions, diet and lifestyle, occupational exposure factors and laboratory tests of 1,468 oil workers. The results showed statistically significant differences in age, gender, Body Mass Index(BMI), marital status, family history of hypertension, family history of diabetes mellitus, salt, meat intake, smoking status, drinking status, shift work situation, Occupational heat, noise, hemoglobin, uric acid(UA), alanine transaminase(ALT), etc (P < 0.05), are shown in Table 1 to Table 4.
Table 1
Comparison of the basic conditions of oil workers with and without metabolic syndrome
Basic conditions | Category(Unit) | MetS n(%)/M(P25,P75) | χ2/Z | P |
No | Yes |
Age | Year | 43(38,47) | 44(40,49) | -5.79 | < 0.001 |
Gender | Male | 601(69.00) | 504(84.42) | 45.26 | < 0.001 |
| Female | 270(31.00) | 93(15.58) |
BMI | Kg/m2 | 23.9(21.90,25.90) | 26.80(24.90,28.80) | -16.35 | < 0.001 |
Marital status | Unmarried | 56(6.43) | 15(2.51) | 11.82 | 0.003 |
| Married | 782(89.78) | 559(93.63) | | |
| Others | 33(3.79) | 23(3.85) | | |
Education level | Junior high school and below | 133(15.27) | 104(17.42) | 9.07 | 0.011 |
| High school/technical secondary school | 374(42.94) | 290(48.58) | | |
| College and above | 364(41.79) | 203(34.00) | | |
Per capita monthly household income(Yuan) | < 2000 | 619(71.07) | 454(76.05) | 8.05 | 0.018 |
| 2000~ | 212(24.34) | 109(18.26) | | |
| 3000~ | 40(4.59) | 34(5.70) | | |
Family history of hypertension | No | 489(56.14) | 288(48.24) | 8.88 | 0.003 |
| Yes | 382(43.86) | 309(51.76) | | |
Family history of hyperlipidemia | No | 801(91.96) | 538(90.12) | 1.51 | 0.22 |
| Yes | 70(8.04) | 59(9.88) | | |
Family history of diabetes mellitus | No | 725(83.24) | 454(76.05) | 11.58 | 0.001 |
| Yes | 146(16.76) | 143(23.95) | | |
Table 2
Comparison of diet and lifestyle of oil workers with and without metabolic syndrome
Factors | Category | MetS n(%)/M(P25,P75) | χ2 | P |
No | Yes |
Salt | Light | 221(25.37) | 88(14.74) | 26.39 | < 0.001 |
| Moderate | 381(43.74) | 276(46.23) | | |
| Salty | 269(30.88) | 233(39.03) | | |
Meat intake | Never | 23(2.64) | 13(2.18) | 9.38 | 0.025 |
| Occasionally | 198(22.73) | 101(16.92) | | |
| Regularly | 335(38.46) | 232(38.86) | | |
| Every day | 315(36.17) | 251(42.04) | | |
Fruit intake | Never | 37(4.25) | 27(4.52) | 6.59 | 0.086 |
| Occasionally | 278(31.92) | 223(37.35) | | |
| Regularly | 258(29.62) | 146(24.46) | | |
| Every day | 298(34.21) | 201(33.67) | | |
Dairy intake | Never | 127(14.58) | 103(17.25) | 119.81 | < 0.001 |
| Occasionally | 230(26.41) | 297(49.75) | | |
| Regularly | 199(22.85) | 111(18.59) | | |
| Every day | 315(36.17) | 86(14.41) | | |
Carbonated beverage intake | Never | 370(42.48) | 270(45.23) | 10.52 | 0.015 |
| Occasionally | 384(44.09) | 258(43.22) | | |
| Regularly | 79(9.07) | 31(5.19) | | |
| Every day | 38(4.36) | 38(6.37) | | |
Physical exercise | No | 307(35.25) | 259(43.38) | 9.90 | 0.002 |
| Yes | 564(64.75) | 338(56.62) | | |
Smoking status | No smoking | 524(60.16) | 262(43.89) | 39.30 | < 0.001 |
| Quit smoking | 51(5.86) | 61(10.22) | | |
| Smoking | 296(33.98) | 274(45.90) | | |
Drinking status | No drinking | 585(67.16) | 309(51.76) | 37.02 | < 0.001 |
| Alcohol withdrawal | 16(1.84) | 24(4.02) | | |
| Drinking | 270(31.00) | 264(44.22) | | |
Table 3
Comparison of occupational exposure factors of oil workers with and without metabolic syndrome
Factors | Category | MetS n(%)/M(P25,P75) | χ2 | P |
No | Yes |
Shift work situation | Never | 535(61.42) | 254(42.55) | 51.44 | < 0.001 |
| Once | 208(23.88) | 202(33.84) | | |
| Now | 128(14.70) | 141(23.62) | | |
Labour intensity | Mild | 93(10.68) | 44(7.37) | 5.36 | 0.069 |
| Moderate | 434(49.83) | 295(49.41) | | |
| Severe | 344(39.49) | 258(43.22) | | |
Occupational heat | No | 548(62.92) | 266(44.56) | 48.34 | < 0.001 |
| Yes | 323(37.08) | 331(55.44) | | |
Noise | No | 429(49.25) | 206(34.51) | 31.39 | < 0.001 |
| Yes | 442(50.75) | 391(65.49) | | |
Table 4
Comparison of laboratory tests in oil workers with and without metabolic syndrome
Biochemical Indicators | MetS n(%)/M(P25,P75) | Z | P |
No | Yes |
RBC(×1012/L) | 5.01(4.65,5.33) | 5.29(4.99,5.54) | -6.94 | < 0.001 |
MCV(fl) | 88.80(85.10,92.00) | 88.20(84.80,91.80) | -0.85 | 0.397 |
BPC(×1012/L) | 256.00(219.50,290.75) | 251.00(211.00,284.00) | -0.55 | 0.59 |
MPV(fl) | 8.20(7.70,8.80) | 8.20(7.70,8.80) | -0.83 | 0.405 |
Hemoglobin(g/L) | 155(141,165) | 160(151,169) | -6.44 | < 0.001 |
TBIL(mmol/L) | 13.50(10.50,17.70) | 13.45(10.30,17.10) | -0.81 | 0.421 |
UA(mmol/L) | 307(242,373) | 367(304,426) | -11.13 | < 0.001 |
ALT(U/L) | 20.00(14.00,24.00) | 35.00(21.00,45.00) | -17.07 | < 0.001 |
The significant factors of univariate analysis were included in the multivariate nonconditional Logistic regression analysis. The results showed that the risk of metabolic syndrome increased with age, BMI, UA and ALT. People with a family history of diabetes, a strong salt taste, occasional consumption of dairy products, daily consumption of carbonated beverages, smoking, shift work, and exposure to high temperatures are more likely to develop metabolic syndrome. The protective factors of metabolic syndrome include family income of 2000–3000 yuan per capita, daily consumption of dairy products and physical exercise. Combined with the results of relevant literature review, 13 significant factors in the multivariate analysis were taken as independent variables for the establishment of the model, as shown in Table 5–6. |
Table 5
Multivariate nonconditional Logistic regression analysis of influencing factors in oil workers with metabolic syndrome
Factors | B | S.E | Waldχ2 | P | OR | 95%CI |
Age | 0.088 | 0.012 | 55.251 | 0.000 | 1.092 | 1.067, 1.118 |
Per capita monthly household income(2000~) | -0.77 | 0.22 | 12.244 | 0.000 | 0.463 | 0.301, 0.713 |
Per capita monthly household income(3000~) | 0.166 | 0.388 | 0.184 | 0.668 | 1.181 | 0.552, 2.525 |
BMI | 0.273 | 0.026 | 114.091 | 0.000 | 1.313 | 1.249, 1.381 |
Family history of diabetes mellitus | 0.373 | 0.183 | 4.129 | 0.042 | 1.452 | 1.013, 2.080 |
Salt(Moderate) | 0.86 | 0.206 | 17.429 | 0.000 | 2.362 | 1.578, 3.536 |
Salt(Salty) | 0.555 | 0.214 | 6.759 | 0.009 | 1.742 | 1.146, 2.648 |
Dairy intake(Occasionally) | 0.676 | 0.216 | 9.771 | 0.002 | 1.966 | 1.287, 3.003 |
Dairy intake(Every day) | -1.149 | 0.261 | 19.317 | 0.000 | 0.317 | 0.190, 0.529 |
Carbonated beverage intake(Every day) | 1.102 | 0.365 | 9.148 | 0.002 | 3.012 | 1.474, 6.153 |
Physical exercise | -0.398 | 0.152 | 6.86 | 0.009 | 0.672 | 0.499, 0.905 |
Smoking status(Smoking) | 0.431 | 0.181 | 5.675 | 0.017 | 1.539 | 1.079, 2.194 |
Shift work situation(Once ) | 0.974 | 0.172 | 32.184 | 0.000 | 2.648 | 1.892, 3.707 |
Shift work situation(Now ) | 1.509 | 0.237 | 40.489 | 0.000 | 4.522 | 2.841, 7.198 |
Occupational heat | 0.656 | 0.224 | 8.548 | 0.003 | 1.926 | 1.241, 2.989 |
UA | 0.004 | 0.001 | 27.244 | 0.000 | 1.004 | 1.003, 1.006 |
ALT | 0.029 | 0.005 | 40.946 | 0.000 | 1.030 | 1.020, 1.039 |
Table 6
Assignment of influencing factor variables
Variable name | Variable meaning | Assignment method |
Y | MetS | 0 = No,1 = Yes |
X1 | Age | Continuous variable (year) |
X2 | Per capita monthly household income | 1 = < 2000,2 = 2000–3000,3 = ≥ 3000 |
X3 | BMI | Continuous variable(Kg/m2) |
X4 | Family history of diabetes mellitus | 1 = No,2 = Yes |
X5 | Salt | 1 = Light,2 = Moderate,3 = Salty |
X6 | Dairy intake | 1 = Never,2 = Occasionally ,3 = Regularly,4 = Every day |
X7 | Carbonated beverage intake | 1 = Never,2 = Occasionally ,3 = Regularly,4 = Every day |
X8 | Physical exercise | 1 = No,2 = Yes |
X9 | Smoking status | 1 = No smoking,2 = Quit smoking,3 = Smoking |
X10 | Shift work situation | 1 = Never,2 = Once,3 = Now |
X11 | Occupational heat | 1 = No,2 = Yes |
X12 | UA | Continuous variable(mmol/L) |
X13 | ALT | Continuous variable(U/L) |
3.3 Collinearity diagnosis
The diagnosis of collinearity was made by using the binary correlation coefficient r, tolerance and variance inflation factor(VIF).The results showed that the correlation coefficient |r| was 0.31 at most and |r|<0.5, as shown in Table 7.The minimum tolerance was 0.844, much higher than 0.1, and the maximum variance inflation factor was 1.185, less than 5, as shown in Table 8.The above results indicate that there is no serious multicollinearity among the screened independent variables.
Table 7
coefficient of correlation
Variable name | X1 | X2 | X3 | X4 | X5 | X6 | X7 | X8 | X9 | X10 | X11 | X12 | X13 |
X1 | 1 | | | | | | | | | | | | |
X2 | 0.062* | 1 | | | | | | | | | | | |
X3 | -0.008 | -0.110** | 1 | | | | | | | | | | |
X4 | 0.068** | 0.022 | 0.014 | 1 | | | | | | | | | |
X5 | -0.021 | -0.015 | 0.141** | -0.008 | 1 | | | | | | | | |
X6 | -0.063* | -0.004 | -0.124** | -0.009 | -0.070** | 1 | | | | | | | |
X7 | -0.147** | -0.001 | 0.010 | -0.010 | 0.065* | 0.288** | 1 | | | | | | |
X8 | -0.019 | -0.016 | -0.043 | -0.034 | -0.027 | -0.034 | -0.045 | 1 | | | | | |
X9 | 0.012 | -0.137** | 0.108** | -0.012 | 0.165** | -0.093** | 0.034 | -0.130** | 1 | | | | |
X10 | 0.018 | 0.310** | 0.081** | 0.054* | 0.004 | -0.052* | 0.011 | 0.039 | -0.012 | 1 | | | |
X11 | 0.028 | -0.044 | 0.091** | 0.028 | 0.000 | -0.047 | 0.005 | 0.040 | 0.028 | 0.047 | 1 | | |
X12 | -0.055* | -0.021 | 0.169** | 0.012 | 0.043 | -0.092** | 0.035 | -0.035 | 0.109** | -0.041 | 0.066* | 1 | |
X13 | 0.084** | 0.015 | 0.168** | 0.058* | 0.026 | -0.110** | -0.042 | -0.078** | 0.049 | 0.006 | 0.090** | 0.226** | 1 |
* P < 0.05 ** P < 0.01 |
Table 8
Results of tolerance and variance inflation factor
Model | Collinearity statistics |
Tolerance | VIF |
(constant) | - | - |
Age | 0.966 | 1.036 |
Per capita monthly household income | 0.881 | 1.135 |
BMI | 0.897 | 1.115 |
Family history of diabetes mellitus | 0.985 | 1.015 |
Salt | 0.952 | 1.051 |
Dairy intake | 0.844 | 1.185 |
Carbonated beverage intake | 0.872 | 1.147 |
Physical exercise | 0.963 | 1.038 |
Smoking status | 0.922 | 1.085 |
Shift work situation | 0.897 | 1.115 |
Occupational heat | 0.975 | 1.026 |
UA | 0.907 | 1.102 |
ALT | 0.905 | 1.105 |
3.4 Logistic regression model
Logistic regression model in the training set, validation set and test set accuracy of 83.45%, 80.60% and 76.72% respectively, the sensitivity of 78.47%, 69.35% and 70.00% respectively, the specificity of 86.89%, 88.57% and 81.08% ,respectively, F1 score was 0.79, 0.75, 0.70,Youden’s index was 0.65, 0.58, 0.51, positive likelihood ratio was 5.98, 6.07, 3.70, and negative likelihood ratio was 0.25, 0.35, 0.37, Kappa value was 0.66, 0.59, and 0.51, respectively, and the area under the ROC curve (AUC) was 0.894, 0.875, and 0797, respectively. As shown in Table 9–10.
3.5 Random forest model
Random forest model in the training set, validation set and test set accuracy of 94.21%, 81.27%, 80.66% respectively, the sensitivity of 94.62%, 77.42% and 77.50% respectively, the specificity of 93.93%, 84.00% and 82.70%, respectively, F1 score was 0.93, 0.77, 0.76, Youden’s index was 0.89, 0.61, 0.60, positive likelihood ratio was 15.60, 4.84, 4.48, and negative likelihood ratio was 0.06, 0.27, 0.27, Kappa value was 0.88, 0.61, 0.60, and AUC values was 0.987, 0.878, and 0.861, respectively. As shown in Table 9–10.
3.6 Convolutional neural network model
Convolution neural network (CNN) model in the training set, validation set and test set accuracy of 86.34%, 82.61%, 78.69% respectively, the sensitivity of 81.30%, 73.39% and 68.33% respectively, the specificity of 89.82%, 89.14% and 85.41%, respectively, F1 score was 0.83, 0.78, 0.71, Youden’s index was 0.71, 0.63, 0.54, positive likelihood ratio was 7.99, 6.76, 4.68, and negative likelihood ratio was 0.21, 0.30, 0.37, Kappa value was 0.72, 0.64, 0.55, and AUC values was 0.935, 0.872, and 0.855, respectively. As shown in Table 9–10.
Table 9
Sample classification results of Logistic regression model, random forest model, convolutional neural network model training set, verification set and test set[n (%)]
Model | Data set | Model predictive value | Actual value | |
Yes | No | Total |
Logistic regression model | Training set | Yes | 277(78.47) | 67(13.11) | 344 |
No | 76(21.53) | 444(86.89) | 520 |
| Total | 353 | 511 | 864 |
Validation set | Yes | 86(69.35) | 20(11.43) | 106 |
No | 38(30.65) | 155(88.57) | 193 |
| Total | 124 | 175 | 299 |
Test set | Yes | 84(70.00) | 35(18.92) | 119 |
| No | 36(30.00) | 150(81.08) | 186 |
| Total | 120 | 185 | 305 |
Random forest model | Training set | Yes | 334(94.62) | 31(6.07) | 365 |
No | 19(5.38) | 480(93.93) | 499 |
| Total | 353 | 511 | 864 |
Validation set | Yes | 96(77.42) | 28(16.00) | 124 |
No | 28(22.58) | 147(84.00) | 175 |
| Total | 124 | 175 | 299 |
Test set | Yes | 93(77.50) | 32(17.30) | 125 |
| No | 27(22.50) | 153(82.70) | 180 |
| Total | 120 | 185 | 305 |
CNN | Training set | Yes | 287(81.30) | 52(10.18) | 339 |
No | 66(18.70) | 459(89.82) | 525 |
| Total | 353 | 511 | 864 |
Validation set | Yes | 91(73.39) | 19(10.86) | 110 |
No | 33(26.61) | 156(89.14) | 189 |
| Total | 124 | 175 | 299 |
Test set | Yes | 82(68.33) | 27(14.59) | 109 |
| No | 38(31.67) | 158(85.41) | 196 |
| Total | 120 | 185 | 305 |
Table 10
Comparison of predictive performance of the three models in training set, validation set and test set
Evaluation index | Training set | Validation set | Test set |
Logistic regression model | Random forest model | CNN | Logistic regression model | Random forest model | CNN | Logistic regression model | Random forest model | CNN |
Accuracy rate (%) | 83.45 | 94.21 | 86.34 | 80.60 | 81.27 | 82.61 | 76.72 | 80.66 | 78.69 |
Sensitivity(%) | 78.47 | 94.62 | 81.30 | 69.35 | 77.42 | 73.39 | 70.00 | 77.50 | 68.33 |
Specificity(%) | 86.89 | 93.93 | 89.82 | 88.57 | 84.00 | 89.14 | 81.08 | 82.70 | 85.41 |
F1 Score | 0.79 | 0.93 | 0.83 | 0.75 | 0.77 | 0.78 | 0.70 | 0.76 | 0.71 |
Youden’s index | 0.65 | 0.89 | 0.71 | 0.58 | 0.61 | 0.63 | 0.51 | 0.60 | 0.54 |
Positive likelihood ratio | 5.98 | 15.60 | 7.99 | 6.07 | 4.84 | 6.76 | 3.70 | 4.48 | 4.68 |
Negative likelihood ratio | 0.25 | 0.06 | 0.21 | 0.35 | 0.27 | 0.30 | 0.37 | 0.27 | 0.37 |
Kappa value | 0.66 | 0.88 | 0.72 | 0.59 | 0.61 | 0.64 | 0.51 | 0.60 | 0.55 |
Positive predictive value(%) | 80.52 | 91.51 | 84.66 | 81.13 | 77.42 | 82.73 | 70.59 | 74.40 | 75.23 |
Negative predictive value(%) | 85.38 | 96.19 | 87.43 | 80.31 | 84.00 | 82.54 | 80.65 | 85.00 | 80.61 |
AUC | 0.894 | 0.987 | 0.935 | 0.875 | 0.878 | 0.872 | 0.797 | 0.861 | 0.855 |
AUC 95%CI | | | | | | | | | |
lower | 0.871 | 0.977 | 0.917 | 0.833 | 0.835 | 0.829 | 0.748 | 0.818 | 0.810 |
upper | 0.913 | 0.994 | 0.951 | 0.911 | 0.913 | 0.908 | 0.841 | 0.898 | 0.892 |
3.7 Comparison of predictive performance of metabolic syndrome risk prediction models
In the training set, the accuracy, sensitivity, specificity, F1 score, Youden’s index, positive likelihood ratio, Kappa index, positive predictive value and negative predictive value of the random forest model were all higher than those of the Logistic regression model and the convolutional neural network model. The area under ROC curve (AUC) of the random forest model was larger than that of the Logistic regression model and the convolutional neural network model, and the difference was statistically significant (P < 0.001). See Table 11 and Fig. 2.
In the validation set, The accuracy, sensitivity, specificity, F1 score and other indexes of the three models were all higher. In order to further reflect the relationship between sensitivity and specificity, it is necessary to judge whether the models are overfitting and have good robustness. By plotting ROC curve and calculating AUC value, it was found that the three curves of Logistic regression model, random forest model and convolutional neural network model were basically identical, with no statistically significant difference (P > 0.05). The area under the curve (AUC) was 0.875, 0.878 and 0.872 respectively.See Table 10,11 and Fig. 3.
In the test set, the accuracy, sensitivity, F1 score, Youden’s index, Kappa index and negative predictive value of the random forest model were the highest, while the specificity, positive likelihood ratio and positive predictive value of the convolutional neural network model were the highest, but the sensitivity and negative predictive value were the lowest. The area under ROC curve (AUC) of the random forest model was larger than that of the Logistic regression model and the convolutional neural network model. Comparing the AUC of the three models in pairs, the difference between Logistic regression model and random forest model was statistically significant (Z = 2.806, P = 0.005), the difference between Logistic regression model and convolutional neural network model was statistically significant (Z = 2.352, P = 0.019), and the difference between random forest model and convolutional neural network model was not statistically significant (Z = 0.320, P = 0.749). See Table 11 and Fig. 4.
Table 11
Comparison of training set, validation set and test set AUC of three models
Model | AUC difference | SE | 95%CI | Z | P |
lower | upper |
Training set | | | | | | |
Logistic regression VS Random forest | 0.094 | 0.010 | 0.074 | 0.113 | 9.419 | ༜0.001 |
Logistic regression VS CNN | 0.042 | 0.008 | 0.027 | 0.057 | 5.371 | ༜0.001 |
Random forest VS CNN | 0.052 | 0.007 | 0.038 | 0.066 | 7.062 | ༜0.001 |
Validation set | | | | | | |
Logistic regression VS Random forest | 0.002 | 0.018 | -0.034 | 0.038 | 0.125 | 0.900 |
Logistic regression VS CNN | 0.003 | 0.014 | -0.024 | 0.031 | 0.248 | 0.804 |
Random forest VS CNN | 0.006 | 0.016 | -0.026 | 0.037 | 0.361 | 0.718 |
Test set | | | | | | |
Logistic regression VS Random forest | 0.064 | 0.023 | 0.019 | 0.109 | 2.806 | 0.005 |
Logistic regression VS CNN | 0.058 | 0.025 | 0.010 | 0.106 | 2.352 | 0.019 |
Random forest VS CNN | 0.007 | 0.020 | -0.034 | 0.047 | 0.320 | 0.749 |
models in the training set models in the validation set
models in the test set