By comparing the characteristic factors between the two groups for different thrombotic outcomes in patients with lower extremity fractures, we found sex, age, ABO blood type, hypoalbuminemia, atrial fibrillation, total cholesterol, T triacylglycerol, albumin, white blood cell ratio, calcium ion, platelets, red blood cells, fibrinogen, D dimer, international normalized ratio, C-reactive protein and fracture types were significantly different between the two groups (P < 0.05). In the XGBoost model, the importance ranking (from high to low) of 10 factors are age, fracture type, platelets, free fatty acids, total cholesterol, D dimer, white blood cell ratio, C-reactive protein, white blood cells and albumin. Among the prediction models built by five machine learning algorithms, although the LR model has the highest AUC value (0.740), its accuracy is not as good as that of the XGBoost model. By comparing multiple indicators such as AUC, accuracy, sensitivity, specificity, and F1 score, the XGBoost model have the best performance.
DVT is a common complication in patients with lower extremity fractures and may lead to fatal pulmonary embolism. The incidence of DVT in this study was 4.67%. Surgery is an independent risk factor for DVT in previous studies[29]. This is mainly because, during the operation, the long-term inability of the body to actively move and the trauma caused by the operation will lead to slow blood flow and damage to the blood vessel wall resulting in th formation of venous thrombosis. In addition, advanced age is also an independent risk factor for DVT[30]. For the elderly, not only the aging of vascular endothelial cells is accelerated, but also the vascular tension, permeability and the regulation ability of the vascular wall are reduced, making the vascular homeostasis difficult to maintain and more prone to thrombosis[31, 32].
Blood hypercoagulability is one of the mechanisms of thrombus formation, so coagulation function tests are closely related to thrombus formation. Previous studies have found a correlation between fibrinogen, INR, D-dimer and thrombus[33, 34], which is consistent with our research results. High blood lipids and cholesterol will keep the body in a low-inflammatory state for a long time and increase blood viscosity[35]. Serum albumin, which has the function of inhibiting platelet aggregation and anticoagulation, has been found to be closely related to nutritional status and inflammatory response[36, 37]. Studies found that serum albumin level was a predictor of left atrial thrombus in elderly patients with nonvalvular atrial fibrillation[38, 39]. However, no research has found its correlation with DVT so far, and the relationship between them needs further verification by higher-quality research.
Blood type is an important characteristic of DVT in patients with lower extremity fractures. Li et al.[40] found that patients with blood type B had a higher risk of DVT. Differently, Haddad et al. claimed that O blood type does not increase the chance of DVT. However, a higher-quality systematic review suggests that ABO blood type is closely related to the occurrence of DVT in cancer patients. The reason may be that ABO blood type is closely related to vWF level, which act as a protective carrier of coagulation factor VIII and can promote the formation of thrombus[41].
At present, more and more evidence has shown that there is a close relationship between inflammatory immune response and DVT[42, 43]. In addition, the C-reactive protein, platelets and neutrophils play an important role in the development of DVT[44]. However, many literatures have shown that severe trauma, stress sepsis and blood loss can all lead to systemic inflammatory response syndrome, which increases the reactivity of neutrophils and platelets in the body[45–47]. Neutrophils provide the original stimulation of DVT and recruit other cells in the coagulation process. Platelets produce circulating microparticles that accelerate blood coagulation[48]. Over the past few years, many scholars have investigated the association between inflammation and venous thrombosis. Kyril et al.[49] analyzed all thrombosis patients in their Thrombosis Research Center and found that a high platelet-to-lymphocyte ratio would increase the risk of inducing DVT by 3 times.
Overall, we analyzed various possible predictors of DVT and compared the predictive performance of various machine-learning methods. AUC is a performance measure of the ROC curve: the higher the AUC, the higher the predictive power of the model. Although the AUC of the LR model is the largest, its accuracy is not as good as that of the XGBoost model. The XGBoost model performs the best in terms of AUC, accuracy, sensitivity, specificity and F1 score. The XGBoost model is an optimized distributed gradient boosting library designed to be efficient, flexible and portable. The XGBoost model uses the second-order Taylor expansion of the loss function as its replacement function and solves its minimization to determine the best segmentation point and leaf node output value of the regression tree. In addition, the XGBoost model fully considers the regularization problem by introducing the number of subtrees and the value of subtree leaf nodes in the loss function. In terms of efficiency, the XGBoost model has greatly improved the modeling efficiency by using the unique approximate regression tree bifurcation point estimation and sub-node parallelization, coupled with the characteristics of second-order convergence.
The occurrence and development of DVT in the actual clinical environment is the result of the long-term interaction of multiple risk factors. The progress of disease is complex as well. The impact of predictors on the occurrence of DVT is judged solely by whether there is a statistical difference, which is different from the clinical situation and the pathophysiology of DVT. The machine learning algorithm can present the degree of influence of all predictors and the degree of influence of each predictor on the occurrence of model outcome events can be reflected by ranking the importance[50, 51]. Therefore, the prediction model based on machine learning can fully reflect the impact of all predictors on DVT, and its results are closer to the actual clinical diagnosis and treatment.
As shown in Table 1, we included variables possibly associated with DVT in the current algorithm to obtain accurate predictions. However, some limitations need to be considered when interpreting the findings of this study. First, there is a lack of detailed description of the variables regarding medication and lifestyle during hospitalization, such as antiplatelet, anticoagulant, antihypertensive, hypoglycemic, chemotherapy, antibiotics, diet and physical activity, which may lead to some confounding effects. Second, we are unable to obtain symptoms and signs such as pain, cramping, heaviness, pruritus and varicose veins at the site of DVT, which can hinder a more comprehensive analysis. Although vascular ultrasonography has gradually replaced venous angiography and is widely used, this method is not the "gold standard". As a result, the possibility of a false positive or false negative in DVT diagnoses can not be ruled out in this study. In addition, it was difficult for us to include all variables that may affect the statistical results. The relationship between variables and DVT is correlation, not causation. Therefore, the findings should be interpreted in the context of the clinical situation. Finally, no external datasets were used for validation in this study and all models were validated only with cross-validation and a test set during training.