This retrospective study presented a prediction model for VTE in COVID-19 patients and the demographics, clinical parameters, and incidence rate of VTE in COVID inpatients. The incidence rate of VTE could have been underreported due to limited radiological testing to reduce staff exposure to COVID-19 infection in the first wave[18]. Our study reported an incidence rate of 6.68%; similar to other studies (Table 4B). We found that patients who developed new-onset VTE had more extended hospital LOS (12.2 days vs. 8.8 days, p <0.001) and ICU LOS (3.8 days vs. 1.9 days, p<0.001) compared to COVID-19 patients, who did not have VTE. This is a robust prediction model for VTE in hospitalized patients with COVID-19 using a large multicenter database (N=3531). We included 85 variables from a broad spectrum of parameters, demographics, vitals, comorbidities, hospital course (oxygen requirement, ICU admission, hospital and ICU LOS). Electrolytes, renal function, blood pressures, hepatic enzymes, and inflammatory markers were indicators of VTE risks; however, further studies on whether a cutoff value could be applied to inflammatory markers for good sensitivity and specificity for VTE in COVID-19 infection would be beneficial. Physicians can assess patients’ presenting signs, renal and hepatic functions and potentially identify patients at high risk of VTE and work on the reversible risk factors to reduce patients’ risks of developing VTE during hospitalization. It is worth mentioning that we used presenting data which was the initial data of patients admitted to the hospital. Models such as multiple logistic regression models that do not handle missing data have smaller sample sizes that can potentially affect performance. Our MLR model has an R-Square of 0.2569, p <0.0001. The R^2 value of MLR and LR is low, which is consistent with the fact that we did not include laboratory values that are missing and did not impute those values. The decision tree has a lower R^2 value (0.19 in training and 0.11 in the testing set). However, R^2 value is most likely not appropriate for a tree-based model. Nevertheless, the random forest model has a low misclassification rate (6.87% in the training set, 8.4% in the testing set). Overall, we have low R^2 values. The decision tree may have worse accuracy than a random forest, but the tree structure is easy to understand and interpret. By looking at the splitting nodes, key factors can be identified, and predictions can be made. On the other hand, random forests are an ensemble of decision trees, and the predictions are based on an average of all trees, which is a "black box" that can't be directly described. One of the possibilities is that our study cohort has an inherently higher amount of unexplainable variability; this could be better addressed in future prospective studies.
Of 3532 records, only 1282 patients were included in the MLR model due to the missing values in the other patients. Similarly, in the LR, only 1282 records were used, which was less than 50% of the records. Although IL-6, LDH, procalcitonin, ferritin, and fibrinogen were excluded in the model building due to significant numbers of missing values, we found no significant difference in these values between non-VTE and VTE groups.
Our model can provide clinical risk stratification of VTE in COVID-19 patients and help individualize thromboprophylaxis, which supports the current consensus of customized and risk-adapted management for thromboprophylaxis in international guidelines[19]. Four papers studied VTE in COVID-19 patients using existing prediction models (Table 4). Kampouri et al. combined Wells score and D-dimer value to predict VTE with a PPV of 18.2%, an NPV 98.5%, and accuracy of 0.905 [20]. A Dutch study reported a 41.7% incidence rate of VTE in COVID-19 patients and built a linear regression model consisting of D-dimer >9 μg/ml and CRP >280 mg/ml, and the authors report a predicted probability of 92%[21]. Another study by Taplin et al. modified Caprini score using a cutoff value of 12, which is also based on the D-dimer score and showed a sensitivity of 73% and specificity of 84% in predicting VTE[22]. Unlike our study, these studies had a much smaller sample size and number of events and included risk factors not analyzed in the original prediction model studies. Notably, the performance of the model depends on the event prevalence. Among all studies, the Dutch study had the highest predictive probability in the critically ill population due to a higher incidence of VTE. The prevalence of PE was higher across studies with higher mean D-dimer values (prevalence ratio 1.3 per 1000 ng/mL increase; 95% CI: 1.11, 1.50, p=0.002) and higher percentage of ICU patients (1.02 per 1% increase; 95% CI: 1.01, 1.03, p<0.001). In addition, prevalence of DVT was higher across studies with higher mean D-dimer values (1.04 per 1000 ng/mL increase; 95% CI: 1.01, 1.07, p=0.022).
After systemic review, we included six other studies that had reported incidence rate of VTE in COVID-19 patients without prediction models that were not discussed in the original manuscript (Table 4B). Our study showed an incidence rate of 6.68% of VTE in COVID-19 patients which is consistent with three of the studies, whereas Freund et al. reported a rate of 15% and two studies showed a lower incidence rate of 2-3%. Critically ill COVID-19 patients who were admitted to ICU had a higher incidence rate of VTE. Among which, only two studies identified risk factors for COVID-19 patients using the MLR model, including advanced age, increased creatinine level, history of cardiovascular disease, ICU admission, elevated D-dimer, male gender, heart rate, clinical signs of DVT, and recent immobilization. Unlike other studies, we did not impute missing values to better building a model that predict VTE individually.
Our study analyzed D-dimer, lactate and inflammatory markers including CRP, ferritin, and LDH that are of great interest in clinical settings and have been routinely ordered for COVID-19 patients. The utilization of laboratory values varies, many physicians trend these markers to predict the trajectory of COVID-19 patients. However, limited studies included them for VTE analysis. Our result showed no significant difference in presenting CRP, IL-6, and LDH level among VTE and non-VTE groups (Table 1), yet the maximum value of D-dimer, CRP and LDH were significantly higher in VTE-groups. This may suggest that D-dimer, CRP and LDH could be utilized clinically for monitoring. However, further studies on the threshold, sensitivity and specificity of certain markers are needed.
A meta-analysis of 47 studies showed the prevalence of PE was higher across studies with higher mean D-dimer values (prevalence ratio 1.3 per 1000 ng/mL increase; 95% CI: 1.11, 1.50, p=0.002) and higher percentage of ICU patients (1.02 per 1% increase; 95% CI: 1.01, 1.03, p<0.001). Current guideline by American Society of Hematology (ASH) suggests using prophylactic-intensity over intermediate-intensity anticoagulation for patients with COVID-19 related critical illness who do not have suspected or confirmed VTE. Furthermore, ASH suggests that an individualized assessment of the patient’s risk of thrombosis and bleeding is important when deciding on anticoagulation intensity. Our study provides physicians a model that could aid in risk stratification, as VTE has been well-known to be a common COVID-19 complication.
We observed that 11.5% of patients (N=302) who did not have VTE were given a therapeutic dosage of AC, whereas 74.3% of patients (N=162) who had VTE were placed on only prophylactic AC. It described an unmet need for risk stratification for COVID-19 patients. Vaughn et al. reported 16.2% of patients who had suspected VTE were given therapeutic AC and increased treatment-dose anticoagulation for VTE prophylaxis[23]. The INSPIRATION trial did not show the difference in routine empirical use of intermediate-dose prophylactic AC in ICU patients in the primary composite outcome including acute VTE, arterial thrombosis, the use of extracorporeal membrane oxygenation, and all-cause mortality (absolute risk difference, 1.5% [95% CI: −6.6, 9.8]; OR: 1.06 [95% CI: 0.76, 1.48]; P = 0.70)[16]. The Anti-Thrombotic Therapy to Ameliorate Complications of COVID-19 (ATTACK) randomized multicenter adaptive design trials has shown therapeutic anticoagulation to be beneficial in moderately severe patients, whereas it was futile with ICU patients requiring organ failure support[24]. However, it is unclear why after diagnosis of VTE, many patients only received prophylactic anticoagulants.
Our study has both strengths and limitations. The strengths include the large sample size, multi-institute-based data, and availability of broad outcomes events data. Moreover, our VTE prediction model in COVID-19 patients can most benefit clinical practice to aid clinical management in settings where definitive diagnosis of VTE is hard to obtain, for example, for critically ill patients on mechanical ventilation who are unable to undergo CTA chest study. Since this is retrospective study utilizing a large database from SMCRD, we were unable to obtain the timing of diagnosis of acute VTE in our cohort, which would have allowed exploration of the temporal relationship between VTE and potential risk factors highlighting an important limitation of our study. Furthermore, although our models showed good predictive capacity, the lower incidence of VTE in the population study created significant hurdles. The random forest model’s PPV is 26%, NPV is 97%, and the F1 score is 0.36. Future studies on a composite outcome including both venous and arterial events could provide a bigger population. Also, the random forest model is not a panelized method and has the risk of overfitting. Lastly, our model needs to be validated externally.