Data Source
To explore a potential connection between thrombosis, sex, and mortality risk, we analyzed hospitalization data from over 200 geographically dispersed hospital systems. From this data we selected patients who were hospitalized for COVID-19 between March 3, 2020, and June 11, 2021. There were no exclusions for other comorbidities or underlying diseases. We limited our analysis to those patients who survived and were discharged from the hospital, those discharged to hospice care, those who died in the hospital, and those who died after being discharged to home. Patients who died soon after discharge to home or who were discharged to hospice care were treated as deceased cases due to COVID-19. COVID-19 readmission cases were excluded from our study.
The data for this study was sourced from EMR data and post-EMR coding. These EMRs were processed with a natural language processing (NLP) engine that produces a homogenized set of codified and non-codified information, including lab results, medications, symptoms, and various observational extracts of text from the EMR. The at-scale extraction of this detailed information allows insight into the clinical manifestations of the COVID-19 population. We examined the subsets of patients who survived COVID-19 compared to those who expired. For each of the groups, we compared the male and female incidence of receiving a thrombosis diagnosis code while hospitalized for COVID-19.
Outcomes and Study Variables
The primary outcome of interest was mortality from COVID-19. The exposure variable was biological sex, and the key variable of interest was thrombosis. We classified thrombotic diagnosis codes into four major conditions: (1) myocardial infarction (MI), (2) DVT/PE, (3) stroke, and (4) peripheral artery occlusion (PAO). A full list of ICD-10 codes used to identify thrombosis can be found in Supplementary Table e1.
Because thrombotic diagnoses are underreported in inpatients with COVID-19,14 we also examined the peak D-dimer value of the subset of hospitalized patients in whom this was measured. A D-dimer level ≥ 3.0 μg/mL was used as a surrogate for underlying thrombosis when a thrombotic diagnosis code was absent.
Statistical Methods
To determine whether the elevated thrombotic risk in males is associated with higher COVID-19 mortality, we used mediation analysis,15 where elevated thrombotic risk was the mediator between the outcome (mortality) and the exposure variable (gender). Mediation analysis consists of three steps, which are illustrated in Figure 1. First, we needed to confirm that male patients had higher mortality rate than female patients in our study population. Second, we needed to confirm that males had a higher incidence of a thrombosis diagnosis relative to females. Third, we needed to demonstrate that both males and females with thrombosis diagnoses had a higher mortality rate than those who did not. If all three of these conditions are supported by statistically significant differences between groups, we can combine regression models from the first and third hypothesis to estimate the proportion of sex effect on mortality that can be explained by the elevated thrombotic risk. Besides estimating this proportion, we also examined the effect of a thrombosis diagnosis together with evidence of a D-dimer level ≥3.0 μg/mL, recognizing that such levels are also likely indicative of underlying thrombosis.
Logistic regression models were used in each of the three steps. The response variables and covariates of interest are illustrated in Figure 1. We used the Wald test for regression coefficients in logistic regression models to assess statistical significance in all steps. All hypothesis tests were one-sided at the 0.05 significance level. We report 95% confidence intervals and one-tailed p-values. We choose the more powerful one-sided test because all three associations described in Figure 1/Table 2 (male and higher risk of mortality, male and higher risk of thrombosis, and thrombosis and higher risk of mortality) are well documented in literature as we discussed in the Introduction section, and we only wanted to test if these hypotheses were true with respect to directions supported by literatures. All regression models were adjusted for age (binary variable of age > 65 or not) and co-morbidities derived from the Charlson Index.16 These co-morbidities can be found in Supplementary Material Table e2. Note that two comorbidities – myocardial infarction and cerebrovascular diseases are not adjusted in our regression models since they overlap with our thrombosis definitions shown in Supplementary Material Table e1.
The proportion of excess male mortality explained by the elevated thrombotic risk is defined by the ratio of the mediation effect to the total effect (mediation + direct effect) of gender on mortality. This proportion was estimated using the methods explained in this paper,17 which was implemented by an R package “mediation”.18 Our R codes for mediation analysis using this package can be found in Supplementary Materials III. Because thrombotic diagnoses are underreported in inpatients as discussed above, we estimated the proportion based on the presence of a thrombosis diagnoses only as well as a definition of elevated thrombotic risk that included D-dimer values > 3.0 µg/mL19 or presence of a thrombotic diagnosis. We also reported the proportion explained by thrombosis diagnoses only on the subset whose D-dimer was measured to make sure the proportion explained by elevated thrombotic risk was not subject to selection bias. The upper bound of the D-dimer normal range (0.5 µg/mL) was used to normalize D-dimer values.
Patient and Public Involvement
As this study represents an exploratory secondary analysis of existing, de-identified patient data, no IRB review and approval were required, and no patients or members of the public were involved in the design, conduct, reporting, or dissemination plans of our research.