Machine Learning to Assess the Prognostic Utility of Radiomic Features for In-hospital COVID-19 Mortality

doi:10.21203/rs.3.rs-2118067/v1

Download PDF

Article

Machine Learning to Assess the Prognostic Utility of Radiomic Features for In-hospital COVID-19 Mortality

https://doi.org/10.21203/rs.3.rs-2118067/v1

This work is licensed under a CC BY 4.0 License

Journal Publication

published 05 May, 2023

Read the published version in Scientific Reports →

You are reading this latest preprint version

As portable chest X-rays are an efficient means of triaging emergent cases, their increased use has raised the question as to whether imaging carries additional prognostic utility for survival among patients with COVID-19. This study assessed the importance of known risk factors on in-hospital mortality and to investigate the predictive utility of radiomic texture features using various machine learning approaches. We detected incremental improvements in survival prognostication utilizing texture features derived from emergent chest X-rays, particularly among older patients or those with higher comorbidity burden. Important features included age, oxygen saturation, blood pressure, and certain comorbid conditions, as well as image features related to the intensity and variability of the pixel distribution. Thus, widely available chest X-rays, in conjunction with clinical information, may be predictive of survival outcomes of patients with COVID-19, especially older, sicker patients, and can aid in disease management by providing additional information.

COVID-19

Machine Learning

Chest X-Ray

Mortality

COVID-19 has resulted in more than eighty-five million cases and over one million deaths in the United States (Centers for Disease Control and Prevention, 2022a). With ongoing concerns of future resurgences (Kupferschmidt and Wadman, 2021; Karim and Karim, 2021), and in an effort to improve the treatment and management of infected patients, principled methods for risk stratification and survival prognostication are critically important (Michelen et al, 2021; McCue et al, 2021). Early reports outlined diagnostic guidance for assessing chest X-ray abnormalities in emergency department settings, including patchy or diffuse reticular–nodular ‘ground-glass’ opacities (GGOs) and consolidation with basal, peripheral, and bilateral predominance (Jacobi et al, 2020; Kim et al, 2020). Recent studies advocated for the use of chest X-rays in grading patients with COVID-19 (Borkowski et al, 2020; Akl et al, 2021) via scoring systems such as the COVID-specific Brixia score, which rates lung involvement on a scale from 0 to 18, or percentage of lung involvement (Monaco et al, 2020; Balbi et al, 2021; Maroldi et al, 2021). While automated approaches for disease classification have attained a high (> 90%) diagnostic accuracy (Hussain et al, 2020), there is a dearth of research using radiomic features to predict clinical outcomes for patients admitted with COVID-19 due to their high-dimensional and heterogeneous nature, as well as data unavailability (Ismael and Şengür, 2021). The added utility of such features for predicting in-hospital mortality, beyond clinical risk factors, is largely unknown (Ismael and Şengür, 2021).

The University of Michigan Health System (or Michigan Medicine), as one of the primary regional centers managing the care of patients with COVID-19 during the pandemic, has collected a wealth of X-ray image data, in addition to demographic and clinical data, via the Electronic Health Record (EHR) (Spector-Bagdady et al, 2020; Salvatore et al, 2021). Portable chest X-ray, with its availability and ease of use, has been routinely used for monitoring patients in need of urgent care at Michigan Medicine, even prior to the pandemic (Nypaver et al, 2018). However, analysis of chest X-ray images is complicated by the data structure, particularly in the COVID-19 setting (Abbas et al, 2020). Leveraging machine learning techniques, we proposed a workflow for the extraction and selection of features from COVID-related X-ray images. By using survival information directly, our framework decomposes raw images into texture features and identifies those features that are most related to COVID-19 mortality. We used several machine learning techniques to assess the predictability of demographic and clinical factors and the radiomic texture features on in-hospital mortality, a primary endpoint for patients hospitalized with COVID-19 (Gupta et al, 2021). Subgroup analyses revealed that chest Xray images offered more prognostic utility for vulnerable (e.g., older or sicker) patients.

Patient Outcomes and Characteristics

Of the 3,313 hospitalized patients with X-ray, we analyzed a total of 3,310 patients with anterior-posterior or posterior-anterior images, which provided clear views of the lungs; excluded were only 3 patients whose X-ray provided unclear views and could not be analyzed. During follow-up, we observed 590 (17.8%) in-hospital deaths and 20 (0.6%) discharges to hospice. Median age was 61 (interquartile range: 46–73) years, and the majority of patients were male (56%), with an over-representation of Black patients (21%) as compared to the surrounding population. Median respiratory rate was 18.8 (17.5–21.7) breaths per minute and median oxygen saturation was 95.5% (94.0–97.2%). There was a high proportion of patients with cardiac arrhythmias (70%), hypertension (70%), and fluid and electrolyte disorders (70%) at admission (Supplement E). Four radiomic features and eight clinic features were included in the final model.

Prediction Performance

We first compared the predictive performance of the following five algorithms using the clinical predictors only. The algorithms were the Cox proportional hazards model (Therneau and Grambsch, 2000), survival support vector machines (P¨olsterl et al, Preprint posted online November 21, 2016), random survival forests (Ishwaran et al, 2008), survival gradient boosting (Hothorn et al, 2006), and ensemble averaging of the first four algorithms (Zhou, 2012). The average C-index across one hundred experiments ranged from 78.1–80.3%, with ensemble averaging performing the best. We then compared the algorithms using both the clinical and radiomic features and noted that ensemble averaging still outperformed the other methods, again achieving the highest average C-index of 81.0%. Moreover, incremental improvements were observed across all five algorithms, ranging from a 0.5% increase in C-index (random survival forests) to a 2.0% increase (survival gradient boosting; Table 1). This motivated us to conduct subgroup analyses to examine which subgroups would benefit more with the added image features; see the later section of “Subgroup Analysis and Risk Stratification.”

Table 1

Comparisons of the prediction performance in concordance index between using clinical only and clinical plus imaging data, obtained by five machine learning algorithms.
Method	Clinical Features Only	Clinical & Imaging Feature
Cox Proportional Hazards Model	78.3 (77.9, 78.7)	79.1 (78.8, 79.5)
Survival Support Vector Machines	78.1 (77.7, 78.4)	79.6 (79.2, 79.9)
Random Survival Forests	79.5 (79.1, 79.9)	80.0 (79.7, 80.4)
Survival Gradient Boosting	78.4 (78.0, 78.8)	80.4 (80.0, 80.7)
Ensemble Averaging	80.3 (79.9, 80.6)	81.0 (80.7, 81.3)

Feature Importance

Figure 1(a) gives the feature importance for the top clinical and imaging features under the five predictive approaches. The most important clinical features were age, indications of fluid and electrolyte disorders, respiratory rate, diastolic blood pressure, metastatic cancer, and solid tumor cancer without metastasis. Important imaging texture features included dependence nonuniformity, zone entropy, median pixel intensity, large area high gray level emphasis, maximal correlation coefficient, pixel intensity kurtosis, and robust mean absolute deviation. Patients with higher dependence non-uniformity, zone entropy, and maximal correlation coefficients had more heterogeneity or complexity in the texture patterns for their images. Those with higher median pixel intensity and large area high gray level emphasis had greater concentrations of high gray level values in their images, and those with higher pixel intensity kurtosis and robust mean absolute deviations had more outlying values in their pixel intensities (Zwanenburg et al, Preprint posted online December 21, 2016).

Adjusted Associations between Clinical/Imaging Features and Survival

We fit a Cox regression model with the important features, presenting the hazard ratios (HR) and 95% confidence intervals (CI) in Table 2. Older age (HR: 2.33; 95% CI: 2.07–2.63), higher respiratory rate (1.41; 1.28–1.55), and indications of fluid and electrolyte disorders (2.57; 1.98–3.34), metastatic cancer (1.41; 1.10–1.80), and solid tumor cancer without metastasis (1.32; 1.03–1.68) were significantly associated with higher in-hospital mortality. Conversely, higher diastolic blood pressure (0.81; 0.75–0.88), never smoking (0.46; 0.32–0.67) and former smoking (0.62; 0.43–0.90) were associated with lower mortality. Among the radiomic texture features, greater dependence non-uniformity (1.21; 1.08–1.36), large area high gray level emphasis (1.14; 1.04–1.25), and median pixel intensity (1.14; 1.05–1.25) were significantly associated with higher hazards for mortality, while lower maximal correlation coefficients (0.91; 0.83–0.99) were marginally associated with higher mortality hazards.

Table 2

Adjusted associations from a Cox proportional hazards model fit on the selected clinical and imaging features taken on the n = 3,310 patients hospitalized with COVID-19 in our study population. HR = Hazard Ratio; CI = Confidence Interval.
Clinical Features	HR	95% CI	p-Value
Age	2.33	(2.07, 2.63)	< 0.005
Respiratory Rate	1.41	(1.28, 1.55)	< 0.005
Fluid and Electrolyte Disorders	2.57	(1.98, 3.34)	< 0.005
Diastolic Blood Pressure	0.81	(0.75, 0.88)	< 0.005
Metastatic Cancer	1.41	(1.10, 1.80)	0.01
Solid Tumor Without Metastasis	1.32	(1.03, 1.68)	0.03
Smoking Status
Current	-	-	-
Former	0.62	(0.43, 0.90)	0.01
Never	0.46	(0.32, 0.67)	< 0.005
Unknown	0.95	(0.65, 1.38)	0.79
Imaging Features	HR	95% CI	p-Value
Dependence Non-Uniformity	1.21	(1.08, 1.36)	< 0.005
Large Area High Gray Level Emphasis	1.14	(1.04, 1.25)	0.01
Median	1.14	(1.05, 1.25)	< 0.005
Maximal Correlation Coefficient	0.91	(0.83, 0.99)	0.03
Robust Mean Absolute Deviation	1.07	(0.98, 1.18)	0.14
Zone Entropy	1.07	(0.95, 1.22)	0.26
Kurtosis	0.97	(0.87, 1.10)	0.66

Subgroup Analysis and Risk Stratification

We used ensemble averaging, which was the most predictive, to construct risk scores with and without the addition of the radiomic features. We compared how these scores could distinguish patients within certain subgroups, defined by age or comorbidity burden [Figures 1(b)-(c)]. Two findings are worth noting. First, the scores, based on clinical indicators only or both clinical and image features, could well distinguish patients across all the subgroups, highlighting the usefulness of clinical and image features in profiling the risk of patient mortality.

Second, patients were then classified as ‘high’ versus ‘low’ risk based on median risk scores defined by using both clinical and clinical + radiomic features. Within certain subgroups (e.g., patients older than 65 years or those with seven or more comorbid conditions), the separation between the survival curves of the high- and low-risk patients defined with the addition of the imaging features was more obvious than that between those of the high- and low-risk patients, defined using clinical features alone. This exemplifies the added prognostic utility of radiomic features in these subgroups. In contrast, the separation was not as apparent in the other subgroups, e.g., among those younger than 65 years or with fewer than seven comorbidity conditions.

To confirm our findings, we compared the increase in C-index with the addition of the radiomic features between these different subgroups. Table 3 shows a significantly higher increase in C-index among older patients than younger patients with the addition of radiomic features. There was a 2.3%– 3.1% increase in C-index among older patients across the different algorithms with the addition of the radiomic features. This increment is clinically meaningful (Harrell Jr et al, 1996) and significantly larger (p < 0.001) than the 0.5–1.0% increase among younger patients. Similarly, a 1.6–2.5% increase in C-index was achieved among patients with a higher comorbidity burden, as compared to a 0.2–1.4% increase among patients with a lower comorbidity burden. This increment was clinically meaningful and statistically significant (p < 0.01).

In addition to PCR testing, radiologic imaging plays an important role in grading and managing patients with COVID-19. Portable chest X-rays have become an efficient and convenient means of triaging emergent cases, and their increased use has raised the question as to whether the images carry any additional prognostic utility. We observed a slight increase in prediction performance with the added X-ray features, which motivated us to further study which patient subgroups would benefit more from the additional image features. We found that older patients and those with higher comorbidity burden at admission saw significantly larger gains in C-index with the added radiomic features.

There is a growing body of literature to support the use of imaging data for in-hospital mortality prognostication. Kim et al. (2020) found that X-ray grade was significantly associated with both length of stay in hospital and higher odds of intubation (Kim et al, 2020). Garrafa et al. (2021) predicted in-hospital mortality using the COVID-specific Brixia score (Garrafa et al, 2021), and the predictiveness for their testing data ranged from 0.52 (logistic regression) to 0.78 (random forests and gradient boosting), which was close to our results. Schalekamp et al. (2021) graded chest X-rays on a severity scale from zero to eight points (Schalekamp et al, 2021) and developed an image-based risk score to predict critical illness in patients with COVID-19. Soda et al. (2021) modeled patient survival with clinical and imaging features in an Italian cohort (Soda et al, 2021) and obtained an accuracy of 0.68–0.76 across different methods with only clinical information and increasing to a range of 0.75–0.77 with both clinical and imaging features, an increment similar to our report. They found that age, oxygen saturation, respiratory rates, and active cancer were of the most importance, which was consistent with our findings.

Lung involvement and COVID-19 severity, assessed by visual examination of the raw X-ray images, were reported to be predictive of mortality (Shen et al, 2021; Monaco et al, 2020; Balbi et al, 2021; Maroldi et al, 2021). However, visual approaches may be prone to subjectivity and inaccuracy. In contrast, our method provides an objective means of extracting image features for aiding in survival prognostication.

Table 3

Prediction performance in concordance index of different algorithms comparing (1) patients 65 years or younger versus older than 65 years, and (2) patients with seven or fewer versus more than seven comorbidities.
	Age ≤ 65 (n = 1,976)			Age > 65 (n = 1,334)
Method	Clinical	Clinical & Imaging	Improvement	Clinical	Clinical & Imaging	Improvement	p-Value
Cox Proportional Hazards Model	79.1 (78.5, 79.7)	79.7 (79.1, 80.3)	0.6 (0.4, 0.8)	69.2 (68.5, 69.9)	71.5 (71.0, 72.0)	2.3 (1.9, 2.7)	< 0.001
Survival Support Vector Machines	79.6 (79.1, 80.1)	80.3 (79.8, 80.8)	0.7 (0.5, 0.9)	69.3 (68.7, 69.9)	72.3 (71.8, 72.8)	3.0 (2.6, 3.4)	< 0.001
Random Survival Forests	79.0 (78.4, 79.6)	80.0 (79.5, 80.5)	1.0 (0.6, 1.4)	69.7 (69.1, 70.3)	72.8 (72.2, 73.4)	3.1 (2.6, 3.6)	< 0.001
Survival Gradient Boosting	79.1 (78.6, 79.6)	80.1 (79.5, 80.7)	1.0 (0.5, 1.5)	70.1 (69.5, 70.7)	73.2 (72.7, 73.7)	3.1 (2.6, 3.6)	< 0.001
Ensemble Averaging	80.6 (80.1, 81.1)	81.1 (80.6, 81.6)	0.5 (0.2, 0.8)	71.1 (70.5, 71.7)	73.6 (73.1, 74.1)	2.5 (2.1, 2.9)	< 0.001
	≤ 7 Comorbidities (n = 1,679)			> 7 Comorbidities (n = 1,631)
Method	Clinical	Clinical & Imaging	Improvement	Clinical	Clinical & Imaging	Improvement	p-Value
Cox Proportional Hazards Model	82.9 (82.3, 83.5)	83.1 (82.5, 83.7)	0.2 (0.0, 0.4)	71.3 (70.8, 71.8)	72.9 (72.4, 73.4)	1.6 (1.4, 1.8)	< 0.001
Survival Support Vector Machines	83.3 (82.8, 83.8)	83.6 (83.0, 84.2)	0.3 (0.0, 0.6)	71.1 (70.6, 71.6)	73.2 (72.7, 73.7)	2.1 (1.7, 2.5)	< 0.001
Random Survival Forests	82.2 (81.6, 82.8)	83.6 (83.1, 84.1)	1.4 (1.0, 1.8)	70.4 (69.9, 70.9)	72.5 (72.0, 73.0)	2.1 (1.7, 2.5)	0.01
Survival Gradient Boosting	82.2 (81.6, 82.8)	82.8 (82.1, 83.5)	0.6 (0.1, 1.1)	70.6 (70.1, 71.1)	73.1 (72.6, 73.6)	2.5 (2.1, 2.9)	< 0.001
Ensemble Averaging	84.2 (83.7, 84.7)	85.0 (84.5, 85.5)	0.8 (0.5, 1.1)	72.0 (71.5, 72.5)	74.2 (73.7, 74.7)	2.2 (1.9, 2.5)	< 0.001

Our work addresses the challenge of analyzing variable-size images, which cannot be processed by deep learning algorithms like AlexNet (Krizhevsky et al, 2012) or ResNet (He et al, 2016). Rather than directly feeding images into the models, we derived relevant texture features with maximal image differentiation for predicting COVID-19 survival based on a standard workflow (Chandra et al, 2021; Ismael and Şengür, 2021; Selvi et al, 2021; Johri et al, 2021; Hussain et al, 2020; van Griethuysen et al, 2017). These texture features are more interpretable than those derived from deep learning models (Zhang et al, 2018), and our method enabled us to leverage patient survival information when selecting the image features, leading to some interesting discoveries. We found that median pixel intensity and large dependence high gray level emphasis, features corresponding to greater concentrations of high gray level values in the images, were important predictors of patient survival. Greater heterogeneity in the texture features, characterized by zone entropy and dependence non-uniformity was also predictive. These findings align with the current literature. For example, similar to our results, Varghese et al. (2021) showed the importance of certain first and second order texture features, namely, histogram and intensity, followed by the gray level size zone matrix and grey level co-occurrence matrix, for predicting intensive care unit utilization, intubation, and death (Varghese et al, 2021).

We detected that predictions on certain subgroups of patients benefited more from the addition of these radiomic features. In particular, greater improvement in survival prediction was observed for older (> 65 years) patients and those with higher (> median 7/25 comorbidities) comorbidity burden. Our results agree with previous findings that the severity of disease in the images is associated with comorbidity burden and age (Blain et al, 2021; Yasin and Gouda, 2020; Liu et al, 2020), hinting that radiomic features coming from older or sicker patients are likely to contain more information relevant to survival. In contrast, younger or healthier patients are at a lower risk of death, so the additional radiomic features do not add much to their prognostication (Castelli et al, 2022).

We note some limitations and areas of future work for the current study. First, only hospitalizations at Michigan Medicine were included in the analysis, potentially limiting the generalizability of the results. However, our workflow provides a general and useful framework for analyzing EHR data with chest X-ray images, and our results may generate hypotheses for larger-scale investigations. As some improvement was observed among older patients and patients with a higher comorbidity burden, external validation is necessary to confirm these results and their clinical importance. Further investigations are also needed to assess the optimality of our feature extraction and screening techniques. Second, as with most EHR studies, there might be an inherent selection bias among those presenting to Michigan Medicine and subsequently admitted for COVID-19 related complications. Causal inference approaches may be explored to address observable and unobservable confounders. Third, comorbidities taken at admission were not differentiated from chronic conditions preceding infection. More in-depth work is needed to account for chronologies of these conditions. Lastly, comparisons to other, automated approaches such as deep learning may yield additional benchmarks for the accuracy of the proposed method.

In summary, portable chest X-ray is a valuable tool for monitoring and guiding the care of patients with COVID-19. Patterns of COVID-19 lung disease that are identifiable on chest X-ray are predictive of, and significantly associated with, the survival outcomes of patients hospitalized with COVID19. As portable chest X-rays are increasingly used, our framework will help predict the prognostic outcomes of patients.

Experimental Design

This was a prognostic analysis of patients who (1) were admitted to Michigan Medicine between March 10, 2020 (the date of the first case in this state) and March 31, 2022 (the cutoff date of the released EHR data), (2) tested positive for COVID-19 or transferred in carrying a positive diagnosis, and (3) had at least one COVID-related chest X-ray image taken. We focused on patients with X-rays because patients without imaging were in general much younger and healthier, and images are valuable in triaging patients and managing resources (Jiao et al, 2021). Our outcome was the time from admission until in-hospital death, censored by discharge or the end of the study. Discharge was regarded as a censoring event, except for discharge to hospice, because the median survival for these patients was less than 30 days post-discharge. As it was a strong precursor to death, we considered both in-hospital death and discharge to hospice as failure events (see Supplement A).

From the EHR database, we extracted and created a set of demographic, socioeconomic, and clinical risk factors (see Supplement B) identified as being related to COVID-19 in the literature (Rod et al, 2020; Wu and McGoogan, 2020; Jordan et al, 2020; Mikami et al, 2021; Kim et al, 2021; Rosenthal et al, 2020; Centers for Disease Control and Prevention, 2022b; Ebinger et al, 2020; Williamson et al, 2020; Alqahtani et al, 2020; Khan et al, 2020; Ssentongo et al, 2020; Yang et al, 2020; Wang et al, 2020; Salerno et al, 2021a). Patient demographics included age, sex, race (Black or non-Black), ethnicity (Hispanic or non-Hispanic), smoking status, alcohol use, and drug use. As patient-level socioeconomic factors were unavailable, we created four composite socioeconomic measures at the US census tract-level based on patient residences. These composites, measuring affluence, disadvantage, ethnic immigrant concentration, and education, were defined to be the proportion of adults meeting the corresponding criterion within a census tract (Clarke and Melendez, Ann Arbor, MI; Gu et al, 2020; Salerno et al, 2021b), and were further categorized by quartiles. For each of twenty-nine prevalent comorbidity conditions commonly used in literature (Crabb et al, 2020; Elixhauser et al, 1998; van Walraven et al, 2009; Quan et al, 2005), we defined a binary indicator to flag whether the patient had any associated ICD-10 code at admission. Lastly, we obtained physiologic measurements within 24 hours of admission, including body mass index (kg/m2), oxygen saturation, body temperature, respiratory rate, diastolic and systolic blood pressure, and heart rate.

With multiple X-rays potentially taken for one patient, we chose the one closest to the time of admission and examined its role in predicting patient survival. We first pre-processed each image according to the pipeline depicted in Fig. 2. First, prior to feature extraction and selection, we retained only those images taken from the anterior-posterior or posterior-anterior position so that the orientation of the images would be comparable. We then normalized these images so that the pixel intensities of each image conformed to a standard range of 0 (black) to 255 (white) units. We further used histogram equalization to enhance the contrast of the images (Jain, 1989).

Broadly, there are two potential approaches for feature extraction, namely (1) artificial intelligence methods, which learn feature representations automatically from the data, and (2) engineered texture features. While deep learning has been shown to have high prognostic accuracy, learned features are difficult to interpret, not standardized, and often not reproducible, which may impact their reliability (Yip and Aerts, 2016). Thus, we extracted a standard panel of engineered texture features according to the PyRadiomics workflow (van Griethuysen et al, 2017). Specifically, we applied six different filters (e.g., different transformations) to the pre-processed images to acquire additional information (e.g., at edges or boundaries) and derive different image types (e.g., shape) (van Griethuysen et al, 2017). From the seven image filters (original + six transformations), we extracted seven classes of features from each image (Haralick et al, 1973; Chu et al, 1990; Thibault et al, 2013; van Griethuysen et al, 2017), resulting in 1,311 candidate image features. To obtain a short list of predictive clinical and image features, we performed feature screening by fitting Cox proportional hazards models (Therneau and Grambsch, 2000) on each feature one at a time and retaining those significant at the 0.05 level. Finally, we selected the features with the highest feature importance, and obtained a final Cox model, quantifying the adjusted associations of important clinical and radiomic features with in-hospital mortality. We used the concordance index (C-index) to assess the predictiveness of models (Harrell Jr et al, 1996) (see Supplement C). This study was approved by the Michigan Medicine Institutional Review Board (HUM00192931), which waived informed consent based on secondary analysis of deidentified datasets. All analysis was conducted in accordance with relevant guidelines and regulations.

Statistical Analysis

We implemented five risk prediction algorithms, namely, the Cox proportional hazards model (Therneau and Grambsch, 2000), survival support vector machines (P¨olsterl et al, Preprint posted online November 21, 2016), random survival forests (Ishwaran et al, 2008), survival gradient boosting (Hothorn et al, 2006), and ensemble averaging of the first four algorithms (Zhou, 2012). The Cox model, the most widely used method in survival analysis, assumes a risk function that is linear in the predictors. Survival support vector machines (P¨olsterl et al, Preprint posted online November 21, 2016) can account for non-linear relationships. Both random survival forests and survival gradient boosting combine multiple predictions from individual survival trees to achieve a more powerful prediction (Ishwaran et al, 2008; Hothorn et al, 2006; Salerno and Li, Preprint posted online May 5, 2022). Ensemble averaging combines predictions from multiple models to produce a desired output and often performs better than individual models by averaging out their errors (Zhou, 2012). Supplement D details these methods.

We used cross-validation to unbiasedly estimate the predictiveness of each method. We randomly split the data into 80% training and 20% testing samples, maintaining the proportion of events in the full sample within each split. We then trained the various predictive models by using the training samples and computed the C-index by using the testing samples. We repeated the same procedure one hundred times and took an average of the C-index to obtain an unbiased estimate of the C-index for each method (Uno et al, 2007). We applied each method with the demographic and clinical predictors, followed by the addition of radiomic features to assess their incremental prognostic utility via the C-index. Using ensemble averaging, which was the most predictive (see the Section of Results), we developed a risk score to predict in-hospital mortality and classified patients into low- and high-risk groups using the median score as the cutoff.

Lastly, we detail the variable selection process for building a final Cox model. We selected clinical and image features based on their importance in prediction, defined by the absolute decrease in C-index with the “removal” of the concerned feature in the data (Breiman, 2001). To do so, we randomly split the data into 80% training and 20% testing samples, fit the model on the training data and calculated the feature importance using the testing data (Supplement D.6). We repeated the same procedure one hundred times, selected the features that were most important (on average) among these one hundred experiments, and included them in a multivariable Cox regression to assess their statistical associations with in-hospital mortality. All data processing and analysis was carried out with Python (version 3.8.8), NumPy (version 1.20.1), and scikit-survival (version 0.17.2).

We examined different subgroups to gauge how the prediction performance of the model improved with the added radiomic features. Because age and comorbidity burden were the most relevant to survival among the clinical factors, we considered patient subgroups defined by age (> versus ≤ 65 years old) and number of comorbidities at admission (> versus ≤ median seven comorbidities), respectively. We compared the change in prediction performance with the addition of the radiomic features between different subgroups.

Acknowledgments

We thank Dr. Brahmajee Nallamothu for leading the development and curation of DataDirect, a newly launched, GPU-based analytics platform through the Michigan Medicine Precision Health Initiative. We are grateful to Anisa Driscoll and Cinzia Smothers for their continued analytical support with respect to database management, data processing, and use of the DataDirect platform.

Funding

National Institutes of Health, National Cancer Institute grant R01-CA249096-01A1 (YL)

Competing Interests

Authors declare that they have no competing interests.

Ethics Approval

This study was approved by the Michigan Medicine Institutional Review Board (HUM00192931), which waived informed consent based on secondary analysis of deidentified datasets. All analysis was conducted in accordance with relevant guidelines and regulations.

Consent to Participate

Not applicable

Consent for Publication

Not applicable

Availability of Data and Materials

The datasets used in this study are not publicly available due to the need for institutional review board approval as a University of Michigan-affiliated researcher through the University of Michigan Health System (i.e., Michigan Medicine) Precision Health Initiative.

Code Availability

All code used to produce the results found in this work have been made publicly available at https://github.com/YumingSun/COVID_Imaging_Prediction.

Authors’ Contributions

Conceptualization: YS, SS, PH, JK, YL
Methodology: YS, SS, XW, PH, JK, MWS, SJ, DCC, YL
Investigation: YS, SS, XH, ZP, EY, CS, JS, XW
Visualization: YS, SS
Supervision: XW, PH, JK, MWS, SJ, DCC, YL
Writing – Original Draft: YS, SS, XH, YL
Writing – Review & Editing: YS, SS, XH, ZP, EY, CS, JS, XW, PH, JK, MWS, SJ, DCC, YL

Abbas A, Abdelsamea MM, Gaber MM (2020) Classification of covid-19 in chest x-ray images using detrac deep convolutional neural network [published online ahead of print, 2020 sep 5]. Appl Intell (Dordr) pp 1–11
Akl EA, Blaˇzi´c I, Yaacoub S, et al (2021) Use of chest imaging in the diagnosis and management of covid-19: a who rapid advice guide. Radiology 298(2):E63–E69
Alqahtani JS, Oyelade T, Aldhahir AM, et al (2020) Prevalence, severity and mortality associated with copd and smoking in patients with covid-19: a rapid systematic review and meta-analysis. PloS One 15(5):e0233,147
Balbi M, Caroli A, Corsi A, et al (2021) Chest x-ray for predicting mortality and the need for ventilatory support in covid-19 patients presenting to the emergency department. Eur Radiol 31(4):1999–2012
Blain M, Kassin MT, Varble N, et al (2021) Determination of disease severity in covid-19 patients using deep learning in chest x-ray images. Diagn Interv Radiol 27(1):20–27
Borkowski AA, Viswanadhan NA, Thomas LB, et al (2020) Using artificial intelligence for covid-19 chest x-ray diagnosis. Fed Pract 37(9):398–404
Breiman L (2001) Random forests. Machine Learning 45(1):5–32
Castelli G, Semenzato U, Lococo S, et al (2022) Brief communication: Chest radiography score in young covid-19 patients: Does one size fit all? PloS one 17(2):e0264,172
Centers for Disease Control and Prevention (2022a) Cdc covid data tracker. https://covid.cdc.gov/covid-data-tracker/, accessed: 2022-06-13
Centers for Disease Control and Prevention (2022b) Scientific evidence for conditions associated with higher risk for severe covid-19. https://www.cdc.gov/coronavirus/2019-ncov/science/sciencebriefs/underlying-evidence-table.html, accessed: 2022-06-13
Chandra TB, Verma K, Singh BK, et al (2021) Coronavirus disease (covid19) detection in chest x-ray images using majority voting based classifier ensemble. Expert Syst Appl 165:113,909
Chu A, Sehgal CM, Greenleaf JF (1990) Use of gray value distribution of run lengths for texture analysis. Pattern Recognition Letters 11(6):415–419
Clarke P, Melendez R (Ann Arbor, MI) National neighborhood data archive (nanda): Neighborhood socioeconomic and demographic characteristics of census tracts, united states, 2000–2010. Inter-university Consortium for Political and Social Research
Cox DR (1972) Regression models and life tables (with discussion). Journal of the Royal Statistical Society, Series B 34(2):187–220
Crabb BT, Lyons A, Bale M, et al (2020) Comparison of international classification of diseases and related health problems, tenth revision codes with electronic medical records among patients with symptoms of coronavirus disease 2019. JAMA Netw Open 3(8):e2017,703
Ebinger JE, Achamallah N, Ji H, et al (2020) Pre-existing traits associated with covid-19 illness severity. PloS One 15(7):e0236,240
Elixhauser A, Steiner C, Harris DR, et al (1998) Comorbidity measures for use with administrative data. Med Care 36(1):8–27
Fisher A, Rudin C, Dominici F (2019) All models are wrong, but many are useful: Learning a variable’s importance by studying an entire class of prediction models simultaneously. J Mach Learn Res 20(177):1–81
Galloway MM (1975) Texture analysis using gray level run lengths. Computer Graphics and Image Processing 4(2):172–179
Garrafa E, Vezzoli M, Ravanelli M, et al (2021) Early prediction of in-hospital death of covid-19 patients: a machine-learning model based on age, blood analyses, and chest x-ray score. Elife 10:e70,640
Gu T, Mack JA, Salvatore M, et al (2020) Characteristics associated with racial/ethnic disparities in covid-19 outcomes in an academic health care system. JAMA Netw Open 3(10):e2025,197
Gupta A, Madhavan MV, Poterucha TJ, et al (2021) Association between antecedent statin use and decreased mortality in hospitalized patients with COVID-19. Nat Commun 12(1):1325
Haralick RM, Shanmugam K, Dinstein I (1973) Textural features for image classification. IEEE Transactions on Systems, Man, and Cybernetics SMC3(6):610–621
Harrell FE, Califf RM, Pryor DB, et al (1982) Evaluating the yield of medical tests. JAMA 247(18):2543–2546
Harrell Jr FE, Lee KL, Mark DB (1996) Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Statistics in medicine 15(4):361–387
He K, Zhang X, Ren S, et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778
Hothorn T, Bu¨hlmann P, Dudoit S, et al (2006) Survival ensembles. Biostatistics 7(3):355–373
Hussain L, Nguyen T, Li H, et al (2020) Machine-learning classification of texture features of portable chest x-ray accurately classifies covid-19 lung infection. Biomed Eng Online 19(1):88
Ishwaran H, Kogalur UB, Blackstone EH, et al (2008) Random survival forests. The Annals of Applied Statistics 2(3):841–860
Ishwaran H, Kogalur UB, Chen X, et al (2011) Random survival forests for high-dimensional data. Statistical Analysis and Data Mining: The ASA Data Science Journal 4(1):115–132
Ismael AM, S¸engu¨r A (2021) Deep learning approaches for covid-19 detection based on chest x-ray images. Expert Syst Appl 164:114,054
Jacobi A, Chung M, Bernheim A, et al (2020) Portable chest x-ray in coronavirus disease-19 (covid-19): A pictorial review. Clin Imaging 64:35–42
Jain AK (1989) Fundamentals of Digital Image Processing. Prentice-Hall, Inc
Jiao Z, Choi JW, Halsey K, et al (2021) Prognostication of patients with covid-19 using artificial intelligence based on chest x-rays and clinical data: a retrospective study. The Lancet Digital Health 3(5):e286–e294
Johri S, Goyal M, Jain S, et al (2021) A novel machine learning-based analytical framework for automatic detection of covid-19 using chest x-ray images. International Journal of Imaging Systems and Technology 31(3):1105–1119
Jordan RE, Adab P, Cheng KK (2020) Covid-19: risk factors for severe disease and death. BMJ 368:m1198
Karim SSA, Karim QA (2021) Omicron sars-cov-2 variant: a new chapter in the covid-19 pandemic. The Lancet 398(10317):2126–2128
Khan MMA, Khan MN, Mustagir MG, et al (2020) Effects of underlying morbidities on the occurrence of deaths in covid-19 patients: A systematic review and meta-analysis. J Glob Health 10(2):020,503
Kim HW, Capaccione KM, Li G, et al (2020) The role of initial chest x-ray in triaging patients with suspected covid-19 during the pandemic. Emerg Radiol 27(6):617–621
Kim L, Garg S, O’Halloran A, et al (2021) Risk factors for intensive care unit admission and in-hospital mortality among hospitalized adults identified through the us coronavirus disease 2019 (covid-19)-associated hospitalization surveillance network (covid-net). Clin Infect Dis 72(9):e206–e214
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems 25:1–9
Kupferschmidt K, Wadman M (2021) Delta variant triggers new phase in the pandemic. Science 372(6549):1375–1376
Liu X, Zhou H, Zhou Y, et al (2020) Temporal radiographic changes in covid-19 patients: relationship to disease severity and viral clearance. Sci Rep 10(1):10,263
Maroldi R, Rondi P, Agazzi GM, et al (2021) Which role for chest x-ray score in predicting the outcome in covid-19 pneumonia? Eur Radiol 31(6):4016–4022
McCue C, Cowan R, Quasim T, et al (2021) Long term outcomes of critically ill covid-19 pneumonia patients: early learning. Intensive Care Med 47(2):240–241
Michelen M, Cheng V, Manoharan L, et al (2021) Characterising long term covid-19: a living systematic review. BMJ Glob Health 6(9):e005,427
Mikami T, Miyashita H, Yamada T, et al (2021) Risk factors for mortality in patients with covid-19 in new york city. J Gen Intern Med 36(1):17–26
Molnar C (2020) Interpretable machine learning. Lulu. com
Monaco CG, Zaottini F, Schiaffino S, et al (2020) Chest x-ray severity score in covid-19 patients on emergency department admission: a two-centre study. Eur Radiol Exp 4(1):68
Nypaver M, Macy M, Pribble J, et al (2018) The michigan emergency department improvement collaborative: A novel model for implementing large scale practice change in pediatric emergency care. Pediatrics 142(1 MeetingAbstract):105–105
P¨olsterl S, Navab N, Katouzian A (Preprint posted online November 21, 2016) An efficient training algorithm for kernel survival support vector machines. arXiv ArXiv:1611.07054 [cs.LG]. doi: 10.48550/arXiv.1611.07054
Quan H, Sundararajan V, Halfon P, et al (2005) Coding algorithms for defining comorbidities in icd-9-cm and icd-10 administrative data. Med Care 43(11):1130–1139
Rod JE, Oviedo-Trespalacios O, Cortes-Ramirez J (2020) A brief-review of the risk factors for COVID-19 severity. Rev Saude Publica 54:60
Rosenthal N, Cao Z, Gundrum J, et al (2020) Risk factors associated with inhospital mortality in a us national sample of patients with covid-19. JAMA Netw Open 3(12):e2029,058
Salerno S, Li Y (Preprint posted online May 5, 2022) High-dimensional survival analysis: Methods and applications. arXiv ArXiv:2205.02948 [stat.ME]. doi: 10.48550/arXiv.2205.02948
Salerno S, Sun Y, Morris EL, et al (2021a) Comprehensive evaluation of covid-19 patient short-and long-term outcomes: Disparities in healthcare utilization and post-hospitalization outcomes. Plos one 16(10):e0258,278
Salerno S, Zhao Z, Prabhu Sankar S, et al (2021b) Patterns of repeated diagnostic testing for covid-19 in relation to patient characteristics and outcomes. J Intern Med 289(5):726–737
Salvatore M, Gu T, Mack JA, et al (2021) A phenome-wide association study (phewas) of covid-19 outcomes by race using the electronic health records data in michigan medicine. J Clin Med 10(7):1351
Schalekamp S, Huisman M, van Dijk RA, et al (2021) Model-based prediction of critical illness in hospitalized patients with covid-19. Radiology 298(1):E46–E54
Selvi JT, Subhashini K, Methini M (2021) Investigation of covid-19 chest x-ray images using texture features–a comprehensive approach. In: Computational
Modelling and Imaging for SARS-CoV-2 and COVID-19. CRC Press, p 45–58
Shen B, Hoshmand-Kochi M, Abbasi A, et al (2021) Initial chest radiograph scores inform covid-19 status, intensive care unit admission and need for mechanical ventilation. Clin Radiol 76(6):473.e1–473.e7
Soda P, D’Amico NC, Tessadori J, et al (2021) Aiforcovid: predicting the clinical outcomes in patients with covid-19 applying ai to chest-x-rays. an italian multicentre study. Med Image Anal 74:102,216
Spector-Bagdady K, Higgins PD, Aaronson KD, et al (2020) Coronavirus disease 2019 (covid-19) clinical trial oversight at a major academic medical center: Approach of michigan medicine. Clin Infect Dis 71(16):2187–2190
Ssentongo P, Ssentongo AE, Heilbrunn ES, et al (2020) Association of cardiovascular disease and 10 other pre-existing comorbidities with covid-19 mortality: A systematic review and meta-analysis. PloS One 15(8):e0238,215
Therneau TM, Grambsch PM (2000) Modeling Survival Data: Extending the Cox Model, Springer, chap The Cox model, pp 39–77
Thibault G, Fertil B, Navarro C, et al (2013) Shape and texture indexes application to cell nuclei classification. International Journal of Pattern Recognition and Artificial Intelligence 27(01):1357,002
Uno H, Cai T, Tian L, et al (2007) Evaluating prediction rules for t-year survivors with censored regression models. Journal of the American Statistical Association 102(478):527–537
Uno H, Cai T, Pencina MJ, et al (2011) On the c-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Stat Med 30(10):1105–1117
Van Belle V, Pelckmans K, Suykens J, et al (2007) Support vector machines for survival analysis. In: Proceedings of the Third International Conference on Computational Intelligence in Medicine and Healthcare (CIMED2007), pp 1–8
van Griethuysen JJM, Fedorov A, Parmar C, et al (2017) Computational radiomics system to decode the radiographic phenotype. Cancer Res 77(21):e104–e107
van Walraven C, Austin PC, Jennings A, et al (2009) A modification of the elixhauser comorbidity measures into a point system for hospital death using administrative data. Med Care 47(6):626–633
Varghese BA, Shin H, Desai B, et al (2021) Predicting clinical outcomes in covid-19 using radiomics on chest radiographs. Br J Radiol 94(1126):20210,221
Wang B, Li R, Lu Z, et al (2020) Does comorbidity increase the risk of patients with covid-19: evidence from meta-analysis. Aging (Albany NY) 12(7):6049–6057
Williamson EJ, Walker AJ, Bhaskaran K, et al (2020) Factors associated with covid-19-related death using opensafely. Nature 584(7821):430–436
Wu Z, McGoogan JM (2020) Characteristics of and important lessons from the coronavirus disease 2019 (covid-19) outbreak in china: summary of a report of 72314 cases from the chinese center for disease control and prevention. JAMA 323(13):1239–1242
Yang J, Zheng Y, Gou X, et al (2020) Prevalence of comorbidities and its effects in patients infected with sars-cov-2: a systematic review and meta-analysis. Int J Infect Dis 94:91–95
Yasin R, Gouda W (2020) Chest x-ray findings monitoring covid-19 disease course and severity. The Egyptian Journal of Radiology and Nuclear Medicine 51(1):193
Yip SS, Aerts HJ (2016) Applications and limitations of radiomics. Physics in Medicine & Biology 61(13):R150
Zhang Q, Wu YN, Zhu SC (2018) Interpretable convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 8827–8836
Zhou ZH (2012) Ensemble methods: foundations and algorithms. CRC press
Zwanenburg A, Leger S, Valli`eres M, et al (Preprint posted online December 21, 2016) Image biomarker standardisation initiative. arXiv ArXiv:1612.07003 [cs.CV]. doi: 10.48550/arXiv.1612.07003

No competing interests reported.

supplement.docx

Download PDF

Journal Publication

published 05 May, 2023

Read the published version in Scientific Reports →

Editorial decision: Major revision
19 Apr, 2023
Reviewers agreed at journal
06 Apr, 2023
Reviews received at journal
09 Feb, 2023
Reviewers agreed at journal
02 Feb, 2023
Reviewers agreed at journal
20 Jan, 2023
Reviewers agreed at journal
29 Oct, 2022
Reviewers invited by journal
04 Oct, 2022
Editor assigned by journal
04 Oct, 2022
Editor invited by journal
04 Oct, 2022
Submission checks completed at journal
04 Oct, 2022
First submitted to journal
29 Sep, 2022

You are reading this latest preprint version

Machine Learning to Assess the Prognostic Utility of Radiomic Features for In-hospital COVID-19 Mortality

Status:

Journal Publication

Version 1

Abstract

Figures

Introduction

Results

Patient Outcomes and Characteristics

Prediction Performance

Feature Importance

Adjusted Associations between Clinical/Imaging Features and Survival

Subgroup Analysis and Risk Stratification

Discussion

Methods

Experimental Design

Statistical Analysis

Declarations

References

Additional Declarations

Supplementary Files

Status:

Journal Publication

Version 1