Machine Learning for COVID-19 Patient Management: Predictive Analytics and Decision Support

doi:10.21203/rs.3.rs-4368072/v1

Download PDF

Article

Machine Learning for COVID-19 Patient Management: Predictive Analytics and Decision Support

https://doi.org/10.21203/rs.3.rs-4368072/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

Background. The global impact of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has profoundly affected economies and healthcare systems around the world, including Lebanon. While numerous meta-analyses have explored the systemic manifestations of COVID-19, few have linked them to patient history. Our study aims to fill this gap by using cluster analysis to identify distinct clinical patterns among patients, which could aid prognosis and guide tailored treatments.

Methods.We conducted a retrospective cohort study at Beirut's largest teaching hospital on 556 patients with SARS-CoV-2. We performed cluster analyses using K-prototypes, KAMILA and LCM algorithms based on 26 variables, including laboratory results, demographics and imaging findings. Silhouette scores, concordance index and signature variables helped determine the optimal number of clusters. Subsequent comparisons and regression analyses assessed survival rates and treatment efficacy according to clusters.

Results. Our analysis revealed three distinct clusters: "resilient recoverees" with varying disease severity and low mortality rates, "vulnerable veterans" with severe disease and high mortality rates, and "paradoxical patients" with a late severe presentation but eventual recovery.

Conclusions. These clusters offer insights for prognosis and treatment selection. Future studies should include vaccination data and various COVID-19 strains for a comprehensive understanding of the disease's dynamics.

Health sciences/Diseases/Infectious diseases/Viral infection

Health sciences/Medical research/Pre clinical studies

Biological sciences/Systems biology/Computer science

Biological sciences/Systems biology/Computer modelling

COVID-19

Clustering

Machine learning

K-prototypes

KAMILA

LCM

Over the past four years, the epidemic of severe acute respiratory syndrome caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has spread around the world, affecting multiple economies and healthcare systems, and rubbing salt into the wound of Lebanon's economy and status quo ¹. As an infectious disease of the respiratory tract, SARS-CoV-2, which causes coronavirus disease (COVID-19), generally manifests itself with common symptoms such as fever, fatigue, headache, cough and sore throat ². However, clinical presentations and disease severity in patients with COVID-19 can vary considerably, depending on the circulating viral strain, comorbidities and the patient's immune constitution. Symptoms can range from none, as in over a third of infected individuals, to the life-threatening, as in those with underlying acute respiratory failure ³. Predicting a patient's reaction to the virus and administering the appropriate treatments to avoid an unfavorable and potentially fatal outcome may seem an avant-garde approach, but it's possible thanks to modern statistical algorithms for clustering patients.

Cluster analysis is a fundamental technique in data mining, designed to reveal patterns that may be hidden by the complexity of the data, and extract knowledge from them. It has many applications in a variety of fields, e.g. socio-economic and medical, and has proved particularly useful for uncovering patterns in clinical data that might not be easily discerned by human analysis alone ^4,5. It involves the use of algorithms to divide data into groups of observations or "clusters", based on increasing the similarity between the components of a cluster, while reinforcing the dissimilarity between clusters ⁶. Such advance have led to a paradigm shift in medicine, where precision medicine is becoming tantamount to evidence-based medicine, particularly in areas such as cancer and metabolic diseases ⁷. For challenging diseases such as COVID-19, it may be of interest to identify distinct patient categories to enable a more personalized and rigorous approach to patient care.

We thus sought to classify patients with COVID-19 treated at the Hôtel Dieu de France Hospital in Beirut, on the basis of their medical history and the biochemical and radiological results obtained on hospital presentation. The representative criteria of each class obtained were then compared, enabling the clusters to be calibrated in order to adopt proactive approaches for each Lebanese admitted patient newly diagnosed with COVID-19. The latter will have the opportunity to be matched with one of the studied clusters of COVID-19 patients with clear, albeit probabilistic, treatment recommendations ready for implementation.

We conducted a single-center retrospective cohort study. We included 556 hospitalized patients with confirmed COVID-19 between September 22, 2019, and October 12, 2021. All statistical analyses were performed using R 4.3.1 (The R Foundation for Statistical Computing, Vienna, Austria) ⁸.

2.1. Data collection

The data were extracted from the Hôtel Dieu de France hospital electronic database. Only the first data values collected within 24 hours of hospital admission were used for cluster analysis (Table 1). The prognostic value of the different treatments administered was assessed by studying their effect on severity variables such as contraction of nosocomial infections, development of pneumo-mediastinum, composite fatal outcome (i.e admission, intubation and death), day of ICU transfer, date of intubation, occurrence of thromboembolic or hemorrhagic events and the corresponding dates, duration of hospitalization and all-cause death.

2.2. Data pre-processing and Clustering

The data studied is composed of continuous and categorical/ordinal variables (Table 1). Scaling and normalization were applied to continuous variables to meet the requirements of each algorithm ^9,10. Missing data were handled using the mice package ¹¹.

2.3. Cluster Analysis

The study used the 26 variables described in Table 1 and three clustering methods (K-prototypes, KAMILA ¹² and LCM ¹³) to cluster COVID-19 patients. The appropriate number of clusters was determined using silhouette scores ¹⁴ and Harrel's concordance index ¹⁵. The clinical relevance of the results was assessed on the basis of differences in survival and hospital stay between clusters, and the identification of signature variables likely to differentiate clusters. Further information on the clustering process can be found in Supplementary file 1.

2.4. Statistical analysis

Kaplan-Meier risk curves, logistic and Cox regressions were applied to 4 subclusters of patients, each defined by whether they were admitted to the ICU and whether they were superinfected or not (procalcitonin ≥ 0.5). Odd ratio (OR) and Hazard Ratio (HR) with 95% Confidence Interval (CI) were used to assess treatment effects. Mann-Whitney U or Kruskal-Wallis tests were used for continuous variables, followed by Dunn’s post-test if the latter test was used ¹⁶. The Chi-2 test (and Goodness-of-Fit test) and Fisher's exact test were used to compare categorical variables, followed by a post hoc test using Bonferroni correction ¹⁷.

3.1. Comparison of characteristics

After performing a systematic and comparative analysis of the three algorithms considered, namely K-prototypes, KAMILA and LCM, KAMILA proved to be the best clustering algorithm because it had the highest silhouette score and C-index, and produced the highest number of signature features, indicating superior clustering quality and differentiation ability compared to the other methods. A comprehensive, meticulously-documented, step-by-step analysis is made available for reference in Supplementary File 1.

Table 2 summarizes the demographic data and days since symptom onset of our sample, while Table 3 summarizes that of each cluster. Cluster 1 was the largest (239 patients) in contrast to the 2nd and 3rd clusters, both having almost the same number of patients (153 and 156, p = 0.865 for pairs 2–3). Roughly, Cluster 1 had the youngest patients with 5 out of 6 patients being ≤ 68 years, Cluster 2 had the oldest patients with the same ratio being ≥ 69 years, and Cluster 3 had the older half of Cluster 1 and the younger half of Cluster 2. Cluster 1 had the lowest weighted patients with Cluster 3 consisting mostly of the heavier. Cluster 1 had an almost equal distribution of genders, unlike Cluster 2 and 3 having predominantly men. Cluster 2 patients presented the earliest to the hospital, whereas for Cluster 3 the latest.

Cluster 1 has the fewest hypertensive and diabetic patients, but percentages comparable to Cluster 3 for other risk factors, while Cluster 2 has the highest rates for all risk factors. Roughly, half of Cluster 1 did not require any oxygen, unlike half of Cluster 3 needing higher flow of oxygen. Cluster 1 had the lowest laboratory values including IL6, except for lymphocyte count, LDH and Ferritin being comparable to Cluster 2. While the latter exhibited the highest procalcitonin and creatinine levels, Cluster 3 had the highest leucocyte (viz. neutrophile) count, CRP, LDH and Ferritin but lowest Lymphocyte count. Cluster 3 had the most ground-glass opacities on serial CT scans while Cluster 2 had the largest pulmonary artery diameter. Lastly, Cluster 1 has the lowest rates of ICU admissions and intubations, Cluster 2 the highest rates of hemorrhagic events and Cluster 3 the highest rates of thromboembolic events. Relevant patient risk factors are shown in Table 4, lab results in Table 5 and hospital outcomes in Table 6.

3.2. Survival analysis

Survival analysis is a statistical method used to study the time until an event of interest occurs, like patient deaths. In the classical approach, it examines the probability of an event occurring: an HR > 1 indicates an increased probability of the event occurring and HR < 1 indicates a decreased probability. However, a modified analysis will be featured here, where deaths are considered discharges, hence HR > 1 will indicate a positive outcome, i.e. faster discharge.

The analysis suggests that cluster 2 had the highest risk of all-cause death, while Clusters 1 and 3 were not different in terms of mortality. Moreover, clusters 2 and 3 have significantly higher risks of prolonged stay resulting in faster Cluster 1 patient discharges. The results of the two analyses are summarized in Table 7.

Based on all the previous results, the clusters will be hereafter labeled according to the population they describe. Namely, Cluster 1 will be dubbed "Resilient Recoverees", Cluster 2 "Vulnerable Veterans," and Cluster 3 "Paradoxical Patients".

3.3. Regression analysis

The association of various treatments with multiple outcomes was evaluated within each cluster to minimize complications and unnecessary interventions. Detailed treatment results, including HR and OR, are available in Supplementary file 2. Subclusters were constructed based on whether PCT at admission ≥ 0.5 and admission location (ICU or regular wards, as shown in Table 7). Mean and SD for treatment initiation, duration, and doses is presented in Table A (Supplementary file 2).

Non-superinfected non-ICU resilient recoverees. The use of carbapenem treatment was associated with longer hospital stays. In contrast, Tocilizumab doses of ~ 750 mg or more were associated with faster patient discharge.

Superinfected non-ICU resilient recoverees. Aminoglycosides, glucocorticoid treatment equal to or greater than ~ 6 weeks, and azithromycin at ~ 1.5 weeks from symptom onset or later were correlated with prolonged hospitalization periods and provided no benefit.

Superinfected vulnerable veterans. Non-ICU patients required oxygen therapy ranging from 4 to 8 liters which extended their hospital stays. The odds of reaching the composite outcome (ICU admission, intubation, then death) if a patient duration of glucocorticoid therapy averaged ~ 3 weeks and they did not receive aminoglycoside treatment averaged 0.220 (0.054–0.619).

Non-superinfected ICU vulnerable veterans. Deferred ICU admissions were associated with a delay of ~ 1.5 weeks in starting Tocilizumab and/or glucocorticoid therapy from onset of symptoms. This also applied to Hydroxychloroquine treatment if it extended beyond 3 weeks. This suggests that all 3 therapies were started after ICU admission, not preventively before. Conversely, early ICU admissions were associated with the administration of doses of Aspirin ≥ 324mg. Regardless of administration date, the mentioned therapies did not benefit the patient.

Non-superinfected non-ICU vulnerable veterans. Cephalosporins failed to impact the composite outcome, proving its uselessness. Antibiotic therapy beyond ~ 2 weeks benefitted patients, which suggests later superinfection. Glucocorticoid use for ~ 6 weeks or more offered no advantage other than prolonging stays. Similarly, glycopeptides extended hospital stays and were associated with increased bleeding risk, suggesting their use as poor prognostic indicator.

Non-superinfected non-ICU paradoxical patients. An increase in mortality was significantly linked to the use of prophylactic doses of antiplatelets in comparison to alternative dosage regimens of the same drug. The administration of doxycycline and the implementation of prone positioning were associated with expedited discharges, as opposed to those subjected to glucocorticoid treatment for a duration of ~ 6 weeks or more and prednisone exceeding 25 mg.

Non-superinfected ICU paradoxical patients. Glucocorticoid therapies that lasted beyond ~ 6 weeks also delayed ICU admission. Delayed ICU admission was associated with very early and multiple courses of Tocilizumab or Baricitinib treatment lasting beyond ~ 2 weeks, but there was no advantage in starting Tocilizumab thereafter. Subsequent superinfections and abstaining from using Tocilizumab were the main promotors of ICU admission. As for antibiotics, expecting to administer glycopeptides or cotrimoxazole at ~ 2 weeks from symptom onset if a bacterium was elucidated could delay ICU admissions. During ICU stay, neither carbapenems nor azithromycin had a positive impact on patient survival. The use of Remdesivir did not show any benefit.

Over the past three years, SARS-COV-2 has spread globally manifesting itself in different clinical presentations ranging from a fleeting flu to critical illness ¹⁸. Few studies have linked pathophysiologic findings to identify patient patterns for personalized treatments ^19,20. Cluster analysis is a promising approach to categorizing patients, and our study classified patients based on medical history, biochemistry, and radiology. KAMILA algorithm produced the best results ¹², identifying three distinct patient clusters with unique characteristics. This approach challenges the most updated treatment guidelines that rely on univariate patient data that has not been interpreted using a multivariate approach, such as clustering. Specifically, it challenges the COVID-19 clinical spectrum outlined in the frequently updated NIH treatment guidelines ¹⁸. As an overview, adults were considered to have asymptomatic or pre-symptomatic infection; mild, symptomatic illness without stigmata of pneumonia; moderate illness, with signs of lower respiratory tract disease but SpO2 ≥ 94% on room air; severe illness, including patients with SpO2 < 94% on room air who are not in shock or respiratory and organ failure; and critical illness with system failure. This classification is primarily based on oxygen saturation, a measurement that can be inconclusive, particularly in people aged ≥ 50 years with high-risk comorbidities and severe outcomes. NIH recommendations on oximetry interpretation favor consideration of the patient's overall clinical presentation and history. That said, it is essential to be proactive and detect and therefore treat those having risky histories so as to avoid admissions to the ICU, invasive mechanical ventilation, and death. CDC researchers reviewed the risk factors that favor COVID-19 progression into severe statuses ²¹, which have been taken into account in the therapeutic management of hospitalized patients as per the NIH recommendations ²². Our study aims to refine these predictive criteria through clustering.

To illustrate, the resilient recoverees, the largest cluster having the fastest recovery and lowest mortality, can be considered a moderate-to-severe COVID-19 group. They include middle aged fit patients, with markedly few cardiovascular risk factors, e.g. hypertension. They can benefit from minimal (e.g. < 8 L O2) to no therapeutic approaches considering the low mortality burden. Laboratory results were roughly normal with no markers of severe COVID-19 ²³. It's worth noting that IL-6 levels in this cluster were the lowest, but the lack of significant difference from other clusters is unreliable due to the infrequency of IL-6 measurements. The use of Tocilizumab may accelerate the discharge of non-ICU patients with no bacterial superinfections, but its cost-effectiveness and priority in this cluster should be minded.

As for the vulnerable veterans, they mostly include elderly men who have multiple risk factors and multiple comorbidities. They are at higher risk of severe to critical COVID-19 with more hemorrhagic events, superinfections and a higher mortality rate, most likely favored by their comorbidities. In addition, these patients had the largest pulmonary artery and the highest serum creatinine and procalcitonin suggesting the presence of pulmonary hypertension and the prevalence of superinfections. That said, superinfected ICU patients appeared to benefit from a ~ 3 week course of glucocorticoid therapy, which aligns with the NIH recommendations ¹⁸.

The paradoxical patient cluster, predominantly middle to old age men with nearly twice commoner hypertensive and diabetic patients than resilient recoverees, presenting 1 to 2 weeks late with more thromboembolic events. Despite severe to critical COVID-19 classification by NIH guidelines, their mortality rates mirror resilient recoverees. Paradoxically, if one were to consider merely their history, they would risk misclassifying the cluster as resilient. This cluster experiences prolonged stays akin to vulnerable veterans, similarly demanding ICU admissions and intubations, but requiring noticeably greater high-flow O2 supplementation. Glass opacities on CT scans predominate and are associated with elevated COVID-19 severity biomarkers ²³, notably lymphopenia but also CRP, ferritin and LDH alluding to a “cytokine storm”. This excessive, uncontrolled response contribute to tissue damage, organ failure, and heightened mortality ²⁴. Ancillary should aid in distinguishing resilient from paradoxical patients, emphasizing the impact of the virus itself. For non-superinfected regular wards patients, recent literature echoed our conclusions on the effectiveness of prone positioning ¹⁸ and doxycycline therapy ²⁵, particularly in hastening discharge. More, it agrees on the uselessness of anti-platelet therapy in this group ¹⁸. As for non-infected ICU patients, Tocilizumab or Baricitinib early on and for long periods delay ICU admissions, as well as glucocorticoid therapy for more than 6 weeks.

Two studies have also attempted to cluster COVID-19 patients into groups. The first study by Han et al. used factor analysis for mixed data (FAMD) ²⁶. What differentiates this study from ours is the addition of patient-experienced symptoms. Their results showed that the patients could be divided into three distinct clusters: Cluster A, the most severe with the longest hospital stays; Cluster B, of intermediate severity COVID-19 with a length of stay as long as Cluster A; and Cluster C, the mildest with the shortest length of stay. Their analysis showed that cluster A had the worst survival rate, whereas cluster B had higher CRP, D-dimer, AST, and LDH levels, indicating a quintessential COVID-19 phenotype. Clusters A and B are thus comparable to our vulnerable and paradoxical patients, respectively. Cluster C had mainly systemic and digestive symptoms and a low frequency of typical symptoms of fever and cough; because of its low severity, it mostly resembles the resilient recoverees. Our study, in comparison, included imaging studies. We were also able to find significant correlation to age. Be that as it may, old age was proven to be associated with adverse outcomes for patients with COVID-19 ²⁷.

Arévalo-Lorido et al., the second study similar to ours, analyzed datasets by applying the Random Forest model and the Gaussian mixed model by clustering ²⁸. The algorithms generated six clusters, the last three of which had high mortality rates from any cause or ended up in intensive care, whereas the first three included patients who did not. The most important comorbidities were heart failure, atrial fibrillation, vascular disease, and neurodegenerative disease, which were mainly present in the last three clusters. The fifth cluster, with the poorest prognosis, included those with liver, kidney, and gastrointestinal diseases, as well as chronic obstructive pulmonary disease. From what has been described, the first three clusters converge on the resilient cluster, and the last three clusters converge on the vulnerable and paradoxical clusters, with cluster 5 in the study being the closest to the paradoxical patients. Contrasted to our study, it did not include data on imaging. Furthermore, KAMILA concisely separated patients into a small number of meaningful clusters, unlike Random Forest and Gaussian mixed model, which resulted in multiple clusters which seem unfathomable.

Our study has significant strengths. Specifically, we recognize the effectiveness of model-based algorithms in clustering mixed data, providing a rationale for this choice. Additionally, we incorporated imaging findings and pinpointed vulnerable age groups, all within an optimal small number of clusters. Finally, the article extends its analysis by investigating the impact of various treatments on four subtypes of patients within each cluster.

We acknowledge our study has some limitations. Firstly, the number of patients decreased significantly due to multiple stratifications, so further analysis with larger populations is recommended. Secondly, patients' symptoms were not taken into account during data collection, which could have made the classification more clinically friendly. Thirdly, the PCT threshold used to consider patients as infected could have been improved by doing serial measures not only for PCT but also other markers in conjunction (e.g. CRP and imaging). Finally, the data collection was performed before vaccination campaigns and when one COVID-19 variant dominated the cases, so it would be interesting to study the effects of vaccination on the classes and the effect of different variants on patients to find a common classification for all COVID-19 strains.

ASMD: Absolute standardized mean difference

ARDS: Acute respiratory distress syndrome

BIC: Bayesian information criterion

CI: Confidence interval

COVID-19: Coronavirus disease 2019

CRN: Creatinine

CRP: C-reactive protein

CT: Computed tomography

CT-PCR: Cyclic value - polymerase chain reaction

HR: Hazard ratio

ICU: Intensive care unit

IL-6: Interleukin-6

KAMILA: K-means of Mixed Large data

k: Number of clusters

LCM: Latent Class Models

LDH: Lactate dehydrogenase

OR: Odds ratio

PCT: Procalcitonin

PMM: Predictive mean matching

SARS-CoV-2: severe acute respiratory syndrome coronavirus 2

VIF: Variance inflation factor

WHO: World Health Organization

Ethical Approval

The present study was conducted in accordance with the principles outlined in the Declaration of Helsinki. Ethical oversight for this study, specifically on the sole use of anonymized patient data, was obtained from the Institutional Review Board at Saint Joseph University affiliated hospital, Hotel-Dieu de France, in Beirut, Lebanon.

Funding Statement

The authors have no relevant financial involvement with any entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript.

Author Contribution

C.H. drafted the manuscript and analyzed the data collected and supplied by M.R. and G.S. G.M. supervised the work and analyzed the outputs. R.S. extensively revised all versions of the article and contributed to the final manuscript.

Acknowledgments

We want to acknowledge the support of our institution and the interests of the contributors who have invested their time and expertise in this project. Moreover, this research has not received any external financial assistance or grants.

Data Availability

The datasets analyzed during the current study are not publicly available due strict regulations imposed by the Hotel Dieu de France Hospital; however, they are available from the corresponding author on reasonable request.

Khoury, P., Azar, E. & Hitti, E. COVID-19 Response in Lebanon: Current Experience and Challenges in a Low-Resource Setting. JAMA 324, 548 (2020).
Baj, J. et al. COVID-19: Specific and Non-Specific Clinical Manifestations and Symptoms: The Current State of Knowledge. J. Clin. Med. 9, 1753 (2020).
Ma, Q. et al. Global Percentage of Asymptomatic SARS-CoV-2 Infections Among the Tested Population and Individuals With Confirmed COVID-19 Diagnosis: A Systematic Review and Meta-analysis. JAMA Netw. Open 4, e2137257 (2021).
Larose, D. T. & Larose, C. D. Clustering. in Data mining and predictive analytics vol. IV 512 (John Wiley & Sons Inc, Hoboken, New Jersey, 2015).
Islam, M., Hasan, M., Wang, X., Germack, H. & Noor-E-Alam, M. A Systematic Review on Healthcare Analytics: Application and Theoretical Perspective of Data Mining. Healthcare 6, 54 (2018).
Pina, A., Macedo, M. P. & Henriques, R. Clustering Clinical Data in R. in Mass Spectrometry Data Analysis in Proteomics (ed. Matthiesen, R.) vol. 2051 309–343 (Springer New York, New York, NY, 2020).
El Hadi, C. et al. Polygenic and Network-based studies in risk identification and demystification of cancer. Expert Rev. Mol. Diagn. 22, 427–438 (2022).
R: The R Project for Statistical Computing. https://www.r-project.org/.
Peterson, R., A. Finding Optimal Normalizing Transformations via bestNormalize. R J. 13, 310 (2021).
Peterson, R. A. & Cavanaugh, J. E. Ordered quantile normalization: a semiparametric transformation built for the cross-validation era. J. Appl. Stat. 47, 2312–2327 (2020).
Buuren, S. van & Groothuis-Oudshoorn, K. mice: Multivariate Imputation by Chained Equations in R. J. Stat. Softw. 45, (2011).
Foss, A. H. & Markatou, M. kamila: Clustering Mixed-Type Data in R and Hadoop. J. Stat. Softw. 83, (2018).
Marbac, M. & Sedki, M. VarSelLCM: an R/C + + package for variable selection in model-based clustering of mixed-data with missing values. Bioinformatics 35, 1255–1257 (2019).
Rousseeuw, P. J. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987).
Harrell, F. E. Evaluating the Yield of Medical Tests. JAMA J. Am. Med. Assoc. 247, 2543 (1982).
Dunn, O. J. Multiple Comparisons among Means. J. Am. Stat. Assoc. 56, 52–64 (1961).
Bonferroni, C. E. Teoria statistica delle classi e calcolo delle probabilità / Carlo E. Bonferroni. Teoria statistica delle classi e calcolo delle probabilità (Seeber, Firenze, 1936).
COVID-19 Treatment Guidelines Panel. Coronavirus Disease 2019 (COVID-19) Treatment Guidelines.
Öztürk, Ş., Özkaya, U. & Barstuğan, M. Classification of Coronavirus (COVID -19) from X‐ray and CT images using shrunken features. Int. J. Imaging Syst. Technol. 31, 5–15 (2021).
Liao, D. et al. Haematological characteristics and risk factors in the classification and prognosis evaluation of COVID-19: a retrospective cohort study. Lancet Haematol. 7, e671–e678 (2020).
CDC. Healthcare Workers. Centers for Disease Control and Prevention https://www.cdc.gov/coronavirus/2019-ncov/hcp/clinical-care/underlyingconditions.html (2020).
Kompaniyets, L. et al. Underlying Medical Conditions and Severe Illness Among 540,667 Adults Hospitalized With COVID-19, March 2020–March 2021. Prev. Chronic. Dis. 18, 210123 (2021).
Qin, R. et al. Identification of Parameters Representative of Immune Dysfunction in Patients with Severe and Fatal COVID-19 Infection: a Systematic Review and Meta-analysis. Clin. Rev. Allergy Immunol. 64, 33–65 (2022).
Jones, S. A. & Hunter, C. A. Is IL-6 a key cytokine target for therapy in COVID-19? Nat. Rev. Immunol. 21, 337–339 (2021).
Dhar, R. et al. Doxycycline for the prevention of progression of COVID-19 to severe disease requiring intensive care unit (ICU) admission: A randomized, controlled, open-label, parallel group trial (DOXPREVENT.ICU). PLOS ONE 18, e0280745 (2023).
Han, L. et al. Exploring the Clinical Characteristics of COVID-19 Clusters Identified Using Factor Analysis of Mixed Data-Based Cluster Analysis. Front. Med. 8, 644724 (2021).
Booth, A. et al. Population risk factors for severe disease and mortality in COVID-19: A global systematic review and meta-analysis. PLOS ONE 16, e0247461 (2021).
Arévalo-Lorido, J. C. et al. The importance of association of comorbidities on COVID-19 outcomes: a machine learning approach. Curr. Med. Res. Opin. 38, 501–510 (2022).

Table 1 to 7 are available in the Supplementary Files section.

No competing interests reported.

Download PDF

Editorial decision: Revision requested
18 Sep, 2024
Reviews received at journal
11 Sep, 2024
Reviewers agreed at journal
26 Aug, 2024
Reviews received at journal
24 Jul, 2024
Reviewers agreed at journal
07 Jul, 2024
Reviewers invited by journal
27 May, 2024
Editor assigned by journal
27 May, 2024
Editor invited by journal
08 May, 2024
Submission checks completed at journal
08 May, 2024
First submitted to journal
04 May, 2024

You are reading this latest preprint version

Machine Learning for COVID-19 Patient Management: Predictive Analytics and Decision Support

Status:

Version 1

Abstract

1. Background

2. Methods

2.1. Data collection

2.2. Data pre-processing and Clustering

2.3. Cluster Analysis

2.4. Statistical analysis

3. Results

3.1. Comparison of characteristics

3.2. Survival analysis

3.3. Regression analysis

4. Discussion

Abbreviations

Declarations

Ethical Approval

Funding Statement

Author Contribution

Acknowledgments

Data Availability

References

Tables

Additional Declarations

Supplementary Files

Status:

Version 1