Over the past three years, SARS-COV-2 has spread globally manifesting itself in different clinical presentations ranging from a fleeting flu to critical illness 18. Few studies have linked pathophysiologic findings to identify patient patterns for personalized treatments 19,20. Cluster analysis is a promising approach to categorizing patients, and our study classified patients based on medical history, biochemistry, and radiology. KAMILA algorithm produced the best results 12, identifying three distinct patient clusters with unique characteristics. This approach challenges the most updated treatment guidelines that rely on univariate patient data that has not been interpreted using a multivariate approach, such as clustering. Specifically, it challenges the COVID-19 clinical spectrum outlined in the frequently updated NIH treatment guidelines 18. As an overview, adults were considered to have asymptomatic or pre-symptomatic infection; mild, symptomatic illness without stigmata of pneumonia; moderate illness, with signs of lower respiratory tract disease but SpO2 ≥ 94% on room air; severe illness, including patients with SpO2 < 94% on room air who are not in shock or respiratory and organ failure; and critical illness with system failure. This classification is primarily based on oxygen saturation, a measurement that can be inconclusive, particularly in people aged ≥ 50 years with high-risk comorbidities and severe outcomes. NIH recommendations on oximetry interpretation favor consideration of the patient's overall clinical presentation and history. That said, it is essential to be proactive and detect and therefore treat those having risky histories so as to avoid admissions to the ICU, invasive mechanical ventilation, and death. CDC researchers reviewed the risk factors that favor COVID-19 progression into severe statuses 21, which have been taken into account in the therapeutic management of hospitalized patients as per the NIH recommendations 22. Our study aims to refine these predictive criteria through clustering.
To illustrate, the resilient recoverees, the largest cluster having the fastest recovery and lowest mortality, can be considered a moderate-to-severe COVID-19 group. They include middle aged fit patients, with markedly few cardiovascular risk factors, e.g. hypertension. They can benefit from minimal (e.g. < 8 L O2) to no therapeutic approaches considering the low mortality burden. Laboratory results were roughly normal with no markers of severe COVID-19 23. It's worth noting that IL-6 levels in this cluster were the lowest, but the lack of significant difference from other clusters is unreliable due to the infrequency of IL-6 measurements. The use of Tocilizumab may accelerate the discharge of non-ICU patients with no bacterial superinfections, but its cost-effectiveness and priority in this cluster should be minded.
As for the vulnerable veterans, they mostly include elderly men who have multiple risk factors and multiple comorbidities. They are at higher risk of severe to critical COVID-19 with more hemorrhagic events, superinfections and a higher mortality rate, most likely favored by their comorbidities. In addition, these patients had the largest pulmonary artery and the highest serum creatinine and procalcitonin suggesting the presence of pulmonary hypertension and the prevalence of superinfections. That said, superinfected ICU patients appeared to benefit from a ~ 3 week course of glucocorticoid therapy, which aligns with the NIH recommendations 18.
The paradoxical patient cluster, predominantly middle to old age men with nearly twice commoner hypertensive and diabetic patients than resilient recoverees, presenting 1 to 2 weeks late with more thromboembolic events. Despite severe to critical COVID-19 classification by NIH guidelines, their mortality rates mirror resilient recoverees. Paradoxically, if one were to consider merely their history, they would risk misclassifying the cluster as resilient. This cluster experiences prolonged stays akin to vulnerable veterans, similarly demanding ICU admissions and intubations, but requiring noticeably greater high-flow O2 supplementation. Glass opacities on CT scans predominate and are associated with elevated COVID-19 severity biomarkers 23, notably lymphopenia but also CRP, ferritin and LDH alluding to a “cytokine storm”. This excessive, uncontrolled response contribute to tissue damage, organ failure, and heightened mortality 24. Ancillary should aid in distinguishing resilient from paradoxical patients, emphasizing the impact of the virus itself. For non-superinfected regular wards patients, recent literature echoed our conclusions on the effectiveness of prone positioning 18 and doxycycline therapy 25, particularly in hastening discharge. More, it agrees on the uselessness of anti-platelet therapy in this group 18. As for non-infected ICU patients, Tocilizumab or Baricitinib early on and for long periods delay ICU admissions, as well as glucocorticoid therapy for more than 6 weeks.
Two studies have also attempted to cluster COVID-19 patients into groups. The first study by Han et al. used factor analysis for mixed data (FAMD) 26. What differentiates this study from ours is the addition of patient-experienced symptoms. Their results showed that the patients could be divided into three distinct clusters: Cluster A, the most severe with the longest hospital stays; Cluster B, of intermediate severity COVID-19 with a length of stay as long as Cluster A; and Cluster C, the mildest with the shortest length of stay. Their analysis showed that cluster A had the worst survival rate, whereas cluster B had higher CRP, D-dimer, AST, and LDH levels, indicating a quintessential COVID-19 phenotype. Clusters A and B are thus comparable to our vulnerable and paradoxical patients, respectively. Cluster C had mainly systemic and digestive symptoms and a low frequency of typical symptoms of fever and cough; because of its low severity, it mostly resembles the resilient recoverees. Our study, in comparison, included imaging studies. We were also able to find significant correlation to age. Be that as it may, old age was proven to be associated with adverse outcomes for patients with COVID-19 27.
Arévalo-Lorido et al., the second study similar to ours, analyzed datasets by applying the Random Forest model and the Gaussian mixed model by clustering 28. The algorithms generated six clusters, the last three of which had high mortality rates from any cause or ended up in intensive care, whereas the first three included patients who did not. The most important comorbidities were heart failure, atrial fibrillation, vascular disease, and neurodegenerative disease, which were mainly present in the last three clusters. The fifth cluster, with the poorest prognosis, included those with liver, kidney, and gastrointestinal diseases, as well as chronic obstructive pulmonary disease. From what has been described, the first three clusters converge on the resilient cluster, and the last three clusters converge on the vulnerable and paradoxical clusters, with cluster 5 in the study being the closest to the paradoxical patients. Contrasted to our study, it did not include data on imaging. Furthermore, KAMILA concisely separated patients into a small number of meaningful clusters, unlike Random Forest and Gaussian mixed model, which resulted in multiple clusters which seem unfathomable.
Our study has significant strengths. Specifically, we recognize the effectiveness of model-based algorithms in clustering mixed data, providing a rationale for this choice. Additionally, we incorporated imaging findings and pinpointed vulnerable age groups, all within an optimal small number of clusters. Finally, the article extends its analysis by investigating the impact of various treatments on four subtypes of patients within each cluster.
We acknowledge our study has some limitations. Firstly, the number of patients decreased significantly due to multiple stratifications, so further analysis with larger populations is recommended. Secondly, patients' symptoms were not taken into account during data collection, which could have made the classification more clinically friendly. Thirdly, the PCT threshold used to consider patients as infected could have been improved by doing serial measures not only for PCT but also other markers in conjunction (e.g. CRP and imaging). Finally, the data collection was performed before vaccination campaigns and when one COVID-19 variant dominated the cases, so it would be interesting to study the effects of vaccination on the classes and the effect of different variants on patients to find a common classification for all COVID-19 strains.