Derivation and Validation of Generalized Sepsis-induced Acute Respiratory Failure Phenotypes Among Critically Ill Patients: A Retrospective Study

Background: Septic patients who develop acute respiratory failure (ARF) requiring mechanical ventilation represent a heterogenous subgroup of critically ill patients with widely variable clinical characteristics. Identifying distinct phenotypes of these patients may reveal insights about the broader heterogeneity in the clinical course of sepsis. We aimed to derive novel phenotypes of sepsis-induced ARF using observational clinical data and investigate their generalizability across multi-ICU specialties, considering multi-organ dynamics. Methods: We performed a multi-center retrospective study of ICU patients with sepsis who required mechanical ventilation for ≥24 hours. Data from two different high-volume academic hospital systems were used as a derivation set with N=3,225 medical ICU (MICU) patients and a validation set with N=848 MICU patients. For the multi-ICU validation, we utilized retrospective data from two surgical ICUs at the same hospitals (N=1,577). Clinical data from 24 hours preceding intubation was used to derive distinct phenotypes using an explainable machine learning-based clustering model interpreted by clinical experts. Results: Four distinct ARF phenotypes were identified: A (severe multi-organ dysfunction (MOD) with a high likelihood of kidney injury and heart failure), B (severe hypoxemic respiratory failure [median P/F=123]), C (mild hypoxia [median P/F=240]), and D (severe MOD with a high likelihood of hepatic injury, coagulopathy, and lactic acidosis). Patients in each phenotype showed differences in clinical course and mortality rates despite similarities in demographics and admission co-morbidities. The phenotypes were reproduced in external validation utilizing an external MICU from second hospital and SICUs from both centers. Kaplan-Meier analysis showed significant difference in 28-day mortality across the phenotypes (p<0.01) and consistent across both centers. The phenotypes demonstrated differences in treatment effects associated with high positive end-expiratory pressure (PEEP) strategy. Conclusion: The phenotypes demonstrated unique patterns of organ injury and differences in clinical outcomes, which may help inform future research and clinical trial design for tailored management strategies.


Introduction
Sepsis is a heterogeneous syndrome that is characterized by life-threatening organ dysfunction due to a dysregulated host response to infection [1].Despite advances in our knowledge and improvements in management strategies, sepsis continues to be one of the leading causes of death worldwide and remains a serious medical emergency [2,3].Patients with sepsis who develop acute respiratory failure (ARF) requiring mechanical ventilation represent a unique and complex subgroup [4][5][6].ARF is one of the most common complications in sepsis and one of the strongest risk factors for mortality.This subgroup of patients also has heterogeneous risk factors, etiologies, pathophysiology, and immunopathogenic responses that contribute to ARF, as well as divergent clinical courses and outcomes [7].A typical manifestation of ARF in patients with sepsis is the acute respiratory distress syndrome (ARDS).In addition to ARDS, these patients also frequently develop other extra-pulmonary organ dysfunctions that lead to increased complications and high mortality.
Clinically recognizable phenomena that are widely observed in sepsis and ARF, such as vital sign abnormalities (dyspnea, hypotension, tachypnea, oxygen desaturation, etc.) and laboratory abnormalities (lactic acidosis, hypoxemia and/or hypercapnia on arterial blood gas, etc.), are only super cial representations of complex pathophysiological and environmental interactions.Furthermore, they do not provide speci c information regarding the heterogeneous clinical trajectories and outcomes, which contribute to the ongoing challenges in developing targeted management strategies and improving outcomes in sepsis-induced ARF.However, there may be subtle patterns of physiologic data and clinical features that may be unclear to clinicians at the bedside but are uncovered with the aid of machine learning (ML) models.The ML models can help recognize these patterns as unique phenotypes within heterogeneous syndromes such as sepsis-induced ARF.While latent class analysis (LCA) techniques have been used to derive hyper-and hypo-in ammatory phenotypes in ARDS with potential differences in treatment responses, these phenotypes only apply to a speci c subset of patients with a key manifestation of ARF.In addition to ARDS, there is a need to investigate other organ dysfunctions that are related to sepsis-induced ARF.Moreover, there is a need to identify generalizable phenotypes among patients with sepsis to elucidate possible mechanisms of complex clinical courses and targetable features pertaining to certain phenotypes.
We sought to utilize pre-intubation clinical data to develop generalizable phenotypes that would correlate with not only the risk of developing ARDS as an endpoint, but also to investigate more complex multi-organ failure trajectories observed within this cohort [8].We further sought to compare the derived generalized sepsis-induced ARF phenotypes against the binarized hyper-and hypoin ammatory subphenotypes to characterize potential overlaps between these approaches.To understand the latent phenotypes in critically ill patients with sepsis-induced ARF, we developed a multi-phased unsupervised ML model to systematically identify novel phenotypes using multi-variable data collected from electronic medical records (EMR).

Study Design
This is a multi-center retrospective cohort study conducted at two high volume academic hospitals located in the southeastern United States.Adult patients (≥ 18 years) admitted to the medical or surgical intensive care units (MICU or SICU) at either of these two metropolitan academic hospitals, Emory University Hospital and Grady Memorial Hospital (Atlanta, GA) with sepsis [based on the sepsis-3 criteria [1]] between 2016-2021 and developed ARF during their hospital admission were included [9].Emory is a quaternary care hospital specializing in the care of adult critically ill patients, whereas Grady is known as a safety-net hospital.ARF was de ned as requiring ≥ 24 hours of invasive mechanical ventilation (IMV) for medical ICU patients.For surgical ICU patients, it is de ned as ≥ 24 hours of IMV even after 48 hours from surgery.

Participants
IMV patients included in our cohort were adult patients admitted to the hospital who were diagnosed with sepsis and during their hospital course required mechanical ventilation as well as adult surgical patients whose post-operative course was complicated, requiring post-surgical IMV lasting at least 24 hours.Here, post-surgical IMV refers to initial IMV, re-ventilation or remaining in IMV state after 48th hour from the surgery completion.Due to our interests in identifying the phenotypes early in the course of sepsisinduced ARF, we utilized up to 24 hours of data preceding the time of index IMV (or post-surgical IMV) in the study.Data collected from the EMR including laboratory values and vital signs, were used for phenotyping.A complete list of these clinical factors or features can be found in Table E1 in the online data supplement.All factors represent clinical values that were routinely collected and recorded in the EMR.A set of demographic variables (e.g., age, sex, race, ethnicity), mortality, and comorbidity information were also included for further analysis of derived phenotypes.We created two separate datasets corresponding to MICU and SICU patients.We de ned index time of IMV in the MICU dataset as the time at which the rst mechanical ventilation parameters (positive end-expiratory pressure (PEEP), tidal volume, and/or plateau pressure) were recorded in the EMR in patients who met the above inclusion criteria; while for SICU dataset, it is the time of rst ventilation parameters recorded from 48-hour mark after the surgery.
We excluded patients who did not meet sepsis-3 criteria, patients admitted to neurological ICUs, patients admitted to the ICU postoperatively who were ventilated only for ≤ 24 hours after 48 hours from their surgeries, or those whose EMR data did not include any hourly collected physiological data up to 24 hours prior to IMV.To derive enriched phenotypes for sepsis-induced ARF, we developed a high-delity unsupervised ML-based approach that incorporates a broad set of routinely collected clinical variables.The overall study pipeline is shown in Fig. 1.

Procedure
We adopted a multi-center derivation and validation study design by rst deriving phenotypes using medical ICU (MICU) data from Emory University Hospital, and then validating this phenotyping algorithm against MICU data collected from Grady Memorial Hospital.The phenotyping was further validated with surgical ICU (SICU) data from both the hospitals.The two hospitals serve unique and diverse patient populations located within the metropolitan southeast United States.Data used to derive and validate our algorithm were collected from the same years across the two hospitals.
We applied median aggregation to the data of our cohort, across the pre-ventilation time-window, for all routinely collected clinical features.We dropped features having more than 85% of missing values in the aggregated data.Subsequently, we used multivariate imputation by chained equations (MICE) algorithm on the training data [10] to impute missing data and correct outliers.Finally, a total of 50 robust and independent clinical features were selected after a Pearson's correlation analysis, also listed in the online data supplemental Table E1.We then normalized the data and used Uniform Manifold Approximation and Projection (UMAP) method to reduce dimensionality of the multivariate dataset and project onto a two-dimensions [11].Finally, we used a k-means (centroidbased) clustering algorithm that yielded four clusters.The derived clusters were analyzed for their most important features using SHapley Additive exPlanations (SHAP) values [12].The derived phenotypes were then examined and interpreted by physicians P.Y., C.M.D. and C.M.C.

Validation of the Phenotypes in External Dataset
We trained a logistic regression (LR) ML model for phenotype prediction on the derivation data with high accuracy.This model outperformed other ML models such as random forests, support vector machines (SVM) and Gaussian Naïve Bayes classi er (see Figures E4 and E5 in the online data supplement).Finally, we applied this trained classi er model to the validation datasets.The purpose was to evaluate the robustness of the classi er in identifying similar phenotypes in 'unseen' and external data from a different medical center.

Validation of the Sepsis-induced ARF Phenotypes Against Hyper/Hypo In ammatory ARDS Subtypes
To investigate how the contributed phenotypes compare to the existing work in the eld, we sought to compare the proposed sepsisinduced ARF phenotypes to the ARDS hyper-and hypo-in ammatory subtypes.Calfee and colleagues investigated clinical and biological data to elucidate two phenotypes in various ARDS cohorts, namely hyper-and hypoin ammatory [13][14][15].In a recent extension of these approaches, Sinha et al. proposed a clinical only-model that provided robust discrimination of the ARDS cohort into the two relevant subtypes [13].However, due to the unavailability of the pre-trained models and labeled data contributed by those groups, we were unable to directly evaluate the classi er against our own data, thus we developed a pragmatic alternative, which utilized the value-intervals of their class-de ning features for further heuristic characterization.

Statistical Analysis
Statistical analyses were performed using python libraries.Patient characteristics and endotype factors that represent continuous variables were analyzed using a Kruskal-Wallis test.Categorical variables were analyzed using a Chi-squared test.A multivariate log rank test was performed when comparing multiple variables and a p-value of ≤ 0.05 was used for statistical signi cance.We report the speci c statistical dichotomization of the hyper and hypo-in ammatory subtypes based on the ranges of each variable as provided by Sinha et al. in the supplemental documents S2-S6 [13,16].

Results
Patient Characteristics: In this retrospective study, a total of 3349 encounters from 3225 unique patients admitted to MICU at Emory University Hospital (Atlanta, GA) were selected from the derivation data for initial phenotyping.The cohort in our study consists of patients across a wide range of demographic variables, such as age (mean: 62.3 ± 15.3 years), sex (male: 53.1%), and race (Caucasian: 42.2%, African American: 47.4%).For validating our phenotyping algorithm, 867 encounters from N = 848 unique and diverse patients were selected from the MICU of Grady Memorial Hospital (Atlanta, GA).Characteristics of these cohorts are described in Tables 1 and 2  were analyzed from the pre-intubation window for each phenotype.The maximum method was used for their aggregation.
Supplemental Table E4 presents SOFA score-maps for Emory MICU data, where the ndings clearly align with our phenotype characterizations.
Boxplots were drawn to illustrate the variabilities in certain prominent features such as creatinine (renal), total bilirubin (hepatic), P/F ratio and FiO 2 (respiratory), BNP (cardiac), and platelets (coagulopathy), as shown in Figure 3 (logistic regression) classi er on the derivation dataset to predict the corresponding phenotype.Thereafter, we employed the trained model on the validation dataset to determine the phenotype for each patient encounter.We summarize the phenotype validation results in Table 2.These results that most features across the four phenotypes remain consistent in the validation dataset, highlighting the reliability and generalizability of our phenotyping approach.
For further analysis of the phenotypes, Figure E1 in the online data supplement shows radar diagrams illustrating average variations of all clinical feature values across four formed phenotypes of the derivation and validation data, where all features are normalized in the range 0-1.Additionally, radar diagrams in Figure E2 in the online data supplement presents distributions of demographic variables and mortality outcomes across various phenotypes of the derivation and validation data.Additionally, our phenotyping approach was also validated on SICU cohorts of both Emory and Grady hospitals.Their phenotyping results are listed in supplemental Tables E2 and E3.We also analyzed aggregated pre-intubated individual SOFA for these datasets, and the results are listed in Supplemental Tables E5-E7.They show consistency in earlier results obtained from the derivation data.
Short-term Survival Analysis: Trajectory of short-term outcomes can provide a better differentiation among phenotypes.For a 28-d short-term analysis, average vent-free days (VFD) were found as 10.4, 8.6, 15.4 and 8.5, respectively for phenotypes A to D of the derivation set.To evaluate the survival probability of patients in each phenotype, we plotted Kaplan-Meier curves [17] for a 28-day period following intubation, as shown in Figure 4.The analysis was performed for derivation and validation datasets, where survival traces of phenotype D of Emory SICU (N=12) and B of Grady SICU (N=5) were omitted here due to having their small sample sizes.
We observed that the mortality trends across various phenotypes were consistent for MICU and SICU of both centers (p-value for trend < 0.001), with phenotype C having the best survival followed by A, and phenotypes B and D having the poor survival rates in both centers.This suggests that our phenotyping approach is generalizable in identifying the least and the most critical phenotypes in terms of short-term survival for ARF patients with different demographic characteristics.
Exploratory Analyses of Clinical Differences among the Phenotypes: We also performed exploratory analyses to examine whether the phenotypes would demonstrate different outcomes or clinical patterns in relation to high PEEP (PEEP ≥ 10) treatments.Within phenotypes, 16.7% in A, 49.6% in B, 24% in C, and 16.2% in D were administered with PEEP ≥ 10 regime on mechanical ventilator.We conducted an analysis to estimate the effects of high PEEP (PEEP ≥ 10) using a propensity score matching scheme on 28-day short-term mortality, by considering lab-values and demographics as confounding variables.Average treatment effects (ATE) with 95% con dence intervals were obtained as 0.04 (-0.08, 0.16) for A, -0.05 (-0.13, 0.02) for B, 0.07 (-0.01, 0.15) for C, and -0.07 (-0.19, 0.05) for D. A negative ATE suggests reduced mortality outcomes for the treated group.We also plotted Kaplan-Meier curves between patients who received high PEEP and those who did not within each of the phenotypes.
In phenotype B with severe hypoxic respiratory failure, higher PEEP (≥10) was associated with signi cantly better survival than lower PEEP (<10), but the opposite association was seen in phenotype C.Among both MOD phenotypes, higher PEEP was found effective for D, whereas it was ineffective for A. However, the PEEP strategy was not signi cantly associated with survival in both these phenotypes (Figure 5).We must emphasize that this analysis was purely exploratory in nature and was carried out to examine the feasibility of further research on the treatment effects of various therapies.
Validation of the phenotypes against the Hyper/Hypo In ammatory Phenotypes: We further sought to investigate how the phenotypes derived from the results above compared to the binarized phenotypes, namely the hyperin ammatory and hypoin ammatory phenotypes.[13,14]

Discussion
In this study, we performed UMAP projection and unsupervised clustering to derive novel phenotypes of sepsis-induced ARF.The phenotypes derived using early clinical data from the pre-intubation phase of sepsis-induced ARF not only demonstrated unique patterns of organ injury, but also correlated with differences in mortality and exhibited potential differences in outcomes in relation to High vs. Low PEEP strategy.Furthermore, the characteristics of the phenotypes remained consistent in the validation datasets.
The performance was evaluated on comprehensive datasets including rich and diverse cohorts of patients with varying comorbidities across different demographics.[20].The validation was done by comparing the output of the phenotyping algorithm to a manual review, and the common causes of misclassi cation were noted [21].Unlike our work, this study does not characterize various degrees of MOD associated with ARF and the severity of such outcomes.It only characterizes patients based on the sequence of different respiratory support they received.The set of features analyzed is also limited; in contrast, this work considers an expanded set of clinical features for phenotyping.While many of these studies identify phenotypes in sepsis at large [22][23][24][25][26], they have not examined phenotypes that may exist speci cally within sepsis-induced ARF.Similarly, LCA-derived phenotypes of ARDS have been well-studied, but do not directly apply to ventilated patients with sepsis who do not satisfy the Berlin de nition of ARDS.
Although previously studies have highlighted the limitations in using EMR for phenotyping such as complex, inaccurate, and missing data problems [27,28], our approach uses a wide range of clinical features from EMR data and has been shown to work e ciently in both derivation and validation datasets.Our study phenotypes show results consistent with the corresponding MODs.For example, the high mortality of phenotype B (severe hypoxic respiratory failure) is 51%, which is close to the numbers reported (34-46%) in previous studies [3].Our phenotyping algorithm from EMR data with high richness and freedom from bias, have been shown to be generalizable and consistent across multiple patient groups from different hospitals, also suggested by prior studies [29,30].
The unique clinical characteristics of the derived phenotypes and the results of our exploratory analyses are highly informative and can be hypothesis-generating for future research.For example, phenotype A and D appear to suffer from multi-system organ failure Figure 2 using a 2-D Uniform Manifold Approximation and Projection (UMAP) representing formed clusters and variations of important clinical features across these distributions.SHAP values were used to identify the important features that distinguished one cluster from another.SHAP plots are available in Figure E3 in the online data supplement.Subsequently, critical care physician experts helped interpret and characterize the four clusters as phenotypes based on their characteristics.

Figures Figure 1
Figures

Table 1
Summary of patient characteristics of the derivation cohort (Emory MICU) and its phenotypes.For clinical variables, this table lists the medians and interquartile ranges (IQR: Q1-Q2) for each phenotype as well as for the whole cohort.The p-value is also provided for each variable to indicate the statistical signi cance of the differences among the phenotypes.For evaluating statistical signi cance, Kruskal-Wallis test was performed for continuous variables and Chi-squared test was used for categorical variables.*Mortality was computed with respect to patients (not encounters).Abbreviations used -count: total encounters, mean: average, std: standard deviation, m: median, IQR: interquartile range, PaO 2 : partial pressure of oxygen, SpO 2 : peripheral oxygen saturation level, FiO 2 : fraction of inspired oxygen, P/F: PaO 2 /FiO 2 ratio, S/F: SpO 2 /FiO 2 ratio, PaCO 2 : partial pressure of carbon dioxide in arterial blood, MAP: mean arterial blood pressure, Resp.: respiration, BNP: B-type natriuretic peptide, BUN: blood urea nitrogen, SOFA: sequential organ failure assessment, GCS: Glasgow coma scale.Measurement units -P/F ratio, PaO2, PaCO2, and MAP: mmHg; S/F ratio and FiO 2 : unitless; creatinine and bilirubin total: mg/dL; albumin: g/L; lactic acid: mmol/L; D-dimer: ng/mL; platelets: ×10 3 /µL; hemoglobin: g/dL; BNP: pg/mL; BUN: mg/dL.

Table 1
summarizes the clinical and demographic variables for each of the four derived ARF phenotypes along with their mortality .43-4.67 mmol/L) suggesting multi-system organ dysfunction.As such, we named this phenotype D (MOD-2) with severe MOD showing a high likelihood of hepatic injury, coagulopathy and lactic acidosis.From Table1, we observed that phenotype B has the highest mortality (51%), followed by phenotype D (49.6%) and phenotype A (40.9%).The relatively healthier phenotype B had a mortality of 21.4%.Phenotype D is also characterized by the highest proportion of patients with septic shock (n=651 patient encounters (79.5%), whereas C consists of the lowest proportion of septic-shock patients (450 encounters, 45.3%).Thus, our phenotypes not only identi ed distinct patterns of organ injury in patients with sepsis-induced ARF, but also different rates of mortality and septic-shock distributions.To con rm more insights of MOD pro les in phenotypes, a set of all six individual SOFA phenotype (N=689 patients) consists of patients with severe hypoxia and clinical characteristics suggestive of non-radiographic features of severe ARDS (low partial pressure of oxygen (PaO 2 ) to fraction of inspired oxygen (FiO 2 ) ratio [P/F ratio] [median: 123, IQR: 90-185 mmHg] and high FiO 2 [mean: 0.8, IQR: 0.6-1]) and has the highest mortality (51%).We called it phenotype B (severe hypoxemic respiratory failure).The third phenotype (N=959 patients) consists of patients with no evidence of organ failure other than mild hypoxia (median P/F ratio: 240 [95% CI: 185, 317.7]) and normal lactic acid levels (median: 1.42 mmol/L).We called it phenotype C (mild hypoxia).The fourth phenotype (N=806) consists of ARF patients with highest total bilirubin (median: 1.2, IQR: 0.6-4.1 mg/dL) and highest D-dimer levels, lowest platelets (median: 118, IQR: 53-202.8×10 3 /µL) and highest lactic acid (median: 2.35, IQR: 1

Table 2 :
Summary of patient characteristics of the validation cohort (Grady MICU) and its phenotypes.For evaluating statistical signi cance, Kruskal-Wallis test was performed for continuous variables and Chi-squared test was used for categorical variables.*Mortality was computed with respect to patients (not encounters).Abbreviations used -count: total encounters, mean: average, std: standard deviation, m: median, IQR: interquartile range, PaO 2 : partial pressure of oxygen, SpO 2 : peripheral oxygen saturation level, FiO 2 : fraction of inspired oxygen, P/F: PaO 2 /FiO 2 ratio, S/F: SpO 2 /FiO 2 ratio, PaCO 2 : partial pressure of carbon dioxide in arterial blood, MAP: mean arterial blood pressure, Resp.: respiration, BNP: B-type natriuretic peptide, BUN: blood urea nitrogen, SOFA: sequential organ failure assessment, GCS: Glasgow coma scale.Measurement units -P/F ratio, PaO2, PaCO2, and MAP: mmHg; S/F ratio and FiO 2 unitless; creatinine and bilirubin total: mg/dL; albumin: g/L; lactic acid: mmol/L; D-dimer: ng/mL; platelets: ×10 3 /µL; hemoglobin: g/dL; BNP: pg/mL; BUN: mg/dL.External and Multi-specialty Validation of Sepsis-induced ARF Phenotypes: To validate our phenotyping algorithm, we utilized an external hospital's MICU cohort from Grady Memorial Hospital, Atlanta, GA.Our methodology involves training a supervised learning [13,14]aring the clinical values reported, our results suggested that patients in phenotype A (MOD-1) and D (MOD-2) were most likely associated with hyperin ammation characterized by high values of total bilirubin (mean A:1.4,D:4.8mg/dL)andcreatinine(mean A:4.3, D:2 mg/dL), and low values of platelet count (mean A:191, D:147 ×10 3 /µL), bicarbonate (mean A:22.4,D:20.7 mmol/L), PaCO 2 (mean A:38.6,D:34.2mmHg) and hemoglobin (mean A:9.2, D:9 g/dL), which were consistent with the values of these markers in the hyperin ammatory subphenotype from the previous works[13,14].Phenotype B with severe hypoxemic respiratory failure demonstrated features that were consistent with neither hyper-nor hypo-in ammatory phenotype, suggesting that this phenotype could either consist of a mix of both phenotypes or represent a completely novel phenotype.On the contrary, patients in C were associated with hypoin ammatory characteristics with lowest values of total bilirubin When evaluating the derivation strategy independently during the 2020-2021 data, we found that they were consistent with that of pre-COVID-19 years, without signi cant variance in the distribution of the phenotypes.Relevant details on the sensitivity analyses are available in FigureE6in the online data supplement.