Machine learning driven identification of gene-expression signatures correlated with multiple organ dysfunction trajectories and complex sub-endotypes of pediatric septic shock

doi:10.21203/rs.3.rs-2093663/v1

Background

Multiple organ dysfunction syndrome (MODS) disproportionately drives sepsis morbidity and mortality among children. The biology of this heterogeneous syndrome is complex, dynamic, and incompletely understood. Gene expression signatures correlated with MODS trajectories may facilitate identification of molecular targets and predictive enrichment.

Methods

Secondary analyses of publicly available datasets. (1) Supervised machine learning (ML) was used to identify genes correlated with persistent MODS relative to those without in the derivation cohort. Model performances were tested across 4 validation cohorts, among children and adults with differing inciting cause for organ dysfunctions, to identify a stable set of genes and fixed classification model to reliably estimate the risk of MODS. Clinical propensity scores, where available, were used to enhance model performance. (2) We identified organ-specific dysfunction signatures by eliminating redundancies between the shared MODS signature and those of individual organ dysfunctions. (3) Finally, novel patient subclasses were identified through unsupervised hierarchical clustering of genes correlated with persistent MODS and compared with previously established pediatric septic shock endotypes.

Results

568 genes were differentially expressed, among which ML identified 109 genes that were consistently correlated with persistent MODS. The AUROC of a model that incorporated the stable features chosen from repeated cross-validation experiments to estimate risk of MODS was 0.87 (95% CI: 0.85–0.88). Model performance using the top 20 genes and an ExtraTree classification model yielded AUROCs ranging 0.77–0.96 among validation cohorts. Genes correlated with day 3 and 7 cardiovascular, respiratory, and renal dysfunctions were identified. Finally, the top 50 genes were used to discover four novel subclasses, of which patients belonging to M1 and M2 had the worst clinical outcomes. Reactome pathway analyses revealed a potential role of transcription factor RUNX1 in distinguishing subclasses. Interaction with receipt of adjuvant steroids suggested that newly derived M1 and M2 endotypes were biologically distinct relative to established endotypes.

Conclusions

Our data suggest the existence of complex sub-endotypes among children with septic shock wherein overlapping biological pathways may be linked to differential response to therapies. Future studies in cohorts enriched for patients with MODS may facilitate discovery and development of disease modifying therapies for subsets of critically ill children with sepsis.

Sepsis

Septic shock

Multiple Organ Dysfunction Syndrome

Pediatric

Precision Medicine

Predictive Enrichment

Machine Learning

Artificial Intelligence.

Multiple organ dysfunction syndrome (MODS) is a major cause for mortality among children admitted to intensive care units with septic shock [1]. Those who survive the acute phase remain at high risk of new morbidity and technology dependence [2], nosocomial infections [3], and late death [4]. Despite the significant burden of disease, care for patients with MODS remains limited to antibiotics and organ support, with no disease modifying therapies currently proven to improve clinical outcomes. The pathophysiology of MODS is complex, dynamic, and incompletely understood [5,6]. It follows that elucidating the biological underpinnings of MODS may facilitate the discovery and deployment of targeted therapies.

Biologic heterogeneity in sepsis has hindered therapeutic development [7]. Precision medicine approaches offer promising solutions to address the underlying heterogeneity [8]. Predictive enrichment, which involves identification of patient subclasses based on shared biological pathways that may be amenable to intervention [9], is one such approach. Over a decade ago, Dr. Wong and colleagues, first identified patient subclasses through whole blood genome-wide expression profiling of children with septic shock [10,11]. Among ~ 7,000 differentially regulated genes, 100 subclass defining genes, which corresponded to the adaptive immune system and glucocorticoid receptor signaling, were identified. Subsequently, exposure to adjuvant steroid therapy among endotype A patients was shown to be independently associated with an increased risk of sepsis mortality [12–14].

Response of patient endotypes to corticosteroids will be tested in the Stress Hydrocortisone in Pediatric Septic Shock trial (SHIPPS, NCT03401398), and may demonstrate feasibility of achieving predictive enrichment among children with septic shock. An important caveat, however, is that current endotyping strategies rely on either distinguishing the septic shock signature relative to patients with systemic inflammatory response syndrome (SIRS) and healthy controls [10–12] or unsupervised machine learning in which patients are grouped based on similarities across multiple dimensions of gene-expression data [15–19], and have largely focused on mortality as an outcome. Given the significant underlying heterogeneity associated with MODS, predictive enrichment explicitly focused on distinguishing organ dysfunction trajectories may unravel the underlying biology and advance patient endotyping.

We have previously used machine learning (ML) to determined gene-expression signatures correlated with the static endpoint of complicated course [20]. Here, we sought to determine whether ML could facilitate identification of gene-expression signatures and endotypes correlated with multiple organ dysfunction trajectories among children with sepsis.

Human subjects research was conducted as approved by the institutional review boards of recruiting sites and in accordance with ethical principles laid forth in the Declaration of Helsinki, as previously detailed [10, 20,11]. Only de-identified clinical data and publicly available gene-expression datasets were used for the purposes of this study. Phenotyping data of organ dysfunction trajectories based on clinical and laboratory data available between day 1 through 7 of pediatric intensive care unit (PICU) admission was available in the derivation cohort, as previously detailed [22]. The primary comparison of interest was persistent MODS (death by day 7, persistence of ≥ 2 organ dysfunctions on day 7, or new MODS between days 1-7), relative to those with MODS resolution (≥ 2 organ dysfunctions on day 1 or 3 with < 2 dysfunctions by day 3 and 7 respectively), and septic, non-septic patients with SIRS, and healthy controls. Our choice for this outcome was guided by the fact that patients with death or persistent organ dysfunctions, despite intensive organ support, are likely to represent a subset of patients who may benefit from targeted therapeutic approaches based on their underlying biological predisposition. We conducted analyses with and without inclusion of patients who died within the first 7 days, to test the premise that non-survivors may have a different expression signature relative to survivors with persistent organ dysfunctions. Secondary comparisons focused on those with and without cardiovascular, respiratory, and kidney dysfunction on day 3 and 7 respectively.

Statistical analysis: Demographic and clinical data were summarized with counts and percentages or medians with interquartile ranges. Differences between groups were determined by χ² test for categorical variables and by one-way analysis of variance (ANOVA) for continuous variables. Dunn’s test was used for account for multiple comparisons testing, where applicable. Log rank test was used to compare 28-day survival among patient subclasses. Logistic regression analyses were used to test the association between MODS endotypes, receipt of adjuvant steroids, and 28-day mortality and occurrence of MODS. All models included the interaction variable between endotype X receipt of steroids. A p-value of 0.05 was used to test statistical significance for all analyses.

Propensity matching: Given the substantial demographic heterogeneity, we generated a propensity score for each patient to account for the confounding influence of age and illness severity, as determined by the PRISM III score [23] for the risk of MODS. We randomly imputed values of 0, 3, and 5 for PRISM-III scores for controls, in whom these data were not available. R package “MatchIt” was used to perform matching and full propensity match method was used. Each patient received a propensity score, which was used to train the machine learning (ML) models and incorporated into risk prediction models.

Gene-expression data: Microarray dataset GSE66099 [24] was downloaded from the NCBI Gene Expression Omnibus (GEO) repository [20] and served as the derivation cohort. The Affymetrix probes were matched to gene symbols using the Affymetrix Human Genome U133 Plus 2.0 (hgu133plus2.db). Data were pre-processed including batch correction for year of study and are detailed online supplement, tables 1 and 2 and figure 1. Differential expression of genes (DEGs) based on a log fold change ≥ ± 0.5, adjusted value for Benjamini Hochberg correction for false discovery rate <0.05, was performed using the limma package [25] in R. We used clusterProfiler [26] for functional gene enrichment, REACTOME pathway analyses [27] to visualize enriched biological pathways, and CIBERSORT [28] to estimate abundance of various immune cells subsets.

Feature selection: Due to the high dimensionality of the training dataset, we used different feature selection strategies to extract a small number of highly discriminative genes to distinguish patients with MODS, relative to those without. We used three variable selection techniques including Random forests, LASSO, and Minimum redundancy and maximum relevance (MRMR). The genes selected by each of the above methods were aggregated into a single input feature set, and the list of DEGs obtained were added to the list. The propensity score for each patient was included in the list of features used to train the classifier.

Classification models: To counter the imbalance in our training data, we incorporated both undersampling and oversampling techniques into our supervised learning framework, detailed in the online supplement. Briefly, three binary classifications algorithms were used including logistic regression and two tree-based classifiers (Random Forest and Extra Trees classifiers).

We applied a 5-fold cross-validation process that was similar to those previously published by our group [20], and involved randomly partitioning the dataset into five equal subsets in a stratified fashion. Four out of the five subsets formed the training set, and the remaining subset was used for testing set and the process was repeated until each fold had been evaluated as a test data. In each training phase, we first integrated the features obtained using the three feature selection approaches, balanced the dataset using sampling techniques, and finally applied the recursive feature elimination algorithm to arrive at a list of features that were most relevant in predicting the target variable, as summarized in online supplement figure 2. Hyper-parameter tuning was done using a cross-validated grid search technique on a subset of the training data over a parameter grid using the area under the curve as the scoring function. We experimented with different classification thresholds from 0 to 1 with step sizes of 0.001 and chose the one that provided the maximum area under the receiver operator characteristic curve (AUROC). The trained classifier was then used to obtain prediction scores on the hold-out test set. To evaluate robustness of the model training, the entire process was repeated seven times, resulting in thirty-five unique train and test splits. The performances obtained during each run were averaged, and the mean scores along with the 95% CI were reported. Features that were repeatedly chosen (≥ 80% for MODS and ≥60% for individual organ dysfunctions) during multiple runs of the cross-validation experiments were determined. Classification performance of models in external validation cohorts were judged based on the AUROC and the Matthew’s Correlation Coefficient (MCC) – a balanced statistical measure of true positive, true negative, false positive, and false negatives [29], as shown in online supplement figure 3.

External validation: We used 4 external datasets: 1) E-MTAB-5882 ArrayExpress dataset that consisted of time-course-based gene-expression profiling measurements collected from the whole blood of 70 critically injured adult patients in the hyperacute time period within 2 hours of trauma [30], 2) E-MTAB-1548 ArrayExpress dataset comprised of 155 adult post-surgical patients with and without septic shock admitted to a Spanish ICU [31,32]. 3) GSE144406 GEO dataset that consisted of a whole blood bulk RNA sequencing total of 27 pediatric patients including 4 healthy controls, 17 patients with MODS, and 6 patients with MODS requiring extracorporeal membrane oxygen (ECMO) support [33], and 4) E-MTAB-10938 ArrayExpress dataset that consisted of 32 pediatric patients with septic shock, of whom 19 had an immunoparalysis phenotype of MODS [34]. Similar, quality control measures were used during data pre-processing of validation cohorts as with the derivation cohort. Different combinations of top genes (n=10, 20, …50) correlated with MODS in the derivation cohort were tested with numerous classifier and sampling techniques to estimate risk of MODS in validation cohorts. We then determined the minimal number of genes and a single classifier combination that provided consistent performance across validation cohorts. Model performance at a fixed sensitivity of 85% [35] were reported across validation cohorts.

Organ-specific dysfunction signatures: We determined gene signatures that correlated with three major organ dysfunctions — cardiovascular, respiratory, and kidney dysfunction—at day 3 and day 7 time points independently in the derivation cohort. Based on the presumption that the MODS signature represented the shared biological pathways among patients with ≥ 2 of cardiovascular, respiratory, renal, hepatic, hematologic, and neurologic dysfunctions, we identified organ-specific differentially expressed genes by eliminating redundancies associated with the shared MODS signature. Finally, we identified targets correlated with cardiovascular, respiratory, and renal dysfunction which features in ≥ 60% of cross-validation experiments.

Endotype identification: Unsupervised hierarchical clustering of the top genes correlated with persistent MODS, selected based on best stability score [36], were used to derive patient subclasses within the derivation cohort. Clinical relevance of newly derived subclasses was determined by estimating differences in clinical outcomes, organ support, and response to adjuvant corticosteroid therapy. Finally, comparisons between previously validated septic shock endotypes [12] —A and B— available in patients with septic shock and the newly derived MODS subclasses were made. Reactome pathway analyses was used to the determine implicated biological processes [27]. Differences in 50 genes used to determine patient subclasses were compared between MODS endotypes.

A total of 201 patients with phenotyping of multiple organ dysfunction trajectories were included in the derivation dataset. The demographic characteristics of the cohort are shown in Table 1. Forty-six patients had persistent MODS, including 15 patients who died within 7 days of study enrollment. Controls included 63 patients with MODS resolution, 19 patients with sepsis without shock or organ dysfunctions, 26 patients with SIRS, and 47 patients admitted for elective surgical procedures who served as healthy controls. Patients with persistent MODS were younger, had higher illness severity at baseline, and a trend toward higher day 1 vasoactive inotropic scores (VIS). There were no significant differences in rate of prescribed corticosteroids between groups. Unsurprisingly, those with persistent MODS had significantly higher 28-day mortality, fewer PICU free days, and higher cardiovascular, respiratory, and renal support requirements than those without. Individual organ dysfunctions and supportive interventions by MODS trajectory are detailed by day of septic shock in online supplement table 3 and 4 respectively.

Machine learning identifies features correlated with a persistent MODS trajectory:

568 genes were differentially expressed among patients with persistent MODS relative to those without; 369 genes were upregulated, and 199 genes were downregulated. The results are summarized in Figure 1. In a sensitivity analyses, exclusion of patients who died within the first 7 days (n=15), did not significantly alter our results. Biological pathways enriched among those with persistent MODS included: “neutrophil degranulation”, “immune system”, “innate immune system”, “adaptive immune system” and “cytokine signaling in the immune system”. CIBERSORT analyses revealed that although neutrophils and monocytes accounted for the most abundant cell types, there were no significant differences in these cell proportions among those with persistent MODS relative to those without. However, an overrepresentation of M0 macrophages and plasma cells and an under-representation of CD8+ T cells, Naive CD4+ T cells, and γδ T cells was observed among patients with persistent MODS relative to those without and detailed in the online supplement table 5

Results of propensity matching are detailed in online supplement table 6. The propensity score for each patient was consistently among the top features identified by our ML models and strengthened the performance of the risk prediction model. We identified 109 genes consistently correlated with persistent MODS in ≥ 80% of cross-validation experiments, detailed in online supplement table 7. The top 10 genes identified were RETN, ADAMTS3, LDHA, LCN2, IL1R2, DDIT4, CEACAM8, MERTK, MPO, and ARL4A. The AUROC for the risk prediction model that included the most stable features to distinguish patients with MODS vs. those without was 0.87 (95% CI: 0.85-0.88) with an MCC of 0.64 (95% CI: 0.60-0.68). The model had a sensitivity of 94.0% (87-93%) and specificity of 79% (76-83%).

We identified a fixed limited set of top 20 genes and Extra Trees (ET) classifier model, a type of ensemble learning technique that aggregates the results of multiple de-correlated decision trees collected in a forest to output it's classification result [37], to reliably distinguish patients with and without MODS in the 4 external validation cohorts. Among pediatric cohorts, our model had an AUROC of 0.96 (95% CI: 0.91-0.99) and 0.78 (95% CI: 0.63 to 0.90) in GSE144406 and E-MTAB-10938 datasets respectively. In the adult cohorts, our model had an AUROC of 0.82 (95% CI: 0.73-0.91) among those with surgical sepsis and 0.77 (95% CI: 0.68 to 0.85) in patients with trauma. The AUROCs for derivation and validation cohorts are summarized in Figure 2. Where available inclusion of propensity scores enhanced model performance to estimate risk of MODS in the validation cohorts, as shown in Table 2.

Eliminating redundancies to disentangle organ-specific dysfunction signatures: We identified top features associated with each of the three major organ dysfunctions —cardiovascular, respiratory, and kidney— independently at day 3 and 7, detailed in online supplement table 8-13. The AUROC for organ-specific models at both time points are summarized in online supplement figure 4. By eliminating redundancies associated with the shared MODS signature, we determined genes correlated with individual organ dysfunctions online supplement table 14-19. CCR1, GPR34, and PDSS1 were the top 3 transcripts correlated with day 3 cardiovascular dysfunction; HSPB1, LILRA3, and GSTO1 were the top 3 transcripts correlated with day 3 respiratory dysfunction; CXCL5, CKLF, and NRGN were the genes correlated with day 3 renal dysfunction. The relevant protein-protein interactions for day 3 organ dysfunctions are shown in online supplement figure 5.

Identification of novel MODS endotypes with distinct clinical features: The top 50 features correlated with persistent MODS were used to derive new pediatric sepsis subclasses. Figure 3 summarizes the results. Four groups were identified: M1 (n=23) and M2 subclasses (n=63) had high rate of MODS–74% and 41% respectively in comparison with 2% in M3 subclass and 0% in M4 subclass, as shown in Table 3. All healthy controls, patients with SIRS, and patients with sepsis without organ dysfunction were clustered in the M4 subclass. M1 and M2 subclasses had significantly lower survival relative to M3 and M4 subclasses (p <0.01). Comparison of the two newly derived MODS endotypes —M1 and M2— is shown Table 4. Patients belonging to M1 endotype were younger relative to M2 endotype with no other meaningful differences in demographic data between groups. Relative to patients with membership in the M2 endotype, those with M1 endotype were less likely to be prescribed adjunctive steroids by the treating team and significantly higher organ support requirements including vasoactive use, mechanical ventilation, and renal replacement therapy. As detailed in Table 5, use of adjunctive steroids was not independently associated with persistent MODS or 28-day mortality in either M1 or M2 endotypes. The p value for the interaction term between endotype X receipt of steroids in M1 relative to M2 endotype was 0.56 for 28-day mortality and 0.97 for day 7 multiple organ dysfunction. Reactome analyses of these 50 features used to derive sepsis subclasses are detailed in the online supplement table 20. Beyond alterations in the innate and adaptive immune systems, the roles of which are well established in sepsis, a major role of transcription factor RUNX1 was identified. Differential expression of the 50 genes used to determine patient subclasses between M1 and M2 endotypes are shown in online supplement figure 6.

Comparison of original and newly derived endotypes of pediatric sepsis:

Figure 4 shows the comparison between previously established endotypes and newly derived patient subclasses among children with septic shock. Endotype A patients were overrepresented among M1 (15/22, 69.2%) and M4 subclasses (5/9, 55.6%); Endotype B were overrepresented among M2 (44/52, 78.9%) and M3 (24/30, 80%) subclasses (χ², p <0.001).

Here we present data that machine learning can enhance identification of differential gene expression signatures correlated with MODS trajectories and persistent individual organ dysfunctions from whole blood of critically ill children with sepsis. Further, through parsimonious selection of features, we demonstrate that our models can be used to reliably estimate risk of MODS across external validation datasets, including adults and children, and can enable identification of novel patient subclasses with meaningful differences in clinical outcomes.

Transcriptomic studies among pediatric patients with sepsis explicitly focused on MODS as an outcome have thus far been limited by patient sample size and case-control study design. Rama Shankar et al. profiled a total of 27 pediatric septic shock patients and identified 30 DEGs when comparing those receiving extracorporeal membrane oxygenation (ECMO) life support (n=6) relative to those with MODS not receiving ECMO; a majority of genes belonged to the histone family [33]. Using similar approaches, Snyder et al. profiled 32 children with pediatric sepsis of whom 19 had an immunoparalysis phenotype of MODS and identified 2,303 DEGs, a majority of which were related to innate and adaptive immune systems [34].

In comparison, we used microarray data from a large prospective cohort of children with septic shock and identified 568 DEGs among patients with persistent MODS relative to those without. Subsequently, using ML and propensity matching, we identified a stable set of features to reliably estimate risk of MODS in the derivation cohort. Then, using a fixed set of the top 20 genes and classifier model, we demonstrated consistent performance across validation cohorts with substantial clinical heterogeneity. Our ability to reliably identify the MODS signature across cohorts with differences in host developmental age and inciting cause of MODS is likely a reflection of shared biological pathways [38]. Future studies focused on contemporary cohorts enriched for patients with MODS may allow us to validate this shared MODS signature.

By eliminating redundancies between the shared MODS signature and those of individual organ dysfunctions, we demonstrate the ability to identify genes correlated with persistent cardiovascular, lung, and kidney dysfunction. Of considerable significance, CCR1 (C-C motif chemokine receptor 1) [39], GPR34 (G-protein coupled receptor 34) [40], and PDSS1 (Decaprenyl diphosphate synthase subunit 1 that codes for Co-enzyme Q10) [41], have been previously correlated with cardiovascular inflammation or dysfunction; HSPB1 (Heat shock protein family B, small member) [42], LILRA3 (Leukocyte immunoglobulin-like receptor A3)[43], and GSTO1 (Glutathione transferase omega 1) [44] have been previously correlated with respiratory inflammation or dysfunction; CXCL5 (C-X-C motif chemokine ligand 5) [45], CKLF (Colon Krüppel-like factor) [46], and NRGN [47] have been correlated with renal inflammation or dysfunction. It is conceivable that differential expression of such genes among immune cellular subsets, with consequent flux in their respective serum protein concentrations, result in patterns of organ dysfunction that are commensurate with organ-specific tissue receptor expression. If validated, our approach may facilitate identification of novel organ-specific molecular targets for subsequent mechanistic studies and development of targeted therapeutics.

We identified 4 subclasses of pediatric sepsis based on 50 top features selected through ML. Of these, M1 and M2 endotypes were enriched for patients with MODS progression and had significantly worse clinical outcomes. Of importance, our newly derived patient subclasses did not show the differential response to corticosteroids, as has been detailed previously based on the endotyping schema used by Wong and colleagues [12–14]. Biological pathway analyses suggested a differential role for nuclear transcription factor RUNX1 among patient subclasses. RUNX1 has been implicated in sepsis [48] with a major role in inflammatory tumor necrosis factor (TNF) production. Loss of RUNX1 is thought to activate a transcriptional signature that primes neutrophils to hyper respond to toll-like 4 receptor (TLR4) stimulation [49], as evidenced based on differential expression of key neutrophil genes among those with the M1 vs. M2 endotype. In addition, RUNX1 binds the promoter of the CSF2 gene that encodes granulocyte monocyte colony stimulating factor (GM-CSF)[50]. The latter is of considerable interest because exogenous GM-CSF has shown promise among patients with an immunoparalysis MODS phenotype [3,51] and is currently under investigation as a potential therapeutic agent (GRACE study, NCT03769844). While speculative, it is conceivable that patients of M2 endotype may overlap with such a phenotype and thus potentially benefit from GM-CSF or RUNX1 modulation.

Comparison of established pediatric sepsis endotype A and B with newly derived subclasses demonstrated that patients with endotype A were overrepresented among M1 and M4 subclasses and those with endotype B were overrepresented among M2 and M3 subclasses. Our data may suggest the existence of complex sub-endotypes among septic patients, as have been described in asthma [52,53], wherein the same individual may both demonstrate differential response to corticosteroids, and simultaneously have a biological predilection to respond to another therapeutic agent. Explicitly stated — although endotype A patients may be predisposed to respond poorly to steroids, they may have differential regulation of biological pathways that make them amenable to an alternative therapeutic agent given their significant overlap with the M1 endotype. Conversely, a subset of endotype A patients who overlap with the M4 endotype may require no additional therapies given that their risk for mortality or organ dysfunctions may be relatively low. If validated, our approach may lead to the development of a tiered-endotyping strategy to identify patients most likely to benefit from an array of targeted therapies and in the future aide clinical decision making at the bedside. Our data have several limitations. 1) We used microarray data and identified a relatively small set of genes correlated with organ dysfunction trajectories. Bulk RNA-sequencing is likely to provide a wider dynamic range including novel and low-abundance transcripts with higher sensitivity and specificity. 2) Although we identified the shared signature associated with MODS, it is likely that further heterogeneity exists. Future cohorts enriched for children with MODS and its related clinical subphenotypes [54] may shed further light on the underlying biology. 3) Whole blood transcriptional signatures correlated with individual organ dysfunctions are unlikely to be causally linked [55]. Thus, identification of corresponding epithelial, parenchymal, or endothelial molecular targets within organ-specific tissue beds through future studies will be necessary. 4) We did not attempt to identify organ-specific signatures correlated with hepatic, hematologic, and neurologic dysfunction given their relatively low frequency in our cohort. 5) Gene-expression data were collected at a single timepoint. However, temporal transcriptomic shifts and endotyping switching are well documented between day 1 and 3 in pediatric septic shock [56,15,57]. Further, evolution of organ dysfunctions in sepsis is dynamic and influenced by both the underlying biology and the interventions used to support organs. Accordingly, sampling the transcriptome at multiple time points in order to determine gene-expression trajectories may better inform the evolution of organ dysfunctions in septic shock [15]. 6) We used bioinformatic tools to deconvolute RNA sequencing data and detect rudimentary differences in immune cell subpopulations by outcome of interest. We anticipate that single-cell RNA sequencing, as has been previously demonstrated [58], may further delineate cell-specific molecular targets correlated with development of organ dysfunctions. 7) Finally, transcription is tightly regulated by epigenomic changes, which are known to themselves modulate development of organ dysfunctions [59]. Future studies that integrate transcriptomic and epigenomic shifts in sepsis may enable discovery of novel epigenetic therapies that correspond with patient endotypes.

We provide evidence that machine learning can be used to optimize feature selection to reliably identify those at risk of multiple and individual organ dysfunctions and delineate patient subclasses with vastly different clinical outcomes. Our data suggest the existence of complex sub endotypes in pediatric septic shock, wherein overlapping biological pathways are linked to differential response to therapies and clinical trajectories. Future studies, in cohorts enriched for patients with MODS, may inform the underlying biology and facilitate identification of novel or repurposed disease modifying therapies for critically ill children with septic shock.

Author contributions:

* M.R.A. and S.B contributed equally to the work detailed in this manuscript. Study concept, design and analyses: M.R.A, S.B, and R.K. Data analysis: M.R.A., S.B., and R.K. Drafting of manuscript: M.R.A and S.B. Acquisition of data: N.Z.C., S.L.W., J.C.F., M.T.B., P.N.J., A.S., R.L., J.N., G.L.A., N.J.T., J.R.G., T.B., M.Q., B.H., H.R.W. Critical review of manuscript for important intellectual content: M.R.A, S.B., A.J.L, M.N.A, B.M.V, J.A.M, M.W.H, N.S.P, and R.K. All authors have approved the final version of this manuscript.

This article has an online data supplement.

Acknowledgement: The authors are deeply indebted to the contributions of Dr. Hector Wong (H.R.W). H.R.W was involved in study conceptualization along with M.R.A and R.K and provided critical feedback during data analyses. Due to his untimely death, H.R.W could not contribute to the drafting of the manuscript.

Conflict of interest: M.R.A, S.B, R.K, and Cincinnati Children’s Hospital Medical Center hold a provisional patent for the work detailed in this manuscript.

Data availability: All data generated or analyzed during this study are included in this published article and its supplementary information files. The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Ethics approval and Consent to Participate: Only de-identified clinical data and publicly available datasets were used for the conduct of this study.

Funding:

H.R.W.’s NIH R35 GM126943 award supported the work detailed in this manuscript. Upon his death, the award was transferred to M.N.A.

R.K. was supported by R01 GM139967.

Weiss SL, Fitzgerald JC, Pappachan J et al. Global Epidemiology of Pediatric Severe Sepsis: The Sepsis Prevalence, Outcomes, and Therapies Study. Am J Respir Crit Care Med 2015;191:1147–57.
Carlton EF, Donnelly JP, Hensley MK et al. New Medical Device Acquisition During Pediatric Severe Sepsis Hospitalizations. Crit Care Med 2020;48:725–31.
Hall MW, Knatz NL, Vetterly C et al. Immunoparalysis and nosocomial infection in children with multiple organ dysfunction syndrome. Intensive Care Med 2011;37:525–32.
Zimmerman JJ, Banks R, Berg RA et al. Critical Illness Factors Associated With Long-Term Mortality and Health Related Quality of Life Morbidity Following Community-Acquired Pediatric Septic Shock. Crit Care Med 2020;48:319–28.
Carcillo JA, Podd B, Aneja R et al. Pathophysiology of Pediatric Multiple Organ Dysfunction Syndrome. Pediatr Crit Care Med 2017;18:S32–45.
Marshall JC, Deutschman CS. The Multiple Organ Dysfunction Syndrome: Syndrome, Metaphor, and Unsolved Clinical Challenge. Crit Care Med 2021;49:1402–13.
Marshall JC. Why have clinical trials in sepsis failed? Trends Mol Med 2014;20:195–203.
Atreya MR, Wong HR. Precision medicine in pediatric sepsis. Curr Opin Pediatr 2019;31:322–7.
Stanski NL, Wong HR. Prognostic and predictive enrichment in sepsis. Nat Rev Nephrol 2020;16:20–31.
Wong HR, Cvijanovich N, Lin R et al. Identification of pediatric septic shock subclasses based on genome-wide expression profiling. BMC Med 2009;7:34.
Wong HR, Cvijanovich NZ, Allen GL et al. VALIDATION OF A GENE EXPRESSION-BASED SUBCLASSIFICATION STRATEGY FOR PEDIATRIC SEPTIC SHOCK. Crit Care Med 2011;39:2511–7.
Wong HR, Cvijanovich NZ, Anas N et al. Developing a clinically feasible personalized medicine approach to pediatric septic shock. Am J Respir Crit Care Med 2015;191:309–15.
Wong HR, Atkinson SJ, Cvijanovich NZ et al. Combining Prognostic and Predictive Enrichment Strategies to Identify Children With Septic Shock Responsive to Corticosteroids. Crit Care Med 2016;44:e1000-1003.
Wong HR, Hart KW, Lindsell CJ et al. External Corroboration That Corticosteroids May Be Harmful to Septic Shock Endotype A Patients. Crit Care Med 2021;49:e98–101.
Maslove DM, Wong HR. Gene expression profiling in sepsis: timing, tissue, and translational considerations. Trends Mol Med 2014;20:204–13.
Davenport EE, Burnham KL, Radhakrishnan J et al. Genomic landscape of the individual host response and outcomes in sepsis: a prospective cohort study. Lancet Respir Med 2016;4:259–71.
Scicluna BP, van Vught LA, Zwinderman AH et al. Classification of patients with sepsis according to blood genomic endotype: a prospective cohort study. Lancet Respir Med 2017;5:816–26.
Sweeney TE, Azad TD, Donato M et al. Unsupervised analysis of transcriptomics in bacterial sepsis across multiple datasets reveals three robust clusters. Crit Care Med 2018;46:915–25.
Sweeney TE, Perumal TM, Henao R et al. A community approach to mortality prediction in sepsis via gene expression analysis. Nat Commun 2018;9:694.
Banerjee S, Mohammed A, Wong HR et al. Machine Learning Identifies Complicated Sepsis Course and Subsequent Mortality Based on 20 Genes in Peripheral Blood Immune Cells at 24 H Post-ICU Admission. Front Immunol 2021;12:592303.
Wong HR, Cvijanovich N, Allen GL et al. Genomic expression profiling across the pediatric systemic inflammatory response syndrome, sepsis, and septic shock spectrum. Crit Care Med 2009;37:1558–66.
Atreya MR, Cvijanovich NZ, Fitzgerald JC et al. Integrated PERSEVERE and endothelial biomarker risk model predicts death and persistent MODS in pediatric septic shock: a secondary analysis of a prospective observational study. Critical Care 2022;26:210.
Pollack MM, Patel KM, Ruttimann UE. The Pediatric Risk of Mortality III--Acute Physiology Score (PRISM III-APS): a method of assessing physiologic instability for pediatric intensive care unit patients. J Pediatr 1997;131:575–81.
Sweeney TE, Shidham A, Wong HR et al. A comprehensive time-course-based multicohort analysis of sepsis and sterile inflammation reveals a robust diagnostic gene set. Sci Transl Med 2015;7:287ra71.
Ritchie ME, Phipson B, Wu D et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 2015;43:e47.
Yu G, Wang L-G, Han Y et al. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS 2012;16:284–7.
Fabregat A, Jupe S, Matthews L et al. The Reactome Pathway Knowledgebase. Nucleic Acids Res 2018;46:D649–55.
Newman AM, Liu CL, Green MR et al. Robust enumeration of cell subsets from tissue expression profiles. Nat Methods 2015;12:453–7.
Chicco D, Jurman G. An Invitation to Greater Use of Matthews Correlation Coefficient in Robotics and Artificial Intelligence. Frontiers in Robotics and AI 2022;9.
Cabrera CP, Manson J, Shepherd JM et al. Signatures of inflammation and impending multiple organ dysfunction in the hyperacute phase of trauma: A prospective cohort study. PLoS Med 2017;14:e1002352.
Almansa R, Tamayo E, Heredia M et al. Transcriptomic evidence of impaired immunoglobulin G production in fatal septic shock. Journal of Critical Care 2014;29:307–9.
Almansa R, Heredia-Rodríguez M, Gomez-Sanchez E et al. Transcriptomic correlates of organ failure extent in sepsis. Journal of Infection 2015;70:445–56.
Shankar R, Leimanis ML, Newbury PA et al. Gene expression signatures identify paediatric patients with multiple organ dysfunction who require advanced life support in the intensive care unit. EBioMedicine 2020;62:103122.
Snyder A, Jedreski K, Fitch J et al. Transcriptomic Profiles in Children With Septic Shock With or Without Immunoparalysis. Front Immunol 2021;12:733834.
Erickson BJ, Kitamura F. Magician’s Corner: 9. Performance Metrics for Machine Learning Models. Radiol Artif Intell 2021;3:e200126.
Kalousis A, Prados J, Hilario M. Stability of feature selection algorithms: a study on high-dimensional spaces. Knowl Inf Syst 2007;12:95–116.
Geurts P, Ernst D, Wehenkel L. Extremely randomized trees. Mach Learn 2006;63:3–42.
Weiss SL, Carcillo JA, Leclerc F et al. Refining the Pediatric Multiple Organ Dysfunction Syndrome. Pediatrics 2022;149:S13–22.
Frangogiannis NG, Entman ML. Targeting the Chemokines in Myocardial Inflammation. Circulation 2004;110:1341–2.
Liebscher I, Müller U, Teupser D et al. Altered Immune Response in Mice Deficient for the G Protein-coupled Receptor GPR34. J Biol Chem 2011;286:2101–10.
Brea-Calvo G, Haack TB, Karall D et al. COQ4 Mutations Cause a Broad Spectrum of Mitochondrial Disorders Associated with CoQ10 Deficiency. Am J Hum Genet 2015;96:309–17.
Breed ER, Hilliard CA, Yoseph B et al. The small heat shock protein HSPB1 protects mice from sepsis. Sci Rep 2018;8:12493.
Lewis Marffy AL, McCarthy AJ. Leukocyte Immunoglobulin-Like Receptors (LILRs) on Human Neutrophils: Modulators of Infection and Immunity. Frontiers in Immunology 2020;11.
Piaggi S, Marchi E, Carnicelli V et al. Airways glutathione S-transferase omega-1 and its A140D polymorphism are associated with severity of inflammation and respiratory dysfunction in cystic fibrosis. J Cyst Fibros 2021;20:1053–61.
Disteldorf EM, Krebs CF, Paust H-J et al. CXCL5 Drives Neutrophil Recruitment in TH17-Mediated GN. J Am Soc Nephrol 2015;26:55–66.
Li J, Liu L, Zhou W-Q et al. Roles of Krüppel-like factor 5 in kidney disease. Journal of Cellular and Molecular Medicine 2021;25:2342–55.
Dumas SJ, Meta E, Borri M et al. Phenotypic diversity and metabolic specialization of renal endothelial cells. Nat Rev Nephrol 2021;17:441–64.
Luo M-C, Zhou S-Y, Feng D-Y et al. Runt-related Transcription Factor 1 (RUNX1) Binds to p50 in Macrophages and Enhances TLR4-triggered Inflammation and Septic Shock. Journal of Biological Chemistry 2016;291:22011–20.
Bellissimo DC, Chen C, Zhu Q et al. Runx1 negatively regulates inflammatory cytokine production by neutrophils in response to Toll-like receptor signaling. Blood Advances 2020;4:1145–58.
Oakford PC, James SR, Qadi A et al. Transcriptional and epigenetic regulation of the GM-CSF promoter by RUNX1. Leuk Res 2010;34:1203–13.
Mathias B, Szpila BE, Moore FA et al. A Review of GM-CSF Therapy in Sepsis. Medicine (Baltimore) 2015;94:e2044.
Agache I, Sugita K, Morita H et al. The Complex Type 2 Endotype in Allergy and Asthma: From Laboratory to Bedside. Curr Allergy Asthma Rep 2015;15:29.
Agache I, Akdis CA. Precision medicine and phenotypes, endotypes, genotypes, regiotypes, and theratypes of allergic diseases. J Clin Invest 129:1493–503.
Carcillo JA, Halstead ES, Hall MW et al. Three hypothetical inflammation pathobiology phenotypes and pediatric sepsis-induced multiple organ failure outcome. Pediatr Crit Care Med 2017;18:513–23.
Wong HR, Marshall JC. Leveraging Transcriptomics to Disentangle Sepsis Heterogeneity. Am J Respir Crit Care Med 2017;196:258–60.
Kwan A, Hubank M, Rashid A et al. Transcriptional Instability during Evolving Sepsis May Limit Biomarker Based Risk Stratification. PLoS One 2013;8:e60501.
Wong HR, Cvijanovich NZ, Anas N et al. Endotype Transitions During the Acute Phase of Pediatric Septic Shock Reflect Changing Risk and Treatment Response. Crit Care Med 2018;46:e242–9.
Reyes M, Filbin MR, Bhattacharyya RP et al. An immune cell signature of bacterial sepsis. Nat Med 2020;26:333–40.
Falcão-Holanda RB, Brunialti MKC, Jasiulionis MG et al. Epigenetic Regulation in Sepsis, Role in Pathophysiology and Therapeutic Perspective. Frontiers in Medicine 2021;8.

Table 1.

Demographic and outcome data by MODS trajectory in derivation cohort.

	Persistent mods	Resolving mods	Other controls	P value
N (%)	46 (22.7%)	63 (31.0%)	92 (46.3%)
Age (years)	1.8 (0.5, 4.5)	2.4 (1.1, 5.2)	2.9 (1.3, 6.1)	0.03
Sex, m	28 (60.8%)	35 (55.5%)	50 (53.2%)	0.69
Race
White	29	40	N/A	0.83
Black	11	18
Other	6	5
PRISM-III	21 (15, 29)	14 (10, 18)	1 (0, 10)	0.01*
Day 1 VIS score	20 (1, 55)	10 (1, 20)	0 (0, 0)	0.07

Source
Pulmonary	9	7	4	0.45*
Extrapulmonary	23	28	7
None	14	28	83
Pathogen type				0.66*
Gram positive	19	15	5
Gram negative	10	15	6
Viral	2	4	0
Fungal	1	1	0

Outcomes
28-day mortality	17	1	0	<0.01
PICU free days	12 (0, 19)	22 (17, 24)	23 (19, 25)	<0.01
PICU LOS	10 (3, 19)	6 (4, 11)	5 (3, 8)	0.02
Hospital LOS	19 (3, 33)	10 (8, 21)	9 (7, 14)	0.38

Steroid use	18 (39.2%)	15 (30.6%)	4 (4.4%)	0.38

VIS score: Vasoactive inotropic score

LOS: Length of stay

N/A: Not available

Table 2.

Model performance in external validation datasets.

Top 20 genes with ExtraTree Classifier

Age group	Dataset	AUROC	Sensitivity	Specificity	PPV
Pediatric	GSE166640	0.96	0.85	1	1.00
Pediatric	E-MTAB-10938	0.78	0.85	0.72	0.37
Adult	E-MTAB-1548	0.82	0.85	0.58	0.50
Adult	E-MTAB-5882	0.77	0.85	0.44	0.53

Top 20 genes with ExtraTree Classifier + Propensity Score

Age group	Dataset	AUROC	Sensitivity	Specificity	PPV
Pediatric	GSE166640	Not applicable, dataset lacked an illness severity score.
Pediatric	E-MTAB-10938	0.80	0.85	0.78	0.40
Adult	E-MTAB-1548	0.84	0.85	0.56	0.49
Adult	E-MTAB-5882	0.83	0.85	0.49	0.59

AUROC: Area under the receiver operator characteristic

PPV: Positive predictive value

Table 3.

Comparison of demographic and outcome variables by newly derived sepsis subclasses.

	M1	M2	M3	M4	P value
N=201	23	63	53	62
Age (years)	1.4 (0.1, 4.5)	2.4 (1.1, 6.5)	2.9 (1.3, 6.0)	2.9 (1.2, 5.3)	0.09
Sex, male	14 (60.9%)	38 (60.3%)	27 (50.9%)	32 (51.6%)	0.64
PRISM-III	22 (12, 26)	14 (10, 20)	11 (7, 16)	0 (0, 0)	<0.01
Day 1 VIS	5 (0, 50)	10 (0, 33)	3 (0,10)	0 (0, 11)	0.02

28-day mortality	9 (39.1%)	8 (12.7%)	1 (1.8%)	0 (0%)	<0.01
Day 7 MODS	17 (73.9%)	26 (41.3%)	1 (1.9%)	0 (0%)	<0.01
PICU free days	17 (1, 22)	19 (12, 23)	23 (17, 25)	24 (18, 26)	0.01
PICU LOS	8 (2, 12)	7 (4, 12)	5 (3, 11)	4 (2, 9)	0.32
Hospital LOS	10 (3, 25)	14 (7, 32)	11 (8, 18)	9 (6, 20)	0.29

Steroid use	4 (17.4%)	27 (42.8%)	10 (18.9%)	1 (1.6%)	<0.01
D3 vasoactive use	20 (90.9%)	38 (64.4%)	14 (35.9%)	6 (21.4%)	<0.01
D7 vasoactive use	16 (72.7%)	15 (28.3%)	3 (9.1%)	0 (0%)	<0.01
Day 3 MV	19 (86.3%)	38 (66.6%)	17 (36.9%)	4 (6.9%)	<0.01
Day 7 MV	18 (81.8%)	28 (49.1%)	4 (9.7%)	1 (1.7%)	<0.01
Day 3 RRT	11 (47.8%)	10 (17.5%)	1 (2.2%)	0 (0%)	<0.01
Day 7 RRT	12 (54.5%)	10 (17.5%)	1 (2.2%)	0 (0%)	<0.01

LOS: Length of stay

MV: Mechanical ventilation

RRT: renal replacement therapy

Table 4.

Comparison of demographic and outcome variables by newly derived MODS endotypes.

	M1	M2	P value
	23	63
Age (years)	1.4 (0.1, 4.5)	2.4 (1.1, 6.5)	0.04
Sex, male	14 (60.9%)	38 (60.3%)	0.87
PRISM-III	22 (12, 26)	14 (10, 20)	0.06
Day 1 VIS	5 (0, 50)	10 (0, 33)	0.73

28-day mortality	9 (39.1%)	8 (12.7%)	0.01
Day 7 MODS	17 (73.9%)	26 (41.3%)
PICU free days	17 (1, 22)	19 (12, 23)	0.30
PICU LOS	8 (2, 12)	7 (4, 12)	0.52
Hospital LOS	10 (3, 25)	14 (7, 32)	0.62

Steroid use	4 (17.4%)	27 (42.8%)	0.03
D3 vasoactive use	20 (90.9%)	38 (64.4%)	0.02
D7 vasoactive use	16 (72.7%)	15 (28.3%)	<0.01
Day 3 MV	19 (86.3%)	38 (66.6%)	0.17
Day 7 MV	18 (81.8%)	28 (49.1%)	0.02
Day 3 RRT	11 (47.8%)	10 (17.5%)	<0.01
Day 7 RRT	12 (54.5%)	10 (17.5%)	<0.01

LOS: Length of stay

MV: Mechanical ventilation

RRT: renal replacement therapy

Table 5.

Logistic regression to test the association between receipt of adjuvant corticosteroids and clinical outcomes by MODS endotypes.

28-day mortality
Term	Coeff.	SE Coeff.	P -value
Constant	-1.83	0.48	0.00
MODS Endotype (M1 relative to M2)	1.29	0.68	0.06
Adjuvant steroids (Yes vs. No)	-0.26	0.78	0.75
Interaction between MODS endotype and receipt of steroids.	0.79	1.35	0.56

Day 7 Multiple organ dysfunction
Term	Coeff.	SE Coeff.	P -value
Constant	-0.69	0.35	0.05
MODS Endotype (M1 relative to M2)	1.47	0.61	0.02
Adjuvant steroids (Yes vs. No)	0.77	0.52	0.14
Interaction between MODS endotype and receipt of steroids.	12.0	268	0.97

Competing interest reported. M.R.A, S.B, R.K, and Cincinnati Children’s Hospital Medical Center hold a provisional patent for the work detailed in this manuscript.

Visualabstract.png
OnlineSupplement.pdf
Online supplement: Figure 1. Effect of batch normalization on average gene expression values. Figure 2. Summary of workflow of machine learning models to identify features correlated with persistent MODS in derivation cohort. Figure 3. Summary of model performance determined based on Matthews Correlation Coefficient across four external validation cohorts. Figure 4. Area under the receiver operating characteristic curve (AUROC) for day 3 and 7 cardiovascular, respiratory, and renal dysfunction in derivation cohort. Figure 5. Protein-protein interaction networks for day 3 cardiovascular, respiratory, and renal dysfunction in derivation cohort. Figure 6. Differential expression of 50 genes used to derive patient subclasses between M1 and M2 MODS endotypes. ** Indicates statistically significant differences (Wilcox Test, p <0.05).

Machine learning driven identification of gene-expression signatures correlated with multiple organ dysfunction trajectories and complex sub-endotypes of pediatric septic shock

Status:

Version 1

Abstract

Background

Methods

Results

Conclusions

Figures

Introduction

Methods

Results

Discussion

Conclusions

Declarations

References

Tables

Additional Declarations

Supplementary Files

Status:

Version 1