DOI: https://doi.org/10.21203/rs.3.rs-1027861/v1
Severe intraventricular hemorrhage (sIVH) is a catastrophic event with serious neurocognitive impairment in preterm infants. Because sIVH is a complex multifactorial disease, determining which patients require special attention to prevent sIVH is challenging. This study aimed to evaluate an easy interpretable decision-tree model to identify extremely preterm infants with a higher risk of severe intraventricular hemorrhage. All infants admitted to a single-center tertiary intensive care unit in São Paulo, Brazil, from 2012 to 2017, with a birth weight less than 1000 grams and at least one cranial ultrasound after three days of life were included. The association of risk factors with sIVH was assessed using logistic regression. Univariate analysis, stepwise logistic regression, correlation matrix, Boruta, and XGBoost were used to select features. In this single-center, retrospective cohort of 190 extremely low birth weight infants, the mean gestational age was 27.5 (2.2) weeks and the mean birth weight was 748 (161) grams. A total of forty-two newborns (22.1%) developed severe intraventricular hemorrhage. Machine learning tools identified three features (pH, base excess, and gestational age) that predict severe intraventricular hemorrhage with an AUC of 0.857. Low pH levels appear to be a key factor in identifying the great majority of cases that require additional attention. Conclusions: We suggest a simple and interpretable decision-tree model to promptly identify extremely low birth weight infants at the highest risk of severe intraventricular hemorrhage.
What is new:
In preterm infants, severe intraventricular hemorrhage (sIVH) is a catastrophic event with serious neurocognitive impairment (1,2). The maintenance of high prevalence among preterm infants is of concern (3). Because sIVH is a complex multifactorial disease, it is difficult for the neonatologist to determine which patients require special attention to prevent sIVH.
A better understanding of the most important predictors of sIVH would help neonatologists to detect preterm infants who require extra caution to prevent sIVH. Furthermore, predictive models could be used as a reference for future studies to evaluate the efficacy of interventions in this population. However, since the pathology is complex, with multiple proposed disease mechanisms, including disturbance in cerebral blood flow, the inherent fragility of the germinal matrix vasculature, and platelet and coagulation disturbances (4), creating a predictive model to sIVH is challenging.
To increase the accuracy of predictive models in classification and regression problems with high dimensional data, feature selection to an optimal number of variables is crucial (5). The incorporation of irrelevant variables in a predictive model will result in poor prediction (6). There are a variety of feature selection strategies available, ranging from traditional methods (univariate selection, feature importance, and correlation matrix) to machine learning-based techniques (6,7).
This study aimed to perform machine learning algorithms for efficient feature selection to create a simple and interpretable decision-tree model to identify preterm infants with a higher risk of severe intraventricular hemorrhage.
We used data from a retrospective study that analyzed extremely low birth weight infants, born between January 2012 and December 2017, in a single-center tertiary neonatal intensive care unit in Brazil (8). All neonates with birth weight lower than 1000 grams and had at least one cranial ultrasound after three days of life were included. Neonates who died in the first 72 hours of life or had major malformation were excluded.
A total of 36 variables for each study participant were included in the analysis:
General characteristics: gestational age (weeks), birth weight (grams), CRIB II score (Clinical Risk Index for Babies) (9), SGA (Small for gestational age defined as birth weight < p10 Fenton growth scale), female gender, 5-minute APGAR score, delivery type (vaginal or c-section), antenatal corticoid, number of endotracheal intubation trials in the delivery room, epinephrine necessity in the delivery room, and lowest temperature in the first 12 hours of life. Severe intraventricular hemorrhage was defined as IVH grade III or IV according to Papile-Burnstein classification(10). Respiratory and Blood gas analysis: lowest pH, highest pCO2, lowest bicarbonate level, and lowest base excess level in the first 72 hours of life. Pneumothorax with chest drainage necessity and invasive mechanical ventilation necessity in the first 3 days of life. Hemodynamic characteristics: Lowest systolic blood pressure, lowest diastolic blood pressure, lowest mean blood pressure, highest systolic blood pressure, highest diastolic blood pressure, highest mean blood pressure, and fluid bolus necessity in the first 72 hours of life. Inotrope necessity was considered as administration of dopamine, dobutamine, epinephrine, norepinephrine, or milrinone in the first 3 days of life. Hemodynamically significant persistent ductus arteriosus (hsPDA) was defined as PDA > 1.5mm diagnosed in echocardiography. Urine output was calculated every 24 hours, and the lowest result in the first 72 hours was considered. Hematological and infectious characteristics: Chorioamnionitis, positive blood culture in the first 3 days of life, highest c-reactive protein in the first 3 days of life, and lowest platelet level in the first 3 days of life.
Continuous variables are presented as mean and standard deviation and categorical variables as frequency and percentages. To compare the groups with and without sIVH, a student t-test was used to compare means and a chi-square test was used to compare proportions. To create a logistic regression model, we split the dataset into training subset (70%) and testing subset (30%). Once the model was created, the area under the curve (AUC) was calculated. Multivariable stepwise logistic regression modeling (forward-backward) was performed using R’s package MASS. A correlation matrix was performed using R’s package corrplot, and correlated features were subjectively removed. Boruta analysis was performed using R’s package Boruta (11), with maxRuns = 100. XGBoost (12) with SHAP values (Shapley Additive Explanations) was performed using R’s package xgboost, with the following parameters: nround = 100, max.depth = 6, eta = 0.01, subsample = 0.8, colsample_bytree = 0.3, verbose = 1, gamma = 0. Decision tree model was created using R’s package rpart and rpart.plot. To create an easy and interpretable decision tree, we limited the maxdepth to 5 when the number of variables was greater than 5. All statistical analysis was performed using R Statistical Software (version 4.1.1; R Foundation for Statistical Computing, Vienna, Austria). The study protocol was approved and informed consent was waived by the institutional ethics committee.
We included 190 extremely low birth weight infants who had at least one cranial ultrasound after three days of life. Clinical and laboratory characteristics are presented in Table 1. The mean gestational age was 27.5 (2.2) weeks and the mean birth weight was 748 (161) grams. During their stay in the neonatal intensive care unit, 42 patients (22.1%) developed severe intraventricular hemorrhage.
In the first model, all variables were subjected to univariate analysis, and those with statistical significance (p<0.05) were included in the logistic regression. The AIC (Akaike Information Criterion) of this model was 127.8 and the AUC was 0.821 (Table 1). We then performed a decision-tree model using variables that remained statistically significant after logistic regression (C-section and inotropic therapy) (eFigure 1).
In the second model, we input all variables in a stepwise logistic regression. The resulting model is presented in eTable 1, and it has an AIC of 108 and AUC of 0.798. The factors that remained statistically significant (C-section, highest pCO2, and inotropic requirement) were then used to develop a decision-tree model (eFigure 2).
A correlation matrix (eFigure 3) was created in the third model to identify and remove correlated features. The logistic regression includes the selected features after the correlation matrix (eTable 2). This model yielded an AIC of 115.4 and an AUC of 0.739. The variables that remained statistically significant (APGAR, delivery, birth weight, worst platelet, and inotropic requirement) were then used to develop a decision-tree model (eFigure 4).
In the fourth model, we performed a stepwise logistic regression using variables after feature selection using a correlation matrix. The resulting model is presented in eTable 3, and it has an AIC of 101 and an AUC of 0.768. The variables that remained statistically significant after logistic regression (APGAR, delivery, birth weight, worst platelet, and inotrope) generated the same decision tree model provided by the third model (eFigure 4).
We used Boruta, which is a feature selection algorithm, in the fifth model. This algorithm identifies features that are either highly or weakly related to the outcome, ranking each factor as either important or unimportant. In our dataset, this algorithm classified 9 attributes as important: Lowest pH, inotropic therapy, gestational age, highest pCO2, Lowest base excess, delivery, fluid bolus necessity, lowest mean blood pressure, and lowest diastolic blood pressure (Figure 1). When applying those 9 features in logistic regression, it yielded an AIC of 122.1 and an AUC of 0.786 (eTable 4). We then created a decision tree model using those 9 important variables (eFigure 5).
In the sixth model, we performed XGBoost, which is an algorithm based on decision trees with gradient boosting, effectively minimizing loss and can directly calculate the feature importance. In our dataset, the top 3 features with the highest importance identified by XGBoost were: lowest pH, lowest base excess, and gestational age (Figure 2). When applying those 3 features in a logistic regression it yielded an AIC of 128.5 and an AUC of 0.857 (Table 2). We then created a decision tree model using only lowest pH, lowest base excess, and gestational age (Figure 3).
Our study found that a simple and straightforward model based on pH, base excess, and gestational age may accurately predict the risk of severe intraventricular hemorrhage in infants born at extremely low birth weight. This study performed a variety of methodologies, including machine learning-based algorithms, to create and compare multiple models. The model using only these three features yielded an AUC of 0.857 and an easy and interpretable decision-tree model which does not require any subjective variables, such as the use of inotropic therapy. Low pH levels appear to be a key factor in identifying the great majority of cases that require additional attention. Preterm infants with the lowest pH in the first 3 days of life greater than 7.2 had only 6% of sIVH prevalence. In contrast, preterm with the lowest pH less than 7.2 had a 40% sIVH prevalence.
The prevalence of severe intraventricular hemorrhage remains relatively high (13), and more effective strategies in preventing sIVH are needed (14). It has serious consequences, with over half of all affected neonates developing significant neurocognitive impairment (1). One of the reasons why the prevalence of sIVH remains stagnant is the fact that it is a complex multifactorial disease, with many risk factors, including changes in cerebral blood flow (hypoxia, hypercarbia, acidosis, ventilation asynchrony, patent ductus arteriosus, suctioning of the airway), high cerebral venous pressure (pneumothorax, high ventilator pressure, prolonged labor, and vaginal delivery), abnormal blood pressure (hypotension, hypertension, sepsis, dehydration), the inherent fragility of the germinal matrix vasculature (hypoxic-ischemic insult, sepsis, thrombocytopenia), and hemostatic disturbance (14). Therefore, it is difficult for a neonatologist to identify those who are at a higher risk of sIVH. Previous predictive models were created to aid to fill this gap. With an AUC of 0.78, Luque et al (15). developed a stepwise logistic regression model with the following variables: gestational age, mechanical ventilation, antenatal steroid, 1-min APGAR, birth weight, cesarean section, male gender, and respiratory distress syndrome(15). However, essential variables like hemodynamics and respiratory variables were left out of this model. The AUC was only 0.70 when we applied the model proposed by Luque et al. to our population. Siddappa et al. (16) also developed another model that had an AUC of 0.78 using only a severity score (SNAPPE-II) (16). In our population, an AUC of 0.76 was obtained using a model based on severity score (CRIB-II).
The importance of classification analysis in assessing the associations between independent variables (predictors) and dependent variables (outcomes) cannot be underestimated. However, when there is a large number of predictors, it increases the computational complexity of the model and makes it more prone to overfitting. When a model becomes too complex, it may begin to describe random errors rather than the relationships between variables which is known as overfitting. To avoid this, the smallest set of features that are required to predict the outcome should be determined (17). However, because sIVH has a large number of risk factors, selecting a few key parameters to build a predictive model is challenging. In this case, machine learning algorithms could help in the identification of variables that are more important to the outcome: Boruta is a feature selection algorithm and XGBoost can provide estimates of feature importance.
Using traditional feature selection methods, we were only able to create reasonable models with AUC ranging from 0.707 to 0.823, similar to Luque et al. and Siddappa et al. results. Furthermore, those features did not result in easily interpretable decision-tree models. To illustrate, the root node (beginning of the tree) of some decision tree models was inotropic therapy necessity. Because there is no consensus on neonatal hemodynamic management, the use of inotropes is mostly subjective and varies between physicians (18). As a result, a decision-tree model containing a subjective variable as the root node may apply to our population but fail external validation.
In predictive modeling, feature selection is crucial. We demonstrated that machine learning algorithms can help in feature selection by assisting in the identification of variables with greater value. Interestingly, only machine learning algorithms (Boruta and XGBoost) identified the lowest pH as the most important predictor of sIVH. However, our study has several limitations. First, this was a single-center retrospective study, and a multi-center prospective one is needed. Second, our results showed that the accuracy of models varied widely, implying that there must be severe intraventricular hemorrhage-related variables that we did not analyze. Finally, the results may apply to our population, but external validation is needed.
We propose a simple and interpretable decision-tree model for predicting newborns with extremely low birth weight who are most at risk of severe intraventricular hemorrhage. Feature selection is crucial in predictive modeling.
AIC - Akaike Information Criterion
AUC – Area Under the Curve
CRIB – Clinical Risk Index for Babies
hsPDA – Hemodynamically significant patent ductus arteriosus
SGA – Small for gestational age
SHAP - Shapley Additive Explanations
sIVH - Severe intraventricular hemorrhage
XGBoost - eXtreme Gradient Boosting)
Acknowledgments:
We thank all the colleagues at the NICU staff at the Faculty of Medicine of the University of São Paulo
Conflict of Interest
The authors declare that they have no conflict of interest and no funding was received for this study.
Funding: None
Availability of data and material: Data available within the article or its supplementary materials
Author’s contributions: Dr. Felipe conceptualized and designed the study, collected data, carried out the initial analyses, drafted the initial manuscript, and reviewed and revised the manuscript. Prof. Vera conceptualized and designed the study, designed the data collection instruments, collected data, and reviewed and revised the manuscript. Prof. Werther conceptualized and designed the study, coordinated and supervised data collection, and critically reviewed the manuscript for important intellectual content.
Ethics approval: Approved and informed consent was waived by the institutional ethics committee
Consent to participate: Not applicable
Consent for publication: Not applicable
Univariate analysis |
Logistic regression |
|||||
---|---|---|---|---|---|---|
Feature |
No severe IVH (n = 148) |
Severe IVH (n=42) |
P value |
Odds Ratio |
95% CI |
P value |
Gestational age (wk), mean (SD) |
27.9 (2.18) |
26.3 (2.2) |
<0.001 |
1.180 |
0.683 – 2.028 |
0.545 |
Birth Weight (g), mean (SD) |
749.9 (162.9) |
741 (156) |
0.756 |
- |
- |
- |
CRIB score, mean (SD) |
11.2 (2.3) |
13 (2.9) |
0.0005 |
0.868 |
0.579 – 1.272 |
0.477 |
Small for gestational age, n (%) |
76 (51.4) |
12 (28.6) |
0.014 |
0.282 |
0.041 – 1.626 |
0.171 |
Female Gender, n (%) |
73 (49.3) |
17 (40.5) |
0.401 |
- |
- |
- |
5-minute APGAR score, mean (SD) |
7.7 (1.8) |
6.16 (2.6) |
<0.001 |
0.808 |
0.576 – 1.116 |
0.199 |
C-section, n (%) |
135 (91.2) |
25 (59.5) |
<0.001 |
0.106 |
0.016 – 0.556 |
0.010 |
Antenatal corticoid, n (%) |
84 (56.8) |
13 (31) |
0.005 |
0.432 |
0.117 – 1.453 |
0.184 |
Endotracheal tube (number of trials in delivery room), mean (SD) |
1.1 (1.3) |
1.3 (1) |
0.294 |
- |
- |
- |
Epinephrine in delivery room, n (%) |
9 (6.1) |
8 (19) |
0.021 |
5.919 |
0.668 – 65.24 |
0.120 |
Lowest temperature, mean (SD) |
34.8 (0.8) |
34.6 (1.1) |
0.353 |
- |
- |
- |
Lowest pH, mean (SD) |
7.22 (0.13) |
7.07 (0.13) |
<0.001 |
3.387 |
0.000004 - 9273933 |
0.865 |
Highest pCO2, mean (SD) |
47.6 (15.5) |
62.3 (18.4) |
<0.001 |
1.051 |
0.985 – 1.133 |
0.157 |
Lowest pCO2, mean (SD) |
30.1 (7.3) |
32.3 (8.5) |
0.135 |
- |
- |
- |
Lowest HCO3, mean (SD) |
15.8 (2.9) |
13.7 (3.7) |
0.001 |
0.953 |
0.695 – 1.328 |
0.768 |
Lowest base excess, mean (SD) |
-9.89 (4.3) |
-14.7 (5.6) |
<0.001 |
0.952 |
0.675 – 1.325 |
0.774 |
Pneumothorax, n (%) |
5 (3.4) |
4 (9.5) |
0.213 |
- |
- |
- |
Mechanical Ventilation, n (%) |
115 (77.7) |
41 (97.6) |
0.006 |
0.427 |
0.025 – 12.47 |
0.562 |
C-reactive protein, mean (SD) |
12.5 (17.3) |
14.8 (13.9) |
0.384 |
- |
- |
- |
Positive blood culture, n (%) |
10 (6.8) |
4 (9.5) |
0.786 |
- |
- |
- |
Chorioamnionitis, n(%) |
14 (9.5) |
8 (19) |
0.149 |
- |
- |
- |
Diuresis, mean (SD) |
2.59 (1.6) |
3.03 (1.8) |
0.183 |
- |
- |
- |
Highest fluid overload, mean (SD) |
14.8 (14.3) |
16.1 (17.1) |
0.647 |
- |
- |
- |
Worst Platelet, mean (SD) |
108 (68) |
90.5 (66) |
0.137 |
- |
- |
- |
Lowest systolic blood pressure, mean (SD) |
42.4 (8.8) |
36.3 (6.7) |
<0.001 |
1.055 |
0.901 – 1.243 |
0.503 |
Lowest diastolic blood pressure, mean (SD) |
21.3 (4.4) |
20 (5.9) |
0.201 |
- |
- |
- |
Lowest mean blood pressure, mean (SD) |
29.5 (5.4) |
26.3 (5.5) |
0.001 |
0.944 |
0.739 – 1.194 |
0.636 |
Highest systolic blood pressure, mean (SD) |
68.4 (12.2) |
62.6 (12.3) |
0.008 |
0.942 |
0.881 – 0.999 |
0.061 |
Highest diastolic blood pressure, mean (SD) |
40.6 (10.6) |
39.2 (7) |
0.315 |
- |
- |
- |
Highest mean blood pressure, mean (SD) |
49.4 (10.4) |
47.2 (8.1) |
0.147 |
- |
- |
- |
Inotropic therapy, n (%) |
29 (19.6) |
28 (66.7) |
<0.001 |
11.09 |
2.401 – 64.66 |
0.003 |
Persistent Ductus Arteriosus, n (%) |
110 (74.3) |
39 (92.9) |
0.018 |
0.888 |
0.039 – 26.50 |
0.9416 |
Hemodynamically significant Persistent Ductus Arteriosus, n (%) |
86 (58.1) |
34 (81) |
0.011 |
0.182 |
0.016 – 1.776 |
0.145 |
Persistent Ductus Arteriosus size, mean (SD) |
1.65 (1.1) |
2.29 (1) |
<0.001 |
2.710 |
0.971 – 8.039 |
0.059 |
Fluid bolus, mean (SD) |
1 (1.9) |
3.2 (3.3) |
<0.001 |
1.023 |
0.823 – 1.274 |
0.827 |
Feature |
Odds Ratio |
95% CI |
P value |
---|---|---|---|
Lowest pH |
0.002 |
0.000005 – 0.5251 |
0.031 |
Gestational age |
0.785 |
0.608 – 0.987 |
0.048 |
Lowest Base Excess |
1.014 |
0.873 – 1.179 |
0.854 |
Model |
AUC |
---|---|
Using only severity score (CRIB II score) |
0.762 |
Using variables identified by Luque et al. |
0.707 |
All variables |
0.739 |
Stepwise using all variables |
0.784 |
Variables with p<0.05 in univariate analysis |
0.823 |
Stepwise using variables with p<0.05 in univariate analysis |
0.798 |
Correlation matrix |
0.739 |
Stepwise after correlation matrix |
0.768 |
Boruta |
0.786 |
XGBoost (eXtreme Gradient Boosting) |
0.857 |