A Simple and Interpretable Severe Intraventricular Hemorrhage Prediction Model for Extremely Low Birth Weight Infants Using Machine Learning

DOI: https://doi.org/10.21203/rs.3.rs-1027861/v1

Abstract

Severe intraventricular hemorrhage (sIVH) is a catastrophic event with serious neurocognitive impairment in preterm infants. Because sIVH is a complex multifactorial disease, determining which patients require special attention to prevent sIVH is challenging. This study aimed to evaluate an easy interpretable decision-tree model to identify extremely preterm infants with a higher risk of severe intraventricular hemorrhage. All infants admitted to a single-center tertiary intensive care unit in São Paulo, Brazil, from 2012 to 2017, with a birth weight less than 1000 grams and at least one cranial ultrasound after three days of life were included. The association of risk factors with sIVH was assessed using logistic regression. Univariate analysis, stepwise logistic regression, correlation matrix, Boruta, and XGBoost were used to select features. In this single-center, retrospective cohort of 190 extremely low birth weight infants, the mean gestational age was 27.5 (2.2) weeks and the mean birth weight was 748 (161) grams. A total of forty-two newborns (22.1%) developed severe intraventricular hemorrhage. Machine learning tools identified three features (pH, base excess, and gestational age) that predict severe intraventricular hemorrhage with an AUC of 0.857. Low pH levels appear to be a key factor in identifying the great majority of cases that require additional attention. Conclusions: We suggest a simple and interpretable decision-tree model to promptly identify extremely low birth weight infants at the highest risk of severe intraventricular hemorrhage.

What Is Known

What is new:

Introduction

In preterm infants, severe intraventricular hemorrhage (sIVH) is a catastrophic event with serious neurocognitive impairment (1,2). The maintenance of high prevalence among preterm infants is of concern (3). Because sIVH is a complex multifactorial disease, it is difficult for the neonatologist to determine which patients require special attention to prevent sIVH.

A better understanding of the most important predictors of sIVH would help neonatologists to detect preterm infants who require extra caution to prevent sIVH. Furthermore, predictive models could be used as a reference for future studies to evaluate the efficacy of interventions in this population. However, since the pathology is complex, with multiple proposed disease mechanisms, including disturbance in cerebral blood flow, the inherent fragility of the germinal matrix vasculature, and platelet and coagulation disturbances (4), creating a predictive model to sIVH is challenging.

To increase the accuracy of predictive models in classification and regression problems with high dimensional data, feature selection to an optimal number of variables is crucial (5). The incorporation of irrelevant variables in a predictive model will result in poor prediction (6). There are a variety of feature selection strategies available, ranging from traditional methods (univariate selection, feature importance, and correlation matrix) to machine learning-based techniques (6,7).

This study aimed to perform machine learning algorithms for efficient feature selection to create a simple and interpretable decision-tree model to identify preterm infants with a higher risk of severe intraventricular hemorrhage.

Materials And Methods

We used data from a retrospective study that analyzed extremely low birth weight infants, born between January 2012 and December 2017, in a single-center tertiary neonatal intensive care unit in Brazil (8). All neonates with birth weight lower than 1000 grams and had at least one cranial ultrasound after three days of life were included. Neonates who died in the first 72 hours of life or had major malformation were excluded.

A total of 36 variables for each study participant were included in the analysis:

General characteristics: gestational age (weeks), birth weight (grams), CRIB II score (Clinical Risk Index for Babies) (9), SGA (Small for gestational age defined as birth weight < p10 Fenton growth scale), female gender, 5-minute APGAR score, delivery type (vaginal or c-section), antenatal corticoid, number of endotracheal intubation trials in the delivery room, epinephrine necessity in the delivery room, and lowest temperature in the first 12 hours of life. Severe intraventricular hemorrhage was defined as IVH grade III or IV according to Papile-Burnstein classification(10). Respiratory and Blood gas analysis: lowest pH, highest pCO2, lowest bicarbonate level, and lowest base excess level in the first 72 hours of life. Pneumothorax with chest drainage necessity and invasive mechanical ventilation necessity in the first 3 days of life. Hemodynamic characteristics: Lowest systolic blood pressure, lowest diastolic blood pressure, lowest mean blood pressure, highest systolic blood pressure, highest diastolic blood pressure, highest mean blood pressure, and fluid bolus necessity in the first 72 hours of life. Inotrope necessity was considered as administration of dopamine, dobutamine, epinephrine, norepinephrine, or milrinone in the first 3 days of life. Hemodynamically significant persistent ductus arteriosus (hsPDA) was defined as PDA > 1.5mm diagnosed in echocardiography. Urine output was calculated every 24 hours, and the lowest result in the first 72 hours was considered. Hematological and infectious characteristics: Chorioamnionitis, positive blood culture in the first 3 days of life, highest c-reactive protein in the first 3 days of life, and lowest platelet level in the first 3 days of life.

Continuous variables are presented as mean and standard deviation and categorical variables as frequency and percentages. To compare the groups with and without sIVH, a student t-test was used to compare means and a chi-square test was used to compare proportions. To create a logistic regression model, we split the dataset into training subset (70%) and testing subset (30%). Once the model was created, the area under the curve (AUC) was calculated. Multivariable stepwise logistic regression modeling (forward-backward) was performed using R’s package MASS. A correlation matrix was performed using R’s package corrplot, and correlated features were subjectively removed. Boruta analysis was performed using R’s package Boruta (11), with maxRuns = 100. XGBoost (12) with SHAP values (Shapley Additive Explanations) was performed using R’s package xgboost, with the following parameters: nround = 100, max.depth = 6, eta = 0.01, subsample = 0.8, colsample_bytree = 0.3, verbose = 1, gamma = 0. Decision tree model was created using R’s package rpart and rpart.plot. To create an easy and interpretable decision tree, we limited the maxdepth to 5 when the number of variables was greater than 5. All statistical analysis was performed using R Statistical Software (version 4.1.1; R Foundation for Statistical Computing, Vienna, Austria). The study protocol was approved and informed consent was waived by the institutional ethics committee.

Results

We included 190 extremely low birth weight infants who had at least one cranial ultrasound after three days of life. Clinical and laboratory characteristics are presented in Table 1. The mean gestational age was 27.5 (2.2) weeks and the mean birth weight was 748 (161) grams. During their stay in the neonatal intensive care unit, 42 patients (22.1%) developed severe intraventricular hemorrhage.

In the first model, all variables were subjected to univariate analysis, and those with statistical significance (p<0.05) were included in the logistic regression. The AIC (Akaike Information Criterion) of this model was 127.8 and the AUC was 0.821 (Table 1). We then performed a decision-tree model using variables that remained statistically significant after logistic regression (C-section and inotropic therapy) (eFigure 1).

In the second model, we input all variables in a stepwise logistic regression. The resulting model is presented in eTable 1, and it has an AIC of 108 and AUC of 0.798. The factors that remained statistically significant (C-section, highest pCO2, and inotropic requirement) were then used to develop a decision-tree model (eFigure 2).

A correlation matrix (eFigure 3) was created in the third model to identify and remove correlated features. The logistic regression includes the selected features after the correlation matrix (eTable 2). This model yielded an AIC of 115.4 and an AUC of 0.739. The variables that remained statistically significant (APGAR, delivery, birth weight, worst platelet, and inotropic requirement) were then used to develop a decision-tree model (eFigure 4).

In the fourth model, we performed a stepwise logistic regression using variables after feature selection using a correlation matrix. The resulting model is presented in eTable 3, and it has an AIC of 101 and an AUC of 0.768. The variables that remained statistically significant after logistic regression (APGAR, delivery, birth weight, worst platelet, and inotrope) generated the same decision tree model provided by the third model (eFigure 4).

We used Boruta, which is a feature selection algorithm, in the fifth model. This algorithm identifies features that are either highly or weakly related to the outcome, ranking each factor as either important or unimportant. In our dataset, this algorithm classified 9 attributes as important: Lowest pH, inotropic therapy, gestational age, highest pCO2, Lowest base excess, delivery, fluid bolus necessity, lowest mean blood pressure, and lowest diastolic blood pressure (Figure 1). When applying those 9 features in logistic regression, it yielded an AIC of 122.1 and an AUC of 0.786 (eTable 4). We then created a decision tree model using those 9 important variables (eFigure 5).

In the sixth model, we performed XGBoost, which is an algorithm based on decision trees with gradient boosting, effectively minimizing loss and can directly calculate the feature importance. In our dataset, the top 3 features with the highest importance identified by XGBoost were: lowest pH, lowest base excess, and gestational age (Figure 2). When applying those 3 features in a logistic regression it yielded an AIC of 128.5 and an AUC of 0.857 (Table 2). We then created a decision tree model using only lowest pH, lowest base excess, and gestational age (Figure 3).

Discussion

Our study found that a simple and straightforward model based on pH, base excess, and gestational age may accurately predict the risk of severe intraventricular hemorrhage in infants born at extremely low birth weight. This study performed a variety of methodologies, including machine learning-based algorithms, to create and compare multiple models. The model using only these three features yielded an AUC of 0.857 and an easy and interpretable decision-tree model which does not require any subjective variables, such as the use of inotropic therapy. Low pH levels appear to be a key factor in identifying the great majority of cases that require additional attention. Preterm infants with the lowest pH in the first 3 days of life greater than 7.2 had only 6% of sIVH prevalence. In contrast, preterm with the lowest pH less than 7.2 had a 40% sIVH prevalence.

The prevalence of severe intraventricular hemorrhage remains relatively high (13), and more effective strategies in preventing sIVH are needed (14). It has serious consequences, with over half of all affected neonates developing significant neurocognitive impairment (1). One of the reasons why the prevalence of sIVH remains stagnant is the fact that it is a complex multifactorial disease, with many risk factors, including changes in cerebral blood flow (hypoxia, hypercarbia, acidosis, ventilation asynchrony, patent ductus arteriosus, suctioning of the airway), high cerebral venous pressure (pneumothorax, high ventilator pressure, prolonged labor, and vaginal delivery), abnormal blood pressure (hypotension, hypertension, sepsis, dehydration), the inherent fragility of the germinal matrix vasculature (hypoxic-ischemic insult, sepsis, thrombocytopenia), and hemostatic disturbance (14). Therefore, it is difficult for a neonatologist to identify those who are at a higher risk of sIVH. Previous predictive models were created to aid to fill this gap. With an AUC of 0.78, Luque et al (15). developed a stepwise logistic regression model with the following variables: gestational age, mechanical ventilation, antenatal steroid, 1-min APGAR, birth weight, cesarean section, male gender, and respiratory distress syndrome(15). However, essential variables like hemodynamics and respiratory variables were left out of this model. The AUC was only 0.70 when we applied the model proposed by Luque et al. to our population. Siddappa et al. (16) also developed another model that had an AUC of 0.78 using only a severity score (SNAPPE-II) (16). In our population, an AUC of 0.76 was obtained using a model based on severity score (CRIB-II).

The importance of classification analysis in assessing the associations between independent variables (predictors) and dependent variables (outcomes) cannot be underestimated. However, when there is a large number of predictors, it increases the computational complexity of the model and makes it more prone to overfitting. When a model becomes too complex, it may begin to describe random errors rather than the relationships between variables which is known as overfitting. To avoid this, the smallest set of features that are required to predict the outcome should be determined (17). However, because sIVH has a large number of risk factors, selecting a few key parameters to build a predictive model is challenging. In this case, machine learning algorithms could help in the identification of variables that are more important to the outcome: Boruta is a feature selection algorithm and XGBoost can provide estimates of feature importance.

Using traditional feature selection methods, we were only able to create reasonable models with AUC ranging from 0.707 to 0.823, similar to Luque et al. and Siddappa et al. results. Furthermore, those features did not result in easily interpretable decision-tree models. To illustrate, the root node (beginning of the tree) of some decision tree models was inotropic therapy necessity. Because there is no consensus on neonatal hemodynamic management, the use of inotropes is mostly subjective and varies between physicians (18). As a result, a decision-tree model containing a subjective variable as the root node may apply to our population but fail external validation.

In predictive modeling, feature selection is crucial. We demonstrated that machine learning algorithms can help in feature selection by assisting in the identification of variables with greater value. Interestingly, only machine learning algorithms (Boruta and XGBoost) identified the lowest pH as the most important predictor of sIVH. However, our study has several limitations. First, this was a single-center retrospective study, and a multi-center prospective one is needed. Second, our results showed that the accuracy of models varied widely, implying that there must be severe intraventricular hemorrhage-related variables that we did not analyze. Finally, the results may apply to our population, but external validation is needed.

We propose a simple and interpretable decision-tree model for predicting newborns with extremely low birth weight who are most at risk of severe intraventricular hemorrhage. Feature selection is crucial in predictive modeling. 

Abbreviations

AIC - Akaike Information Criterion

AUC – Area Under the Curve

CRIB – Clinical Risk Index for Babies

hsPDA – Hemodynamically significant patent ductus arteriosus

SGA – Small for gestational age

SHAP - Shapley Additive Explanations

sIVH - Severe intraventricular hemorrhage

XGBoost - eXtreme Gradient Boosting)

Declarations

Acknowledgments:

We thank all the colleagues at the NICU staff at the Faculty of Medicine of the University of São Paulo

Conflict of Interest

The authors declare that they have no conflict of interest and no funding was received for this study.

Funding: None

Availability of data and material: Data available within the article or its supplementary materials

Author’s contributions: Dr. Felipe conceptualized and designed the study, collected data, carried out the initial analyses, drafted the initial manuscript, and reviewed and revised the manuscript. Prof. Vera conceptualized and designed the study, designed the data collection instruments, collected data, and reviewed and revised the manuscript. Prof. Werther conceptualized and designed the study, coordinated and supervised data collection, and critically reviewed the manuscript for important intellectual content.

Ethics approval: Approved and informed consent was waived by the institutional ethics committee

Consent to participate: Not applicable

Consent for publication: Not applicable

References

  1. Davis AS, Hintz SR, Goldstein RF, Ambalavanan N, Bann CM, Stoll BJ, et al. Outcomes of extremely preterm infants following severe intracranial hemorrhage. J Perinatol [Internet]. 2014;34(3):203–8.
  2. Sherlock RL, Anderson PJ, Doyle LW, Callanan C, Carse E, Casalaz D, et al. Neurodevelopmental sequelae of intraventricular haemorrhage at 8 years of age in a regional cohort of ELBW/very preterm infants. Early Hum Dev. 2005;81(11):909–16.
  3. Owens R. Intraventricular Hemorrhage in the Premature Neonate. Neonatal Netw. 2005;24(3):55–71.
  4. Ballabh P. Intraventricular hemorrhage in premature infants: Mechanism of disease. Pediatr Res. 2010;67(1):1–8.
  5. Masoudi-Sobhanzadeh Y, Motieghader H, Masoudi-Nejad A. FeatureSelect: A software for feature selection based on machine learning approaches. BMC Bioinformatics. 2019;20(1):1–17.
  6. Chandrashekar G, Sahin F. A survey on feature selection methods. Comput Electr Eng [Internet]. 2014;40(1):16–28.
  7. Li J, Cheng K, Wang S, Morstatter F, Trevino RP, Tang J, et al. Feature selection: A data perspective. ACM Comput Surv. 2017;50(6).
  8. FY M, VLJ K, AA F, de Carvalho WB. Early fluid overload is associated with mortality and prolonged mechanical ventilation in extremely low birth weight infants. Eur J Pediatr [Internet]. 2020;179(11):1665–71.
  9. Parry G, Tucker J, Tarnow-Mordi W. CRIB II : an update of the clinical risk index for babies score For personal use. Only reproduce with permission from The Lancet Publishing Group. 2003;361:1789–91.
  10. Papile LA, Burstein J, Burstein R, Koffler H. Incidence and evolution of subependymal and intraventricular hemorrhage: A study of infants with birth weights less than 1,500 gm. J Pediatr. 1978;92(4):529–34.
  11. Kursa MB, Rudnicki WR. Feature selection with the boruta package. J Stat Softw. 2010;36(11):1–13.
  12. Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System. In: KDD ’16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016. p. 785–94.
  13. Mukerji A, Shah V, Shah PS. Periventricular/intraventricular hemorrhage and neurodevelopmental outcomes: A meta-analysis. Pediatrics. 2015;136(6):1132–43.
  14. Ballabh P. Pathogenesis and Prevention of Intraventricular Hemorrhage. Clin Perinatol. 2014;41(1):47–67.
  15. Luque MJ, Tapia JL, Villarroel L, Marshall G, Musante G, Carlo W, et al. A risk prediction model for severe intraventricular hemorrhage in very low birth weight infants and the effect of prophylactic indomethacin. J Perinatol. 2014;34(1):43–8.
  16. Siddappa AM, Quiggle GM, Lock E, Rao RB. Predictors of severe intraventricular hemorrhage in preterm infants under 29-weeks gestation. J Matern Neonatal Med. 2021;34(2):195–200.
  17. Steyerberg EW. Overfitting and Optimism in Prediction Models. Clinical Prediction Models, Statistics for Biology and Health. Springer Nature Switzerland; 2019.
  18. Matsushita FY, Krebs VLJ, de Carvalho WB. Neonatal Hypotension: What Is the Efficacy of Each Anti-Hypotensive Intervention? A Systematic Review. Curr Treat Options Pediatr. 2019;5(4):406–16.

Tables

Table 1

Univariate analysis comparing no severe IVH vs severe IVH and logistic regression

 

Univariate analysis

Logistic regression

Feature

No severe IVH (n = 148)

Severe IVH (n=42)

P value

Odds Ratio

95% CI

P value

Gestational age (wk), mean (SD)

27.9 (2.18)

26.3 (2.2)

<0.001

1.180

0.683 – 2.028

0.545

Birth Weight (g), mean (SD)

749.9 (162.9)

741 (156)

0.756

-

-

-

CRIB score, mean (SD)

11.2 (2.3)

13 (2.9)

0.0005

0.868

0.579 – 1.272

0.477

Small for gestational age, n (%)

76 (51.4)

12 (28.6)

0.014

0.282

0.041 – 1.626

0.171

Female Gender, n (%)

73 (49.3)

17 (40.5)

0.401

-

-

-

5-minute APGAR score, mean (SD)

7.7 (1.8)

6.16 (2.6)

<0.001

0.808

0.576 – 1.116

0.199

C-section, n (%)

135 (91.2)

25 (59.5)

<0.001

0.106

0.016 – 0.556

0.010

Antenatal corticoid, n (%)

84 (56.8)

13 (31)

0.005

0.432

0.117 – 1.453

0.184

Endotracheal tube (number of trials in delivery room), mean (SD)

1.1 (1.3)

1.3 (1)

0.294

-

-

-

Epinephrine in delivery room, n (%)

9 (6.1)

8 (19)

0.021

5.919

0.668 – 65.24

0.120

Lowest temperature, mean (SD)

34.8 (0.8)

34.6 (1.1)

0.353

-

-

-

Lowest pH, mean (SD)

7.22 (0.13)

7.07 (0.13)

<0.001

3.387

0.000004 - 9273933

0.865

Highest pCO2, mean (SD)

47.6 (15.5)

62.3 (18.4)

<0.001

1.051

0.985 – 1.133

0.157

Lowest pCO2, mean (SD)

30.1 (7.3)

32.3 (8.5)

0.135

-

-

-

Lowest HCO3, mean (SD)

15.8 (2.9)

13.7 (3.7)

0.001

0.953

0.695 – 1.328

0.768

Lowest base excess, mean (SD)

-9.89 (4.3)

-14.7 (5.6)

<0.001

0.952

0.675 – 1.325

0.774

Pneumothorax, n (%)

5 (3.4)

4 (9.5)

0.213

-

-

-

Mechanical Ventilation, n (%)

115 (77.7)

41 (97.6)

0.006

0.427

0.025 – 12.47

0.562

C-reactive protein, mean (SD)

12.5 (17.3)

14.8 (13.9)

0.384

-

-

-

Positive blood culture, n (%)

10 (6.8)

4 (9.5)

0.786

-

-

-

Chorioamnionitis, n(%)

14 (9.5)

8 (19)

0.149

-

-

-

Diuresis, mean (SD)

2.59 (1.6)

3.03 (1.8)

0.183

-

-

-

Highest fluid overload, mean (SD)

14.8 (14.3)

16.1 (17.1)

0.647

-

-

-

Worst Platelet, mean (SD)

108 (68)

90.5 (66)

0.137

-

-

-

Lowest systolic blood pressure, mean (SD)

42.4 (8.8)

36.3 (6.7)

<0.001

1.055

0.901 – 1.243

0.503

Lowest diastolic blood pressure, mean (SD)

21.3 (4.4)

20 (5.9)

0.201

-

-

-

Lowest mean blood pressure, mean (SD)

29.5 (5.4)

26.3 (5.5)

0.001

0.944

0.739 – 1.194

0.636

Highest systolic blood pressure, mean (SD)

68.4 (12.2)

62.6 (12.3)

0.008

0.942

0.881 – 0.999

0.061

Highest diastolic blood pressure, mean (SD)

40.6 (10.6)

39.2 (7)

0.315

-

-

-

Highest mean blood pressure, mean (SD)

49.4 (10.4)

47.2 (8.1)

0.147

-

-

-

Inotropic therapy, n (%)

29 (19.6)

28 (66.7)

<0.001

11.09

2.401 – 64.66

0.003

Persistent Ductus Arteriosus, n (%)

110 (74.3)

39 (92.9)

0.018

0.888

0.039 – 26.50

0.9416

Hemodynamically significant Persistent Ductus Arteriosus, n (%)

86 (58.1)

34 (81)

0.011

0.182

0.016 – 1.776

0.145

Persistent Ductus Arteriosus size, mean (SD)

1.65 (1.1)

2.29 (1)

<0.001

2.710

0.971 – 8.039

0.059

Fluid bolus, mean (SD)

1 (1.9)

3.2 (3.3)

<0.001

1.023

0.823 – 1.274

0.827

  
Table 2

Logistic regression using top-3 features identified by XGBoost algorithm (AIC 128.5, AUC 0.857)

Feature

Odds Ratio

95% CI

P value

Lowest pH

0.002

0.000005 – 0.5251

0.031

Gestational age

0.785

0.608 – 0.987

0.048

Lowest Base Excess

1.014

0.873 – 1.179

0.854

 
Table 3

Predictive model’s AUC

Model

AUC

Using only severity score (CRIB II score)

0.762

Using variables identified by Luque et al.

0.707

All variables

0.739

Stepwise using all variables

0.784

Variables with p<0.05 in univariate analysis

0.823

Stepwise using variables with p<0.05 in univariate analysis

0.798

Correlation matrix

0.739

Stepwise after correlation matrix

0.768

Boruta

0.786

XGBoost (eXtreme Gradient Boosting)

0.857