We used data from a retrospective study which analyzed newborns with birth weight less than 1000 grams, born between January 2012 and December 2017, admitted to a single-center tertiary neonatal intensive care unit in São Paulo, Brazil (8). Briefly, we included all neonates admitted to the neonatal intensive care unit, with birth weight less than 1000 grams, who did not have severe congenital malformation and had no missing data.
We recorded a total of 62 variables for each study participant that were used in the analysis:
General characteristics: Gestational age (weeks), birth weight (grams), CRIB II score (Clinical Risk Index for Babies) (9), SGA (Small for gestational age defined as birth weight < p10 Fenton growth scale), gender, 5-minute APGAR score, delivery type (vaginal or c-section), single or twin birth, antenatal corticoid, number of endotracheal intubation trials in the delivery room, lowest temperature in the first 12 hours of life and epinephrine necessity in the delivery room.
Hemodynamics: inotropic therapy necessity in the first 7 days of life, lowest pH in the first 3 days of life, lowest bicarbonate level in the first 3 days of life, lowest base excess level in the first 3 days of life, lowest and highest systolic/diastolic and mean blood pressure in the first 3 days of life, hypotension (defined as mean blood pressure lower than gestational age), persistent ductus arteriosus (PDA), PDA size, hemodynamically significant PDA (hsPDA) (defined as PDA > 1.5mm), and fluid bolus necessity.
Respiratory: Invasive mechanical ventilation necessity and its duration in the first 3 days of life, highest and lowest pCO2 in the first 3 days of life, pneumothorax with chest drainage necessity, highest Ppeak, PEEP, and Mean Airway Pressure in the first 3 days of life in mechanically ventilated patients.
Renal: Acute kidney injury presence (defined as increase in serum creatinine > 0.3mg/dL), creatinine clearance according to Cockcroft-Gault Eq. (10), lowest urine output in the first 3 days of life and highest fluid overload in the first 3 days of life.
Hematologic/Infectious: Positive blood culture in the first 3 days of life, chorioamnionitis, highest c-reactive protein in the first 3 days of life, lowest platelet and hemoglobin levels in the first 3 days of life. Thrombocytopenia defined as platelet count < 50,000mm3.
Outcomes: Early death defined as death in the first 7 days of life, in-hospital mortality, length of stay in survivors patients, moderate or severe bronchopulmonary dysplasia classified at 36 weeks of corrected gestational age in survivors patients, domiciliary oxygen necessity, ibuprofen for PDA clinical closure, PDA surgical ligation necessity, intraventricular hemorrhage and respective grade, severe intraventricular hemorrhage was defined as IVH grade III and IV according to Papile-Burnstein classification(11).
Statistical methods
We performed an agglomerative hierarchical cluster analysis (HC) in 3 different models. In this algorithm, a distance metric is employed to calculate similarity (or dissimilarity) between two patients (or groups of patients), where the smaller the distance between 2 patients, the more similar they are between them. This algorithm was selected due to the lack of knowledge about the number of existing subgroups. In the first model we analyzed all 62 features and performed a dimensionality reduction technique. We used FAMD (Factor analysis of mixed data) to reduce dimensionality because of the presence of mixed (categorical and continuous) variables. We then carried out a hierarchical clustering using principal components with cumulative percentage of variance with at least 85%, with euclidean distance and Ward’s linkage criteria. In the second model, all features were reviewed by clinical experts to identify variables which may drive cluster allocation. We then created a new data set with those selected variables and, without dimensionality reduction, we created a distance matrix using Gower’s distance method (because of mixed data) and proceeded with HC with Ward’s linkage criteria. Cluster analysis was carried out using the following 8 variables: 1) highest pCO2 level; 2) Lowest base excess level; 3) Invasive mechanical ventilation necessity; 4) Inotrope necessity; 5) Positive blood culture; 6) Epinephrine necessity in the delivery room; 7) hemodynamically significant persistent ductus arteriosus; 8) Gestational age < 26 weeks.
In the third model, we used the same features from model 2 and applied FAMD to reduce dimensionality, using principal components with cumulative percentage of variance with at least 85%, with Euclidean distance and Ward’s linkage criteria.
The cluster stability was assessed with bootstrapping methods. The data were resampled and the Jaccard similarities of the original clusters to the most similar clusters in the resampled data were computed(12). The mean of the similarities was used as an index of stability, and only a mean greater than 0.75 was considered stable.
Continuous variables were tested for normality using Kolmogorov-Smirnov test. To compare results among clusters, we used chi-square for categorical variables and Kruskal-Wallis for continuous variables. All data with missing values were excluded from the analysis. Data were standardized before cluster analysis. All analyses were performed in RStudio software and the following packages were used: “factoextra”, “FactoMineR”, “stats”, “fpc”, “ggplot2”, and “tidyverse”. The study protocol was approved and informed consent was waived by the institutional ethics committee.