Predicting the use labor induction intervention: A machine learning approach for the birth-cohort registry at a tertiary hospital in north Tanzania.

Introduction : Following an increased use of labor induction procedure to prevent adverse maternal and fetal outcomes in Sub-Saharan Africa, hitting the best algorithm that accurately classify subjects in need of the intervention is of paramount importance. This study aimed at comparing the potential benefits of applying machine learning (ML) algorithms over the conventional logistic regression model in predicting the use of labor induction intervention in pregnant women attending one of the tertiary hospitals in north Tanzania for delivery. Methods: We conducted a secondary data analysis of the Kilimanjaro Christian Medical Centre (KCMC) birth registry database for women with uncomplicated pregnancies from the year 2000 to 2015. We excluded observations with non-vertex presentation and those with missing information on labor induction status. Model accuracy and Area under the receiver operating characteristic curve (AUC - ROC) were used to assess the discriminative ability of the selected models. We plotted the decision curve analysis (DCA) to assess the clinical utility of the models under observation. Results: A total of 21,578 deliveries were analyzed. Among these, 8814 (41%) were induced during the study period. Among the selected machine learning models, Random forest algorithm exhibited the best performance in terms of accuracy [0.75; 95%CI (0.73 – 0.76)] and AUC-ROC [AUC-ROC: 0.75; 95% CI (0.74 – 0.76)] compared to other models including logistic regression. Among assessed maternal attributes, parity, maternal age, body mass index, gestational age and birthweight were deemed most important predictors for labor induction intervention. Conclusion : The selected machine learning methods offered better computational performance compared to the conventional logistic regression model in predicting the use of labor induction intervention. The current study lends substantial support to the use of machine learning models in predicting the use of labor induction intervention.


Background
Labor induction (IOL) is one among the most frequent obstetric intervention which include artificial stimulation of uterine contraction before its spontaneous onset [1 -3]. The procedure can be achieved either by mechanical means or by the use of pharmaceuticals readily available in the market [4].
Mechanical induction methods include amniotomy, membrane sweeping and the use of balloon catheter [5 -6]. Major predictors of IOL procedure may be categorized as maternal, fetal, social or combination of all or some of these factors [7]. Given the increasing attention to reduce perinatal morbidity and mortality, the rates of IOL have continued to rise over the past few decades [8].
Globally, the prevalence of IOL varies greatly between countries and regions but higher rates have been more reported in developed countries than in developing countries [9 -10]. The IOL accounts for approximately 20% of deliveries in the UK and USA, but the rates have been rising steadily since the past decade [11,12]. The rate of IOL for Africa region currently stands at 4.4%, confirming the lowest rates for this important intervention in this region [13]. The low rates of IOL in Africa may be reflecting the existing high perinatal and maternal mortality rates in the region whereby 94% of all pregnancyrelated deaths worldwide are recorded in this region [14].
However, the World Health Organization recommends IOL as a therapeutic option only when the benefits of pregnancy termination surpass the risks of its continuation [18]. Early identification of subject in need of the pregnancy intervention has been shown to positively affect the pregnancy outcomes for the mother and the newborn [19]. While there is increasing clinical and administrative interest in predicting the use of IOL intervention, there is no any study that has made use of machine learning algorithm to model this event to-date [20]. Machine learning is a method of data analysis that automates analytical model building. It is a branch of artificial intelligence based on the idea that systems can learn from data, identify patterns and make decisions with minimal human intervention [58]. We thought that machine learning (ML) approach in healthcare data is important due to its wide range of applications and its incredible ability to adapt and provide solutions to complex problems efficiently, effectively, and quickly [21 -23]. However, myriad of studies has shown similar capabilities of logistic regression and machine learning (ML) models in terms of their performance but reported loss of interpretability in some of ML models [24,25]. Selecting the best classifier algorithm to execute for a particular healthcare intervention based on performance still remains an ad-hoc process using fundamental benchmarks such as overall loss function and misclassification metrics. In this article, we compare the performance of ML algorithms (Boosting, Random Forest, Bagging, Artificial Neural Network and Naïve Bayes ) with that of logistic regression in predicting the utilization of IOL intervention based on social and clinical attributes. Using efficient algorithm to predict IOL intervention may be useful to different domains such as risk management, tailored health communications, decision support system and personalized medicine. This may also play a significant role in reshaping and optimizing the current policy governing IOL practices by channeling specific intervention for individuals who will be most likely in need of the intervention.

Study Setting, data source and study design.
The current study was conducted at the Kilimanjaro Christian Medical Centre (KCMC), one of the four (4) tertiary hospitals in Tanzania. This facility is located in Moshi urban district, in northern part of Tanzania, serving the residents of Kilimanjaro as well as the nearby regions. The hospital records more than 2000 deliveries annually, 20% of these admissions are referral cases while the remaining proportion comprises of self-referrals. Since the year 2000, the facility has been recording information on pregnancy, delivery as well as information of the new-born in a specific database. Specially trained nurses conduct personal interviews on daily basis after each uncomplicated delivery or within three days in case of complicated ones. Interviews are conducted using the standardized questionnaire.
Records from the hospital birth registry database covers socio-demographic information as well as mother's health status before and after delivery. Socio demographic information in this database include maternal age, occupation, education level, marital status, tribe, religion and maternal residence. Coverage on clinical information include parity status, use of labor induction, referral status, indications for IOL, induction methods used, history of previous pregnancies, and others. More details on the KCMC medical birth registry procedures has been described elsewhere [26]. We excluded observations with missing information on IOL status. Deliveries with missing information on the covariates under study or with data inconsistencies (example; age below 14 years) were also excluded ( Figure 1). We remained with 21,578 deliveries that constituted to our final sample size.

Study Outcome
The primary outcome of the study was defined as either the pregnancy was intervened by labor induction process by any means (mechanically or by the use of pharmaceuticals) or it achieved spontaneously. Instantaneous and accurate prediction of subject requiring the IOL intervention enables clinicians not only to allocate the IOL resources but also to timely intervene on high-risk patients.

Statistical analyses
Data analysis was performed using R package Version 4.0.3. Mean and standard deviation were used to summarize the continuous variables while categorical variables were summarized using frequency and percentages. Pearson chi-squared test was used to determine association between a set of independent variables and IOL status. We compared the performance of logistic regression (Lreg), random forest (RF), naïve bayes (NB), Artificial neural networks (ANN), Boosting and Bagging algorithms in predicting the use of IOL.
RF is an ensemble technique that has been shown to have the best predictive performance and well suited for medium and large datasets. It is the tree-based learning algorithm which aggregates multiple generated decision trees formed on bootstrap samples aiming at reducing model overfitting while improving its accuracy [27]. We also used this algorithm to identify important variable for IOL intervention [28]. Prior to building ML models, we split data into "training" and "testing" datasets in 70%: 30% ratio. Randomization was performed prior to splitting so as to avoid possibility of data dependency. We estimated Out-of-Bag error (OOB) (tested against training data subsets that are not included in sub-tree construction) and validation error (tested against the test data) to come up with the best possible predictive model [29]. We used the "RandomForest" package to implement this model in R.
Naïve Bayes (NB) is a powerful classifier algorithm belonging to a family of simple probabilistic classifiers based on Bayes' theorem with strong independence as well as equal-importance assumptions among the features under observation [30,31]. It is somehow computationally expensive but can handle an enormous amount of high dimensional data. Even though the naïve assumptions are rarely true, the algorithm performs surprisingly good in many cases [32,33]. We used "naiveBayes" function in R-package to fit NB models.

Artificial Neural Network (ANN) is a computational model inspired by the biological neural networks
aiming at simulating the functioning of the human brain [34,35]. The algorithms learn from inputs, hidden and output layers which are interconnected to produce the desired outputs. Backpropagation is a set of learning rules that are used to guide these complex networks. The input units receive information based on the internal weighting system and the neural network attempts to learn about them and eventually produce the desired results. This algorithm can learn by themselves and produce the output that is not limited to the input provided to them. Another fascinating feature is that, even if a neuron is not responding or a piece of information is missing, the network can detect the fault and still produce the output [36,37]. In addition, ANN can perform multiple tasks in parallel without affecting the system performance. We used "nnet" package to fit the ANN model in R.
Boosting is an ensemble meta-algorithm which works by combining weak learners to form a strong rule for classification by performing several iterations, a process which improves the prediction accuracy [38,39]. These algorithms seek to improve the prediction power by training a sequence of weak models, each compensating the weaknesses of its predecessors.
Bagging or Bootstrap aggregati0n also uses an ensemble learning to evolve machine learning models.
The bagging technique is useful for both regression and statistical classification [40]. This algorithm is used with decision trees, where it significantly raises the stability of models by the reduction of variance and it also eliminates the challenge of model overfitting. Briefly, the base algorithm reads the data and assigns equal weight to each covariate under observation. Thereafter, the incorrect predictions made by the base learner are identified. In the next iteration, the false predictions are assigned to the next base learner with a higher weightage on these incorrect predictions. This process is repeated until the algorithm can correctly classify the output [41].
Logistic regression is one of the simplest machine learning algorithms that has been used often in low dimension data for binary classification problems. This algorithm uses the sigmoid function to perform prediction task [42,43]. We used generalized linear model found in "glm" package to execute logistic regression algorithm.
After training the selected models, we computed the prediction performance of each model using testing dataset (which is the 30% proportion of the main data set reserved and unseen by the model) for validation task. The main purpose of using the testing data set is to test the generalization ability of the trained models (assessing whether the model overfits). To assess the validity and performance of the models, we used "area under the receiver operating characteristic curve" (AUC-ROC). The AUC-ROC is a performance measurement used in machine learning for classification problem that uses the true positive and false positive rate that represents the degree or measure of separability and describes how much the model is capable of distinguishing between classes [44,45]. To compare the ROC curve between models, Delong's test was used [63]. In these machine learning models, we applied techniques to minimize potential overfitting. These include Out-of-bag estimation, 10-fold cross validation and hold-out method.

Variable importance, Decision curve analysis (DCA) and model validation
We identified covariates which had the most predictive power compared to others. "VarImp" function was used to execute this process. This algorithm adds randomness to the data by creating shuffled copies of all features followed by training a random forest classifier on extended dataset while applying a feature importance measures to evaluate the importance of each feature [46,47]. At every iteration, it assesses whether a real feature has a higher importance than the rest of other features. Important variables are the drivers of the outcome and their values have a significant impact in the overall model accuracy.
To compare the performance and clinical utility of these models at a given threshold probability, we plotted a decision curve analysis (DCA). This is a common framework in which a clinical judgement of the relative value of benefits and harms associated with prediction model is made. It calculates the "net-benefit" as a parameter of interest for each threshold probability [48,49]. In other words, the DCA incorporates the information about the benefits of correctly identifying the induced deliveries (true positives) and the relative harm of incorrectly identifying the same (false positives). We therefore presented the net benefit of each model through the range of threshold probabilities in a decision curve. Model is said to be superior to another at the chosen threshold if its net benefit surpasses the net benefit of other models for a given value of threshold probability.

Characteristics of study participants
During 2000 -2015, KCMC hospital recorded about 53,662 deliveries. Of these, we excluded entries (n=10,588) that had no information on whether the labor was induced or not. We also left out pregnancies that had non-vertex presentation (n=1891) to avoid overrepresentation of the use of IOL intervention at the study site. The mean maternal age of study participants was 28 (SD = 6) years.
About half of deliveries were from mothers aged between 25 -35 years. Our study population had a good balance between nulliparous (46%) and multiparous (54%) mothers. Majority of deliveries (78%) were at term while post term deliveries constituted of about 8% and preterm accounted for 14% of all deliveries. Sociodemographic and clinical characteristics of study participants are clearly displayed in Table 1.

Important variables for IOL intervention
We used the "VarImp" function in "RandomForest" package to obtain important/significant characteristics that plays a major role in predicting the utilization IOL intervention among the 14 variables in the main dataset. We used Mean Decrease Gini parameter to achieve the variable importance (Figure 2). Variable with high importance have strong association with the prediction results. We found that body mass index, maternal age, gestational age, parity and birthweight were the important features to consider when predicting IOL intervention.

Predicting the use of IOL intervention
The performance of the selected ML models versus that of logistic regression can be visualized in Figure 3 as well as Table 2. It shows that there is a significant difference in AUC-ROC between logistic regression model and the selected ML algorithms (p<0.001). Among the ML models, Naïve Bayes (NB) was outperformed by other algorithms. Results from the DCA (Figure 4) has shown that the selected ML models demonstrated high net benefit over the range of threshold probabilities when compared to that of logistic regression model.

DISCUSSION
The current study made use of ML models to predict the use of labor induction intervention at tertiary hospital using maternal birth registry database. We observed a significant advantage of using ML models over the conventional logistic regression in terms of sensitivity, accuracy, and area under the For a prediction model that gives predicted probability of disease p, sensitivity and specificity at a given threshold probability is calculated by defining test positive as p̂ ≥ pt. We observed that the net benefit for RF model surpassed that of all other models under investigation, which mean higher accuracy in predicting the likelihood of IOL intervention. Higher net benefits in predictive ability of ML models have been documented elsewhere [56,57]. The main reason for the higher benefits may be explained by the fact that ML models can accommodate high order nonlinear interactions among the covariates under study, a characteristic which is not featured in conventional logistic regression models. In addition, logistic regression model works under strict distributional assumptions while ML models are mostly non-parametric and hence robust [59,60]. Another fascinating ML feature is their ability to efficiently handle model overfitting. Overfitting is the scenario when a model only learns the details and noise in the training data to the extent that it negatively impacts the performance of the model on the testing data [61,62]. To our knowledge, the current study is the first study that applied and compared most popular ML algorithms to predict the use of IOL intervention in Tanzania

Conclusion
Prediction of labor induction intervention based on sociodemographic and clinical information may be of limited value when relying on logistic regression models. ML methods has proved to have an extended net benefit over the range of threshold probabilities, indicating an extended clinical utility over that of logistic regression. Therefore, ML models offers new approach and direction for enhancing clinician's judgement for the sake of improving as well as optimizing utilization of resources in a resource-constrained setting.

Declarations
Ethics approval and consent to participate.
This study sought and was granted an ethical approval from Kilimanjaro Christian Medical University College Research and Ethics Committee with reference number 985. The registry project obtained informed verbal consents from the study subjects during development of the medical registry database and was approved by the Ministry of Health of Tanzania and the National Ethics Committee in Norway prior to its commencement. The midwife nurse gave every woman oral information about the birth registry, the data needed to be collected from them and the use of the data for research purposes. Following the consent, the woman could still opt not to reply to individual questions. All consent procedures were approved by the Kilimanjaro Christian Medical Centre ethical committee and the administrative permission to access the data was provided by the KCMC hospital. Furthermore, confidentiality and privacy were assured as per the protocol of the birth registry. Patients' names were coded by the unique hospital registration numbers to ensure anonymity. We declare that all methods adopted in this research were carried out in accordance with the guideline and regulations for involving human participants.

Consent for publication
Not Applicable

Availability of data and materials
The data used/or analyzed during the current study is available from the corresponding author on a reasonable request.

Conflict of Interests
The authors declare to have no competing interests.

Funding
Research on CDC-Hospital-Community Trinity Coordinated Prevention and Control System for Major Infectious Diseases ", Zhengzhou University 2020 Key Project of Discipline Construction, XKZDQY202007. We declare that the funder had no influence on the study design, collection, analysis, and interpretation of data and on writing the manuscript.