A Clinical Decision Support System for Prediction of Postpartum Hemorrhage in Vaginal Birth

neural network; PPH:Postpartum hemorrhage; TOLAC:Trial of labor after a previous cesarean section; ERCS:Elective repeated cesarean section; ACOG:the American College of Obstetricians and Gynecologists; CMQCC:the California Maternal Quality Care Collaborative; AUROC:Area under the receiver operating characteristic; WBC:White blood cells; OR:Odds ratio; EBL:Estimated blood loss; QBL:Quantitative blood loss.


Introduction
Postpartum hemorrhage (PPH) is the leading cause of maternal mortality and morbidity worldwide, accounting for 29.3% of maternal deaths and 26.7% of adverse pregnancy outcomes, respectively [1][2] , it makes up 9.3% of maternal deaths in developed countries, while 45.7% in developing countries [3] . In developed countries, postpartum hemorrhage has fallen to the third as a direct cause of maternal death, after pregnancy hypertension and thromboembolism, however, it remains the leading cause of maternal death in developing countries [2] .
In China, with the opening of "two-child policy" from 2016, more and more elderly pregnant women have become the potential population of postpartum hemorrhage. The increasing trial of labor after a previous cesarean section (TOLAC) and elective repeated cesarean section (ERCS) caused by previous one-child policy, also deteriorate the risk of obstetric hemorrhage. Maternal mortality has been decreasing and seems to reach a bottleneck since 1998 in China, but the mortality caused by postpartum hemorrhage is increasing, compared with 2015, the maternal mortality caused by postpartum hemorrhage increased by 15% in 2016, to 23.5%, and increased to 29.0% in 2017 [1] .
The cesarean section rate in China rose from 34.1% in 2016 to 36.6% in 2019 [1] , far from the 15% recommended by the World Health Organization [4] , thereby, promoting vaginal delivery is an important task in Chinese obstetrics department now. As a decision-maker for early warning of postpartum hemorrhage during the vaginal delivery, midwives often rely on their own experience to judge and have not yet formed a uni ed decision standard [5] . On the one hand, prediction and decision-making for postpartum hemorrhage in vaginal delivery is a formidable task for midwives. Multiple factors have varying in uence on the occurrence of hemorrhage among vaginal delivery women, nonlinear relations between risk factors and postpartum hemorrhage make it more complicate as well [6] . On the other hand, the quali cation of most Chinese midwives is college or technical secondary school education, and standardized training of midwives is low in China, as well.
The objective of this study is to develop a clinical decision support system (CDSS) to solve above problems. A clinical decision support system is invented to help physicians make appropriate decisions for patients, it uses arti cial intelligence algorithm to solve semi-structured or unstructured clinical problem. Clinical decision support system not only is a principle and method based on the knowledge reasoning and logic operation to acquire and process relevant patient data from the electronic medical records automatically, but also is an auxiliary decision-making system which provide decision-makers with valuable information at the right time by the human-computer interaction [7] . Clinical decision support system is considered to be the inevitable trend of the combination of medicine and arti cial intelligence [8][9] . Chinese government issued many policies to promote the sustainable development of medical informatization [10] , the notice of national hospital informatization construction focusing on electronic medical records by the National Health Commission in 2018 [11] , pointed out that clinical nursing decision support system should realize intelligent input, intelligent generation, intelligent reminder, quality control and recti cation of nursing records and other functions.
Data-driven clinical decision support system based on a large amount of electronic medical record data can be divided into case reasoning system and machine learning system, a good machine learning algorithms can extract personalized features from data and summarize patterns to achieve high accuracy [12] . Current algorithms of prediction model for postpartum hemorrhage adopted by the American College of Obstetricians and Gynecologists (ACOG) and the California Maternal Quality Care Collaborative (CMQCC) include logistic regression [13] , decision tree [14] , random forest and extreme gradient boosting [15] . Most of them are linear prediction that divides events into possible or not possible. However, as real life has shown, most events are not as black and white. Rather than other liner algorithms, arti cial neural network (ANN) is a multi-layer complex model formed by connecting neurons (perceptron) through synapses (weights) [16] . ANN has strong nonlinear mapping ability to address highdimensional data, it can also reasonably predict and modify the relationship between input variables and output variables. ANN models possess strong feature of self-adaption, self-organizing, self-learning and tolerate faults [17] . In recent years, ANN has been applied to medical research, such as prediction of disease diagnosis and prognosis, prediction of nursing adverse events, which has proven high superiority [18][19][20][21] .
Therefore, this study aims to build a scienti c midwifery-led clinical decision support system based on ANN algorithm for prediction of postpartum hemorrhage in vaginal delivery, which will provide a practical theoretical basis for improving the success rate of treatment and reducing the mortality of postpartum hemorrhage.

Data source and data collection
In order to identify the potential variables, we rst conducted a literature system review and a Delphi method with obstetricians and midwives. We adopted the search strategy to nd relevant literature as follows: (postpartum hemorrhage OR PPH OR transfusion) AND (predictor* OR risk factor*) AND (vaginal delivery OR vaginal birth) in PubMed, Web of Science, Embase, Springer, Elsevier SD and Wiley. We read related papers published in English from 2006. Moreover, we considered current medical evidence and obstetricians' consultations. We identi ed 54 potential variables regarding this; however, ve of these 54 variables (residence, history of postpartum hemorrhage, pre-pregnancy weight, antenatal steroids, and antepartum infection) were not registered in the electronic medical records. Hence, we designed the data collection form and collected another 49 variables manually (Appendix 1). The needed data was retrospectively collected from the medical records of vaginal delivery women who had visited the obstetrics clinic at our hospital (the Third A liated Hospital of Zhengzhou University) in Zhengzhou, China from 2018 to 2020. We also tried to use the Maternal and Child Health Information Network of Henan Province to collect large data, but some essential elements were not available in the information system, so we collected complete data manually. Two trained researchers ( rst author and second author) extracted the data from the electronic medical records.
Out of 28765 registered vaginal deliveries from the electronic medical records, we excluded any patients who underwent (1) antenatal fetal demise, (2) gestations under 35 weeks, (3) caesarean section, (4) late postpartum hemorrhage, (5) hematological and neoplastic disorders, (6) severe liver diseases, (7) incomplete medical records ≥ 30%, and (8) death before childbirth. According to these criteria, only 1587 of these deliveries were available and eligible for study. The nal dataset included the 49 potential predictor variables and bleeding or not outcomes of these cases. Next, these women were randomly divided into a training set (70%), a validation set (15%) and a test set (15%).

Pre-processing and feature selection
The outcomes event of postpartum hemorrhage was de ned as blood loss at least 500mL within 24 hours after the fetus is vaginally delivered. This outcome was selected because it was consistent with the most recent de nition from "Guidelines for the Prevention and Management of Postpartum Hemorrhage (2014)" issued by Obstetrics Division, Obstetrics and Gynecology Branch, Chinese Medical Association [22] , the literature of "obstetrics and gynecology" (published by People's Medical Publishing House in 2013, 8th Edition) [23] , as well as the consensus de nition of the Royal College of Obstetricians and Gynecologists [24] , and the World Health Organization [25] .
There were data values missing for some variables, we used the multiple imputation to impute missing data values for variables in model. This approach has been recommended by the prediction model risk of bias assessment tool (PROBAST) [26] and applied in previous study [27] .
Numerous variables will lead to over-tting of the model and increase the clinic burden. Therefore, the preprocessing techniques were frequently used to reduce dimension or select feature. This step can eliminate irrelevant, weakly relevant, or less important features. Variable screening is an important procedure to improve the overall stability of the model. We selected variables using the "backward method" and the univariate logistic regression performed by IBM SPSS software (version 20). 7 variables were nally identi ed as input neurons, and the standardized importance of each variable was shown on a scale with a maximum value of 100, to understand the contribution of each variable in the machine learning model (Fig. 1).

Establishment of arti cial neural network models and logistic regression (LR) model
The arti cial neural networks and LR models of postpartum hemorrhage were built based on the identi ed predictors from the previous step.

The multi-layer perceptron (MLP) network
The MLP neural network consists of one input layer, one or more hidden layer and one output layer. The neuron is the basic component of the network. Neurons include input neurons, hidden layer neurons and output neurons. MLP is a multi-layer feedforward neural network, it has advantages of processing nonlinear data, good fault tolerance and strong self-learning ability [17] . The MLP network model can pass the data among each neuron in direction, and excavate the internal deep connections between the input data, whose learning process mainly includes the forward transmission of data and the back propagation of errors. This kind of feedback learning method can reasonably change the weight and bias of each input so that obtain the best prediction ability for the model [28] . The number of neurons in hidden layer can be automatically calculated by SPSS software and the relative optimal number of neurons can be get.

The Back Propagation (BP) network
BP neural network is multi-layer feedforward neural network based on gradient descent algorithm, it is trained according to the error back propagation algorithm, and has been the most widely used neural network, which is composed of input layer, hidden layer and output layer. Activation function between layer and layer is Sigmoid type differentiable function, the error is propagated back to the input layer from back to front layer by layer, the link weight is adjusted constantly in order to realize arbitrary nonlinear mapping between input and output [19,29,30] . The internal Sigmoid function was automatically calculated by SPSS and the relative optimal function parameters were obtained.

The radial basis function (RBF) network
The RBF neural network consists of input layer, radial basis layer and output layer, the input layer is made up of signal source nodes; the radial basis layer is a nonlinear radially symmetric function that is attenuated to the center point according to speci c needs; the output layer is the output result of input layer processed by radial basis function [21] . RBF neural network is a kind of forward conduction neural network which is applicable to solve classi cation problems. It can approximate any continuous function with any precision so that achieve the optimal prediction ability by adjusting the parameters of the radial basis function [30] . The internal radial basis function was automatically calculated by SPSS and the relative optimal function parameters were obtained.

Evaluation of the networks and development of the CDSS
Firstly, in order to identify the best-performed model based on a given dataset, we not only developed ve models for the MLP network, RBF network and BP network, respectively, but also constructed the binary logistic regression to compare the performance of the traditional statistical analysis model and the arti cial neural network model. Secondly, the receiver operating characteristic (ROC) curves of each model in training set, validation set, test set were plotted by R 3.6.3 software. We used the area under the receiver operating characteristic (AUROC), sensitivity, speci city, and accuracy to evaluate and select the most accurate model. In addition, we also conducted the internal validation by 10-fold cross. Finally, we designed the CDSS based on the most accurate network by MATLAB 2013b software to reach visualization.

Ethical and legal considerations
This study was approved by the Ethics Committee of the Third A liated Hospital of Zhengzhou University and followed the ethical guidelines set by the Helsinki Declaration of the World Medical Congress (2019-135-01).

Identifying signi cant variables for predicting postpartum hemorrhage in vaginal delivery
Originally, we identi ed 54 variables that is likely to affect the postpartum hemorrhage among vaginal delivery women (Appendix 1). Among the 1587 deliveries included, 1307 (82.4%) just occurred physiological blood loss (less than 500 ml), in addition, there were 280 (17.6%) cases of postpartum hemorrhage (31 of blood loss more than 1000 ml and 249 of blood loss between 500-1000 ml).
After performing the univariable logistic regression, the most signi cant fteen factors with P-value < 0.01 were selected for predicting the postpartum hemorrhage (Table 1). Tables 1 demonstrated the distribution of these factors in our total dataset for bleeding and not bleeding, respectively. Figure 1 showed the variable importance in the selected MLP model which had the best discrimination ability. The top 10 variables, ranked from most to least important, were white blood cells (WBC) during the rst labor stage, newborn weight, cervical laceration, history of uterine surgery, parity, manual removal of placenta, episiotomy, placenta previa, operative vaginal delivery, uterine curettage, assisted reproduction, velamentous placenta, threatened abortion, induced labor, and placental abruption.   Table 2 demonstrated the number of neurons in the hidden layer and performance of the ve trained networks. As seen, Network 5 performed the best, the accuracy, speci city, and sensitivity (in the total data) was 94.3%, 94.9%, and 90.0%, respectively, there were 8 epochs in the hidden layer.
As shown in confusion matrix of   Table 2 revealed the number of neurons in the hidden layer and performance of the ve trained networks. As seen, Network 7 performed the best, accuracy, speci city and sensitivity (in the total data) was 92.1%, 94.9%, and 79.3%, respectively, there were 10 epochs in the hidden layer.
As shown in confusion matrix of

RBF structure
The radial basis function (RBF) network was composed of Softmax activation functions in the hidden layer, Identity activation functions in the output layer. Table 2 indicated the number of neurons in the hidden layer and performance of the ve trained networks. As seen, Network 13 performed the best, accuracy, speci city, and sensitivity (in the total data) was 87.5%, 94.9%, and 46.2%, respectively, there were 5 epochs in the hidden layer.
According to

Developing the CDSS
To compare the performance of MLP, BP, RBF, and LR model, we put the four ROC curves of the four models together in the training set, validation set, and test set, respectively (Fig. 2). As shown in the Table   6 and Fig. 2  interventions. Prediction of our model mainly focused on the time of labor and birth, therefore, we adopted the variables available during the same instances. It is also appropriate to construct prediction model employing high-risk factors after delivery, such as cervical lacerations and newborn weight.
Importantly, in order to facilitate clinical use immediately, we recommend the information departments in hospital to integrate the aid decision-making system in electronic medical records, in the form of a risk calculator or outputting the result automatically. When a pregnant woman is laboring by an obstetrician, the midwife or other assistant could input the numerical values of the 15 predictors into the built CDSS (Fig. 3), and the prediction outcome (postpartum hemorrhage or non-postpartum hemorrhage) would be output by clicking the button of "submit" in the designed interface of the system. Once the system identi es some pregnancy women as postpartum hemorrhage, the one should be considered carefully by the obstetricians. According to the predicted outcome, the obstetricians can make optimal decision for the pregnancy women about personalized treatment in the next step. In this way, the obstetrician can manage the labor stage more skillfully by early identi cation, early warning, early treatment, in order to mitigate adverse outcome for these high-risk patients. For example, midwives can make pre-transfusion preparations for high-risk patients in advance, cross-match blood tests, and inform the blood transfusion department to prepare blood products, particularly fresh frozen plasma can take up an hour to thaw. At the same time, it is conducive to the rational allocation of medical resources by arranging high-risk bleeding women for delivering in different time to avoid insu cient staff resources.
In recent years, medical institutions and obstetric experts at home and abroad have made a lot of exploration in the early warning evaluation and prediction of postpartum hemorrhage. The risk score calculators formulated by them for different subgroups of people could indicate the possibility of postpartum hemorrhage to some extent and achieved certain results. Ana [31] developed and validated a predictive model based on the binary logistic regression and the ridge regression to measure the risk of excessive blood loss in 2336 vaginal delivery women, but this analysis only collected thirteen variables and the sensitivity was low. Michelle [32] trained and tested a logistic regression prediction models involving 74 variables for hemorrhage and transfusion by a data set from 63973 deliveries, however, the results were not visualized. Kartic [15] et al. used two traditional statistical models (logistic regression with and without lasso regularization) and two machine learning models (random forest and extreme gradient boosting) to predict postpartum hemorrhage, the extreme gradient boosting model showed the best in both the discrimination and decision curve analysis, but some variables in the model were di cult to collect so that limited the clinical practicability. Ahmadzia et al. [33] developed an online calculator for postpartum hemorrhage risk scores, Dunkerton [34] et al. established a decision tree model for postpartum hemorrhage prediction based on a non-parametric recursion algorithm, Bingnan Chen [35] created a nomogram model to predict postpartum hemorrhage individually, however, all of them were only used for cesarean section population. Due to the heterogeneity of discriminant criteria and risk factors between cesarean section and vaginal delivery, there are some limitations in predicting postpartum hemorrhage using the same warning model for different mode of delivery. In addition, most of current prediction models for postpartum hemorrhage were linear algorithms, logistic regression is a traditional statistics algorithm which can screen out the limited variables associated with postpartum hemorrhage and eliminate confounding variables [36] . However, when there are too many variables to observe, the LR neither detect complicated nonlinear relationships between independent and dependent variables, nor has the ability to address collinearity between variables, so some potential valid variables were removed [37] .
Compare with above models, our study indicated that both neural network and logistic regression can provide excellent discrimination for the prediction of postpartum hemorrhage, hence, a nally selected model to build a CDSS should rely on a combination of model performance (discrimination, calibration and net bene t), clinical applicability, and acceptability by obstetricians and expectant women [38] .
The ANN prediction model established in this paper aims to provide personalized prediction results and achieve effective risk strati cation. The results of our study revealed that the ANN model was more accurate to predict postpartum hemorrhage among vaginal delivery. At present, the ANN model has been widely used in disease prediction and diagnosis, chronic disease management, medical image recognition and other aspects [16][17][18][19][20][21] , it had shown superior performance than conventional predictive models even employing the same input variables [16,21] . Postpartum hemorrhage, as an obstetric emergency, has various factors and complex mechanism. If we take the epidemiological data and use the traditional linear discriminant function to predict it, there are great limitations. As an information processing system abstracted from biological neural network, ANN has the ability of self-learning and identifying the relationship between variables, which can approximate arbitrary nonlinear functions with arbitrary accuracy [17] . Previous studies implied that neural network models re ected a stronger t to address complex nonlinear relationships and cost less effort to generate than the traditional regression algorithms [18][19][20][21] . It is imperative that ANN techniques automatically conduct variables selection, missing values imputing and other data preprocessing procedures.
There was a lot of literature support for the included variables as risk factors for PPH. Almost all the variables in our model are clinically available and have been identi ed in previous models, including newborn weight, cervical laceration, history of uterine surgery, parity, manual removal of placenta, episiotomy [31][32][33][34][35]38] . However, one important predictor that has not been considered yet is the WBC count during the rst stage of labor, it is identi ed of 100% importance in this study, which extended the previous work in predicting PPH and may provide insights of underlying pathophysiology links between in ammation participation and PPH onset. In our results, the WBC count of hemorrhage women were higher than that of non-hemorrhage women during the rst stage of labor. Possible reasons for this may be as follows: infection leads to weak state of pregnancy women and those were vulnerable to PPH, and labor is a state of stress which also leads high WBC count [39,40] . Further, more clinical laboratory studies are needed to uncover the underlying pathophysiology mechanism between high WBC count during the rst stage of labor and postpartum hemorrhage. Our outcomes for the WBC showed a 100% standardized importance, this point maybe because the sample size was large, but the morbidity of PPH was low and the number of cases was small in this model. Nevertheless, another statistical algorithm was conducted (binary logistic regression) to explore the odds ratio (OR) of the variables and con rm their statistical signi cance.
The second important risk factor was the newborn weight of 75.5% standardized importance in our study, which again matched previous studies [13,15,24,31] . This may because that increased newborn weight is associated with hyperextension of uterine muscle bers and affects uterine contractions. In addition, pregnant women with large fetal weight may have other complications, such as cephalic pelvic asymmetricity, prolonged rst and second stages of labor, shoulder dystocia, laceration of the soft birth canal, uterine contraction [41] , which also increases the incidence of postpartum hemorrhage. The newborn weight in our study was postpartum variable when the risk period of bleeding had passed, we can consider introducing a prenatal diagnosis of macrosomia or estimated fetal weight according to the ultrasound into model. Some other risk factors in our model for hemorrhage have been accepted generally, namely operative vaginal delivery, induced labor, manual removal of placenta, episiotomy and cervical tears [2,3,6,31−35] , despite their fewer links to hemorrhage with standardized importance less than 30%, the less noticeable whose effect is, the more severe the situation goes.

Limitations
Our study has several limitations. Firstly, neural network algorithms are driven by big data and rely on a large sample size. We developed a CDSS based on electronic medical record from only one center, the sample size was far insu cient, the external validation was just conducted in our hospital, so the result may be not generalizable, robust internal and external validation is needed before promotion and application widely. Secondly, Missing data is another limitation of our study, the study was restricted in the part of valid blood loss data from the original dataset, the missing value of the covariable accounted for a considered proportion, though we adopted multiple imputation techniques reported by the PROBAST [26] , the proportion of incomplete data also limited the generalization of the model. Furthermore, it is more likely that the missing values will continue to hinder integrating the models into electronic medical records. Thirdly, the predictors in our study all were static, hourly data such as temperature, heart rate, systolic blood pressure, oxygen saturation, physical examination ndings were not included.
In addition, despite embedding all risk factors in the CDSS will be promising, but it may increase clinical workload burden and could potentially delay intervention [42] . The impact of a model on aiding decision making depends on multiple characteristics of health providers and circumstances, including ability to initiate immediately intervene response and weigh the risks against bene ts, capacity to take actions, and expectant women (or obstetricians) compliance with the recommended measures [43] . Other environmental restrictions consist of staff, space, and facilities, which are not considered into current CDSS. Meanwhile, estimated blood loss (EBL) is considered to be inaccurate, subjective and always underestimated, there is a mismatch between the actual blood loss and the vital signs, urine volume and mental state of the individual shown. A physical compensatory period of blood loss exists in the early bleeding stage [44][45] . Our de nition of postpartum hemorrhage followed the current clinical guidelines [22][23][24][25] , however, we did not evaluate other relevant clinical indicators of acute blood loss, including a deep perineal hematoma from a laceration of the birth canal, blood ow velocity and properties, the bleeding was uid or turbulent or exudative, and whether it clotted or contained clots [46] . The blood loss in predictive models needs to be measured by means of the quantitative blood loss (QBL) methods, such as basic methemoglobin colorimetric method [47] or image spectral analysis [45] , Shock Index (SI) [48] and so on.

Conclusions
In summary, we developed a CDSS based on the risk factors identi ed by arti cial neural network algorithms. This ANN model performed better than logistic regression in predicting the postpartum hemorrhage of vaginal delivery women. In the future, these ndings will enrich nursing informational practice to promote the application of arti cial intelligence technology (such as ANN algorithm) in the eld of nursing. As predictive tools become more widely used in obstetric care, they can be introduced into clinical guidelines and care pathways after further testing. Identi cation high risk mother of postpartum hemorrhage on labor admission using the clinical decision-making system will bene t accurate prenatal diagnosis and prompt intervention, which may lead to optimize obstetric care, improve maternal prognosis, and rational allocation of medical resource.  The ROC of the MLP, BP, RBF network and LR in the training/validation/test data