Combination of XGBoost Analysis and Rule-Based Method for Intrapartum Cardiotocograph Classification

The two major components of a cardiotocograph (CTG) are uterine contraction (UC) and fetal heart rate (FHR) signals. CTG has been widely used to monitor fetal well-being in the past 50 years. The guideline provided by the National Institute of Child Health and Human Development (NICHD) classifies CTG patterns into three categories (I, II, and III) in evaluating the status of a fetus. However, manual interpretation of CTG is time-consuming and is subjected to inter-personal bias. In this study, we combined the rule-based method and eXtreme Gradient Boosting (XGBoost) analysis in classifying CTG patterns. Because of the persistent controversies about the Category II of NICHD, XGBoost analysis was used to classify it into IIa and IIb. A total of 68 pregnant women were enrolled in this study. Three categories (I, II, and III) were consistent in both manual interpretation by clinicians and our algorithm across all categories, and the average Kappa was about 0.72. The probability of fetal distress (FD) was 28.8% and 71.2% in the categories IIa and IIb, respectively. These findings show the proposed method has the potential to provide a clinical assistant tool to monitor fetal well-being and has high potential to be an assistive and warning system to reduce the burden of medical staff.


Introduction
Advanced maternal age (AMA), defined as a phenomenon where women become pregnant at the age above 35, has recently increased in high-income economy countries [1]. In 2018, research from Pew Research Center in America revealed that 86% of American women had ever given birth between ages 40 to 44 [2]. Meanwhile, a study by EU researchers indicates that 20% of birth in England and Wales were given by women above 35 years old, while 4% were given by women aged ≥ 40 years, compared with that of 6% and 1% birth, respectively in 1980 [1]. Although this trend is attributed to assisted reproductive technologies (ART) advancements, it potentially causes clinic risks [1]. According to previous studies [3], advanced maternal age is associated with the incidence of pregnancy complications, such as abortion, gestational toxemia, gestational diabetes, and fetal growth retardation. These complications not only trouble pregnant women but also affect the health of the fetuses. Among all these complications, fetal distress (FD) may lead to hypoxia of the fetus, thereby reducing fetal movements and heart rate, and provoking acidemia. In addition, Hypoxic Ischemic Encephalopathy (HIE) caused by FD may also lead to chronic, long-term injury or even death to the fetus [4]. The placenta is the main source of nutrition for the fetus, while the function of the uterus gradually declines with the age of women [5]. In 2014, Parker pointed out that advanced maternal age would cause an increased chance of early placental exfoliation, which would reduce the fetus' oxygen circulation rate. Consequently, this provokes FD [6], which may lead to growth retardation. According to statistics, the incidence of FD during delivery is three out of a thousand [7]. Therefore, in 2013, Ugwumadu et al. looked for relevant fetal compensation strategies by observing the changes in fetal heart rates during pregnancy in the hopes of lowering the incidence of FD [8].
Other studies have shown a correlation between prenatal and fetal physiological signals and FD [9][10][11][12][13][14][15]. For example, cardiotocograph (CTG) signals can assist obstetricians and nurses in clinical interpretations. CTG is mainly used as a clinical reference for fetal health and FD [9] by monitoring the changes of Uterine Contraction (UC) and Fetal Heart Rate (FHR) of pregnant women. However, in recent years many experts, scholars, or organizations, such as the National Institute of Child Health and Human Development (NICHD) [10] and the Royal College of Obstetricians and Gynecologists (RCOG) [11], have established different classification rules based on different clinical experiences in using CTG signals to make clinical results more objective. In order to improve the speed of interpretation and decrease human error, various methods of intelligent CTG interpretation have been proposed in recent years. According to the NICHD, the research team from National Taiwan University developed an automated NICHD 3-tier classification system in 2014. In this study, researchers collected 62 CTG data before delivery and compared the CTG data with the visual interpretations from 8 obstetricians. The results showed the average statistics kappa value of categories I, II, III, which were 0.89, 0.78, 0.50, and 0.79, respectively. Compared with the traditional classification method, the kappa value here was higher (0.15-0.38), although the value of category III was lower [12]. In 2016, according to RCOG, a Malaysian research team classified a group of pregnant women into normal, suspicious, and pathological categories and analyzed the baseline, variability, deceleration, and acceleration based on FHR to construct an instant classification system [11]. The proposed methods were employed to confirm the accuracy of classification based on the visual interpretation of three obstetricians-kappa values of 0.668, 0.587, and 0.630 for the 80 clinical CTG signals.
As shown above, the automatic interpretation system of NICHD or RCOG has its advantages and feasibility. However, some classification results are clinically vague, such as the Category II of NICHD and the suspicious category of RCOG. In 2016, Penfield et al. proposed a classification system for NICHD Category II signals, called the "ABC system", which divided category II tracings into IIA, IIB, and IIC subcategories. The system mainly classified the changes in the new fetal heart rate tracing characteristics in category II. They demonstrated that the ABC system could improve team communication and increase on-site management of category II FHR tracings by private physicians [13]. In 2017, Sabina Martı Gamboa et al. started to estimate the association between atypical variable decelerations and neonatal academia and compare the CTG signals of 102 fetal acidemia cases (umbilical arterial cord gas pH ≤ 7.10) and 100 normal fetuses (umbilical arterial cord gas pH > 7.10) to estimate the correlation among typical variable decelerations of CTG, neonatal acidemia, and the morbidity of FD [14]. The results showed certain atypical features, as slow return and loss of moderate variability within decelerations are associated with neonatal acidemia. The slow return could help in the gradation of acidemia risk levels as an indicator of gravity.
In this view, this study used the classification rules of NICHD with the rule-based method to classify our results into categories I, II, and III and the atypical variable decelerations based on literature results [14] directly into category III (e.g., slow-return) to improve its consistency with the clinicians' interpretation. In addition, every UC during pregnancy is a kind of stimulation to the fetus; for normal fetuses, such stimulation can be recovered in a short time. However, if the fetus has FD, the fetus may not be able to maintain internal regulation through compensation strategies [15].
Extreme Gradient Boosting (XGBoost) is an open-source library that provides an efficient and effective implementation of gradient boosting analysis. XGBoost is a class of ensemble machine learning algorithms that can be used to classify predictive modeling problems. In this study, we utilized XGBoost analysis as the classifier based on sequence analysis to classify category II data into IIa and IIb by further examining UC and FHR. It could improve the understanding of category II signals in clinical application, increase on-site management of category II by private physicians, and provide a more accurate clinical basis for FD.

Materials and Methods
The proposed system for intrapartum CTG classification was developed by the rule-based method and XGBoost analysis in this study. Data sources were mainly obtained from a clinical database and then analyzed accordingly. The clinical CTG signals were divided into categories I, II, and III by the rule-based method based on their features, such as baseline, variability, acceleration, and deceleration following NICHD's and Chen's [12] definitions. In order to reduce the processing time, we modified and simplified the rule as Table 1.

Clinical Data Collection
The clinical CTG records were taken from the fetal monitor (Avalon FM20, Royal Philips, Eindhoven, Netherlands) in the delivery room of the National Cheng Kung University

3
Hospital. The medical records were documented between January 2017 and April 2018. The excluding rules of clinical CTG records include Sinusoidal pattern of FHR (SP), multiple births, mothers with heart and lung diseases, and fetuses with congenital anomalies. The flowchart of clinical CTG selection is shown in Fig. 1. The data, including numerical data in the database and converted images, comprised the age of the pregnant woman, gestational age, FHR, UC, birthing methods, and anesthesia methods, as shown in Table 2. Sixty-eight records of UC and FHR were selected in this study, of which records with FD and Non-FD each have 30, verified by professional medical staff. The information was obtained under the Institutional Review Board (IRB) program (A-ER-105-477). Each record has a continuous measurement of at least 40 min with a 1 Hz sampling rate.

The Preprocessing of CTG Signals
During prenatal checkups, CTG records were affected easily by the noise and missing data. It may have been caused by the movement of the pregnant woman or sensors, which greatly affected the identification results made by the medical staff or computerized analysis [11]. Therefore, before performing feature extraction, we referred the research [16,  17] to CTG preprocessing, including linear interpolation, trend removal, and data smoothing to remove the noise and motional artifact.

Features Extraction of FHR and UC
Features extraction from clinical CTG data is shown in Table 1. At first, various features were extracted from the FHR signal, such as baseline, variability, acceleration, and deceleration; for instance, the time difference between FHR & UC as a feature. Five parameters were defined from the extraction of the time and frequency domains and the morphological features. In this section, FHR features were extracted based on the NICHD criteria [12]. The algorithm used for feature extraction and classification in this study was referred to previous results [18] and developed by MAT-LAB (version 9.9.0.1495850, R2020b, Natick, Massachusetts, The MathWorks Inc.).

Baseline Variability (BV) of FHR Detection
The FHR BV is a response that shows FHR changes within a short period when the fetal central nervous system is stimulated. It also reflects the normal development of fetal nerves and the adequacy of oxygen in the body [15]. Through the BV, we could observe whether the abnormal heart rate of the fetus was due to metabolic acidosis or neurological damage that may be caused by hypoxia. FHR variability with 0 beats per minute (bpm) does not exist in the living fetus.
Refer to the previous researches [12,15], an amplitude of 2 bpm as BVA and between 2-5 bpm as BVMI means that the fetus may have been neurologically harmed due to hypoxia.
Moreover, an amplitude between 6-25 bpm and more than 25 bpm is defined as BVMO and BVMA, respectively. The BV is calculated by Eq. (1): the difference between the FHR maximum and minimum values every minute. Moreover, the BV would exclude the acceleration (AC) and deceleration (DE) conditions.
where BV represents the difference between the maximum (MAX) and minimum (MIN) values of the FHR within one minute; n represents one minute.

Baseline (BL) of FHR Detection
According to the definition of NICHD, the BL of FHR is determined by approximating the mean FHR rounded to increments of 5 beats per minute (bpm) during a 10-minute window. It needs to exclude the state of accelerations and decelerations. Normal (BLN), tachycardia (BLT), and bradycardia (BLB) mean that the BL was between 110 & 160 bpm, greater than 160 bpm, and less than 110 bpm, respectively. BL was calculated as Eq. (2).
where i represents the FHR reference line in the i-th minute, and n is the total number of times the difference before and after the FHR is less than 5 bpm.

Acceleration (AC) of FHR Detection
The manifestation of the FHR acceleration is closely related to fetal health and is usually generated when there is fetus activity. AC was also found to be an important status to evaluate fetal health. Thus, we compared the lowest point of acceleration with the peak position of UC and calculated the continuous acceleration time of FHR. The duration and magnitude of the acceleration during measurement were used as evaluation criteria. Generally, the duration was more than 15 s, and the amplitude was greater than 15 bpm, as shown in Eq. (3). FHR onset and FHR recovery refers to the start and the end of continuous acceleration time, respectively. The FHR acceleration represents the normal oxygen concentration in the blood of the fetus with no metabolic academia, implying that the fetus is currently in a normal state. However, when no acceleration is observed during the monitoring process, it does not mean that the fetus is abnormal. The determination must be done through FHR variability and deceleration.
(1)  12 20 where FHR and n mean amplitude series and acceleration period of FHR, respectively; FHR recovery and FHR onset represent the start and the end in this period.

Deceleration (DE) of FHR Detection
UC is the state in which the uterus practices childbirth during pregnancy. When the uterus contracts, it is squeezed inwardly, and the blood flow in the placenta changes. We used the valley detection method based on an algorithm that fits a quadruple polynomial to sequential groups of data points to determine the nadirs of FHR decelerations [13]. At this moment, the DE is the degree of FHR decline consistent with the UC. The DE standard is longer than 15 s, and the deceleration size is more than 15 bpm, as shown in Eq. (4). The nadir location of FHR deceleration was compared with the UC peak location to classify early and late decelerations.
where FHR and n mean amplitude series and decelerations period of FHR, respectively; FHR recovery and FHR onset represent the start and the end in this period.

Early DE (EDE) and Late DE (LDE) of FHR Detection
When the uterus contracts under normal conditions, the fetus initiates a compensatory mechanism to balance the oxygen in the body. Generally, the time difference between the lowest point of FHR and the last max value before the lowest point of FHR lasts at least 30 s. Notably, there is no time difference between the position of nadir FHR, which is the lowest position of FHR deceleration, and the position of maximum UC. After the end of the contraction, the FHR returns to normal. Such stimulation does not affect the fetus and belongs to the state of EDE, as shown in Eq. (5). The state of late deceleration reflects the condition where the fetus may face hypoxia. Therefore, the FHR is delayed and decelerated after UC. LDE is defined as the FHR starts to decelerate to the bottom of at least 30 s, and its time difference from when the UC starts is more than or equal to 18 s, as shown in Eq. (6).

UC Duration
UC feature samples are firstly obtained by setting the threshold value (Thr). When the threshold of the UC data is extracted and the UC is greater than the threshold value, an intersection point is generated. The time interval between the two intersection points (a, b) is used as the basis for judging the UC, as shown in Eq. (7). The difference between the time points at which the maximum occurs and the time when the FHR decelerates can be used to determine whether the FHR deceleration state is at early or late deceleration.
where VAR(UC) is the difference between maximum and minimum value; MAX(UC) and MIN(UC) represent the highest and lowest position, respectively, and Thr(UC) obtained the feature by setting the threshold value of UC.

Signal Truncation for Sequence Analysis and XGBoost Analysis
In this study, we designed a CTG category II advanced analysis method based on UC to further classify the signals into IIa and IIb. Since the traditional method requires a long-term evaluation, such as EDE and LDE in the interpretation of CTG, it is not applicable in the interval of a UC. Therefore, only the parameters of BL, BV, AC, and DE in each interval were taken in this study. In the sequence analysis, based on UC, the CTG signal is evaluated as normal and abnormal FHR through truncation and the above parameters, while the ratio of the 2 CTG signals is used to classify further the category II signal as shown in Fig. 2. Furthermore, we applied a machine learning classifier, the XGBoost [19], to classify the category II CTG signals following the NICHD rules into IIa and IIb. The input features included 4 characteristics of FHR and UC obtained by the rule-based method (BL, BV, AC, DE) and the abnormal ratio in the sequence analysis. In addition, we considered the results from clinicians as the ground truth of the classifier training model. This method was chosen due to its significant advantages, including dealing with missing values, requiring data scaling, and implying a computationally efficient variant of gradient boosting analysis and t a→b ≥ 30, UC = MAX UC t a→b [19]. It provided satisfactory results in ML competitions and was successfully used in other studies and domains [20,21]. We can efficiently select features parameters, reduce overfitting and computational complexity with supervised learning of XGBoost. A model could be built after the training to predict the results through supervised learning and additive training to execute iteration and update the objective function. The objective function of XGBoost is shown in Eqs. (8) and (9). In order to optimize the hyper-parameters of XGBoost, we referred to previous research [22] and used the Bayesian optimization method in this study.
where l is a loss function, to estimate the difference between ŷi and yi , which is the training error. is a regularization term, including T and denote decision node and hyperparameters, respectively. By adjusting , the weight of the decision node to avoid overfitting.

Results
The data collected in this study were mainly CTG records provided by the clinical database under the IRB program. There were two kinds of CTG data-FD and Non-FDbased on the clinical medical records. After backtracking the CTG information, the data was given to three clinicians (whose average obstetric experience was more than 10 years) to interpret the relevant CTG records and classify them into categories I, II, and III following the NICHD. Then, we utilized the rule-based method to classify the data and calculated the ratio of FD in each category. The results made by the clinicians and the rule-based method were compared to evaluate the accuracy. The comparison results of Kappaof which the average was about 0.72-in each category are shown in Table 3. There was no statistical difference between the results from the clinicians and the rule-based method.
In order to improve the clinical evaluation of categories II of NICHD, we tried to classify the category II into IIa and IIb and analyze their correlation with the probability of FD. Category IIa means that the fetus has a higher probability of returning to the category I state, while IIb means that the fetus has a higher probability of developing FD. Thus, surgery must be performed immediately. This processing  method not only utilized the results of the sequence analysis but also included the features of the rule-based extraction, which was used as an input in the XGBoost analysis. The sequence analysis mainly used the UC time as the standard when extracting the FHR segment. The FD ratio was calculated through Eq. (10). The results from the XGBoost analysis and classification are shown in Table 4. It can be found that the FD ratio of IIa is significantly lower than IIb, having a statistical difference (p<0.05). Although I and IIa, and IIb and III are not statistically consistent, clinicians can still take IIa and IIb as the basis for clinical care.

Discussion
In the past 50 years, CTG has been widely used to assess fetal well-being during pregnancy and labor [23]. Although it is a non-invasive and cost-effective tool, the practice heavily relies on the subjective interpretation of health personnel [24,25]. A recent Cochrane review showed that although CTG did not significantly reduce the perinatal death rate, it was associated with halving neonatal seizure rates. There was also no difference in the incidence of cerebral palsy or cord blood acidosis [9]. Based on clinical experience, clinicians diagnose the fetal status, such as FD by CTG changes with visual analysis, usually dominated near the end of the CTG signal period. It is not quite an objective method without considering all the long-term CTG signals (more than 40 min) [4]. The results of the CTG interpreted by clinicians were compared with the proposed system. We found that they are statistically consistent with the clinicians' visual analysis through the Kappa value. However, we further analyzed and compared the different signals between clinicians and the proposed system. We found out that the difference is usually caused by the local and global views of the CTG signal. For example, there was a result put under category (10) FD ratio = Number of FD Quantity in the Category I by clinicians' visual analysis. Nevertheless, in a global view, once its variability met LDE, the proposed system did not classify it as category I but as category II. Since the long-term CTG signal considers more than 40 min by visual analysis, it is time-consuming and difficult to execute in the clinical environment.
A relative objective comment based on long-term CTG analysis by the proposed system was necessary for clinical evaluation. The categories of IIa and Iib, carried out by the XGBoost analysis, improved the information of the fetus in original category II. The FD ratio from the original category II by the rule-based method was about 65.2%, and after having classified by XGBoost analysis into categories IIa and IIb, their FD ratios were changed to 28.8% and 71.2%, respectively. It could provide much reliable information for clinical evaluation. The sequence analysis in this study not only contained the original part of the feature analysis in the rule-based method but also gained a new feature through the abnormal ratio of all the CTG records. However, while the definition of variable DE is not clear at present, it is also difficult to define a good feature extraction method in our system. Therefore, more clinical trials and big data sets for further analysis and verification are required.
The management of category II FHR patterns remains the most important and challenging issue in the clinical application of CTG. It has been estimated that up to 80% of women will have a category II FHR tracing at some point during labor. However, there is no specific guidance on the management of category II FHR tracings [26]. The goal of CTG is to identify hypoxic babies who may need additional assessments or determine whether the fetus needs to be delivered immediately. However, continuous CTG was associated with increased caesarean sections and instrumental vaginal deliveries [9]. It is also uncertain whether intrapartum oxygen supplementation could resolve fetal hypoxia in the category II group [27]. Therefore, it is an unmet need to improve the current classification of category II FHR tracings. In this study, we proposed an XGBoost-based algorithm, which can improve the information of the fetus in original category II. Moreover, further studies are required to validate that our method can reduce unnecessary caesarean sections or improve neonatal outcomes.

Conclusions
A combination of the rule-based method and XGBoost analysis could improve the clinical evaluation of the CTG signal. For example, while most fetal heart rate tracings with the variable deceleration pattern were located in category II, fetal academia only occurred in a subset of cases presenting with variable decelerations [28][29][30][31]. In future studies, the proposed method can be combined with other physiological parameters to improve the detection of FD and obviate false positive signals. Overall, the proposed method has a high potential to be an assistive and warning system in improving the care quality for pregnant women while reducing the burden of medical staff.