Preterm-term birth classification using EMD-based time-domain features of single-channel electrohysterogram data

Preterm birth anticipation is a crucial task that can reduce both the rate and the complications of preterm birth. Electrohysterogram (EHG) or uterine electromyogram (EMG) data have shown that they can provide useful information for preterm birth anticipation. Four distinct time-domain features (mean absolute value, average amplitude change, difference in absolute standard deviation value, and log detector) that are commonly applied to EMG signal processing were utilized and investigated in this study. A single channel of EHG data was decomposed into its constituent components (i.e., into intrinsic mode functions) by using empirical mode decomposition (EMD) before their time-domain features were extracted. The time-domain features of the intrinsic mode functions of the EHG data associated with preterm and term births were applied for preterm-term birth classification by using a support vector machine with a radial basis function. The preterm-term birth classifications were validated by using 10-fold cross validation. From the computational results, it was shown that excellent preterm-term birth classification can be achieved by using single-channel EHG data. The computational results further suggested that the best overall performance concerning preterm-term birth classification was obtained when thirteen (out of sixteen) EMD-based time-domain features were applied. The best accuracy, sensitivity, specificity, and F1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$F_1$$\end{document}-score achieved were 0.9382, 0.9130, 0.9634, and 0.9366, respectively.


Introduction
Preterm birth (or premature birth) is defined by the World Health Organization (WHO) [1] as all births occurring before 37 completed weeks of gestation [2]. Preterm birth is a major health concern, as a preterm birth gives the baby less time to develop in the womb [3]; additionally, preterm birth constitutes a major cause of death for babies [2]. Preterm birth can cause both short-term complications and long-term complications [3]. Solutions for reducing the incidence of preterm birth include improving healthcare before, between, and during pregnancies, as well as providing effective treatments and interventions and identifying women at risk for preterm delivery [4,5]. Therefore, preterm birth anticipation is a crucial task that is useful in reducing preterm birth and its complications. Electrohysterogram (EHG) or uterine electromyogram (EMG) procedures for measuring the electrical activity of uterine muscles [6] has become a potential noninvasive tool for assessing and monitoring uterine activity [6][7][8]. There have been a number of computational tools and techniques that have been applied for classifying and predicting preterm birth, with several of these techniques being summarized in Table 1. A variety of time-domain and frequency-domain features, as well as quantitative features, that are obtained by using concepts of nonlinear dynamics and chaos theory (including the Lyapunov exponent, correlation dimension, and sample entropy extracted from EHG signals) are examined. From previous studies, the root mean square, the peak frequency, the median frequency, and the sample entropy are the most common quantitative features of EHG signals that are extracted and applied [6,[9][10][11][12][13]. Common time-domain features of EHG signals and the root mean square [14,15] are also typically investigated. Furthermore, Hemthanon and [16] have also examined the correlation between common time-domain features of EHG signals and gestational age.
A wide range of performances in preterm birth classification and prediction have been reported with various metrics, including accuracy, sensitivity, specificity, and area under the receiver operating characteristic (ROC) curve. Preterm birth classification and prediction remain challenging, as insightful information related to preterm birth is explored to enhance their performance. In this study, the capability of four distinct time-domain features (the mean absolute value, the average amplitude change, the difference absolute standard deviation value, and the log detector) for preterm-term birth classification is further examined, as time-domain features have been common and useful quantitative features that have been applied for various applications of EMG signal processing and analysis. In contrast to all channels of EHG data that have been generally examined in previous studies, and due to the fact that excellent performance on preterm birth classification and prediction was reported, only a single channel of EHG data was examined and independently applied for preterm birth classification. Before the timedomain features were extracted and applied for preterm birth classification, empirical mode decomposition (EMD) was applied to decompose single-channel EHG data into a finite set of oscillatory components that are known as intrinsic mode functions (IMFs), which are associated with sequentially decreasing frequencies [22][23][24].

EHG dataset
The electrohysterogram (EHG) dataset that was examined in this study was obtained from EHG recordings collected at the Department of Obstetrics and Gynecology, Medical Centre Ljubljana Ljubljana [6,25] from 1997 until 2006. Three hundred EHG recordings are publicly available at https:// www. physi onet. org/ conte nt/ tpehg db/ on PhysioNet [25]. The EHG recordings were recorded from a general population of pregnant women during regular check-ups at either around the 22nd week of gestation or around the 32nd week of gestation [6,25]. Three channels of EHG recordings that were obtained from four electrodes placed around the navel [6,25] are referred to as s 1 , s 2 , and s 3 . The EHG data were acquired by using a sampling frequency of 20 Hz.
The EHG dataset was classified into four classes (referred to as PE, PL, TE, and TL), which corresponded to two subject groups and two periods of recordings that are summarized in Table 2. Those two subject groups included a group of preterm births (P) that are associated with delivery either before or on the 37th week of gestation, as well as a group of term births (T) that are associated with delivery after the 37th week of gestation [6,25]. Those two periods of recordings constitute an early period (E), that is associated with the recordings acquired before the 26th week of gestation, as well as a late period (L), that is associated with the recordings acquired either during or after the 26th week of

Data processing
EHG signals corresponding to each channel of the original EHG recordings were segmented with a length of 14,400 samples (an equivalence of 720 s) from the end of the EHG recordings. The EHG segments were further divided into 3600-sample epochs with a 50% overlap. Accordingly, there were twelve EHG epochs for each EHG segment. EHG epochs were dissected and decomposed into four IMFs by using empirical mode decomposition [22,24,26]. The time-domain features that are commonly applied in EMG signal processing and analysis were subsequently extracted from the intrinsic mode functions of the EHG epochs. The mean absolute value (MAV), the average amplitude change (AAC), the difference absolute standard deviation value (DASDV), and the log detector constituted the four distinct time-domain features that were applied for preterm-term birth classification. We denoted a sequence of intrinsic mode functions x[n] with a length of N applied for time-domain feature extraction. The time-domain features of each EHG epoch, referred to as F 1 , F 2 , F 3 , and F 4 , are given as follows: (1) (2) (3)

Classification and evaluation
A feature vector that is applied for preterm-term birth classification was composed of time-domain features of the intrinsic mode functions of the EHG epochs. A number of time-domain features that are used to form the feature vector ranged from 2 to 16, which represented all of the timedomain features. The time-domain features of the intrinsic mode functions of the EHG epochs that were selected and applied correspond to their ranking scores that were obtained from the p-value of the 2 test, i.e., log (p) . Two pretermterm birth classifications were examined in this study: (1) the PE-TE classification and (2) the PL-TL classification. Support vector machine (SVM) was applied as a binary classifier for classifying between the group of preterm births (P) (either PE or PL class) and the group of term births (T) (either TE or TL class). SVM is a supervised learning algorithm that has been widely applied and is considered to be one of the most applicable classifiers [27]. SVM is adaptable to problems via the proper selection of a kernel [28]; thus, it is flexible for both linear and nonlinear-based discriminatory analyses [29]. Furthermore, SVM outperforms conventional classifiers when the amount of training data is small [28]. In computational experiments, the radial basis function (RBF) kernel is used to train the SVM classifiers.
In addition, standardization is also applied to SVM models. For the evaluation of preterm-term birth classifications, 10-fold cross-validation was applied. The performance of preterm-term birth classifications was evaluated by using four conventional classification performance measures, including accuracy (Ac), sensitivity (Se), specificity (Sp), and F 1 -score, which are, respectively, given by where TP, TN, FP, and FN denote the number of true positives, number of true negatives, number of false positives, and number of false negatives, respectively.
It should be noted that other common supervised learning classifiers, including decision trees, k-nearest neighbors,  feedforward neural networks, and naive Bayes classifiers, were also applied. In general, the superior performance on preterm birth classifications is obtained when support vector machines are applied as a preterm birth classifier. Therefore, only computational results that were obtained by using support vector machines are presented. All of the computational experiments were performed by using MATLAB R2020a running on a MacBook Pro with a 2.6 GHz Dual-Core Intel Core i5 processor and 8 GB memory.

Preterm-term birth classifications using early period EHG data
The accuracy, sensitivity, specificity, and F 1 -score of the preterm-term birth classification obtained by using the channels, s 1 , s 2 , and s 3 , of the early period EHG data are shown in Figs. 1a-d, 2a-d, and 3a-d, respectively. The accuracy, sensitivity, specificity, and F 1 -score of the preterm birth classification ranged from 0.7002 to 0.9382, 0.5515 to 0.9153, 0.6774 to 0.9634, and 0.6479 to 0.93662, respectively. The performance of the preterm-term birth classification corresponding to all of the metrics tended to be boosted as the number of time-domain features of the intrinsic mode functions of the applied EHG epochs increased. The best performances that were obtained from the preterm-term birth classifications using the time-domain features of the intrinsic mode functions of the early period EHG data corresponding to each performance metric are summarized in Table 3. All of the performance metrics (the accuracy, sensitivity, specificity, and F 1 -score) of the preterm-term birth classification obtained by using the channel s 1 of the early period EHG data are completely higher than those of the preterm-term birth classification using either channel s 2 or s 3 . The best performance of the preterm-term birth classifications was generally achieved by using thirteen time-domain features of the intrinsic mode functions of channel s 1 of the early period EHG data, in which the best accuracy, specificity, and F 1 -score were obtained. The accuracy, sensitivity, specificity, and F 1 -score of the pretermterm birth classification using thirteen time-domain features

Preterm-term birth classifications using late period EHG data
The accuracy, sensitivity, specificity, and F 1 -score of the preterm-term birth classification obtained by using the channels s 1 , s 2 , and s 3 , of the late period EHG data are shown in Figs. 4a-d, 5a-d, and 6a-d, respectively. The accuracy, sensitivity, specificity, and F 1 -score of the preterm birth classification ranged from 0.6842 to 0.8970, 0.5858 to 0.9176, 0.6728 to 0.8993, and 0.6498 to 0.8980, respectively. The trend of increasing accuracy, sensitivity, specificity, and F 1score of the preterm-term birth classifications as the number of time-domain features of the intrinsic mode functions of the applied EHG epochs increased was obviously evidenced in channels s 1 and s 3 . Table 4 summarizes the best performances obtained from the preterm-term birth classifications by using the timedomain features of the intrinsic mode functions of the late period of EHG data corresponding to each performance metric. Likewise, higher values of accuracy, sensitivity, specificity, and F 1 -score of the preterm-term birth classifications using late period EHG data were obtained in channel s 1 , compared to channels s 2 and s 3 . The best accuracy and F 1score of the preterm-term birth classifications were obtained

Discussion
From the computational results, it was shown that the performance of preterm birth classification that was achieved by using channel s 1 of the EHG data, which records the difference in electrical activity of uterine muscles above the navel [6], is superior to that achieved by using channel s 2 or s 3 of the EHG data. In general, the time-domain features of the intrinsic mode functions of single-channel EHG data providing the best performance on preterm-birth classification included the first thirteen ranked time-domain features of the intrinsic mode functions of single-channel EHG data. The accuracy, sensitivity, specificity, and F 1 -score of pretermterm birth classification using the first thirteen ranked timedomain features of the intrinsic mode functions of channel s 1 of the EHG data that were acquired in the early period were 0.9382, 0.9130, 0.9634, and 0.9366, respectively, whereas the accuracy, sensitivity, specificity, and F 1 -score of preterm-term birth classification using the first thirteen ranked thirteen time-domain features of the intrinsic mode functions of channel s 1 of the EHG data that were acquired in the late period were 0.8970, 0.9062, 0.8879, and 0.8980, respectively.
Furthermore, the performance on preterm birth classification was slightly decreased when no feature selection was applied. The accuracy, sensitivity, specificity, and F 1 -score of preterm-term birth classification using all sixteen time-domain features of the intrinsic mode of channel s 1 of the EHG data that were acquired in the early period were 0.9153, 0.8810, 0.9497, and 0.9123, respectively, whereas the accuracy, sensitivity, specificity, and F 1 -score of pretermterm birth classification using all sixteen time-domain features of the intrinsic mode of channel s 1 of the EHG data that were acquired in the late period were 0.8696, 0.9176, 0.8215, and 0.8755, respectively. In addition, better performance was obtained from the preterm-term birth classification by using single-channel EHG data that were acquired in the early period (i.e., before the 26th week of gestation), compared with those data that were acquired in the late period (i.e., during or after the 26th week of gestation).
The performance on preterm-term birth classification and discrimination of notable previous studies is summarized in Table 1 and compared in Table 5. In general, the performance on preterm birth classification has been partially determined in previous studies, as different performance metrics have been reported. The reported values of accuracy, sensitivity, and specificity of preterm birth classification have ranged from 0.8800 to 0.9700, 0.8800 to 0.9508, and 0.8400 to 0.9733, respectively. Obviously, the performance on preterm birth that was obtained in this study by using four time-domain features of the intrinsic mode functions of a single channel of EHG data is above the average or comparable to the performance on preterm birth that has been reported in previous studies. It should be noted that the best overall performance on preterm-birth classification was reported by Acharya et al. [17], with an accuracy of 0.9625, sensitivity of 0.9508, and specificity of 0.9733, wherein eight quantitative features of the EHG components, including fractal dimension, fuzzy entropy, interquartile range, mean absolute deviation, mean energy, mean Teager-Kaiser energy, sample entropy, and standard deviation, as well as the adaptive synthetic sampling approach (ADASYN), were applied.

Conclusion
The mean absolute value, the average amplitude change, the difference absolute standard deviation value, and the log detector, which are commonly applied for EMG signal processing and analysis, were extracted from the first four intrinsic mode functions of single-channel EHG and applied for preterm-birth classification. The computational results verified that excellent performance on preterm birth classification can be achieved by using only those four timedomain features, and single-channel EHG data can also yield Performance of the preterm-term birth classification obtained using the channel s 3 of late period EHG data excellent performance on preterm birth classification. The performance on preterm birth classification was comparable to that obtained by using more complicated quantitative features and computational techniques. The first thirteen time-domain features of the intrinsic mode functions of single-channel EHG data that were ranked by using the p-value of the 2 test included quantitative features that were most favorable for preterm birth classification. Furthermore, the computational results implied that the best position for acquiring EHG data is the area above the navel, as the performance on preterm-birth classification that was obtained by using channel s 1 of EHG data was generally better than that obtained by using either channel s 2 or s 3 . Superior performance on preterm birth classification was obtained by using the EHG data that were acquired in the early period of pregnancy, compared to those data that were acquired in the late period of pregnancy. The best performance on preterm birth classification was achieved with accuracy, sensitivity, specificity, and F 1 -score values of 0.9382, 0.9130, 0.9634, and 0.9366, respectively, by using thirteen time-domain features of the intrinsic mode functions of channel s 1 of EHG data that were acquired in the early period.
Funding This study was funded by a TRF Research Career Development Grant jointly funded by the Thailand Research Fund (TRF) and Ubon Ratchathani University (Contract Number RSA6180041).