An Automatic Classi�cation Method of Sleep Apnea Events Based on EEG Frequency Sub-band Division

Sleep apnea is a kind of sleep disorder with a high prevalence rate. It is manifested as the abnormal stop of breathing during sleep and is highly dangerous to human health. The purpose of this research is to find a simple, and effective feature extraction method that can able to distinguish obstructive apnea events, central apnea events, and normal breathing events. Unlike conventional methods, the method illustrated in this study used the Infinite Impulse Response Butterworth Band pass filter to divide the Electroencephalogram (EEG) signal into 5, 7, 9 or 11 frequency sub-bands and then used the Welch method to extract the power features of these frequency sub-band signals, which were subsequently used as classifier input. Random forest, K-nearest neighbors and bagging classifiers were investigated. The results showed that in several different frequency sub-band division methods of EEG signals, the features extracted from the EEG signal that was divided into 11 frequency sub-bands were more conducive to the classification of sleep apnea events. The random forest classifier achieved the highest average accuracy, macro F1 and kappa coefficient in three types of events, which were 90.43%, 90.38% and 0.88, respectively. Compared with existing methods, the method used in the present study has higher classification performance.


Introduction
Sleep apnea (SA) is a common sleep disorder, which is characterized by a significant decrease in breathing airflow or even complete interruption for more than 10 seconds during sleep [1]. SA has a high morbidity and its incidence gradually increases with age. It is estimated that about 936 million adults aged 30-69 worldwide have had different degrees of sleep apnea syndrome by 2019 [2]. In addition to reducing quality of life, causing daytime fatigue, cognitive function and memory impairment, sleep apnea can also lead to a variety of life-threatening cardiovascular diseases [3,4]. As a common and harmful systemic sleep disease, SA has become one of the main research topics in the field of sleep medicine.
Obstructive sleep apnea (OSA), central sleep apnea (CSA) and mixed sleep apnea (MSA) are three types of SA [5]. OSA events is manifested as the oral-nasal airflow but the abdominal and thoracic effort is still exerted, which is a sleep respiratory disease with the highest incidence and the most harmful effects. CSA events is characterized by simultaneous disappearance of oral-nasal airflow and thoracoabdominal respiration. MSA events are the combination of the above two events, starting with central apnea events, followed by obstructive sleep apnea events, and vice versa [6]. The traditional SA detection method uses polysomnography (PSG) as the gold standard, but its operation is complex and more than ten sensors need to be connected to the patient, to monitor the sleep parameters such as electroencephalogram (EEG), electrocardiogram (ECG), electromyogram (EMG), electrooculogram (EOG), respiratory effort (both abdominal and thoracic) and blood oxygen saturation during the entire night sleep [7]. Then, the data collected overnight are analyzed and diagnosed comprehensively by professional technicians. In addition, this detection process can easily cause discomfort to patients and may also lead to subjective errors [8]. Therefore, many researchers are committed to obtaining SA information from a single or a small amount of physiological signals for simple and effective automatic SA classification, such as photoplethysmography (PPG) [9], SpO2 [10,11], ECG [12,13], thoracoabdominal signal [14], EMG, ECG and EEG [15].
In recent years, the application of EEG as the single physiological signal source to diagnose sleep apnea has attracted increased attention. The reasoning behind the choice of EEG is as follows: First, EEG is widely used in sleep staging, which can determine the sleep time of patients [16]. Identifying sleep apnea events based on EEG is relatively accurate, compared with PPG, SpO2, ECG and other signals. Second, the usefulness of ECG monitoring and PPG monitoring in patients with irregular breathing or obviously abnormal heart rate is deteriorated. However, EEG can still accurately determine sleep state and sleep apnea events in this case, which has great clinical implication. Otherwise, the use of EEG signals instead of several physiological signals to monitor sleep apnea events can also save acquisition and computing costs.
So far, the method of automatic classification of sleep apnea based on EEG signals has been widely studied [17][18][19][20][21][22][23][24]. Zhou et al. utilized detrend fluctuation analysis (DFA) to calculate the 30-min EEG scaling index that quantifies the power-law correlation, and then used a support vector machine classifier to detect SAS [17]. Wafaa et al. extracted the energy and variance of delta, theta, alpha, beta, and gamma sub-bands as features, as input of SVM, Artificial Neural Network, Linear Discriminant Analysis and Naive Bayes classifiers to distinguish between OSA patients and normal controls [18]. Similarly, Saha et al. used subband signals instead of full-frequency EEG signals to extract features. They used the entropy [19] and the energy ratio [20] of the sub-band signal as the characteristics to represent the random feature difference between apnea and non-apnea events and KNN for classification. In addition, Ahmed et al. considered the energy and the geometric and arithmetic means of the beta band to solve the same problem [21]. Shahnaz et al. proposed a method to extract the delta sub-band power ratio as a feature to classify apnea and non-apnea events [22]. Taran et al. proposed a particle swarm optimisation-based Hermite decomposition algorithm to decompose EEG signals for SA event recognition [23]. Bhattacharjee et al. used Rician model parameters and statistical indicators to represent the different characteristics between apnea and non-apnea events [24]. However, these methods focus on the detection of SAS or apnea events and do not concerned with the identification of extensional types of SA events. Therefore, it is necessary to develop an automatic classification method of apnea event types based on EEG signals in order to determine the specific type of SA in patients.
The primary research work of this paper included proposing a simple and effective method to classify sleep apnea events based on EEG frequency band division. Due to the different treatment each type of sleep apnea requires, correct diagnosis is very important, and at present, the same treatment methods are used in clinical practice for MSA and OSA. Therefore, in this study, MSA events were treated as OSA events. The power features of the EEG sub-band signal were then extracted to distinguish between normal breathing events, OSA events and CSA events.

Results
Results were obtained through 10-fold cross-validation, that is, the data set was randomly divided into 10 equal subsets. In each fold, one subset was considered as the test data and the remaining nine subsets were used for training. The process was repeated 10 times with different test subsets and finally the average of the 10 results was taken as the result. The performance of the proposed method is evaluated in this section. Therefore, the performances of two different powers and three classifiers are compared.   . The average AP of normal breathing events was the largest, which is followed by CSA and OSA ( Figure 1). The majority of the power of each breathing event was concentrated in the low frequency part. By comparison, Figure 2 shows that RP relationships of three respiratory events were more complex, with no obvious arrangement relationship law. For the performance analysis of the proposed method, the influence of different powers was first compared. In this study, the AP and RP of each sub-band signal were extracted, and the RF classifier was used to evaluate the performance of these two powers. The average accuracy of AP was higher than that of RP in all cases (Fig. 3). Therefore, AP was chosen as the final feature to present the classification result for the rest of the analysis. The impact of the number of frequency sub-bands on the classification results when AP is used as feature is presented in Fig. 4 When the classifier is determined, the average accuracy of breathing events was gradually enhanced with the increase of the number of frequency subbands. Therefore, APs of the 11 frequency sub-band were used in the rest of this study. It can also be observed from Fig. 4 that the classification results of RF were superior to those of other classifiers in almost all cases. In addition, the classification accuracy of RF, KNN and bagging classifiers for each type of breathing event is given in Fig. 5 The RF classifier showed better classification results for each type of event, compared to the other two classifiers (KNN and bagging classifier). Specifically, the classification accuracy of OSA events was 88.12%, CSA events was 85.88% and NB events was 97.33%. The 10-fold cross-validation confusion matrix using the RF, KNN and bagging classifiers is presented in Table 1. From the confusion matrix, the RF classifier correctly classified 221 events out of 244 test samples. In contrast, bagging and KNN classifiers correctly classified 219 and 204 events, respectively.  Table 2 provides the macro F1 and kappa coefficients of the three classification models used in this study to evaluate the classification performance of the proposed method. It can be seen from this table that the classification performance of RF is superior to the other two classifier models.

Discussion
This study aims to develop a simple and effective automatic classification algorithm for SA events, to reduce the detection and medical cost. Therefore, the EEG signal was selected as the research object, after considering the influence of SAS on various physiological signals and their corresponding characteristics. For the processing of EEG signals, several previous studies have divided EEG signals into delta, theta, alpha, beta, and gamma frequency subbands and then extracted all or part of the frequency sub-band features for sleep apnea classification [18][19][20][21][22]. Compared with previous studies, the main contribution of the proposed method is feature extraction of each epoch using the variants of the above EEG frequency sub-band signals. The feature performance of the five frequency sub-bands was lower than that of their variants (Fig. 3 and 4). Moreover, the increase of the number of frequency subbands led to increase in the classification performance, reaching the highest in 11 frequency sub-bands. Specifically, compared with the feature classification results of five frequency sub-bands, the average error of 11 frequency sub-bands was reduced by 2.87% (average accuracy increased from 87.56% to 90.43%, relative error decreased by 23.14%). In addition, since only the power features of the frequency sub-band signals were extracted, the number of features is still acceptable even when the number of sub-bands is the largest (i.e., 11 subbands). Therefore, feature selection is unnecessary, simplifying the entire process.
The detection and classification of SA events can help doctors diagnose the type of SA of patients quickly and accurately, in order to arrange corresponding treatment plans. In addition, the use of EEG in identification and classification of SA is of high clinical significance. However, few recent literature studies exist that use only EEG signals to classify sleep apnea events. Monika et al. combined discrete wavelet transform (DWT) with Hilbert transform (HT) to analyze EEG signals. The features used were mainly related to the instantaneous attributes (frequency, amplitude and amplitude-weighted frequency) of HT, then Artificial Neural Networks were used to classify OSA events, CSA events and normal breathing events. The final average classification accuracy was 77.3%, of which the accuracy for OSA events was 86.36%, for CSA events was 74.24% and for normal breathing events was 71.21% [25]. In addition, the same method was used to find differences in sleep apnea information contained in EEG signals from C3-A2 and C4-A1 channels. The simultaneous use of the data from the two channels is more conducive to the automatic detection and classification of SA [26]. The proposed method outperformed the method in Monika et al. In our previous work, the sample entropy and variance of the five EEG sub-band signals of delta, theta, alpha, beta, and gamma were extracted, then the optimal feature subset by the neighbor composition analysis (NCA) were applied to RF to classify the three types of events. The average accuracy of 88.99% was reported, in addition, the average accuracy of OSA events was 80.43%, CSA events was 84.85%, and normal breathing events was 95.24% [27]. In contrast, the average accuracy of the method used in this study increased by 1.44%, and most notably the classification accuracy of OSA events increased by 7.69%.
Although this research has certain advantages and effectiveness, the following limitations need to be resolved in future research. Signal decomposition is one of the methods of effective information extraction from EEG signals. From the perspective of the frequency domain, the EEG signal contains several different frequency ranges, which are used to extract representative features, replacing the entire EEG signal for analysis [16]. The proposed method in this study decomposed the spectrum of EEG signals, but in a strict sense, time information is ignored with this decomposition method. In future research, both time domain and frequency domain information will be considered to extract more effective features and improve the classification accuracy of sleep apnea events.

Method
In the proposed method, the respiratory events of the subjects were detected frame by frame to distinguish normal breathing events, OSA events and CSA events. Fig. 6 shows the main steps of the method. First, the EEG signals of each epoch were preprocessed and four different frequency division schemes were considered. In the feature extraction phase, the power features of each sub-band were calculated, respectively, and then the obtained features were used as input to the classifiers for classification of breathing events. The algorithm used in this research was implemented in MATLAB R2017b (The MathWorks, Natick, Massachusetts, USA). . The EEG signals of all patients were collected from the C3-A2 and C4-A1 channels, the sampling rate was 100 Hz/s and the epoch length exceeded 10s. All normal breathing and apnea events were derived from the EEG of the patient during sleep and have been marked by sleep experts. Among them, 2030 OSA events were identified (including 801 MSA events) and 812 CSA events. In order to obtain a balanced data set in terms of number of samples per class, 812 events from OSA events and normal breathing (NB) events were randomly selected, along with the 812 CSA events to form the final dataset.
Ethics approval and consent to participate. This study was approved by the Ethics Committee of Tianjin Chest Hospital and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. And informed consent was obtained from all individual participants included in the study.

Conflict of Interest.
All authors declare that he or she has no conflict of interest.
Data preprocessing. Two tasks were performed in this part. The noise and frequency bands in the EEG signal that were outside the interest of this study were filtered out and the EEG signal was divided into sub-bands. The Infinite Impulse Response Butterworth filter was used to process the data. Several studies on the division of EEG frequency bands have been performed [28][29][30][31][32]. Additionally to the common five sub-band division methods of delta, theta, alpha, beta and gamma, according to previous studies, the alpha band can be divided into two parts (8-10 Hz and 10-13 Hz) [28], the beta band can be divided into two parts (13-18 Hz and 18-30 Hz) [29] [30]or three parts (13-18 Hz,18-25 Hz and 18-30 Hz) [31] and the gamma band can be divided into two parts (30)(31)(32)(33)(34)(35)(36)(37)(38)(39)(40) Hz and 40-49.5 Hz) [30] or four parts (30)(31)(32)(33)(34)(35)(36)(36)(37)(38)(39)(40) Hz, 40-46 Hz and 46-49.5 Hz) [32]. In this study, according to the above frequency band division methods, different frequency partitioning schemes for EEG signals were used and 5, 7, 9 and 11 frequency sub-bands were obtained, respectively. Table 3 shows the number and frequency range of EEG sub-band signals. Feature extraction. All the features used in this study were extracted based on the frequency domain. Specifically, the Welch method was used to calculate the power of EEG signals [33,34]. For the parameter setting, a rectangular window function with 25% overlapping samples was selected to calculate the power spectral density (PSD) function. Then the absolute power (AP) and relative power (RP) of each sub-band according to the PSD were calculated. AP was obtained by integrating the PSD and RP was obtained by dividing the absolute power of the sub-band signal by the total spectral power [35]. The performances of these two powers were evaluated in the next chapter.
Classifiers. Three widely applicable classifiers, namely, RF, KNN and bagging classifier were evaluated in this study in terms of performance. RF is a classifier comprised of multiple decision trees. Its basic principle revolves around random selection of feature values that ensure the independence of each decision tree and then the prediction accuracy is enhanced by combining these unrelated decision tree models [13]. The voting results from all decision trees determine the final output of RF (the category with the most votes are designated as the overall classification output) [36,37]. In addition, the RF classification has high accuracy, and it can run effectively on large data sets [38]. Studies have shown that increasing the number of trees does not mean that the classification performance can be significantly improved [39], and the proposed method achieved the best effect with 450.
KNN algorithm is a supervised classification method, in which the new samples are classified by measuring the distance between different feature values. The principle is that if the majority of the k nearest samples of a sample in the feature space belong to a certain category, the sample is also classified into this category [40]. The algorithm includes two important parameters: distance measurement and the number of nearest neighbors. In this study, the Euclidean distance was selected to determine the distance between neighbors, and the number of nearest neighbors k in all experiments was equal to 5.
Bagging classifier is a method that repeatedly samples (with replacement) from the data according to a uniform probability distribution, which is a kind of ensemble learning algorithm [41]. It is based on bootstrap sampling to extract T subsets which contain m training samples from the training set, with each subset used to train a base classifier. The base classifiers are then combined and the category with the vote majority is the final classification result [42]. Fig. 2 shows the algorithm flow chart of the bagging classifier. In our test, we bagged 300 tree learner packages to get the classification results. Performance measures. The confusion matrix, macro F1 (m_F1) and kappa coefficient were selected as the classification performance evaluation indicators. Macro F1 is an indicator to comprehensively examine the quality of classifiers under different categories in multiclassification problems. It is obtained by calculating the precision, recall and F1 value of each category. The macro F1 is defined as follows: (1) where TP, FP, FN represents the number of samples identified as true positives, false positives, and false negatives, respectively and n is the number of sample types.
For classification problems, the kappa coefficient is a measure of the consistency between the predicted results of the model and the actual classification results. The value of kappa coefficient is between -1 and 1, usually greater than 0. It is defined as: (5) where (6) (7) where, N is the total number of all epochs in the sample, and TN is the number of true negative samples.

Conclusions
A new feature extraction method based on electroencephalogram (EEG) sub-band signals was proposed in this article to classify sleep apnea events. This method divided the EEG signals into different frequencies, and obtained 5, 7, 9 and 11 frequency sub-bands, respectively. The two different powers of each sub-band signal, namely absolute power and relative power, were used as features. In the classification stage, the three classifiers of Random Forest (RF), bagging and K-nearest neighbors (KNN) were used. The results showed that the RF classifier had the best classification performance for the absolute power features of 11 frequency sub-bands. The average classification accuracy of obstructive apnea (OSA), central apnea (CSA) and normal breathing events reached 90.43%. Specifically, the accuracy of these three types of events was 88.12 %, 85.88% and 97.33 %, respectively. Compared with the state-of-the-art technology, the method introduced in this paper is simple and effective, and provides a high-performance sleep apnea event identification method.