Risk Screening of Obstructive Sleep Apnea Syndrome by Body Proles via Random Forests Model

Obstructive Sleep Apnea Syndrome (OSAS) is a major global health concern and is typically diagnosed by in-lab polysomnography (PSG). This examination though has high medical manpower costs and alternative portable methods have further limitations. This paper develops a new model for screening the risk of OSAS in different age groups and gender by using body proles. The effects of body proles for different subgroups in sleep stage alteration and OSAS severity are also investigated. regression was used to evaluate the correlations between body proles and sleep stages as well as sleep disorder indexes. To develop an age and gender independent model, random forests (RF), which is an ensemble learning method with high explainability, were trained by the four groups by gender and age (older or younger than 50 old) with ratios of 70% (training dataset) and 30% (testing dataset), respectively. Prediction performance was evaluated by sensitivity, specicity and accuracy. Variable importance was assessed by averaging the impurity decrease to account for the effect of different

In recent years, Obstructive Sleep Apnea Syndrome (OSAS) has become a source of major health concerns globally (1). A study by (2) reported that the estimated prevalence of OSAS (moderate to severe degree) in the United States was 10% in males between 30 and 49 years of age, rising to 17% in elderly males (between 50 and 70 years old). The same study noted that for females, the estimated prevalence was 3% of moderate-to-severe OSAS between the age 30 and 49 years, again increasing to 9% between 50 and 70 years. With respect to comorbidity, OSAS is also considered as an independent risk factor for a wide range of ailments, including: cardiovascular diseases, systemic hypertension, stroke, abnormal glucose metabolism and even cancer (3,4). Furthermore, previous studies have observed that OSAS correlates to: brain damage, cognitive impairment, and dementia (5)(6)(7). Therefore, without doubt, OSAS signi cantly affects an individual's quality of life.
To diagnose the severity of OSAS and thereby develop therapeutic strategies, the in-lab polysomnography (PSG) is the standard examination (8). However, this examination has associated high resource costs in terms of medical manpower for continuous sleep monitoring (9). Given both that the examination modalities are expensive and that there is often a lack of space in a sleep laboratory, the waiting lists for an individual to receive a PSG are usually long, e.g., typically, the average wait time for receiving medical therapy after a PSG in the United States is 11.6 months (10). This limited availability results in delays in the time required to diagnose sleep disorders (11). To overcome these limitations, the Home Sleep Test (HST) has been considered as an alternative portable examination for diagnosing the severity of OSAS. However, this test too has numerous limitations regarding its use. For instance, the accurate diagnosis of OSAS through a HST is curtailed if the patient suffers from other comorbidities (12). For example, for subjects whose body mass index (BMI) are more than 40 and elderly subjects (over the 65 years of age), there are no established HST clinical guidelines (13). In addition to these tests, there are a variety of questionnaires designed on the basis of the clinical prediction rules, such as: the Berlin questionnaire, Epworth Sleepiness Scale, and the STOP-Bang questionnaire (14). Although these questionnaires have been validated as high sensitivity tools for screening OSAS, low speci city has been observed in each of the severity groups (15). Hence, given these shortcomings of the current methods of evaluating OSAS, there is a need to develop a new method for an accurate examination of the risk of OSAS.
In order to do this, it is worthwhile to investigate the signi cant predictors related to OSAS severity. One such predictor is gender, e.g. (16) noted that the prevalence of OSAS in southern Pennsylvania is nearly three times higher in males compared to females. The same study reported that OSAS prevalence in postmenopausal females was 2.7% and signi cantly higher than the 0.6% prevalence in premenopausal females. With respect to anthropometric pro les, a higher mean BMI and a larger mean neck circumference has been observed in severe OSAS subjects (N = 25) compared with normal subjects (N = 14) in Turkey (BMI: 34.55 kg/m 2 versus 29.83 kg/m 2 , p = 0.021; Neck: 40.84 cm versus 36.11 cm, p < 0.001) (17). Another study indicated that for Turkish adults, the odds ratio for OSAS was 1.09 (95% con dence interval (CI): 1.014-1.17, p < 0.05) with each increase of 3.5 cm in neck circumference (18). In another study, the signi cant mean larger waist circumference was observed in a severe OSAS group (N = 437) compared with a control group (N = 72) in the Turkish subjects (OSAS group: 111.74 ± 12.47 cm; control group: 91.67 ± 12.00 cm, p < 0.001) (19). In a further study, which involved OSAS subjects (N = 59) recruited in the USA, signi cant associations were observed between the apnea-hypopnea index (AHI) and BMI (r = 0.349, p = 0.008), neck circumference (r = 0.276, p = 0.038), and waist circumference (r = 0.459, p < 0.001) (20). Despite epidemiological reports showing that excess body pro les are associated with the severity of OSAS, to the best of the authors' knowledge, there is still no applicable screening model for accessing OSAS risk by considering the anthropometric data, gender and age effects. Furthermore, the associations between anthropometric features, alternations of sleep structure and sleep-disordered indexes for different age groups and genders also remain unclear.
Therefore, this paper examines the hypothesis that body pro les, as indicators of OSAS severity, can be used to perform OSAS risk screening. The primary objective is to establish OSAS risk screening models for different age and gender groups based on body pro les. These models are established using the random forests (RF) method which has a number of advantages when compared to current methods of analysis. Furthermore, this paper investigated the effects of anthropometric features in relation to sleep stage alternations and sleep-disordered indexes in subgroups by using the regression model. The paper is organized as follows. Section 2 describes the collection of the dataset, statistical analysis and the establishment of the screening model. Section 3 presents the baseline characteristics of the subjects, the statistical outcomes between body pro les and PSG parameters and the classifying performance of the trained model. Section 4 discusses the results and compares the ndings with other related studies. The paper concludes in Sect. 5.

Ethics
The study protocol was approved by the Ethics Committee of the Taipei Medical University-Joint Institutional Review Board (SHH: N201911007). The examination institution (Sleep center of Shuang-Ho Hospital) was quali ed by the Taiwan Society of Sleep Medicine. The methods were conducted in accordance with the approved guidelines.

Study population
The data is derived from subjects who have previously undergone PSG in order to assess the severity of OSAS in the sleep center of Taipei Medical University Shuang-Ho Hospital (SHH, New Taipei City, Taiwan) between March 2015 and October 2019. The criteria for participant selection were as follows: (1) they were between 18 years and 80 years of age (2) none had received any invasive surgery for OSAS treatment (3) none had regularly taken hypnotic or psychotropic medications, and nally (4) the total recording time of the PSG was more than six hours. A baseline screening questionnaire was administered to assess the following: age, gender, BMI, neck circumference, and waist circumference. Additionally, the usage of medication and the surgical history of each participant were obtained from their clinical registration. It is worth noting that all participants were Han-Taiwanese in ethnic origin, and craniofacial features are considered as the effective factors related to the OSAS severity for Han ethnicity compared with other ethnicities (21). However, these factors were not used as predictors since this paper only recruited Han-Taiwanese subjects.

Polysomnography
The full-night PSG examination was performed by using ResMed Embla N7000 and Embla MPR in the sleep center of SHH. The recorded PSG data involved the following: electroencephalography (EEG), electrooculography (EOG), chin and leg electromyography (EMG), electrocardiography (ECG), nasal and oral air ow, thoracic and abdominal bands, snoring sensor, body position meter, and oxygen saturation. These data were scored by certi ed polysomnographic technologists using RemLogic (version 3.4.1) software. The sleep stages and respiratory events were scored using the American Academy of Sleep Medicine scoring manual for 2017 (22). The diagnosis of OSAS was determined by the frequency of apnea and hypopnea events (23). The AHI for each subject was calculated by the total event numbers of apnea and hypopnea divided by the total sleeping time (TST). In following with clinical practice, subjects were recommended to undertake the active intervention when their AHI was higher than 15 times per hour, which is the clinical threshold for moderate to severe OSAS (24).

Statistical Analysis
All the statistical analyses were conducted by using Python statistics module: Scikit-learn (version: 0.21.2). The collected PSG data, which quali ed with inclusion criteria, were divided into two groups by gender. The characteristics of the two groups were compared by using the independent student t-test for continuous variables or the chi-square test for categorical variables. In order to determine the correlations between anthropometric features, sleep structure alternations and sleep-disordered indices considering gender and menopause effects, the male and female groups were divided into subgroups by the age (over the age of 50 years or under) (25). Linear regression models were used to associate the body pro les to the parameters of PSG report among four groups. The level of signi cance was set to p < 0.05.

Random Forests
Previous studies have used classi cation methods such as genetic algorithm (26) and support vector machine (27). In this paper, we used RF, which is an ensemble learning model that can be used to perform classi cation (28). Compared with other classi cation methods, this method has the following advantages: fast result computation, high explainability of feature importance, better anti-noise ability, stable performance with high accuracy and the avoidance of over tting. Therefore, the RF method has been widely used to perform diagnosis or decision support in the medical eld (29,30). In this study, considering the amount of data collected, which is easily over tted, and to investigate the importance of predictors effecting the severity of OSAS, the RF was used to develop screening models for each subgroup.
The procedure for training and testing model is illustrated in Fig. 1. The PSG data of the four groups were divided in the ratio of 70-30% to prepare the training and testing dataset respectively. The structure of the model consists of a number of classi cation and regression trees (CART) which are trained on selected data using the bootstrap technique (28). This technique randomly samples a subset at each internal part from the training dataset to decrease training time and to prevent over tting. The training data was input in a model iteratively for training each CART in RF with bootstrapping. The number of CART was decided by out-of-bag (OOB) samples estimation which can be used to evaluate the convergence of the prediction error (31,32). In this study, the maximum number of CART was de ned as 800 for stability and resource implication. Since each CART was trained by a variety of subset data, over tting situations can be avoided and can be assembled for voting to perform the classi cation. Furthermore, feature importance was computed by averaging the impurity decrease for determining the effect by different factors (33)(34)(35).

Accuracy Evaluation
Upon completion of the training process, the testing dataset was input into the RF model to access the model sensitivity, speci city and its accuracy. The confusion matrix, which is a measurement for the performance of the classi cation, was computed. The values of true-positive (TP), true-negative (TN), false-positive (FP), and false-negative (FN) were determined. In addition, several indexes, including sensitivity (TP/ (TP + FN)), speci city (TN/ (TP + FN)), and accuracy ((TP + TN)/ (TP + TN + FP + FN)) were calculated to validate the accuracy of the trained model. The Receiver Operating Characteristic (ROC) curve was calculated to determine the optimal point, which served as the most balanced sensitivity and speci city point. The Area Under the ROC curve (AUC) measures the separability of the model. The closer this value is to 1, the better the separability of the model. In addition, the positive likelihood ratio and the negative likelihood ratio were calculated to validate the performance.

Characterization of study subjects
A total of 6614 subjects, of whom 69.9% (4632) were male, were enrolled in this study, and their baseline characteristics are demonstrated in Table 1. Several features are worthy of note in this sample. There was a signi cant difference in numbers of different OSAS severities between the male and female groups (p < 0.01). The average ages of the subjects in the male groups (48.47 ± 12.81 years) were signi cantly lower than the value of the subjects in the normal group (51.94 ± 12.67 years; p < 0.01). For anthropometric features, the mean BMI (26.97 ± 3.96 kg/m2), the mean neck circumference (39.25 ± 3.14 cm), and the mean waist circumference (93.73 ± 10.46 cm) in the male group were signi cantly higher than the mean values in the normal group (p < 0.01).  : the p-value is less than 0.05 between males and females within subjects younger than 50 years old. $ : the p-value is less than 0.05 between males and females within subjects older than 50 years old.
In terms of the sleep quality indexes, the mean values of AHI and desaturation index in the elderly groups in both genders were signi cantly higher than the values of the younger groups (AHI: p < 0.01; desaturation index: p < 0.05). There was no signi cant difference in the snoring index in male groups, whereas a signi cant difference was found in the female groups (male: 149.04 ± 193.74 events/hour; female: 176.49 ± 202.58 events/hour, p < 0.01). Also, there were signi cant differences in arousal index between younger and elder groups in both genders (all p < 0.01). Additionally, the percentage of wake and N1, WASO, AHI, desaturation index, snoring index, and arousal index in males was signi cantly higher than values in females in both younger and elder age groups (all p < 0.05). Conversely, the percentage of REM in younger males were signi cantly lower than values in younger females (all p < 0.05). The lower SE, the lower percentage of N2, and the lower percentage of REM in elder males were observed compared with elder females (all p < 0.05).

PSG results and anthropometric features
The associations between the parameters of PSG results and body pro les in the different age groups and genders are illustrated in Table 3 and Table 4 respectively. The sleep e ciency (SE) in elder males associated negatively with the waist circumference, whereas the percentage of wake stage associated positively (p < 0.05). Similarly, the percentage of wake stage associated positively with neck and waist circumference in younger males (p < 0.05). There were signi cant positive correlations between the percentage of N1 stage and all body pro les among all groups except the BMI of younger females.
Additionally, signi cant negative correlations between the percentage of N2, neck circumference, and waist circumference in males were observed. The percentage of REM negatively correlated to all body pro les in males (p < 0.05) and negatively correlated to the BMI as well as waist circumference in elder females (p < 0.05). While there were positive correlations between WASO and all body pro les among all group, they lacked statistical signi cance. The indexes of OSAS severity, AHI, desaturation index, snoring index and arousals index all correlated positively with BMI, neck circumference as well as waist circumference in all groups (all p < 0.05).

Discussion
In this study, the associations between PSG parameters and anthropometric features have been determined considering age and gender effect. The gender and age-independent models based on the anthropometric features were established successfully to assess the risk of OSAS. The applicable models were demonstrated to possess high prediction accuracy for classifying AHI higher or lower than 15, in particular for Han-Taiwanese subjects. The anthropometric feature importance for effecting OSAS severity was obtained for each subgroup. The large-scale statistics, including Han-Taiwanese anthropometric features, sleep stage details and PSG results were also provided.
For both genders, better accuracy can be observed in the younger groups, and the waist circumference showed the highest importance for its affects on the AHI except for the case of younger males. It is known that visceral fat, which is a type of body fat deposited around internal organs, is related to the AHI and waist circumference is a unique indicator for indicating visceral fat distribution. Similarly, a prior study, which used traditional statistical methods, reported the observation that waist circumference was a better predictor for the severity of OSAS compared with the BMI and neck circumference (36). Another study revealed that OSAS prevalence was exacerbated in menopausal females and waist circumference served as the main factor (37). Additionally, the BMI and waist circumference showed similar importance for effecting the AHI in the males and elderly female groups, but there is a difference in younger females. There were statistically signi cant correlations between the BMI, waist circumference and sleep stage percentage, for the males and elderly female groups, but not for younger females. These results maybe induced by the menopause effect. This effect may not lead to weight gain directly, but it may be correlated to the fat distribution changes. In perimenopause females, the increased abdominal adiposity deposition and decreased lean body mass were observed. This change is similar to the fat distribution of males which tends to develop a greater degree of upper body obesity. There are some limitations to this study, which should be addressed in the future. First of all, in this study, the dataset was limited to a South-East Asian population, with craniofacial factors, rather than from diverse body pro les with a wider geographical distribution. Hence the results of this paper should be viewed with this in mind, since craniofacial factors, which also affects sleep-disordered breathing should be considered as predictors (41). Next, the clinical standard for classifying OSAS severity requires a PSG to determine the AHI. This sleep examination is still conducted by manual interpretation, and since the PSG results were scored by different technologists, the scoring variability can affect the accuracy (42). Although the data was derived from one sleep center, which regularly performed inter-scoring training, scoring variability could still have affected the results. Furthermore, the rst night effect, which is a phenomenon on the rst night of testing characterized by an altered sleep cycle and impacted sleep physiology, can also cause inaccuracies of PSG results (43). To minimize this effect, some PSG parameters, such as sleep e ciency, should be used to rule out subjects and rearrange the PSG for avoiding the bias.
Another limitation concerns the lack of some interacting factors of OSAS, while OSAS has been recognized as multifactorial sleep-disordered breathing. Some behaviors, including smoking, alcohol use, environmental parameters, and menopausal status are highly associated with OSAS (44,45). To understand in uence of background details, the questionnaire can be used to obtain personal habits.
OSAS also in uenced by different diseases (44). The situation of comorbidity also affects the results of PSG. The disease-related parameters that are already available from clinical information can be obtained and serve as signi cant variables for preforming pre-screen classi cation.
In future work, a dataset with comprehensive dimensions, which include personal habits, personal comorbidity, more anthropometric features, and body compositions, will be collected for training a novel model.

Conclusion
Given the concerns globally about the impacts of OSAS, this paper noted the need for a method of measuring OSAS given the limitations of current methods. In order to do this, OSAS risk screening models for different age and gender groups based on body pro les were developed based upon data from 6614 participants from Taiwan.
Results indicate that high BMI, neck circumference and waist circumference decreased the duration of slow-wave sleep and increased the sleep disorder indices and the percentage of wake and N1.
Additionally, prediction models for different gender and age utilizing anthropometric features as predictors via RF were established and demonstrated to have high accuracy. Feature importance was explored, with waist circumference the highest contributing factor in females and elder males, whereas the BMI was the highest contribution in younger males.
The authors recommend the use of the prediction models for Taiwan and indeed for those with Han-Taiwanese craniofacial features.