2.1 Datasets and Subjects
This study utilized data from the Multi-Ethnic Study of Atherosclerosis (MESA), initiated and collected by the National Heart, Lung, and Blood Institute (NHLBI) [28,29]. MESA study recruited a total of 6,814 participants without CVDs aged between 45 to 84 years from four ethnic groups (African Americans, Chinese Americans, Hispanics, and Caucasians). Five examinations were conducted between 2000 and 2011, and clinical outcomes were assessed every 9 to 12 months during this period, including: myocardial infarction, angina, heart failure, coronary heart disease, and death.
During Exam_5, 2,237 participants of the MESA population participated in an auxiliary sleep study, and 2,489 static features were recorded, including demographics, anthropometrics, medication usage, medical history, imaging risk factors, etc. Among them, 1,874 participants had provided overnight PSG data, which includes 27 continuous physiological signals recorded during sleep, including electrocardiography, electroencephalography, pulse, nasal airflow, blood oxygen, periodic limb movements (PLMS), and more. The average recording duration was 12 hours per participant. Additionally, 615 sleep-related static features were extracted from the sleep questionnaires, actigraphy-derived parameters and average counts of overnight sleep events derived from PSG recordings (so there are in total of 3,104 static features). Manual annotations were performed on the PSG data for identifying sleep stages and events (such as arousal, hypopnea, apnea, hypoxemia, periodic limb movements). This study mainly focused on the analysis of the 1,874 participants, who have complete sets of static features and PSG recordings, and 175 of them (9.3%) experienced CVD events. The whole dataset was divided into training and validation sets, comprising 1,687 and 187 subjects, respectively. The validation set proportion is 9.3%.
2.2 Study Pipeline
Initially, feature selection was performed on all static features by lasso-logistic regression. Clustering was then conducted based on the overnight average values of PSG features (named PSG static features) to identify OSA phenotypes. Subsequently, several ML and DL models with different feature selection and training strategies were built to explore the most effective method of incorporating OSA phenotypic information for CVD risk prediction. Finally, the feature importance analysis was conducted for each phenotype based on the best performing model. The study pipeline is shown in Fig.1.
2.2.1 Data Preprocessing
The following preprocessing steps were applied to the 3,104 static features (2,489 exam_5 static features and 615 PSG static features) of all participants. First, data outliers were removed, including blank values, duplicates, and irrelevant numerical values. Missing values were then imputed with mean interpolation. Finally, z-score normalization was applied to standardize each feature. The preprocessing process resulted in each participant having 1,600 normalized static features.
Considering our goal is to predict 5-year CVD risk, while multiple high-fidelity physiological signals exhibit highly complex temporal patterns throughout the entire sleep period, directly using them
as model inputs may not capture efficient and effective information related to long-term CVD risks. Therefore, this study chose to learn deep representations from the feature sequences of five sleep events known contributed to CVD risks, including arousal, hypopnea, apnea, hypoxemia and PLMS. The sequences of the five sleep events were generated according to the PSG labels through the following steps. Firstly, sleep sequences were generated with a 5-second sampling interval during the start and end sleeping time of each participant. Then, the time slots of the occurrences of each sleep event were extracted from the PSG labels to determine whether the event occurred within any of the 5-second sampling intervals. These events were marked as '1' if they occurred within an interval, and '0' otherwise. This created five overnight sleep-event sequences for further analysis. (Fig.2) illustrates the generation process of the overnight feature sequences of five sleep events.
2.2.2 Static feature selection
Lasso logistic regression was used to select a subset of features from the 1,600 static features using the training set. Ten-fold cross validation was conducted to determine the regularization hyperparameter, alpha. It was adjusted across a range of values: 0.0001, 0.0005, 0.001, 0.005, 0.01, 0.015, 0.02 and 0.025, to identify its optimal value that yields the best predictive performance on the validation set.
2.2.3 OSA clustering
29 PSG static features (the features shown in Supplementary Table 2) were employed for OSA clustering utilizing the K-means clustering. A range of 2 to 6 clusters was explored to determine the optimal cluster number by the silhouette and elbow methods.
Mann-Whitney U tests were then conducted on the 29 PSG static features between different OSA phenotypes to identify the most relevant PSG features for each phenotype. A significance level below 0.05 was considered as significantly different, while a P value below 0.001 was considered highly significant.
Cox proportional hazards regression analysis was also performed to estimate the hazard ratio (HR) and 95% confidence intervals (CI) of occurring CVD events within five years for different OSA phenotypes.
2.2.4 OSA phenotyping-based CVD risk prediction modeling
To assess the value of incorporating OSA phenotypic information in CVD risk prediction, several classic ML models were employed in this study, including logistic regression (LR), support vector machine (SVM), decision tree (DT), random forest (RF), gradient boosting machine (GBM) and multilayer perceptron (MLP). The phenotype-agnostic method did not consider any phenotypic information. For phenotype-specific models, two different strategies to integrate the phenotypic information and three different feature sets were evaluated. The details of the phenotype-specific ML models are described as below:
1.Pheno_fuse_ML: The OSA phenotyping labels (Phenotypes 1, 2, 3, 4) were incorporated as a new feature along with selected static features, serving as input for the ML models to predict CVD risk across the entire population.
2.Pheno_specific_ML: The strategy develops CVD risk prediction models specific to each phenotypic population, using the three different sets of features:
(1) Pheno_Spec_ML1: Uses only selected static features.
(2) Pheno_Spec_ML2: Combines the 29 PSG static features with the selected static features.
(3) Pheno_Spec_ML3:Fuses phenotype-specific static PSG features representing each phenotype with the selected static features.
To further explore the value of overnight sleep-event feature sequences in CVD risk prediction, a two-layer LSTM network was employed to learn deep representations from the overnight feature sequences, which were then combined with the static features in a fully connected layer to predict CVD risk. We further proposed a phenotype-contrastive training strategy, i.e., the Contrast_pheno_DL, to enhance the model performance. To validate the effectiveness of the strategy, two other phenotype-specific DL models with different feature sets were implemented for comparison, i.e., Pheno_spec_DL1 and Pheno_spec_DL2. The DL model architecture with different training strategies and feature sets are illustrated in (Fig.3), and the details of the three DL models are described below:
3. Pheno_spec_DL1: The feature set of this model includes all selected static features, the 29 PSG static features, and deep representations of the five sleep-event feature sequences.
4. Pheno_spec_DL2: The feature set comprises the selected static features, the 29 PSG static features and sleep-event feature sequences that are specifically relevant to each phenotype. The aim is to focus on features that are directly related to each phenotype.
5. Contrast_pheno_DL: This model utilizes the same features as the Pheno_spec_DL1, while its training approach is distinct in strategically merging different phenotypic populations that have distinct risk levels. The goal is to identify and learn discriminative features among various phenotypes.
2.2.5 Performance Evaluation
In this study, subjects who experienced cardiovascular events were labeled as positive samples. The evaluation metrics include: Accuracy, Precision, Recall, F1-Score, Area under Curve of the Receiver Operating Characteristic Curve (AUC-ROC), Area under Curve of the Precision-Recall Curve (AUC-PRC).
2.2.6 Feature Importance Analysis
The influence of all features on the model's predictive performance was evaluated by calculating the SHapley Additive exPlanation (SHAP) values derived from the field of cooperative game theory [30], to identify the key features for different phenotypes. The SHAP value of each feature was calculated according to the following equation: