Metabolomic Profiles of Sleep-Disordered Breathing are Associated with Hypertension and Diabetes Mellitus Development: the HCHS/SOL

Sleep-disordered breathing (SDB) is a prevalent disorder characterized by recurrent episodic upper airway obstruction. In a dataset from the Hispanic Community Health Study/Study of Latinos (HCHS/SOL), we applied principal component analysis (PCA) on seven measures characterizing SDB-associated respiratory events. We estimated the association of the top two SDB PCs with serum levels of 617 metabolites, in both single-metabolite analysis, and a joint, penalized regression analysis using the least absolute shrinkage and selection operator (LASSO). Discovery analysis included n = 3,299 HCHS/SOL individuals; associations were validated in a separate dataset of n = 1,522 HCHS/SOL individuals. Seven metabolite associations with SDB PCs were discovered and replicated. Metabolite risk scores (MRSs) developed based on LASSO association results and representing metabolite signatures associated with the two SDB PCs were associated with 6-year incident hypertension and incident diabetes. MRSs have the potential to serve as biomarkers for SDB, guiding risk stratification and treatment decisions.


Introduction
Sleep-disordered breathing (SDB) is a common yet underdiagnosed disorder. It is estimated to affect 17% and 34% of middle aged female and male individuals, respectively 1 , but diagnosed in less than 15% of individuals with clinically signi cant disease 2,3 . SDB is characterized by recurring episodes of complete (apneas) or partial (hypopneas) upper airway obstruction, often accompanied by oxyhemoglobin desaturation and/or sleep fragmentation.
Symptoms including snoring and excessive daytime sleepiness 4 . A growing body of epidemiological studies has found that SDB associates with increased risks for vascular and metabolic diseases, including stroke, coronary heart disease, hypertension, and diabetes mellitus [5][6][7][8][9] .
Underlying mechanisms proposed to associate SDB with the cardiometabolic conditions include: chronic hypoxemia, particularly nightly exposures to intermittent hypoxemia and re-oxygenation 10 ; dysregulated proin ammatory responses 11 ; increased oxidative stress 12 , imbalanced gut microbiome 13 , hormonal imbalance 14 , among others. Emerging evidence has shown that intermittent hypoxemia, especially high frequency desaturations, modulates the in ammatory response differently from chronic sustained hypoxemia 15 . While recent work has examined speci c aspects of SDB that best predict incident outcomes [16][17][18] , only a few studies have tried to model more complex exposures by combining multiple SDB measures together 19,20 . Various SDB measures, such as the frequency of obstructive events (e.g., Respiratory Event Index (REI)), sleep-apnea speci c hypoxic burden 21 , minimum oxyhemoglobin saturation during sleep, apnea and hypopnea event duration 22 , and others, while capturing different characteristics of SDB-related physiological stressors, tend to be correlated. Given the increasing recognition of the heterogeneity and complexity of SDB 23 , indices that combine multiple measures of SDB by accounting for the correlation among them may provide powerful approaches both for studying SDB biology and for risk strati cation for incident cardiometabolic outcomes.
Metabolites, re ective of the products and intermediates of metabolism, can provide biomarkers useful for disease prediction and subtyping 24 . Studying SDB-associated metabolites may yield insights into the metabolic environment of the disorder, elucidate gender differences, and suggest SDB subtypes and related molecular mechanisms involved in the progression of cardiometabolic conditions. Untargeted metabolomic pro ling is the comprehensive identi cation and quanti cation of small metabolite molecules within the biological system, and has begun to be used in sleep research to understand the cellular process such as sleep/wake regulation 25,26 , as a window on peripheral molecular clocks and oscillators 27 , and to detect biomarkers of sleep restriction 28 and neurological degeneration among patients with obstructive sleep apnea (OSA) 29 . In a recent study 30 we identi ed metabolites associated with moderate to severe OSA (de ned as a Respiratory Event Index [REI] > = 15) and constructed an index composed of 14 metabolites, associated with OSA cross-sectionally, in two independent datasets. Another recent study 31 identi ed metabolites associated with SDB and metabolites that changed levels following SDB treatment using continuous positive airway pressure, though without multiple testing correction. Further demonstrating the potential clinical utility of untargeted metabolite pro ling, prediction models incorporating metabolites outperformed clinical predictors for some conditions 32 . Thus, untargeted metabolomics may provide a unique opportunity both for the development of biomarkers for SDB, and for utilizing such biomarkers for SDBrelated risk strati cation: identifying patients with increased risks for other chronic diseases.
We hypothesize that by combining SDB measures, and next, identifying and combining changes in their associated metabolomic environment, we can construct new SDB biomarkers that may offer additional utility compared to standard measures for identifying individuals at high risk for progression of cardiometabolic disease (Fig. 1). We use a data-driven, unsupervised principal component (PC) analysis to rst construct two SDB summary measures based on several physiological phenotypes. We then study the association of the SDB PCs and the metabolic environment in a large population-based study with a high-dimensional set of measured metabolites using two methods: (1) association analysis of individual metabolites with each SDB PC, and (2) least absolute shrinkage and selection operator (LASSO) regression to identify a subset of metabolites that together best associate with SDB PCs. Based on LASSO metabolites selection and estimates, we develop SDB PC-speci c metabolomic risk scores (SDB-MRS). To validate our results, we use a discoveryreplication approach where we separate datasets of individuals sampled from the same target population. We then study the SDB PC-speci c MRS associations with incident hypertension and diabetes mellitus.

The Hispanic Community Health Study/Study of Latinos
The Hispanic Community Health Study / Study of Latinos (HCHS/SOL) is a prospective community-based cohort study of 16,415 Hispanic/Latino individuals aged 18-74 years at the baseline examination (2008-2011) 33 . Individuals were selected into the study using a multi-stage strati ed random sampling from four geographic regions: Bronx NY, Chicago IL, Miami FL, and San Diego CA. The sampling strategy and study design were previously described 34 . Of study participants, 12,803 individuals were genotyped. Fasting blood samples were collected at the baseline examination. Within a week of the baseline examination in the clinic, 14,440 individuals were assessed for SDB using a validated Type 3 home sleep apnea test (ARES Unicorder 5.2; B-Alert, Carlsbad, CA) that measured nasal air-ow, position, snoring, heart rate and oxyhemoglobin saturation 3 . Among the baseline HCHS/SOL participants, 11,623 returned to a second clinic visit (visit 2) from 2014 to 2017, on average 6 years after the rst visit.

Metabolomics pro ling and quality control
Of HCHS/SOL participants from the baseline examination who also had genetic data, 4,004 individuals were selected at random for metabolomics pro ling of fasting serum samples collected at baseline (metabolomics batch 1, processed in 2017). In 2021, additional 2,368 serum samples from 2,330 participants, also collected at baseline (see details in Supplementary Note 1), were pro led in a second metabolomics batch 2. Serum samples were stored at -70ºC at the HCHS/SOL Core Laboratory at the University of Minnesota until analysis by Metabolon, Inc. (Durham, NC) in 2017 (batch 1) and 2021 (batch 2). Serum samples were then extracted and prepared using Metabolon's standard solvent extraction method. Prior to extraction, samples were split into equal parts for untargeted analysis on both the gas chromatography-mass spectrometry and liquid chromatography-mass spectrometry (GC-MS and LC-MS)-based metabolomic quanti cation platforms 35,36 . Instrument variability was determined by calculating the median relative standard deviation (SD) for the internal standards added to each sample prior to injection into the mass spectrometers. Overall process variability was determined by calculating the median relative SD for all endogenous metabolites (i.e., non-instrument standards) present in 100% of the technical replicate samples.
We took a discovery-and-replication approach using batch 1 as the discovery and batch 2 as the replication dataset. Preprocessing of the metabolomic data is described in Supplemental Figure S1. First, we removed batch 2 individuals who overlapped with batch 1 and duplicated samples from the same individuals, resulting in 2,178 remaining observations. Next, we kept metabolites that were known and available in both batches and excluded xenobiotics.
We also excluded metabolites with missing values in more than 75% of the individuals in either batch 1 or batch 2. Metabolites with missing values in 25-75% of the individuals in both batches were dichotomized as "observed" and "not observed" -referred to as "dichotomized metabolites" henceforth). Metabolites that had different missingness pattern between the batches (e.g., < 25% missing values in one batch and > 25% missing value in the second batch) were excluded. For metabolites with missing values in up to 25% of the individuals in both batches, we assumed that missing values were due to concentrations below the minimum detection limits, thus imputed the missing values for each metabolite with the lowest non-missing value of that metabolite across the sample within the batch. We then rank normalized these metabolite measures in each batch separately. In the sex-strati ed analysis, we used the same rank-normalized metabolites (and did not rank-normalize within sex groups).

Sleep disordered breathing phenotypes
We selected seven correlated phenotypes capturing potentially different aspects of SDB: Respiratory Event Index 0 (REI0), the sum of all respiratory events (apneas and hypopneas with at least 50% air ow ow reduction for a minimum duration of 10 seconds), regardless of oxygen desaturation, divided by estimated sleep time; the sum of all respiratory events associated with > = 3% oxygen desaturation divided by estimated sleep time (REI3); respiratory event duration (the average length of each respiratory event); sleep-apnea associated hypoxic burden 21 ; the minimum and the average oxyhemoglobin saturation during the sleep period; and the percentage of estimated sleep period with oxyhemoglobin saturation below 90%. We then conducted sampling-weighted Principal Component Analysis (PCA), accounting for the HCHS/SOL study design, over the complete HCHS/SOL study population with non-missing SDB measures. We rank-normalized the 7 SDB measures prior to PCA analysis due the highly non-normal distribution of some of the measures and so that measures with wide range do not dominate the PCA results. We used the PCs that explain at least 10% of the variance in the SDB measures in subsequent analyses. To interpret SDB phenotypes captured by the PCs, we characterized the study populations de ned by the low and high 10% values of each of the PCs selected for further analysis. Characteristics include demographic (age, sex), cardiometabolic (BMI, hypertension, diabetes), and sleep measures (SDB and self-reported insomnia, sleep duration, sleepiness) variables.

Model covariates
All analyses used up to three conceptual models. Model 1 (i.e., base model) adjusted for demographic variables and body weight, including age, sex, eld center, Hispanic/Latino background (Mexican, Puerto Rican, Cuban, Central American, Dominican, and South American and other/multi) and body mass index (BMI). Hispanic/Latino background was included because cultural differences between groups are potentially associated with differences in diet, which is highly associated with levels of many metabolites. Model 2 further adjusted for lifestyle variables -alcohol use, cigarette use, physical activity (METmin/day), and diet (Alternative Healthy Eating Index 2010) in addition to demographic variables. Model 3 is a lifestyle and comorbidity model that is adjusted for Model 2 variables in addition to indicators for diabetes mellitus and hypertension, and continuous measures of fasting insulin, fasting glucose, HOMA-IR, HDL, LDL, total cholesterol, triglycerides, systolic blood pressure and diastolic blood pressure.

Single metabolite associations (SMA) between individual metabolites and SDB PCs
Using survey-weighted generalized linear regressions, each metabolite's concentration levels were regressed separately against SDB PC outcomes, with a recognition that cross-sectional data cannot establish causal direction. We used the Benjamini-Hochberg method 37 to control false discovery rate (FDR) for multiple testing among metabolites in all models for each SDB PC in batch 1. Metabolites were agged for further validation if the FDR corrected p < 0.05 in Model 1, for either SDB PC1 or PC2. In the validation analysis, we tested the associations of these agged metabolites with SDB PCs in linear regression in batch 2 in Models 1-3. We computed one-sided p-values guided by the estimated directions of the associations in batch 1 38 , and determined replication if the one-sided p-value was < 0.05.
In a follow-up analysis, we visualized the concentrations of raw and rank-normalized metabolites from sex hormone-related pathways that are associated with SDB by gender and age strata.
LASSO regression for constructing the SDB metabolomic risk scores (SDB-MRS) For each SDB PC, we applied LASSO linear regression over all 582 continuously modeled metabolites, adjusted for the covariates from Model 1 (unpenalized) in HCHS/SOL batch 1. We selected the LASSO tuning parameter by minimizing the prediction error for SDB PCs in a 10-fold cross-validation. SDB-MRSs were calculated as a weighted sum of the normalized metabolite serum concentrations, with weights being the metabolite coe cients from the LASSO regression from batch 1. In association analyses using the MRSs, we standardized (z-scored) them to have mean 0 and variance 1 using the sample mean and variance (Supplemental Table S7).
To validate the associations between the SDB-MRS with SDB PCs, we constructed the SDB PC1-MRS and SDB PC2-MRS in batch 2 using the weights from the LASSO regression conducted in batch 1, then assessed their associations with the corresponding SDB PCs in Models 1-3. In secondary analyses we assessed potential sex differences via: (1) sex-strati ed SDB-MRSs constructed based on sex-strati ed LASSO; (s) sex-strati ed association analyses for sexspeci c and sex-combined SDB-MRSs. We also assessed the associations between SDB-MRS quartiles and the corresponding SDB PCs.
In a secondary analysis, we also constructed SDB-MRSs based on the SMA results, similar to our prior work 30 . The SMA based SDB-MRSs were weighted sums of metabolite levels where the metabolites were those with FDR < 0.05 among the metabolites modeled as continuous in the SMA batch 1 discovery analysis. The weights were the metabolite coe cients from a survey weighted unpenalized multivariate linear regression using these metabolites jointly.

Incident outcomes
We also studied the associations of the SDB PCs and their MRSs with incident hypertension and diabetes, assessed at visit 2, among individuals free of hypertension and free of diabetes mellitus, respectively, at the baseline exam. Diabetes was derived based on American Diabetes Association (ADA) de nition or self-report of diabetes mellitus. ADA criteria are based on laboratory tests -fasting glucose > = 126 mg/dL, or post-OGTT glucose > = 200 mg/dL or A1C > = 6.5% 39 . In a secondary analysis, incident diabetes was assessed separately among individuals with impaired glucose tolerance (fasting glucose within 100-125 mg/dL, or post-OGTT glucose within 140-199 mg/dL, or A1C within 5.7% − 6.5%) and among normal glycemic individuals. Hypertension was determined following the NHANES guidelines: systolic or diastolic blood pressure is greater than or equal to 140/90 or participant self-reported as currently taking antihypertensive medications 40 .

Association analyses between SDB phenotypes and incident cardiometabolic outcomes
Finally, survey-weighted Poisson regressions were implemented to assess the associations between incident hypertension and diabetes mellitus among batch 1 and 2 combined study samples with various SDB phenotypes including benchmark singular sleep measures (i.e., REI 3%, hypoxic burden) and our newly developed composite measures (i.e., SDB PCs, and SDB-MRSs), as well as our recently developed OSA-MRS, adjusting for Model 1 and 2 covariates, respectively. The OSA-MRS was trained using LASSO on moderate to severe OSA (de ned as REI3 > = 15) in the HCHS/SOL cohort and previously validated in the MESA cohort 30 . We combined the two batches in this analysis to increase statistical power by having a larger sample size. To combine metabolomics data of batch 1 and batch 2 we aggregated the metabolites from non-overlapping batch 1 and batch 2 individuals, after imputation and rank-normalization of each metabolite separately in each batch.
All analyses were done in R 3.6.3. svyglm was used for survey-weighted generalized linear regression models, and svyprcomp was used for samplingweighted principal component analysis, both of which were from the survey package 41 . The glmnet R package (version 3.0) 42 was used for the LASSO linear regression.

Metabolomics sample characteristics
The main, batch 1, discovery dataset (used for SDB-SMA analysis and LASSO regression) included 1,874 female participants (mean age = 42.8), and 1,425 male participants (mean age = 41.6), and the validation dataset included 960 female participants (mean age = 51.9) and 562 male participants (mean age = 51.2) from batch 2 ( Table 1). Consistent with their older age, the prevalence of moderate to severe SDB was higher in batch 2 compared to batch 1 participants (REI3 15, 13.8% compared to 11.5% in batch 1 participants); similarly, comorbidities were higher in batch 2 participants (30.1% prevalent diabetes mellitus and 45.7% prevalent hypertension, compared to 20.4% and 32.2%, respectively, in batch 1).  Table S1 shows the sample characteristics strati ed by gender while accounting for sampling weights, so that means and proportions are representative of the HCHS/SOL target population. The rst two principal components of the SDB measures accounted for 79.8% of the total variance (Supplementary Figure S2). For both PCs, higher values indicate more severe hypoxemia. However, PC1 is also characterized by more frequent respiratory events while PC2 is characterized by shorter respiratory events. Speci cally, high SDB PC1 is correlated with increased REI3 (Spearman correlation coe cient To better understand the phenotypic characteristics that SDB PC1 and PC2 represent, we also compared the populations de ned by the top and bottom 10% of SDB PC1 and PC2 ( Table 2). The top 10% compared with the bottom 10% SDB PC1 was comprised of individuals who were on average older and have a higher BMI; more likely to be male and have prevalent and incident hypertension and diabetes mellitus; and more likely to have history of smoking. The top 10% SDB PC2 compared to the bottom PC2 included participants who were slightly younger, less likely to be males, and more likely to be current smokers but did not differ in rates of baseline and incident hypertension and diabetes ( Table 2). As for sleep disturbance traits, the top and bottom 10% SDB PC1 participants self-reported similar insomnia symptoms according to the Women's Health Initiative Insomnia Rating Scale and similar sleep quality (typical night's sleep in the past 4 weeks being restless or very restless), but reported more severe excessive sleepiness, more frequent snoring and shorter sleep duration, while the top 10% SDB PC2 participants reported worse insomnia symptoms and sleep quality, and were more likely to have long sleep ( > = 9 hours) compared to the bottom 10% SDB PC2.    Table S2). Among the 15 SDB PC1 metabolites, four metabolitespregnanolone/allopregnanolone sulfate, linoleoyl-linoleoyl-glycerol (18:2/18:2) [1], glucuronide of C10H18O2 (8) and 5alpha-pregnan-3beta,20alpha-diol monosulfate (2) replicated (one-sided p-value < 0.05) in batch 2 in Model 1 analysis (Fig. 4). Pregnanolone/allopregnanolone sulfate and glucuronide of C10H18O2 (8) remained associated with PC1 when adjusted for additional lifestyle and comorbidity covariates in batch 2. Three of the four metabolite associations with SDB PC2 in batch replicated in batch 2 (one-sided p-value < 0.05) in Model 1 and 2, all of which were sphingomyelin lipidssphingomyelin(d18:2/24:2), sphingomyelin(d18:2/24:1,d18:1/24:2), and sphingomyelin(d18:2/23:0,d18:1/23:1, d17:1/24:1). Full results from the SMA sexcombined analysis are provided in Supplemental Table S3.
In the sex-speci c SMA, tauro-beta-muricholate, a lipid from the bile acid metabolism pathway, was associated with SDB PC1 (FDR p < 0.05) among males, while no metabolite was identi ed for SDB PC2 in male-only analyses after FDR correction. The association of tauro-beta-muricholate with SDB-PC1 in males did not replicate in batch 2. In female-speci c discovery analysis, ten metabolites were associated with SDB-PC1, of which eight were discovered in the sexcombined SMA analysis, and two, 3-hydroxyoctanoylcarnitine (1) and 3-hydroxyoctanoylcarnitine (2), were unique to the sex-strati ed analysis. A single metabolite, allantoin, was associated with SDB-PC2 among females (Supplemental Table S3). Among the twelve metabolites identi ed in either the maleand female-speci c SMA analysis, only the associations of pregnanolone/allopregnanolone sulfate and glucuronide of C10H18O2 (8) with PC1 were replicated in batch 2 among females (Supplemental Table S4). When testing for evidence of interaction with sex, only tauro-beta-muricholate had signi cant interaction effect (FDR p = 0.014) (Supplemental Table S5).
Given that half of the discovered and replicated SDB PC1 metabolites were from the progesterone steroids biosynthesis pathway, we compared and visualized the concentration levels of the eight progesterone steroids sulfate metabolites with statistically signi cant associations with SDB PC1 after FDR correction in batch 1 by age groups in each sex strata. As age increases, we observed a decreasing trend in the levels of circulating progesterone steroids sulfate metabolites in both men and women. The patterns become more visible in the rank-normalized metabolites (Supplemental Figure S5). Sulfated metabolites of progesterone − 5alpha-pregnan-3beta,20alpha-diol disulfate, 5alpha-pregnan-3beta,20alpha-diol monosulfate (2), and 5alpha-pregnan-3beta,20beta-diol monosulfate (1), 5alpha-pregnan-diol disulfate and Pregnanolone/allopregnanolone sulfate, were higher among younger women compared to younger men (in age groups < 40 and 40-45), while the differences diminished in older age groups (50-55, 55-60, and > 60) that would typically include post-menopausal women. The circulating pregnenolone steroids sulfate metabolites XXXpregnanediol sulfate (C21H34O5S)*, pregnenetriol sulfate*, and pregnenolone sulfate, were higher in men compared to women across all age groups. The patterns were similar in the two batches.

LASSO regression for joint selection and estimation of metabolite associations with SDB PCs in HCHS/SOL batch 1
To identify a set of metabolites that were jointly associated with SDB PCs, we also implemented a LASSO regression in HCHS/SOL batch 1 (discovery dataset), both in sex-combined and strati ed study samples. 125 metabolites were identi ed for SDB PC1, and 80 metabolites for SDB PC2, with 27 metabolites overlapping between the two groups. The breakdown of super pathways of the metabolites are shown in Supplemental Figure S3 and coe cients for all metabolites from LASSO trained in sex-combined and sex-strati ed samples are provided in Supplementary Table S6.
We constructed SDB PC1-MRS and SDB PC2-MRS for batch 1 and batch 2 HCHS/SOL participants based on results from the LASSO penalized regression.
Study sample means and SD used in standardizing the MRSs are provided in Supplemental Table S7. In a secondary analysis, we constructed MRSs based on SMA results. Supplemental Table S8 provides weights of these secondary SDB-MRSs. As expected by construction, all SDB-MRSs were signi cantly associated with their corresponding SDB PCs in batch 1 in all models. The associations replicated for both LASSO based SDB-MRSs but not for SMA-based SDB PC1-MRS in batch 2 (Table 3). Therefore, we move forward with the SDB-MRSs based on LASSO. The sex-speci c SDB PC-MRSs, although also replicated in batch 2, did not show stronger associations with their corresponding SDB-PCs. Americans, Dominicans, and South Americans and other/multi) and body mass index (BMI); Model 2 adjusted for all model 1 covariates and lifestyle variables -alcohol use, cigarette use, physical activity (MET-min/day), and diet (Alternative Healthy Eating Index 2010) in addition to demographic variables. SDB PC1 MRS: metabolite risk score calculated based on the coe cients from LASSO regression trained in both sexes combined to predict SDB PC1 in discovery dataset (batch 1); SDB PC2 MRS: metabolite risk score calculated based on the coe cients from LASSO regression trained in both sexes combined to predict SDB PC2 in discovery dataset (batch 1);Sex Speci c SDB PC1 MRS: metabolite risk score calculated based on the coe cients from LASSO regression trained in each sex strata to predict SDB PC1 in discovery dataset (batch 1); Sex Speci c SDB PC2 MRS: metabolite risk score calculated based on the coe cients from LASSO regression trained in each sex strata to predict SDB PC2 in discovery dataset (batch 1);SMA SDB PC1 MRS: metabolite risk score calculated based on coe cients from unpenalized regression of metabolites identi ed in single metabolite association analysis with SDB PC1 in discovery dataset (batch 1);SMA SDB PC2 MRS: metabolite risk score calculated based on coe cients from unpenalized regression of metabolites identi ed in single metabolite association analysis with SDB PC2 in discovery dataset (batch 1)

Associations with incident cardiometabolic outcomes
In the HCHS/SOL sleep study target population, SDB PC1 showed positive associations with incident diabetes mellitus and hypertension over an average of 6.1 years (4.3-9.4 years) in both Model 1 and 2 among the samples without diabetes or hypertension at baseline, respectively. These composite phenotypes showed stronger associations than the individual SDB measures REI3 and hypoxic burden. SDB PC2 was not signi cantly associated with either incident outcome (Supplemental Table S9).
In the batch-combined analysis, both SDB-MRSs were signi cantly associated with increased incidence rate ratio (IRR) for incident hypertension, while only SDB PC2 MRS was signi cantly associated with developing incident diabetes mellitus, when adjusted for demographic and lifestyle risk factors (Fig. 5and Supplemental  Table S10). The effect estimates were slightly lower for SDB PC1-MRS when adjusting for the same covariates ( Fig. 5and Supplemental Table   S10). For comparison, we also computed OSA-MRS, 1 SD increase of OSA-MRS was associated with a 43% [IRR: 1.43 95% CI: 1.27-1.60, p < .0001] higher incidence rate of hypertension and a 57% [IRR: 1.57 95% CI: 1.38-1.80, p < .0001] higher incidence rate of diabetes mellitus. None of the single metric physiological phenotypes (i.e., REI3, HB, SDB PCs) were signi cantly associated with incident cardiometabolic outcomes in both models (Fig. 5and Supplemental Table S10 -S11).
Secondary analysis was carried out by stratifying the study samples for incident diabetes mellitus into two subgroups: individuals with normal glucose regulation (n = 1,376) and with impaired glucose regulation (n = 1,532) at baseline. The observed associations between SDB-PC1 MRS and incident diabetes Focusing on OSA-MRS, which had the strongest association with incident outcomes of all MRSs, we also compared risk for incident outcomes by quartiles.
Compared with the lowest quartile of the OSA-MRS, the top quartile showed more than a three-fold increase in incidence rate for diabetes mellitus  Figure S4 and Supplemental Table S12).
There is no evidence supporting stronger associations with incident outcomes among SDB PCs-MRSs trained in each sex stratum separately versus the sexcombined MRSs.

Discussion
We constructed new SDB measures based on seven correlated SDB phenotypes using PCA, weighted to represent the target population of the HCHS/SOL study. High scores for SDB PC1 appeared to characterize a SDB phenotype described by a high frequency of obstructive events and marked hypoxemia-a pattern typical of severe SDB and more often observed in men compared to women. In contrast, high SDB PC2 re ected a subphenotype that correlated mostly strongly with shorter event duration, and to a lesser degree, with hypoxia measures, while being almost uncorrelated with traditional event frequency measures (REI0 and REI3). In the HCHS/SOL, higher SDB PC2 was more common in younger women, individuals with more severe insomnia, self-reported poor sleep, frequent awakenings and longer sleep duration. SDB PC2 is highly correlated with shorter respiratory event duration, which has been reported in other cohorts to be more common in females, in younger individuals, and associated with higher arousal responses for any given change in oxygen saturation 43 . Moreover, in a discovery-replication approach within distinct subsamples from the HCHS/SOL, we identi ed multiple metabolites individually associated with each SDB PC, as well as metabolites that are collectively associated with SDB. We used the latter set of metabolites to construct MRSs of SDB aggregating multiple metabolites. The SDB-MRSs have stronger associations with incident cardiometabolic outcomes -diabetes mellitus and hypertension-compared to single SDB metrics, REI3 and hypoxic burden.
Higher concentrations of multiple sulfated metabolites of progesterone and its precursor pregnenolone were associated with lower (healthier) values of SDB PC1 (FDR < .05) in the discovery dataset: pregnanolone/allopregnanolone sulfate and 5alpha-pregnan-3beta,20alpha-diol monosulfate (2) (which replicated), as well as additional six progestin steroids (highlighted in green in Fig. 6). Since progesterone in circulation is quickly metabolized by the liver and has a half-life of approximately 5 minutes 44 , only the glucuronide and sulfate metabolites of progesterone steroids were measured in the Metabolon platform.
Progesterone is a female reproductive hormone that is mostly synthesized in ovaries and by the placenta during pregnancy, and to a lesser degree in adrenal cortex and other tissues in both men and women, and in testes in men 45 . All progesterone steroid sulfates were present in both men and women in our dataset (Supplemental Figure S5). The pattern of differences in these metabolites between sexes according to age suggests that sulfated metabolites of progesterone in women are of gonadal origin, whereas the sulfated metabolites of pregnenolone are of adrenal origin. Future studies will need to verify this possibility in cohorts where the date of menopause is known. If true, these data point to the possibility that different classes of steroids of different origins may be involved in the development of SDB, and its association with incident hypertension and diabetes mellitus.
The in uence of progesterone-and pregnenolone-derived steroids on SDB has been a source of interest for decades given the considerable sexual dimorphism of this trait-i.e., the prevalence, severity, and physiological subtype all vary by sex. For example, while men are 3-to 4-fold more likely to have SDB than women, this sex differences attenuates after women reach menopause 46 . Women with SDB have a less collapsible airway, more hypopneas relative to apneas, and shorter event duration than men 43 . Progesterone is a proposed mechanism for protecting women from SDB. It is an anti-oxidant 47 that also is a respiratory stimulant that increases hypoxic and hypercapnic ventilatory response (including through effects on CO2 receptors), increases genioglossus muscle tone and decrease upper airway collapsibility [48][49][50] . Animal studies have shown the important roles of nuclear and membrane progesterone receptors mediating the stability of the breathing pattern and therapeutic effects in treating apnea of prematurity in both male and female mice 51 . SDB increases substantially among postmenopausal women [52][53][54] , which may, at least in part, relate to changes in sex hormones. Two small crosssectional studies reported inverse between progesterone levels and OSA 53,55 . Post-menopausal women who use hormone replacement therapy that includes both estrogen and progesterone have lower respiratory event frequencies than their counterparts who do not use this therapy 56 . On the other hand, clinically induced sex hormone de ciency in young women has not associated with increased SDB 57 . The complexity of interpreting effects due to exogenous versus endogenous progesterone levels, bioavailability, receptor sensitivity, and the effects of other sex steroids has limited our understanding of role of progesterone steroids in the pathogenesis of SDB. In our study, the association with SDB PC1 suggests protective associations of progesterone steroids sulfate metabolites with SDB phenotype characterized by a high frequency of obstructive events and marked hypoxemia; while this association was observed in both women and men, this phenotype was more severe in men. We found that metabolic scores re ecting SDB better predicted adverse cardiometabolic outcomes compared to the physiological phenotypes such as REI or those measuring hypoxia. We developed SDB-MRSs, expanding our earlier work on OSA-MRSs 30 . We now further studied the association of the MRSs with incident cardiometabolic outcomes. Previous work in HCHS/SOL demonstrated that SDB was associated with incident hypertension and diabetes and insomnia was associated with incident hypertension 20 . Other studies also have demonstrated associations between SDB with cardiometabolic and cardiovascular disorders 68,69 . Here, we saw that MRSs had stronger association with incident diabetes mellitus and hypertension compared to measured physiological traits (REIs, hypoxia-related metrics, and SDB-PCs), suggesting that the plasma-based SDB-related metabolites may be better markers of cardiometabolic risk than are physiological metrics made from a single overnight sleep study. Further, when examining only individuals with normal glycemic levels at baseline, the SDB-PC2 MRS exhibited a more robust association with incident diabetes compared to the MRS derived using a simpler OSA phenotype (a binary measure SDB) (null in the analysis). This suggests a promising role for the SDB-PC2 MRS for identifying individuals with SDB at elevated risk of developing diabetes before the onset of glucose dysregulation (i.e., early-stage diabetes). Given the null ndings of many SDB intervention trials who recruited patients on the basis of physiological traits, future studies can evaluate the use of metabolic markers for identifying individuals who may bene t from SDB treatment.
Strengths of this study include the use of a large population of under-studied Hispanic/Latino adults, large panel of measured metabolites, rigorous analysis including a replication study, and assessment of association of constructed MRSs as well as of traditional SDB severity measures with incident hypertension and diabetes. The study also has a few limitations. We used PCA, a linear dimension reduction method. Despite its advantage of interpretability, it could be less exible than other non-linear techniques, and may not be an optimal method if the underlying structure among SDB phenotypes is non-linear.
Information loss may have occurred secondary to use of rank normalization of the SDB phenotypes and the metabolite levels at a preprocessing step. The discovery and the replication datasets differed by age and several health characteristics, which may have reduced the ability to replicate ndings. Given the observational nature of the study, we cannot draw causal inferences. Lastly, SDB MRSs do not include all the signi cantly associated metabolites, including the replicated metabolite pregnanolone/allopregnanolone sulfate, because only the metabolites in continuous format can be used in the weighted sum forming the SDB MRSs.
To summarize, using a discovery-replication study design, we identi ed and replicated multiple metabolites associated with SDB after corrected for multiple comparisons. We constructed SDB MRSs which exhibited stronger associations with cardiometabolic sequalae of SDB, compared with physiologic SDB measures, including after accounting for demographic and lifestyle factors. These ndings provide a strong basis for future evaluation of MRSs for risk strati cation, and as biomarkers that guide diagnosis and treatment decisions.  de ned as fasting glucose >=126 mg/dL, or post-OGTT glucose >=200 mg/dL or A1C>=6.5%, or use of anti-diabetic medication; baseline hypertension is de ned as systolic or diastolic BP greater than or equal to 140/90 respectively, or current use of antihypertensive medications.  total cholesterol, triglycerides, systolic blood pressure and diastolic blood pressure. * indicates FDR p<0.05 in batch 1 and one-sided p<0.05 in batch2. ** indicates FDR p<0.01 in batch 1 and one-sided p<0.01 in batch 2. Metabolite with * indicates they were identi ed based on accurate mass data, retention time and mass spectrometry but not reference standards. Therefore, the veri cation is not as robust as metabolites without *. b1: batch 1; b2: batch 2.