The main goal of this study was to understand how sleeping on an active temperature-controlled mattress cover (the Eight Sleep Pod, see SI Fig. 2) affected sleep, cardiovascular, and perceptual outcomes. To answer this question, subjects were instructed to sleep on the Pod with the temperature regulation turned off (Pod OFF) for one week in order to collect baseline physiological and perceptual responses. Next, subjects slept with the Pod’s temperature regulation on (Pod ON) for one week to assess the impact of temperature regulation during sleep on sleep, cardiovascular, and perceptual metrics. Cardiovascular metrics included HR and heart rate variability (HRV) during sleep. Sleep metrics included time spent in each sleep stage, total sleep time (TST), sleep onset latency (SOL), and sleep efficiency (SE). Perceptual metrics included thermal comfort and sensation, sleep satisfaction, calmness, refreshed, ease of falling asleep and waking up, and sleep quality. Lastly, subjects slept for two nights with Pod OFF at the end of the 16-night study to determine whether any physiological changes persisted after the one-week intervention with Pod ON. See Fig. 1 for a schematic of the experimental design. To maximize ecological validity, all subjects completed the study in their normal bedroom under free-living conditions (e.g., no experimental control over ambient temperature, lighting, or sleep/wake times).
Subject Characteristics
75 subjects were recruited for this study, and 69 subjects completed the study. After filtering out nights with noisy or missing EEG data and non-temperature compliant subjects (see below for filtering criteria), 54 subjects successfully completed the 16-night study (see Results for anthropometric details). Subjects were excluded if they were unavailable to sleep on the Pod for two consecutive weeks, were under 18 y, or had an unsupported bed size. Subjects were also excluded if they reported having any of the following criteria: a pacemaker, restless leg syndrome, an apnea hypopnea index (AHI) above 30, insomnia, or taking beta-blockers. Subjects were also excluded if they reported that they normally sleep < 4 h per night on more than three days per week. Seven subjects reported conditions (heart and respiratory) not listed in the exclusion criteria: six subjects reported a respiratory condition and one subject reported both heart and respiratory conditions (Table 1). Subjects’ self-reported ethnicities were African American (1), Latino (2), Caucasian (45), multiple ethnicities (5), and other (1). All subjects provided written consent to participate in the study, approved by Sterling Institutional Review Board (IRB identification number 10282).
Table 1
Population characteristics stratified by sex
Variable | Total (n = 54) | Male (n = 27) | Female (n = 27) |
Mean ± SD (% of n) | Mean ± SD (% of n) | Mean ± SD (% of n) |
Age (mean) | 36.0 ± 14.4 | 38.3 ± 14.7 | 33.7 ± 13.9 |
Reported use of sleep medication | 9 (16.7%) | 2 (7.4%) | 7 (25.9%) |
Heart conditions | 1 (1.9%) | 1 (3.7%) | 0 (0%) |
Respiratory conditions | 7 (13.0%) | 4 (14.8%) | 3 (11.1%) |
Global PSQI (mean) | 4.96 ± 2.33 | 4.30 ± 2.25 | 5.22 ± 2.22 |
Note: subjects were excluded if they were taking heart medications or had a pacemaker. Subjects with severe sleep apnea were also excluded (see Methods). Heart and respiratory conditions listed here included: chronic asthma, sleep apnea (AHI < 30), and arrhythmia. |
Experimental Design
Before starting the study, subjects filled out a medical history survey (see SI Appendix A) and the Pittsburgh Sleep Quality Index (PSQI, 26; pre-PSQI). The pre-PSQI asks subjects about their normal sleeping habits over the previous month (see SI Appendix B). The PSQI generates a global PSQI score and seven sub-component scores for subjective sleep quality, sleep onset latency, sleep duration, sleep efficiency, sleep disturbances, use of sleeping medications, and daytime dysfunction.
Pod OFF Baseline (Week 1)
For the first seven nights during Pod OFF, baseline physiological and perceptual data were collected. To collect sleeping HR and HRV along with exercise data, subjects were mailed a Fitbit Versa 2 (Fitbit Inc., San Francisco, CA) to start wearing on Night 1. Subjects wore their Fitbit throughout the entire study and were asked to log any exercise sessions that they completed. Subjects were also sent a daily survey each morning which included perceptual questions about thermal comfort, sleep satisfaction, and ease of falling asleep and waking up (see SI Appendix C).
Subjects who did not already own an Eight Sleep Pod (n = 42) slept on their own mattress for Nights 1–5, until the Pod was installed by an Eight Sleep research associate on Night 6. Subjects who already owned a Pod (n = 12) were instructed to start sleeping on the Pod with temperature OFF starting three nights before Night 1 to mitigate any potential carryover effects of sleeping with Pod ON. We required that these subjects had at least three consecutive nights with Pod OFF before wearing Fitbit. All subjects wore the Fitbit and filled out the daily survey during Nights 1–5.
For Nights 6 and 7, subjects slept on the Pod with temperature OFF. Sleep stages, SOL, SE, TST, and deep and REM sleep onset latencies were collected on these nights via a home sleep test device (HST; Zmachine Synergy, General Sleep Corp., Cleveland, OH).
Pod ON (Week 2)
For the next seven nights (Nights 8–14), subjects slept on the Pod with the temperature regulation on (Pod ON). See below for details about how the Pod’s temperature regulation works. To assess whether there were any short-term vs. longer-term changes in sleep with Pod ON, subjects wore the HST on Nights 8–9 & 13–14, which corresponded with the first two nights and last two nights of the one-week intervention of Pod ON. Subjects continued to wear the Fitbit and fill out the daily surveys for Nights 8–14. After Night 14, subjects filled out a post-Pod PSQI (see SI Appendix D) that asked subjects questions about their sleep habits during the week of Pod ON. The post-Pod PSQI had slightly altered questions to account for the adjusted timeline (one week vs. one month). Therefore, only a subset of PSQI subcomponents could be compared from pre- to post-PSQI (see Statistical Analysis for more details).
Pod OFF End (final two nights)
On Nights 15 & 16, subjects slept with Pod OFF again to determine whether any long-lasting effects of sleeping with Pod ON persisted after the Pod temperature was turned OFF. Subjects wore the HST for the last two nights and continued to wear Fitbit and fill out the daily surveys. On day 17, study equipment was collected from subjects.
Temperature Compliance
To ensure subjects complied with the experimental design for Pod ON and Pod OFF, their Pod temperature data were monitored daily. The temperature data were recorded in real-time and stored in a relational database (Postgres). These data were queried for each subject over the course of the study to determine their temperature values for each night. A subject was temperature compliant if they slept with Pod OFF for their first two nights on the Pod (Nights 6 & 7), slept with Pod ON for the subsequent seven nights (Nights 8–14), and then turned the Pod OFF for the final two nights (Nights 15 & 16). If subjects previously owned a Pod, then they were considered temperature compliant if they had Pod OFF at least three days before the study began and up through Night 7. Subjects were asked to repeat a night if they did not follow this schedule. If a subject had to repeat a night for any reason, they could still be considered temperature compliant if they had more than two nights at the beginning with Pod OFF, at least six nights during Week 2 with Pod ON, and two or more nights with Pod OFF at the end. Only seven subjects had to repeat a night to remain temperature compliant.
Physiological Data
The Eight Sleep Pod
The Eight Sleep Pod has two main capabilities: 1) continuously regulate water temperature flowing through the mattress cover during the night, and 2) collect biometric data (HR, HRV, respiratory rate, and sleep staging data). These functionalities occur independently on both sides of the bed while the person sleeps. The Pod consists of a hub that sits beside the bed, and a cover that fits over the mattress like a thick fitted sheet. Inside the cover, water flows through a water mat to heat and cool the bed according to the subject’s preference. The Pod temperature is controlled through the Eight Sleep application (iOS and Android) using a temperature dial. The water temperature circulating throughout the Pod was programmed by each subject through the app, and can achieve water temperatures ranging from ~ 13 to ~ 43°C which corresponds to a range of -10 to + 10 on the temperature dial in the Eight Sleep app. Each person sleeping on the Pod can independently program the temperature on their half of the bed. A temperature profile is customizable and consists of three temperature settings that automatically cycle throughout the night. The first temperature setting, the Bedtime Phase, lasts from the time the person gets into bed until 15 min after persistent sleep is detected. The second temperature setting, the Early Phase, lasts for four hours after Bedtime Phase. The third temperature setting, the Late Phase, lasts from the end of the Early Phase until waking.
To help subjects quickly select temperature settings that were neutral and minimize any adjustment period of sleeping on the Pod, the research associates recommended a different temperature profile for women vs. men. Women were recommended − 1, 0, and + 1, while men were recommended − 2, -1, and 0, which correspond to water temperatures of approximately 26, 27, and 29°C vs. 25, 26, and 27°C, respectively. Although subjects were given these recommendations, they were allowed to create any temperature profile they preferred during Pod ON, and then continue to adjust their profile throughout the week as desired.
Note that although physiological data from the Pod were recorded throughout sleep, none of these data were used in this manuscript. This was to ensure that any findings during Pod ON were based on independent third-party devices (Fitbit and HST).
HST
For eight nights, subjects wore the Zmachine Synergy, an HST that records the following: single-channel electroencephalogram (EEG) from three electrodes, respiratory effort via respiratory inductance plethysmography, respiratory airflow via nasal cannula, oxygen saturation via pulse oximeter, heart rate via photoplethysmography, and body position via tri-axis accelerometer (rotation and tilt in degrees). The Zmachine sleep monitoring system takes the EEG signal and generates sleep stages via an automated single-channel EEG sleep staging algorithm, Z-ALG, that is FDA cleared, with a Cohen’s kappa agreement of 0.7227. At the Pod installation (for subjects without a Pod) or via zoom (for subjects with a Pod), an Eight Sleep research associate guided subjects on proper HST setup. To ensure maximum adherence of the EEG electrodes throughout the night, subjects were instructed to attach the three EEG electrodes to the skin (behind the ears and back of the neck) ~ 20 mins before attaching the wires. Subjects fit the HST respiratory belt over light clothing on their breastbone/nipple line and wore the nasal cannula taped to their cheeks and secured behind their ears. Subjects were additionally provided with General Sleep videos and manuals on proper HST installation, as well as Zoom call check-ins and warnings on actions that may impede data collection. If subjects reported any issues with the HST in the daily survey, the research associate would follow up with them to ask questions and help mitigate any repeated issues.
All data from the HST were exported as an EDF file and preprocessed in EDF browser (version 2.0)28. The sleep stages are stored as annotations in the EDF file in 30 s epochs. The stages are wake (W), combined light sleep (N1 and N2), combined deep sleep (N3 and N4), Rapid Eye Movement (REM; R), and inconclusive (“?”). If there were fewer than three hours of sleep stage data, we classified the night as invalid. If a subject’s night was invalid due to HST checks or temperature compliance checks (see above), then they would be asked to repeat a night. Out of our total population, nine subjects repeated a night with the HST during the study; however, this repeat night was only included if it was in proper sequence with when the Pod was ON or OFF. Furthermore, within Pod ON, the repeat HST night had to occur at the beginning or the end of the week, because there were three days without HST in the middle of the week.
Fitbit HR, HRV, and Exercise Data
The Fitbit was worn continuously throughout the study to collect sleeping HR and HRV, along with physical activity data including time spent in HR zones and total daily steps. The Fitbit Versa has been validated for HR compared to electrocardiogram (ECG) r = 0.9129. For each exercise session that was logged by a subject or auto-detected by Fitbit, each exercise minute was categorized into a heart rate zone, based on the heart rate for that minute as a percentage of the subject’s age-predicted maximum HR. Subjects were asked to enter their demographic information (age, weight, height, and sex) in the Fitbit application at the beginning of the study to ensure accurate classification of HR zones based on age-predicted maximum heart rate. Fitbit classifies fat burn, cardio, and peak heart rate zones as 40–59%, 60–84%, and > 85% of maximum HR, respectively30. After completion of the study, the Eight Sleep research team accessed the Fitbit data through a combination of the free Fitbit data web export (via logging in to each subject’s account that the research team created for them), and through Terra API (England, UK) integration in the Eight Sleep app which was then queried from Terra API. Through these two data sources, we obtained daily information for each subject including minutes spent in each heart rate zone, total steps, minimum sleeping HR, and median sleeping HRV. Fitbit sleep data were not used for any analyses except to serve as a secondary check for HST data when there was substantial missing data due to noisy EEG (see details below).
Data Quality Checks & Data Processing
HST Post-processing
Further processing was done with the HST data in order to calculate SOL, LPS (latency to persistent sleep), TST, REM sleep latency, deep sleep latency (Ldeep) and SE for the entire night. In order to calculate SOL, we first found sleep onset. Sleep onset was defined as the first occurrence of sleep in any sleep stage within the first hour with at least 10 minutes of sleep in any sleep stage. Since subjects turned on the HST right before going to sleep, SOL was found using the difference in minutes between sleep onset and the start of the nightly HST recording. SOL itself is often calculated using the first epoch of stage 2 sleep31, but Z-ALG returns stages 1 and 2 together as light sleep27, so we cannot distinguish between stage 1 and stage 2. To overcome this, we found the first hour where the individual was asleep for more than 10 minutes. Within this first hour of substantial sleep, we found the exact time when sleep started. The 10-minute threshold avoids counting intermittent light sleep as the start of the sleep period. This method was manually reviewed by two individuals blinded to Pod OFF vs. Pod ON. LPS was defined as the start of the HST recording until the first occurrence of 10 consecutive minutes of sleep in any phase32. If the HST was removed before stopping the recording, there could be upwards of 10 hours of “?” stages, so the end of the file (sleep offset) was defined as the last occurrence of REM, deep, light, or wake sleep stages. TST was found by taking the sum of all minutes spent in REM, deep, light, sleep from sleep onset until the end of the file. The number of minutes spent in REM, deep, light, Wake, and “?” stages were calculated by taking the total number of 30 s epochs in each stage and dividing that by two. REM latency was defined as the amount of time between SOL and the first occurrence of REM sleep. The number of awakenings was defined as the number of awake periods following SOL, which was calculated by finding the distinct occurrences of consecutive wake stages33. SE was defined as TST divided by the duration of the HST recording up until the calculated the end of the file, excluding from both the numerator and denominator the minutes spent in “?” stage34.
Additionally, we calculated TST, total minutes spent in each sleep stage, the percentage of time spent in each sleep stage relative to TST, and the total number of awakenings for both Early Phase and Late Phase (defined above). To calculate these metrics for each Phase, the 30 s epochs were divided into the respective Phases. From there, the same method as above was used to calculate the number of minutes spent in REM, deep, light and Wake stages respective to each Phase. The TST was found by taking the sum of all minutes spent in REM, deep, and light sleep during that Phase. Percentage of total Wake time was calculated by taking the minutes spent in the Wake stage divided by the total number of minutes in each Phase. The percentage of time spent in the rest of the sleep stages was calculated by dividing the minutes spent in each sleep stage during the Early or Late Phase by TST for that Phase.
Filtering Methods
Out of the original 69 subjects that completed the study, the dataset was further filtered to only include subjects who were temperature compliant and had sufficient sleep data. If a subject was not temperature compliant (see above), then they were not used for data analysis. A total of 56 subjects were temperature compliant: 49 subjects were temperature compliant by following the original schedule, and 7 subjects were temperature compliant by repeating a night (as discussed above).
For both the HST and Fitbit datasets, each night was matched with the subject’s temperature status (i.e., Pod ON or Pod OFF) for that night. The temperature status categories are: 1) Pod OFF Baseline (first seven nights of baseline with Pod OFF or not yet installed), 2) Pod ON beginning (first four days of the week with Pod ON), 3) Pod ON end (last three days of the week with Pod ON), and 4) Pod OFF End (final two days of the experiment with Pod OFF). Twelve subjects were previous Pod owners, and four of these subjects did not turn temperature OFF at least three days prior to Night 1. For these four individuals, up to two nights of Fitbit and survey data were removed from the beginning of their Pod OFF Baseline data to ensure they had at least three nights with Pod OFF before their data were used for analyses.
Fitbit sleep sessions with a duration of less than three hours were removed from the dataset because Fitbit does not report HR, HRV, or sleep staging for sessions under 3 h35. Subjects were included in the Fitbit analysis (n = 54) only if they had at least four out of seven nights for Pod OFF Baseline, five out of seven nights for Pod ON beginning and Pod ON end combined, and one out of the two nights for Post-Pod OFF. After applying these filtering criteria, HST sleep sessions with TST < 3 h (~ 7% of nights) were removed from the dataset to match the Fitbit cutoff of 3 h.
Since there were several HST nights with a significant portion of the night containing “?” stage (due to loose electrodes or noisy EEG data), there was further filtering to ensure that only nights with reliable sleep staging were kept. For each night, we calculated the longest time period of consecutive minutes with a “?” stage. This highest count of missing data for a given night was divided by the total duration (including sleep and wake time) of the HST recording to define a percentage of missing (“?”) data. Any night with > 16% missing data was removed from the dataset. The 16% threshold was determined by visually examining the percent of time spent in deep and REM stages. In evaluating different thresholds for the missing data cutoff, we found that there was little difference in the aggregate statistics for percentage of time spent in deep and REM stages by filtering out nights with > 10% vs. >16% missing data. However, the nights with greater than 16% missing data had significant differences in time spent in deep and REM stages when compared to Fitbit’s sleep staging percentages. After all these filtering steps, subjects were only included in the final analyses if they had at least one HST night in each of the four temperature status categories (listed above). After applying these filtering methods, 44 out of 56 subjects were included in the HST analysis and 54 out of 56 subjects for the Fitbit analyses (see Table 1).
Statistical Analysis
To obtain each subject’s temperature data for each Phase (Bedtime, Early, and Late) for Pod ON, a weighted mean temperature was calculated. If a subject woke up in the middle of a Phase and changed their temperature, a weighted mean better represented the overall temperature they experienced by accounting for the time spent at each temperature during that Phase. For Pod OFF nights, there were no temperatures, but the total time spent in each sleep stage was still binned into Early and Late Phases by using the same time cutoffs for each Phase as when Pod ON (defined above). Cool vs. warm Pod temperatures were defined by dichotomizing Pod temperatures for men and women separately as the median for each of the three temperature Phases on the Pod (Bedtime, Early Phase, and Late Phase). For each Phase, temperatures below the median were defined as cool temperatures and those above the median were defined as warm temperatures. The median Pod temperatures during Pod ON for women vs. men at Bedtime were 25.8°C vs. 23.4°C, respectively; for Early Phase were 26.0°C vs. 25.4°C, respectively; and for Late Phase were 28.6°C vs. 26.8°C, respectively.
Linear mixed models were used to evaluate differences in the physiological data (HR and HRV), along with time spent in each sleep stage for Pod OFF vs. ON. All models met the assumption of linearity, normality of errors and homoscedasticity. This was evaluated visually using qq plots, histograms of residuals, and plots of residuals against the outcome. Pod temperature was analyzed in three separate models as 1) a binary variable (ON vs. OFF), 2) as a categorical variable where Pod OFF Baseline was the reference compared to sleeping with Pod ON or Post-Pod OFF, and 3) a second categorical variable, where all Pod OFF days were the reference compared to sleeping at cool vs. warm Pod temperatures when Pod ON. Models evaluating cool vs. warm temperatures were stratified by sex. Sleep-stage specific analyses were restricted to the subset of subjects who had at least one night of HST data in Pod OFF Baseline and Post-Pod OFF, and at least two nights of HST data during Pod ON (n = 44; see detailed explanation above).
Cumulative link mixed models (CLMM) were used to estimate odds ratios (OR) and 95% confidence intervals (CI) for the relationship between Pod temperature, as well as the ordinal daily perceptual questions and the six PSQI components. The proportional odds assumption for these models was evaluated by comparing CLMM models to multinomial models which have no assumption of proportional odds. Models were compared using Akaike's information criteria (AIC), a model goodness of fit measurement36. The CLMM models produced lower AICs compared to the multinomial regression indicating the CLMM model is better fit for the data. The PSQI analysis evaluated changes in six out of the seven PSQI sleep components: sleep quality, sleep onset latency, sleep duration, sleep efficiency, sleep medication use, and daytime dysfunction. Changes in the sleep disturbances component and global PSQI score were not evaluated because the validated scoring for these metrics uses responses that require an assessment over one month, which was not possible to assess after one week of Pod ON. Pod temperature was analyzed in separate models as both a binary variable (Pod ON vs. OFF) and a categorical variable. In the case of the Pod temperature as a categorical variable, Pod OFF Baseline was the reference compared to sleeping at a cool vs. warm Pod temperature. Each model was fit with a random intercept to account for differences in responses for both within- and between-subject correlations. For each of the sleep components in the PSQI, an increased score is considered a worse sleep outcome. Therefore, an OR < 1 for any PSQI component indicated a sleep improvement compared to Pod OFF Baseline. Conversely, a higher score on the daily perceptual questions, except for the thermal sensation question, indicated better sleep outcomes. In this case, an OR > 1 indicated improvements in sleep compared to Pod OFF Baseline. With respect to the daily survey question on thermal sensation, an OR < 1 indicated that, on average, subjects felt warmer during sleep, whereas an OR > 1 indicated subjects felt cooler during sleep compared to Pod OFF. Sensitivity analyses were conducted where the 12 subjects who were previous Pod owners were excluded from the daily survey perceptual analyses. This was to account for the fact that current Pod owners may be accustomed to the Pod and feel differently during Pod OFF vs. those who have never slept on the Pod.
To evaluate the mind-body connection, we looked at how changes in the nightly sleep metrics from Pod OFF to ON impacted perceptual responses the following day. In the subset of individuals with HST data (n = 44), CLMM were used to evaluate the relationship between the daily perceptual questions and changes in HST-measured sleep metrics. Changes in sleep metrics were calculated for each individual within each temperature Phase by subtracting the average time spent in each sleep stage during Pod OFF from each night's sleep stage duration with Pod ON. These values were dichotomized, where a change score > 0 indicated an improvement in sleep with Pod ON, and a change score < 0 indicated a negative change in sleep with Pod ON. For the change in wake time and number of awakenings, values were inverted so that negative change scores were considered an improvement in sleep. The goal of these analyses was to demonstrate whether improvements in the sleep metrics (measured via HST) actually resulted in perceived sleep improvements.
For all models, alpha was set 0.05. All models were evaluated for effect modification by sex using a multiplicative interaction term. All results were evaluated and visualized using R statistical software (version 4.2.2)37 and all pre-processing of the data was done with Python (version 3.7).