False Atrial Fibrillation Alerts from Smartwatches are Associated with Decreased Perceived Physical Well-being and Confidence in Chronic Symptoms Management

Wrist-based wearables have been FDA approved for AF detection. However, the health behavior impact of false AF alerts from wearables on older patients at high risk for AF are not known. In this work, we analyzed data from the Pulsewatch (NCT03761394) study, which randomized patients (≥50 years) with history of stroke or transient ischemic attack to wear a patch monitor and a smartwatch linked to a smartphone running the Pulsewatch application vs to only the cardiac patch monitor over 14 days. At baseline and 14 days, participants completed validated instruments to assess for anxiety, patient activation, perceived mental and physical health, chronic symptom management self-efficacy, and medicine adherence. We employed linear regression to examine associations between false AF alerts with change in patient-reported outcomes. Receipt of false AF alerts was related to a dose-dependent decline in self-perceived physical health and levels of disease self-management. We developed a novel convolutional denoising autoencoder (CDA) to remove motion and noise artifacts in photoplethysmography (PPG) segments to optimize AF detection, which substantially reduced the number of false alerts. A promising approach to avoid negative impact of false alerts is to employ artificial intelligence driven algorithms to improve accuracy.


Introduction
Early detection of atrial fibrillation (AF) can prevent the devastating consequences of strokes [1][2][3]. Nearly 20% of those who suffer ischemic strokes associated with AF are first diagnosed with the arrhythmia at the time of the stroke or soon after, highlighting the need for timely AF detection [4,5]. The American Heart Association recommends heart rhythm monitoring for undiagnosed AF in patients who suffered an embolic stroke of undetermined source [6,7]. Non-invasive adhesive monitors, such as the Zio ® Patch, Cardiac Insight patch, or mobile cardiac outpatient telemetry (MCOTTM) devices are often prescribed for this purpose [8][9][10]. The typical wear time for Zio ® Patch or Body Guardian is 14 days, whereas MCOTTM devices can be used for up to 30 days of monitoring. Implantable cardiac monitors (ICM), which provide AF monitoring over years, have been shown to detect higher rates of AF than conventional monitoring in the 12-months after a cryptogenic stroke, suggesting that prolonged monitoring after stroke can lead to higher rates of AF detection [11]. Despite the longer duration of surveillance with ICM, the higher cost and invasiveness of these devices limit their broad adoption [12]. The Pew Research Center reports one-infive U.S. adults in their daily lives wear smartwatch with a fitness tracker. Many such devices are now capable of AF detection [13]. Several commercially available smartwatches, including from Apple, Samsung, and Fitbit, have been shown to be highly sensitive and specific for AF detection and have both pulse plethysmographic and electrocardiographic algorithms capable of detecting rhythm irregularity that are cleared by the FDA for incident AF detection [14][15][16][17]. To date, studies exploring smartwatches' ability to detect AF include lower-risk populations, predominantly young smartwatch owners [18]. The usability and accuracy of wrist-based wearables to identify AF in populations at high risk for arrhythmias, including older stroke survivors, have not been thoroughly evaluated [19]. Older adults report less familiarity with digital health technology and thus maybe particularly impacted by notifications from smartwatches prescribed for AF monitoring. The Apple Heart study demonstrated that of the 450 participants who received notifications for irregular pulse, only 34 % had AF diagnosed with subsequent ECG patch monitor [18]. We hypothesize that false AF alerts may adversely impact quality of life and patient self-perceived well-being. We explore this hypothesis by analyzing data from the Pulsewatch study, a randomized clinical trial (NCT03761394) that examined the accuracy and acceptability of a smartwatchsmartphone app dyad for AF detection among stroke survivors [20].

Study Design and Population
We analyzed data from the Pulsewatch study, a randomized controlled trial in which participants were randomized 3:1 into intervention and control groups. In Phase I of the study, the intervention group was randomized to use the Pulsewatch system: an Android OS smartwatch-smartphone app dyad (Samsung Gear S3 or Samsung Galaxy Watch 3) capable of AF detection and wear a standard-of-care ECG patch monitor (Cardea Solo, Cardiac Insight, Seattle WA) for 14 days [20,21]. The control group was asked to wear the ECG patch monitor but did not receive a study smartwatch-smartphone dyad. The second phase of the Pulsewatch study focused on adherence to the smartwatch-smartphone app dyad and did not include contemporaneous patch monitoring or AF adjudication. As the focus of the present analysis was on the impact of false positive alerts, defined as alerts not triggered by AF as determined by ECG patch monitor and cardiologists, we used only information gathered from the intervention group from Phase I [20]. Eligible participants were recruited from neurology and cardiology ambulatory clinics at UMass Memorial Health from 2018-2021. Participants had to (1) be over 50 years old, (2) have had an ischemic stroke in the last decade, and (3) be willing to use the Pulsewatch system (smartwatch-smartphone app dyad) and undergo patch monitoring over the course of the study. Exclusion criteria included having contraindications to long-term anticoagulation, inability toprovide informed consent, contraindication for wearing an ECG patch monitor (e.g., sensitivity or allergy to medical adhesives, implantable pacemaker), or having a life-threatening arrhythmia that required immediate analysis and in-patient monitoring. The methods were performed in accordance with relevant guidelines and regulations and approved by UMass Chan Institutional Review Board (H00009953).

Study Procedures
Eligible patients were identified through review of the electronic medical record. Once identified, an invitation letter was mailed to briefly describe the study, including a study number to call if they had additional questions or wanted to opt out of further contact. These potential participants were then approached at the time of their clinic appointment. If they chose to enroll in the study, they were provided written informed consent and filled out a baseline study questionnaire assessing several sociodemographic and psychosocial domains. At baseline and 14 days, participants completed validated questionnaires, including the generalized anxiety disorder-7 (GAD-7) scale, consumer health activation index (CHAI), the SF-12 (physical and mental health) survey, and the general disease management scale [22][23][24][25]. All participants were given a comprehensive set of instructions about proper watch and patch monitor use as well as in-person training at enrollment on operation of all study devices. Participants who were randomized to use the Pulsewatch system were coached to wear the watch as much as possible and to engage with the Pulsewatch phone app to log symptoms or review their data (including any alerts). Since the first phase of the study focused on accuracy, the study staff called intervention participants on the third and seventh days of the study to encourage watch wear and help troubleshoot any technical challenges they might experience. In our study, participants wore the Samsung smartwatch with PPG sensor program that ran every 10 min [26]. The duration of the sensor-on stage was 5 min, but it could be extended based on PPG findings ( Figure 1). During the sensor-on stage, should there be 1.5 min of AF detected, participants were asked to "please hold still" for further 1.0 min analysis. "Abnormality" alert appeared if AF was detected in the 1.0 monitoring period. In our study, we considered any participant with both a "Please stay still" and "Abnormality" notification as having an AF alert. Smartwatch alerts for AF that did not correspond with AF identified on their Cardiac Insight ECG patch and cardiologist overread were considered to be false alerts. Sociodemographic, psychosocial, and clinical characteristics were compared between participants who received false alerts and those who did not. Chi-square tests and t-tests were used to examine between group differences for continuous variables. We used linear regression models to examine the association between those with and without false alerts, at baseline, and changes in anxiety, patient activation, and self-rated physical and mental health status, over a 14-day period. We did not adjust for confounding variables due to our modest sample size. P-value < 0.05 was considered to be statistically significant and all statistical analyses were completed using SAS 9.3.

Artificial Intelligence Driven Algorithms to Improve Accuracy
To increase data usability for AF detection, we propose reconstruction of artifact removed PPG segments using convolutional denoising autoencoder (CDA) [27]. Most prior studies typically used only one CDA for removing motion and noise artifact (MNA) from ECG with NSR [28]. Moreover, some studies have used CDA trained on NSR to remove MNA on data with AF. More recently, CDA was trained using equal number of AF and NSR data, but both approaches resulted in sub-optimal performances [29,30]. Our approach differs from prior CDA approaches in that we employ two distinct CDA models that were trained on either data with AF or data without AF ("non-AF"). In order to know which model to use on a segment with moderate MNA, however, we first had to classify the segment as either AF or non-AF using a deep learning (DL)-based classifier so that the appropriately trained CDA model for either AF or non-AF could be used to remove motion and noise artifact. The rationale for using two separate CDA models for either AF or non-AF is that, based on our investigation, we found that a specific CDA model trained for AF performed better on segments with AF than did a model that combined both AF and non-AF rhythms. Likewise, a specific CDA model trained for non-AF performed better on non-AF segments.

Results
Of the eighty-five participants who were randomized to receive smartwatch-smartphone dyads, 15 received AF alerts from Samsung smartwatches (Supplemental Table 1-3). Ten of the 15 participants (67%) who received alerts did not have AF detected on a contemporaneous ECG patch monitor. The average age of the 80 participants (10 who received false alerts and 70 who received no alerts) in our study was 64.3 ± 9.1 years old. Eighty-six percent were white and 43% were female. Ten out of eighty participants (12%) had false AF alerts. Five participants received ≤ 2 alerts and five participants received >2 alerts, with the most being 13 false alerts received by one participant (Figure 2). Of the total 35 false alerts that occurred, 19 had underlying sinus rhythm with noisy PPG signals, and 11 were caused by arrhythmias, such as premature atrial or ventricular contractions as well as sinus arrhythmia. (Figure 3). There were 5 alerts that we were not able to determine underlying cause as ECG data was corrupted and not available. There were no significant differences in the baseline characteristics of those with false alerts as compared with those free from false alerts during the 14-day follow-up period (Table 1). Baseline blood pressure, heart rate, and BMI were similar among both groups. Additionally, there were no differences in psychosocial characteristics or technology engagement characteristics between the two groups ( Table 2).

Change in Patient Reported Outcomes during the 14-day follow-up Period
Participants who received false AF alerts did not have statistically significant change in self-reported anxiety (GAD-7, β = −1.11 (1.28), P = 0.39, Table 3), patient activation (CHAI, β = −6.04 (4.14), P = 0.15, Table 3), or medication adherence (β = −1.06 (0.99), P = 0.29, Table 3) compared to those not exposed to false alerts over the study period. Similarly, participants with a false positive alert did not experience a significant decline in mental health compared to those free from an alert (β = 1.14 (2.65), P = 0.67, Table 3). Notably, participants with a false AF alert reported a statistically significant decrease in self-reported physical health (β = −7.53 (3.20), P = <0.02, Table 3) compared to those free from an alert over the study period. Furthermore, those who received >2 alerts reported a more significant decrease in self-reported physical health than did those who received ≤ 2 alerts (β = −14.08 (4.24), P = 0.001 vs β = −0.99 (4.24), P = 0.82, respectively, Table 4). Participants who received a false AF alert also reported less confidence in symptom self-management than did those who did not receive a false AF alert (β = −8.32 (2.81), P = 0.004, Table 3). Those participants who received > 2 alerts reported a greater decrease in their confidence in symptom self-management than did those who received ≤ 2 alerts (β = −12.32 (3.80), P = 0.002 vs β = −4.32 (3.80), P = 0.26, respectively, Table 4). Of note, we did not observe a statistically significant impact of true positive AF alerts on patient-reported outcomes, though this could be due to our modest sample size (Supplemental Tables 4-6).

Artificial Intelligence Driven Algorithms to Improve Accuracy
We developed a deep learning based approach that uses a convolutional denoising autoencoder to improve PPG signal quality for subsequent AF detection [27]. Employing this artificial intelligence algorithm successfully reduced the number of participants with false positives from ten to two participants ( Figure 4). The total number of false positive alerts was also reduced from 35 to 6 ( Figure 4).

Discussion
Several smartwatches are FDA cleared for AF detection and these devices may have particular value among older adults who survive an embolic stroke of undetermined source [31][32][33]. Older adults at highest risk for AF are also at highest risk for other arrhythmias (e.g., premature atrial beats, atrial ectopy, atrial tachycardia) and conditions (e.g., tremors) that may decrease the accuracy of wrist-based wearables for AF detection [34][35][36][37][38]. There have been reports of anxiety among smartwatch users who receive false AF alerts and studies are needed to better understand the performance and acceptability of wrist-based wearables for AF detection in older populations [39,40]. The use of smartwatches has not to date directly been associated with harms, but falsely abnormal results may be associated with anxiety and potentially have a negative impact on the psychological health of participants. The Screening for Atrial Fibrillation in the Elderly (SAFE) study was a multicenter trial that randomized several clinical practices to screening (systematic or opportunistic) vs. no screening for atrial fibrillation [41,42]. Anxiety scores were not significantly different between systematic and opportunistic AF screening arms. The study, however, did not collect any anxiety data points from participants in the no screening group, and no comparative analysis was possible between the screening and no screening groups.
Using data from the Pulsewatch study, we sought to understand the connections between false AF alerts with patient reported outcomes that relate to overall patient well-being and healthcare utilization. We observed that false alerts occurred in 67% of participants who received any alert. This is consistent with other real-world studies involving smart watch users, including the Apple Heart study that demonstrated only 34% of the 450 participants with irregular pulse alerts on smartwatch had AF diagnosed on subsequent ECG patch monitor [18]. In our study, the most common cause of a false alert was a poor quality PPG signal. The embedded motion artifact detection algorithm in the smartwatch was based on statistical features and a threshold value derived from the timefrequency representation using the preliminary data collected from 37 subjects wearing a non-commercially-available Samsung smartwatch, Simband, in a clinical environment [21]. The data collection duration was 14 minutes and the protocol involved subjects performing limited daily activities. The accuracy of the AF detection algorithm was constrained by the narrow memory of the smarwatch, modest training dataset and the maximization of near real-time calculations. The second cause for false alerts was non-AF arrythmias, such as sinus arrythmia, premature atrial complexes (PACs) and premature ventricular complexes (PVCs). Our work is consistent with previous findings of Bashar and colleagues that false positives are related to noise artifact, PAC and PVC [21]. While we tried to account for PACs and PVCs, the training data contained only 6 subjects with these rhythms [43]. More importantly, there were only 46 segments to train PAC/PVC from these subjects which is not sufficient to account for different dynamics of these rhythms. Hence, it is not surprising that our embedded algorithm was not able to accurately differentiate PAC/PVC from AF beats. When we applied our deep learning approach for AF detection, we reduced the number of false positive alerts by 83%. Our deep learning approach developed for offline analysis was trained with 60 times more segments than the number used in our embedded rule-based AF detection [44]. As the results showed successful reduction of the number of false positive alerts using deep learning, we concluded that having a sufficiently large training dataset is paramount to account for various complicated rhythms for both deep learning and statistical rule-based AF detection algorithms. Although we did not identify physiological, psychosocial, socio-demographic, or other characteristics associated with receipt of a false alert in our sample of older stroke survivors, we observed that health-related quality of life and confidence in symptom management decreased significantly in participants receiving false AF alerts. Participants who received false alerts might have reported worsened quality of life and lower confidence in their disease management due to physical manifestations of PACs or PVCs, albeit a rather low proportion of false alerts was attributed to those arrhythmias (less than 11 out of 35 false alerts, Figure 3). It is also possible that these alerts contributed to heightened awareness or worry independent of their cardiac rhythm. An alternative explanation might be that participants who received a false alert may have experienced AF outside of the monitoring window (for example at nighttime) and therefore their symptoms were attributable to undetected cardiac arrhythmias. Consistent with findings of decreased physical health perceptions, participants who received false alerts also reported decreased confidence in chronic symptom management. Participants who receive alerts may feel overwhelmed and lose self-confidence as they do not know how to manage alerts and have little power to stop the alarms. Of note, there was no change in self-reported medication adherence among those who received a false alert. There appears to be a dose-response relationship with increasing number of false alerts and decreasing self-reported physical health and self-reported confidence in chronic symptom management. In comparison to those who receive ≤ 2 false alerts, those who received > 2 false alerts reported greater reduction in perceived physical health and confidence in chronic symptom management during study period. Results suggest a potential threshold after which alerts can cause significant decline in a patient's perceived wellbeing. Our findings are consistent with previous reports that false positive alerts are associated with negative short-term psychosocial consequences, affecting self-perception, and decreasing short-term quality of life [45][46][47][48]. Here, we address several gaps in our understanding of the impact of false alerts in a population wearing contemporaneous watches and patch monitors. Consistent with our previously published work, we found that smartwatch alerts for AF (both true and false positives) cause significant decline in self-reported physical health in a dose-dependent manner [48]. Our findings suggest that clinicians should consider the stress and potential adverse impact of false alerts before recommending commercial wearables for AF detection and should educate patients about what to do should they receive an AF alert. Previously, Ding et al. found that health care providers report difficulty interpreting tracings from commercial wearables and a lack of knowledge about the appropriate workup of patients with a possible AF alert [49]. Wang et al. reported that healthcare utilization, including ablation procedures, was higher among patients with AF and wearables as compared to patients with AF and no wearables, even when controlling for their baseline heart rate [7]. Further studies are needed to optimize AF detection algorithms for long-term monitoring in older populations to minimize the potential negative impact of false positives alerts.
Our findings suggest that clinicians should educate their patients about the limitations of commercial wearables and discuss the potential risks and benefits of these devices. More real-world studies, like HEARTLINE (NCT04276441), are needed to examine the clinical impact of smartwatch prescription for patients with or at risk for AF. Finally, society guidelines for clinicians and patient-education materials are needed to help healthcare providers and their patients navigate the increasingly complex area of mobile health and arrhythmia surveillance.

Strength and Limitations
Our study has multiple strengths. It is a multifaceted randomized clinical trial to evaluate the accuracy and health behavior impact of wearables for AF detection in stroke survivors. Participants in this study are well-defined with respect to sociodemographic, clinical, and psychosocial characteristics. We applied validated instruments including PEPPI, GAD-7, CHAI, and SF-12 at two time-points to examine changes in patient-physician interaction, anxiety, patient activation, and health-related quality of life among participants, increasing the generalizability and likely reproducibility of our study findings. There are some limitations that should be considered when interpreting our findings. Our sample size is modest with relatively short follow-up. It is not designed or powered to evaluate the effects of false alerts on measured outcomes in long-term use of smartwatches for AF detection. Furthermore, our cohort is relatively homogeneous with respect to race, ethnicity, and socioeconomic status, and only includes stroke survivors, limiting the generalizability of our findings to populations not represented in our study cohort. Finally, we only examined the impact of alerts from one smartwatch-smartphone system and it is possible that other alert systems do not elicit the same response from users.

Conclusions
We observed that a modest proportion of individuals received a false AF alert in our randomized trial of older stroke survivors over a two-week period. We noted a significant, dose-dependent negative impact of false AF alerts on both health-related quality of life and chronic disease self-management. A promising approach to avoid negative impact of false alerts is to employ artificial intelligence driven algorithms to improve accuracy. Further study is warranted to improve AI algorithm of commercial wearables for AF detection and new tools are needed to help guide healthcare providers, patients, and caregivers to make informed decisions about smartwatch use given the potential adverse impact of false alerts in this population.

Supplementary Material
Refer to Web version on PubMed Central for supplementary material.

Funding
The Pulsewatch Study is funded by R01HL137734 from the National Heart, Lung,

Data Availability
Data from the Pulsewatch study will be made available upon request, subject to approval of the University of Massachusetts Chan Medical School Institutional Review Board. All questions can be addressed to khanh-van.tran@umassmemorial.org.       Baseline psychosocial and technology engagement characteristics participants who received false alerts compared to those who received no alerts.