Several smartwatches are FDA cleared for AF detection and these devices may have particular value among older adults who survive an embolic stroke of undetermined source.31–33 Older adults at highest risk for AF are also at highest risk for other arrhythmias (e.g., premature atrial beats, atrial ectopy, atrial tachycardia) and conditions (e.g., tremors) that may decrease the accuracy of wrist-based wearables for AF detection.34–38 There have been reports of anxiety among smartwatch users who receive false AF alerts and studies are needed to better understand the performance and acceptability of wrist-based wearables for AF detection in older populations.39,40
The use of smartwatches has not to date directly been associated with harms, but falsely abnormal results may be associated with anxiety and potentially have a negative impact on the psychological health of participants. The Screening for Atrial Fibrillation in the Elderly (SAFE) study was a multicenter trial that randomized several clinical practices to screening (systematic or opportunistic) vs. no screening for atrial fibrillation. 41,42Anxiety scores were not significantly different between systematic and opportunistic AF screening arms. The study, however, did not collect any anxiety data points from participants in the no screening group, and no comparative analysis was possible between the screening and no screening groups. Using data from the Pulsewatch study, we sought to understand the connections between false AF alerts with patient reported outcomes that relate to overall patient well-being and healthcare utilization.
We observed that false alerts occurred in 67% of participants who received any alert. This is consistent with the rate of false positives seen in other real-world studies involving smart watch users, including the Apple Heart study that demonstrated only 34% of the 450 participants with irregular pulse alerts on smartwatch had AF diagnosed on subsequent ECG patch monitor.18 In our study, the most common cause of a false alert was a poor quality PPG signal. The embedded motion artifact detection algorithm in the smartwatch was based on statistical features and a threshold value derived from the time-frequency representation using the preliminary data collected from 37 subjects wearing a non-commercially-available Samsung smartwatch, Simband, in a clinical environment.21 The data collection duration was 14 minutes and the protocol involved subjects performing limited daily activities. The accuracy of the AF detection algorithm was constrained by the narrow memory of the smarwatch, modest training dataset and the maximization of near real-time calculations.
The second cause for false alerts was non-AF arrythmias, such as sinus arrythmia, premature atrial complexes (PACs) and premature ventricular complexes (PVCs). Our work is consistent with previous findings of Bashar and colleagues that false positives are related to noise artifact, PAC and PVC.21 While we tried to account for PACs and PVCs, the training data contained only 6 subjects with these rhythms43 More importantly, there were only 46 segments to train PAC/PVC from these subjects which is not sufficient to account for different dynamics of these rhythms. Hence, it is not surprising that our embedded algorithm was not able to accurately differentiate PAC/PVC from AF beats.
When we applied our deep learning approach for AF detection, we reduced the number of false positive alerts by 83%. Our deep learning approach developed for offline analysis was trained with 60 times more segments than the number used in our embedded rule-based AF detection.44 As the results showed successful reduction of the number of false positive alerts using deep learning, we concluded that having a sufficiently large training dataset is paramount to account for various complicated rhythms for both deep learning and statistical rule-based AF detection algorithms.
Although we did not identify physiological, psychosocial, socio-demographic, or other characteristics associated with receipt of a false alert in our sample of older stroke survivors, we observed that health-related quality of life and confidence in symptom management decreased significantly in participants receiving false AF alerts. Participants who received false alerts might have reported worsened quality of life and lower confidence in their disease management due to physical manifestations of PACs or PVCs, albeit a rather low proportion of false alerts was attributed to those arrhythmias (less than 11 out of 35 false alerts, Fig. 3). It is also possible that these alerts contributed to heightened awareness or worry independent of their cardiac rhythm. An alternative explanation might be that participants who received a false alert may have experienced AF outside of the monitoring window (for example at nighttime) and therefore their symptoms were attributable to undetected cardiac arrhythmias.
Consistent with findings of decreased physical health perceptions, participants who received false alerts also reported decreased confidence in chronic symptom management. Participants who receive alerts may feel overwhelmed and lose self-confidence as they do not know how to manage alerts and have little power to stop the alarms. Of note, there was no change in self-reported medication adherence among those who received a false alert.
There appears to be a dose-response relationship with increasing number of false alerts and decreasing self-reported physical health and self-reported confidence in chronic symptom management. In comparison to those who receive ≤ 2 false alerts, those who received > 2 false alerts reported greater reduction in perceived physical health and confidence in chronic symptom management during study period. Results suggest a potential threshold after which alerts can cause significant decline in a patient's perceived wellbeing.
Our findings are consistent with previous reports that false positive alerts are associated with negative short-term psychosocial consequences, affecting self-perception, and decreasing short-term quality of life. 45–48 Here, we address several gaps in our understanding of the impact of false alerts in a population wearing contemporaneous watches and patch monitors. Consistent with our previously published work, we found that smartwatch alerts for AF (both true and false positives) cause significant decline in self-reported physical health in a dose-dependent manner.48 Our findings suggest that clinicians should consider the stress and potential adverse impact of false alerts before recommending commercial wearables for AF detection and should educate patients about what to do should they receive an AF alert.
Previously, Ding et al. found that health care providers report difficulty interpreting tracings from commercial wearables and a lack of knowledge about the appropriate workup of patients with a possible AF alert.49 Wang et al. reported that healthcare utilization, including ablation procedures, was higher among patients with AF and wearables as compared to patients with AF and no wearables, even when controlling for their baseline heart rate.7 Further studies are needed to optimize AF detection algorithms for long-term monitoring in older populations to minimize the potential negative impact of false positives alerts. Our findings suggest that clinicians should educate their patients about the limitations of commercial wearables and discuss the potential risks and benefits of these devices. More real-world studies, like HEARTLINE (NCT04276441), are needed to examine the clinical impact of smartwatch prescription for patients with or at risk for AF. Finally, society guidelines for clinicians and patient-education materials are needed to help healthcare providers and their patients navigate the increasingly complex area of mobile health and arrhythmia surveillance.
Strength and limitations:
Our study has multiple strengths. It is a multifaceted randomized clinical trial to evaluate the accuracy and health behavior impact of wearables for AF detection in stroke survivors. Participants in this study are well-defined with respect to sociodemographic, clinical, and psychosocial characteristics. We applied validated instruments including PEPPI, GAD-7, CHAI, and SF-12 at two time-points to examine changes in patient-physician interaction, anxiety, patient activation, and health-related quality of life among participants, increasing the generalizability and likely reproducibility of our study findings. There are some limitations that should be considered when interpreting our findings. Our sample size is modest with relatively short follow-up. It is not designed or powered to evaluate the effects of false alerts on measured outcomes in long-term use of smartwatches for AF detection. Furthermore, our cohort is relatively homogeneous with respect to race, ethnicity, and socioeconomic status, and only includes stroke survivors, limiting the generalizability of our findings to populations not represented in our study cohort. Finally, we only examined the impact of alerts from one smartwatch-smartphone system and it is possible that other alert systems do not elicit the same response from users.