What Matters When Exploring Fidelity in Interventions Using Health It to Reduce Disparities in Language-diverse Populations?”


 Background: Evidence-based interventions often develop strategies to engage diverse populations while also attempting to maintain external validity. When using health IT tools to deliver patient-centered health messages, systems-level requirements are often at odds with ‘on-the ground’ tailoring approaches employed in patient-centered care, particularly regarding ensuring equity is achieved linguistically diverse populations. Methods: STAR MAMA, is a 5-month bilingual (English and Spanish) intervention adapted from the Diabetes Prevention Program, examined in a pilot RCT conducted among 181 post-partum women with recent gestational diabetes. Fidelity to pre-determined ‘core’ intervention components (e.g. systems integration) as well as important ‘modifiable’ components focused on population equity (e.g. health coaching responsiveness, and variation in outcomes by language), were assessed, using an adapted implementation fidelity framework. Evaluation data included participant-level surveys, systems-level databases of message delivery, call completion, and health coaching notes.Results: Participant mean age was 31.5 years, 96.6% of participants are Latina and 80.9% were born outside the US. Among those receiving the STAR MAMA calls 55 received the calls in Spanish (61%) and 35 English (39%). Of those in the call arm, 81 women (90%) completed all 20 weeks of the program. There were many more systems errors in the beginning of the program, than over time. Health coaching triggers were also more widespread in the first several weeks of the STAR MAMA intervention, notably among Spanish-speakers. Although Spanish speakers had more triggers than English-speakers, the difference was not statistically significant. Of the calls that triggered a health coach follow-up, a call-back attempt was made for 85.4% (n=152) of the English call triggers and for 80.0% (n=279) of the Spanish call triggers (NS). Of those with attempted calls, health coaching calls were complete for 55.6% (n=85) of English-language call triggers and for 56.6% of Spanish-language call triggers (NS). Some differences in acceptability were noted by language, with Spanish-speakers reporting higher satisfaction with prevention content (p=<0.01) and English-speakers reporting health coaches were less considerate of their time (p=0.03). Conclusions: Implementation fidelity for health IT interventions involving health coaching should address moderating factors, such as language, as well as systems level factors.Trial Registration: National Clinical Trials registration number: CT02240420 Registered September 15, 2014 . ClinicalTrials.gov


Introduction
To improve evidence-based practice, practice-based interventions must balance adaptations to local circumstances with attempts to maintain external validity. It is implied but not always explicitly described that for an intervention or evidence-based practice (EBI) to be considered evidence-based, ndings need to be replicated with delity, even while adaptations occur [1]. According to Carroll et al., delity refers to "the degree to which an intervention is delivered as intended" and is how you determine to what extent an intervention has been adequately 'replicated' [2]. Monitoring and assuring delity are critical for planning to replicate an intervention in a variety of settings and populations, and thus is seen as instrumental to determining factors associated with implementation success and failure, one of the cornerstones of implementation science [3]. Fidelity most often involves attention to content, dose and duration, which can be thought of as general measures of protocol adherence [4,5].
Increasingly, there is more attention to context in discussions of delity [6], and to adaptations of interventions so as to be responsive to local conditions. To further understand the relationship between context and delity, investigators have called for a more comprehensive approach to clarify the concept of delity and the function of factors moderating delity, including context, participant responsiveness, intervention complexity, among others [6,7,8]. Increasing attention to the conceptualization, measurement and documentation of adaptations means that there is more information about the relationship between delity, adaptations and outcomes [7,8]. As well, recent theoretical work in this area [8], has proposed reviewing both delity and adaptation in the context of a 'value equation' which focuses more on the nal desired outcomes beyond intervention effects, linking three concepts: 1) The end product should emphasize overall value rather than only the intervention effects, 2) implementation strategies are a means to method to create ' t' between EBIs and context, and 3) transparency is required. As many factors operate as potential moderators of the relationship between intervention implementation and intended outcomes, it is essential to determine how to understand moderators, even if to do so does not completely resolve the tension between delity and adaptation.
When using health IT tools as an implementation strategy, as with interventions focused on selfmanagement support or prevention, systems-level technology requirements are often at odds with 'on-the ground' approaches that employ patient-centered adaptations to individual circumstances [2,9] and thus are an important place to explore the adaptation and delity relationship. As well, with a focus on ensuring that health equity is improved not worsened when applying health IT to vulnerable populations [10,11], such as those with limited English pro ciency and poor access to healthcare, attention to delity and adaptation in this context is important.
In the United States, patients with limited health literacy and limited English pro ciency disproportionality experience suboptimal diabetes care and poor health outcomes, including a higher prevalence of type 2 diabetes [12,13]. Language-concordant care, delivered through patient counselling or health coaching, is a critical predictor of improved diabetes self-management outcomes [14,15,16]. Technology-assisted diabetes self-management and prevention programs, including those that provide patient-centered supports, have expanded signi cantly in the last decade with a myriad of approaches including: webbased programs [17,18], SMART phone applications or apps [19,20], telephone-based automated call programs, (often referred to as Automated Telephonic Self-Management Support, or 'ATSM' or Interactive Voice Response) which blend narratives content with queries that require patients to respond via touch tone with the information going to a central location for review [21,22,23,24,25]. Each of these program archetypes provides an e cient platform to deliver multi-lingual content, content tailored to local context, and a range of additional modular features. However, studies have noted that language-concordant availability is often limited [26]. Recent literature has called attention to the importance of evaluating Health IT interventions in ways to ensure the digital divide, is not exacerbated for low-income populations, for those with limited English pro ciency, and ethnic minority groups. In the context of expanding implementation for health IT delivered diabetes care/diabetes prevention support [27,28], it is important to understand how an expanded view of moderators of implementation delity, such as language of delivery, interact with speci ed core components, such as intervention dose, consistency or timing, when considering program sustainability, dissemination, and equity.
In this paper we explore implementation outcomes and moderating factors for a health IT enabled health coaching program focusing on diabetes prevention behaviors called STAR MAMA (Support via Telephone Advice and Resources Sistema Telefonico de Apoyo y Recoursos) [29,30,31]. We build on the value outcome concept described above in relationship to delity, in that the end product maps back to the goals, using an equity lens to interpret the value equation for STAR MAMA evaluation. By focusing attention to equity as well as to patient-centered outcomes, we explore moderators of implementation delity focusing on understanding variability, participant responsiveness, acceptability, quality of delivery, and cultural or language-concordant tailoring (as well as other equity-focused moderators).

Study Summary
STAR MAMA is an ATSM-based program which combines automated 3-5 minute weekly calls including queries and narratives with 'live' follow-up calls from a language-concordant health coach (plus opt-in text messages) to encourage diabetes prevention behaviors among post-partum women with recent gestational diabetes (Fig. 1). We apply an established implementation framework for delity evaluation [6,7,9] and a range of data sources in order to explore the reach and engagement of intervention content to linguistically diverse populations.
The STAR MAMA program pilot tested a 20 week bilingual (English and Spanish) ATSM diabetes prevention program, and evaluated for impact on 9-12 month outcomes, using a type 1 hybrid implementation effectiveness study and a randomized clinical trial design [32]. Women were individually randomized during a baseline visit at the end of their pregnancies to either STAR MAMA calls or to an education only arm. Health outcomes were evaluated for effectiveness at 9-12 months post-partum using structured interviews and medical records review, and included: weight loss (BMI reduction), breast-feeding duration and the percentage of women actively engaged in chronic disease risk reduction behaviors (such as increased physical activity and decreased consumption of sugar sweetened beverages and program acceptability for those in the intervention arm. The trial enrolled 181 post-partum women receiving health care in safety net settings in the San Francisco Bay Area, between 2014 and 2018. Study sites included Zuckerberg San Francisco General Hospital (ZSFGH), SF-Women Infant Child Programs, and Sonoma County-Women Infant Child Programs and a Federally Quali ed Health Center. Women were enrolled during prenatal visits and had a clinician-con rmed gestational diabetes diagnosis at 32 weeks of pregnancy. All study procedures were approved by the University of California, San Francisco Committee on Human Research. Participants were given gift cards valued at $135 total as reimbursement for their participation in baseline and follow-up interviews.
STAR MAMA was developed using a theory-informed approach, applying the Capability Opportunity and Motivation (COM-B) model and related Behavior Change Wheel [33], as well as Social Cognitive Theory alongside a stakeholder engagement process to improve the relevance and reach of the intervention content for the linguistically diverse populations receiving it [29,30]. The bilingual program adapted from the Diabetes Prevention Program (DPP) content [34]. Based on stakeholder input, STAR MAMA included content focusing on health at the individual level (participant), but included content for infant care as well as about socio-ecological drivers affecting health behavior, such as food insecurity and social support/social isolation (Appendix). The content includes a mix of narrative storytelling showcasing supportive messaging about challenges often encountered in post-partum period (e.g. stress, mood, fussy babies), questions that ask about behaviors for the coaches to review responses to (e.g. "Are you having trouble breastfeeding?, press 1 if yes and 0 if no"), and tips, in the form of recipes, text links to videos for exercise and community resources. Topics focused on behaviors related to diabetes prevention (weight loss, healthy eating, physical activity, glucose screening, breast feeding, stress and mental health) and on key areas of infant health in the rst 6 months (vaccination timing, breastfeeding, fussiness, sleep, introduction of food). The intervention was delivered weekly beginning at 6 weeks post-partum at a day and time selected by the participant, and lasting 20 weeks, after which a follow-up interview was completed over the phone or in-person.
The structure of STAR MAMA includes both a "push" of diabetes prevention messages directed at improving adherence to diabetes prevention related behaviors to women, and a "pull" of engaging participants with health coaching call backs, based on participant responses to behavioral questions (e.g. "how many sugar sweetened drinks did you have in the last 7 days? enter the number of drinks") and predetermined trigger thresholds for health coaching call backs (e.g. reporting more than 1 day drinking sugar sweetened beverages, or 'yes' to di culty with breastfeeding). Primary health outcomes from the study will be reported elsewhere regarding weight loss, physical activity, breast feeding and diabetes-relevant behaviors.
Fidelity Analysis-Overview of Fidelity-Related Outcomes The goals of the delity analysis are to determine to what extent the STAR MAMA program was delivered as intended, for core intervention components related to: (1) System Integration: completeness and correct timing of the STAR MAMA delivery system such that women rst received their calls as intended beginning 6 weeks post-partum, at their preferred day and time; (2) Intervention Delivery: correct sequencing of the weekly calls, the "push" of the intervention; and (3) Call Consistency: for calls over the intervention period; and (4) Health Coach Responsiveness: for attempted call backs for call triggers generated by the STAR MAMA system. (Table 1; Fig. 2). All measures were evaluated for variation by language as a potential equity moderator of delity. As we were interested speci cally in potential moderators, we also explored the impact of moderators, such as health coaching call consistency over time, and language, on delity outcomes (Fig. 2).
Acceptability was also included in the delity assessment for women in the STAR MAMA intervention arm as a moderator-the rationale being that participant engagement in the intervention could affect the health coaches responses, and it would be important to understand which program aspects had higher and lower acceptability, and variation by language.
Integration of the participant enrollment registry with the STAR MAMA delivery system was estimated as the percent of enrolled women who were subsequently uploaded to the Health IT intervention delivery platform prior to the target start date of the women's calls, beginning 6 weeks post-partum. This timesensitive activity involved site-level identi cation of the eligible women with GDM (with con rmation at 30 weeks gestation) through review of weekly clinic trackers and WIC databases, contacting women postpartum to determine their preferences for call dates and times, uploading preferences, and activating the STAR MAMA call initiation.
System delivery of the STAR MAMA call content was measured by counting the number of calls with the correct content (vs. "incorrect"), delivered in the correct sequence (vs. "skipped"), and in the correct language for the patient based on weekly system generated reports. We also measured the consistency of call delivery over time, evaluating whether all 20 weeks of STAR MAMA calls were delivered, and whether any errors were consistently appearing over the 20 weeks (such as missed weeks), and variation by language.

Fidelity Analysis-Moderators
We explored potential moderators of the quality of delivery including: health coach consistency of Relevant reporting guidelines based on the StaRI checklist, Standards for Reporting Implementation Studies, were completed for this study. See Additional Files for more information.

Results
Of the 181 women who were recruited 90 were randomized to the STAR MAMA ATSM calls and 91 to the education only arm (see CONSORT diagram

Delivery of the STAR MAMA program: System Integration
There were no errors in the system integration components evaluated with all participants correctly uploaded to the platform, and for the activation of calls to begin at 6 weeks after the con rmed delivery date.

Delivery of the STAR MAMA program: Intervention Delivery Completeness
We separate out program call completion assessments into two categories: system-driven and participantdriven. Of the 81 participants who completed some or all of the 20 weeks, there were a total of 1,620 total calls programmed to be pushed by the ATSM system.  and Spanish speakers (n = 50, 62% of participants) should have received 1,000 calls (see Fig. 5). Of the calls in English, 29% (n = 178) triggered a health coach follow-up, while of the calls in Spanish, 35% (n = 349) triggered a health coach follow-up. There were many more triggers in the rst several weeks of the STAR MAMA intervention than later on in the program, especially among Spanish-speakers. Although Spanish speakers had more triggers than did English-speakers, the difference was not statistically signi cant. Of the calls that triggered a health coach follow-up, a call-back attempt was made for 85.4% (n = 152) of the English call triggers and for 80.0% (n = 279) of the Spanish call triggers. Of those with attempted calls, health coaching calls were complete for 55.6% (n = 85) of English-language call triggers and for 56.6% of Spanish-language call triggers. Again, there were no differences by language in attempted or completed health coaching call-backs. Additionally, attempted call backs were consistent over time and by language of call trigger (Fig. 6).

Acceptability
Overall acceptability was high for STAR MAMA calls (Table 2) and in general did not differ by language, with a few notable exceptions including agreement that the program provided "useful information on diabetes prevention and baby care" (Spanish speakers reporting higher agreement, p < 0.01), and English speakers were less likely to report the health coaches were considerate of their time (p = 0.03). Ninety percent of women interviewed reported they would do the program again. Over half (55.6%) had shared the program ideas with friends and 75.8% had engaged a partner in some of the STAR MAMA content. Two-thirds reported that the number of weeks was ' ne', with a third indicating the program was "too long".

Discussion
In this paper we report that core delity metrics for the STAR MAMA study, the relationship of language and other moderators to delity outcomes assessed and describe several inconsistencies over time at the systems level. Although there were few effects of a language differential in the evaluation, there were some trends in differences by language in systems-level problems as well as in health coaching interactions. We believe that it is critical to determine to what extent efforts to increase diverse populations in health IT interventions are well adapted to the local context and to this end, it is important to evaluate the errors inherent in any automated processes designed to reach a wider range of participants.
Technology can be a great enabler of care delivery, but if left unchecked, can also cause delity failure.
Some authors support the view that To explore this topic we evaluate language as a moderator across a wide range of delity outcomes-for systems delivery and in-person touches. Based on these ndings, we recommend consideration of language equity should be included as a moderator in multi-lingual Health IT interventions, as it concerns whether an intervention is delivered equally across all populations (in our case between Spanish-speaking and English-speaking participants) over time. This work builds on existing implementation research to study how technology is implemented, to explore the impact on multilingual populations [35]. This study also highlights an approach to make more concrete existing delity frameworks with a step-wise approach outlined in the conceptual model. We hope this work can guide exploration of delity for health IT interventions, and in particular, those that include an automated 'push' along with a 'personalized' follow-up by a health coach or other health professional.
For low-income populations such as the women enrolled in this study, the underlying contributions of social determinants and structural barriers (such as limited economic resources, language barriers, or limited healthcare access), may impede engagement with health coaching programs if participants are not able to prioritize addressing their prevention-focused health needs in real time in addition to the other demands they face. It is critical to explore to what extent offering adapted multi-lingual interventions, especially those with health coaching components, are aceeptable and to what extent modi ers may impact core delity measures. Some authors indicate that adaptation is needed in order to achieve users' involvement and ownership for successful implementation [36], which was one of the intervention development steps involved in adapting STAR MAMA and may have contributed to the high levels of engagement and acceptability. That we identi ed greater engagement and acceptability with the non-English speaking group is consistent with other work we and others have done regarding language and health coaching engagement. for each group [37,38,39].
There are several limitations to this study. As mentioned above, information on high and low adopters, by language would have provided critical insights into necessary modi cations. We did nd high acceptability across language groups but drivers of dissatisfaction, are less speci ed in the quantitative descriptive analysis. Additionally, conducting modeling to explore the relationship between delity and health outcomes was out of scope for this study, since it was a pilot, with a relatively small intervention arm sample size. Also, it is possible that there are complex relationships between moderators, as suggested by Carroll 2007 [2]. Similar to how more facilitation strategies does not necessarily mean better implementation (because of the level of complexity), more "equitable" delivery does not necessarily mean better implementation. Understanding each population's variability, through exploration of high and low adopters for example, with in-depth interviews, can move towards an assessment of social determinants, and suggest recommendations for intervention adjustments that do not violate core components, but can address the context of the needs of each particular group. Availability of data and materials: The datasets that support the ndings of this study are not publicly available due to information that could compromise research participant consent and privacy but can be made available from the corresponding author (MH) with appropriate precautions and upon reasonable request.