Use of Mobile Technology to Identify Behavioral Mechanisms Linked to Mental Health Outcomes In Kenya: Protocol for Development and Validation of a Predictive Model

Objective: This study proposes to identify and validate weighted sensor stream signatures that predict near-term risk of a major depressive episode and future mood among healthcare workers in Kenya. Approach: The study will deploy a mobile app platform and use novel data science analytic approaches (Artificial Intelligence and Machine Learning) to identifying predictors of mental health disorders among 500 randomly sampled healthcare workers from five healthcare facilities in Nairobi, Kenya. Expectation: This study will lay the basis for creating agile and scalable systems for rapid diagnostics that could inform precise interventions for mitigating depression and ensure a healthy, resilient healthcare workforce to develop sustainable economic growth in Kenya, East Africa, and ultimately neighboring countries in sub-Saharan Africa. This protocol paper provides an opportunity to share the planned study implementation methods and approaches. Conclusion: A mobile technology platform that is scalable and can be used to understand and improve mental health outcomes is of critical importance.


Introduction
According to the World Health Organization, mental disorders account for the largest burden of illness among all types of health problems [1].Major depressive disorder is the second leading cause of disability, affecting more than 350 million people worldwide [2].In low-and-middle-income countries (LMICs), resources are particularly limited for surveillance, diagnosis, and treatment.In Kenya, a report prepared by a national mental health taskforce [3] highlighted the fact that the severe state of neglect of mental health services, infrastructure and research has led to a "national crisis" and implored the mobilization of immediate and effective solutions to address mental health issues.The Taskforce highlighted that research institutions and universities should: 1) collect relevant data to better de ne the mental illness gaps (in detection and interventions), burden and determinants; 2) focus particular attention on escalating interventions to reduce suicidal acts and clinical depression.We propose a novel mobile infrastructure in Kenya that has previously been deployed in the United States of America (USA), to test its validity and predictive capability in a different cultural setting [4,5].
The proposed study builds up on the Intern Health Study which is a prospective longitudinal cohort study of stress and depression among training physicians in the USA and China [6,7].This model allows for the same individuals to be followed, rst under normal conditions and then under high-stress conditions [8].
The Intern Health Study employs a unique mobile app platform, the Intern App, speci cally designed for healthcare workers to collect and integrate active and passive data on mood, sleep, and activity.Intern Health Study has demonstrated that mobile monitoring can facilitate the prospective, real-time monitoring of continuous, passive measures and effectively predict short-term risk for depressive episodes in a large group of individuals (See Table 1 and Fig. 1).Mobile technology coupled with predictive models can help in triggering early warning systems e.g., signs of depression [8].We would like to adapt the App and make it contextually relevant and deploy it among healthcare workers within Kenyan urban settings.
The objective of the proposed study is to demonstrate the feasibility of a novel methodology in developing predictive models for mood and depression among Kenyan healthcare workers.The speci c objectives of the research are: i) To adapt, validate and re ne previously developed Arti cial Intelligence/Machine Learning (AI/ML) based prediction models of depression and mood among Kenyan healthcare workers ii) To collect and evaluate individual active and passive data that may predict depression and mood among Kenyan healthcare workers

Design
This will be a longitudinal cohort study.

Study Settings
The setting will be in ve urban healthcare facilities in the Kenyan capital city, Nairobi, where healthcare workers are directly involved with the patient's care.Facility 1: (Aga Khan University Hospital Nairobi) This is a private facility with a bed capacity of 254 with 14 different departments.
Facility 2: (Pumwani Hospital) This is a public maternity facility with 396 healthcare staff spread across 13 different cadres (physicians, nurses, nutritionists, and medical students) and 25 different departments [9].Facility 3: (Mama Lucy Hospital) This is a public facility with 700 healthcare workers who attend to more than 800 outpatients daily and has an ever-growing bed capacity.Facility 4: (Kenyatta National Hospital)This is public facility which has a bed capacity of 1800 and a staff population of 6000 [10].These staff give medical care to patients in 50 inpatient wards, 24 theaters, 22 outpatient clinics and 1 accident and emergency center.Facility 5: (Kenyatta University Teaching Referral and Research Hospital) This is a public facility which has an over 650 bed capacity and there are around 665 hospital staff [11].

Sample size determination
The Cochran formula to estimate a representative sample for proportions is utilized [12].Using a 95% con dence level, proportion = 0.5 (maximum variability) of +/-5%, the minimum sample size required will be 462 participants after an attrition rate of 20%.n= (Z^2 p q)/e^2 where n is the sample size, Z is the statistic corresponding to 95% level of con dence, P is expected proportion, and d is precision (corresponding to effect size).Thus, we will recruit 500 participants, which will translate to 100 participants from each of the ve healthcare facilities.

Instruments description
A socio-demographic questionnaire in the App incorporates elements such as age, gender and education background.Each participant will complete the following standardized instruments: 1) Neuroticism -NEO-Five Factor Inventory [13] is a 60-item measurement that is designed to assess personality in the domains of neuroticism, extraversion, openness, consciousness, and agreeableness.Each item is scored with a 5-point Likert scale ranging from "strongly disagree, disagree, neutral, agree and strongly agree.Internal consistency reliability [13] of the tool has been found to be good α =.86.
PANSI-NSI and PANSI-PI examine the frequency of speci c negative thoughts (e.g., failure to accomplish something important) or positive thoughts (e.g., excited about doing well at school or work) related to suicidal behavior [14].Participants will use a Likert scale ranging from 1 (i.e., "none of the time") to 5 (i.e., "most of the time") to assess the frequency they experience suicidal ideation.Higher scores indicate more positive or negative suicide ideation, depending on the item's particular subscale.
6) Generalized Anxiety Disorder -7 (GAD-7) [21] will measure anxiety based on seven items which are scored from zero to three.The scale score can range from 0 to 21 and cut-off scores for mild, moderate and severe anxiety symptoms are 5, 10 and 15 respectively [21].GAD-7 has demonstrated good internal consistency and convergent validity in heterogeneous samples [22] and has been used within Kenyan settings [23,24].

7) MyDataHelps App -Online Survey
MyDataHelps is an application (available on iOS, Android, and web) developed by CareEvolution which provides step-by-step instructions for collecting data from participants via their smartphones [25].The app acts as a tool for consent, data collection, delivery of study questionnaires, automated noti cations, and reminders.

8) Mobile Monitoring
In addition to completing surveys, the participants will also be asked to wear Fitbit Inspire 2™(www.tbit.com) which is a wearable tness tracker wristband to track daily activity levels, daily mood rating and sleep.The device automatically connects via Bluetooth and transfers data to a mobile platform via a dedicated App.Fitbit Inspire 2™ allows tracking of sleep stages (minutes spent awake, in "light", "deep", and "REM" [Rapid Eye Movement] sleep) in addition to sleep/wake states.During set-up, the participants will be prompted to allow MyDataHelps to send them noti cations.A noti cation will be automatically generated on their phone daily and will appear as follows: "On a scale of 1 to 10 what was your average mood today?"They will then respond in the App.We will ask them to complete this daily rating for a period of one year.After completing the consenting process and initial survey via the study App, participants will be given Fitbit Inspire 2. They will be asked to sync the tness tracker with their phone and wear it daily through to the end of their study year for the purpose of collecting objective data on their daily activity.On the App activities page, participants will be able to track their recent activity data (e.g., how many days they entered mood over the past week, last tness tracker sync) (see "MyDataHelps App -Dashboard-Activities Screen").Healthcare workers will also be able to view graphs of their mood, sleep, and step data at any time on the App dashboard.(See Figure 1)

Sampling Procedure
The healthcare workers will be sampled using random sampling technique on the basis that they meet the following inclusion criteria: voluntary consent to participate in the study; enrolled at the sampled facility as a healthcare worker (including residents) and own a smart phone.Exclusion criteria will be: decline consent to participate in study; has a non-smartphone and HCW's on locum.This sampling technique is preferred due to the large number of healthcare workers who rotate in these sites.Potential participants will be randomly selected from a list with staff names and contacts.They will then be contacted via email or smartphone.

Data collection procedure
We will administer surveys and collect active and passive mobile data on mood and affect over a 12-month period in a Kenyan cohort of 500 healthcare workers, recruited from ve health facilities.The frequency of the data collection will vary i.e., daily, monthly and quarterly.The initial data collection day will be the same as the consenting day.Right after the participants give consent through the App (MyDataHelps), a unique identi er will be generated.All consented healthcare workers will then be encouraged to complete the baseline survey.In addition to demographic and baseline psychological information (Neuroticism using the NEO-FFI and Early Family Environment using the Risk Family Questionnaire), the baseline survey will assess anxiety symptoms using GAD-7, suicidal symptoms using PANSI and depressive symptoms using PHQ-9 [14][15][16][17][18]. Participants will also be asked for permission for us to collect active and passive data through their smartphone.On submission of the survey, the healthcare workers will receive a message directing them where they will pick Fitbit Inspire 2.
During the 12-month data collection period, participants will respond daily to a validated one-question measure of mood valence via the mobile App (scale of 1 to 10).At 3-month intervals (quarterly) they will also be assessed via the App using a shorter questionnaire designed to assess: 1) current depressive symptoms measured by the PHQ-9, 2) current suicidal symptoms measured by the PANSI, 3) current anxiety symptoms measured by the GAD-7, 4) non-work life stress, 5) COVID-19 experiences, 6) work hours, and 7) perceived medical errors (Table 1).If participants do not respond to the surveys and have not indicated that they do not wish to participate in the study, a reminder will be sent after 3 days of no response (See gure 2).The healthcare workers will be compensated (Kenya Shillings 500.00 or USD $4.06 equivalent) for each quarterly survey.Compensation has been shown to reduce attrition rate [26].

Data analysis and presentation
Data will be collected, stored in the cloud database, and checked for completeness daily.Whenever we come across missing data, a reminder will be sent to the respondents after three consecutive days of missed entries, urging them to complete the data entry process.When they opt out of the study, reminders will no longer be sent.Once the study is over, the data will be stored in accordance with guidelines from Kenya's Data Protection Act 2019 [27], National Institute of Mental Health and Aga Khan University Hospital.For aim 2, we will adapt and test models using mobile data in the domains that are most promising as predictors (sleep behaviors, mood, anxiety) of depression and future mood from the Intern Health Study.This will be used to determine if the predictors identi ed in the Intern Health Study dataset suggest similar correlates in our Kenyan sample, which would be essential in adapting, developing, and validating individualized risk prediction models for depression and mood.We will identify data driven behavioral phenotypes, derived from mobile data elements, which predict short-term risk for mood changes and depressive episodes.
Modeling approach.The framework for this model was built in 1999 [28, 29], and has been constantly updated.
Circadian phase, sleep drive and performance.We will use our modeling framework to estimate circadian phase and sleep drive in individual subjects.We will compare the Fitbit estimates of individuals both asleep and awake with our predicted sleep drive with the hypothesis typically used in the eld.The predicted relationship between these variables by our mathematical models will be compared with these datasets, aggregated over the population to re ne parameters for the mathematical model, which will represent the behavior of a typical individual.These estimates of circadian and sleep drive have the potential to inform shift work.
Behavioral phenotyping for Mood Prediction.We will make inferences about which behavioral phenotypes are most predictive and evaluate their time-delayed impact upon mood at different time lags.Granger causality [30] is a quantitative framework for assessing the relationship between time series, and has been widely used in econometrics, neuroscience, and genomics to study the temporal causal relations among multiple economic events [30,31] directional interactions of neurobiological signals [32][33][34] and gene time-course expression levels [35][36][37], respectively.Given two time series and , the temporal structure of the data is used to assess whether the past values of , are predictive of future values of , beyond what the past of can predict alone; if so, is said to Granger cause .Consider the two regression models: where A is a vector with j-th element is the lag-j effect upon and B is a vector with j-th element representing the effect of j-th lag of X upon Y. Then X is said to be Granger-causal for Y only if model (a) results in signi cant improvement over model (b).We will use the multivariate extension of the notion of Granger causality between two variables to p variables, referred to as graphical Granger models (GGM) [38].GGM uncovers a sparse set of Granger causal relationships amongst the individual univariate time series, for example, to establish the temporal relationship between the daily sleep hours and mood scores.In the GGM, on a given day, a behavioral phenotype is said to be Granger causal for another if the corresponding coe cient for this phenotype and day is signi cant.
Feature and temporal order selection.We will estimate the strength of association among the mobile phenotypes and their respective associations with the daily mood scores.For example, we will construct the sequence of sleep hours averaged over the past 1, 2 or 7 days and study their respective association with self-reported mood scores.These estimates will directly address two questions: 1) "Which phenotypes and their temporal features predict the daily mood scores?", and 2) "If a phenotype is predictive of daily mood scores, at what time lags do we observe the strongest predictive strength?"We address these questions via Lasso-type estimates in the context of Granger causality to estimate the effects of variables on each other in a sparse graphical model framework [36].Important questions including the direction (positive/negative) and the strength of the effect of, say sleep hours on mood scores, can then be answered.Such methods have been known to reduce false positive and false negative rates [36] in inferring the relations among mobile phenotypes and between the mobile phenotypes and the daily mood scores.Nonlinearity extensions.We will also likely observe nonlinear relationships between the mobile phenotypes and the daily mood scores.We will use kernelbased regression methods to explicitly relax the linear dependence of the outcome upon predictors [39].
Latent variable model for integrating mood scores and PHQ-9 responses.In addition to the daily mood scores, we will also use PHQ-9 scores to assess the study participants' depressive symptoms.We will use dynamic latent variable models [40] to address the different frequencies at which daily mood scores and PHQ-9 data are collected.We will introduce a daily latent variable that represents the true mood for every subject.The true daily mood will then be informed by both the self-reported mood score and the proximal PHQ-9 survey responses.This approach has been applied to the study of complex disease etiology where multiple measurements with different errors and frequencies are integrated to provide extra statistical precision [41,42].The integrated analysis has the advantage of increasing the power of identifying important mobile phenotypes that are predictive of the true mood represented by the latent variable.

Study Limitations
Selection and Missing Data Bias.As with in any longitudinal study, missing data may be a problem.We will perform a secondary analysis after accounting for missing data to ensure the absence of missing data biases and to maximize power [43].
Self-Reported Depressive Symptoms.We cannot diagnose major depressive disorders (MDD) using self-reported assessments.However, our self-reporting assessment has several advantages.For instance, the PHQ-9 instrument that we will utilize to measure depressive symptoms has a sensitivity of 88% and a speci city of 88% for a MDD diagnosis with a cut-off of 10 points or higher [44,45].Further, there is evidence that in the population of young healthcare workers, maintaining anonymity is of paramount importance to the accurate assessment of sensitive personal information such as depressive symptoms, suggesting that self-reporting may be the most accurate ascertainment method in this population [46].
Accuracy of Mobile Data.The smartphone and wearable devices utilized in this study rely on emerging technology and do not measure parameters of sleep and activity as accurately as gold-standard laboratory tests [47,48].While these mobile tools can be used at a scale and in settings that traditional assessment cannot, preliminary evidence suggests that mobile tools produce meaningful and important data [49], we will rigorously monitor their accuracy and upgrade to better technology as it becomes available.

Figure 1 App
Figure 1 App Dashboard

Table 1 :
Summary of longitudinal data to be collected in the study