mHealth app characteristics by financing stage
The financing stage of the app companies at study publication date (Fig. 2) included 16% early stage startups (Pre-seed, Seed and Series A; 6/38), 29% scale-ups (Series B-F; 11/38), 39% acquired or public (15/38), and 16% of apps developed by universities or government groups (6/38). Companies where financing data was not found (9/47) were excluded from this categorisation.
We found a significant association (p < 0.001) between the financing stage and the team conducting the study (Fisher’s exact test; Supplementary Table 5), with early-stage startup (n = 6) and scale-up (n = 11) research both being conducted by external groups (no declared conflict of interest; n = 3 and n = 5, respectively) and mixed teams (internal employees and external collaborators; n = 3 and n = 6, respectively); acquired or public companies (n = 15) being exclusively researched by external groups; and university or government-developed app research (n = 6) being conducted mainly by internal teams (n = 3), with some efforts led by external groups (n = 2) and mixed teams (n = 1).
No association was found between study design and financing stage using FIsher’s exact test (p = 0.55; Supplementary Table 6). However, early stage startups were more apt to use pilot (n = 2) or full-scale RCTs (n = 2); scale-ups showed a higher inclination towards pre-post studies (n = 5) and full-scale RCTs (n = 4); acquired or public companies employed more diverse study design choices, including pilot (n = 6) and full-scale (n = 3) RCTs, pre-post studies (n = 4), and alternative designs (including a micro-randomised trial and a non-randomised open label controlled trial; n = 2); and university or government-developed app research teams tended to use pre-post studies (n = 3), full-scale RCTs (n = 2), and a 2x2 randomised, mixed factorial design (n = 1).
mHealth app categories
Most apps (94%, 44/47) fell under the category of wellness management. Among the wellness management apps, 32% (14/44) targeted three or more health outcome sub-categories. The most prevalent subcategories included diet and nutrition (55%, 24/44), exercise and fitness (55%, 24/44), mental health (52%, 23/44), and sleep (41%, 18/44; Fig. 3 and Supplementary Table 7). Of the few health condition management apps (6%, 3/47), all were therapeutic tools designed for self-monitoring specific health conditions (including diabetes, hypertension, and allergic rhinitis; Fig. 3).
Nearly one-third of the wellness management apps (32%, 14/44) were associated with companion devices offered by the same or alternative companies. Of these, eleven apps were integrated with smartwatches, two apps were linked to smart scales, and one app was linked to a monitoring system for urine excretion. Only smart scale devices have received FDA 510k clearance. Two of the three health condition management apps were accompanied by companion devices with FDA 510k clearance. One of these devices is a capillary glucose reader catering to individuals with diabetes [65] and the other is a blood pressure monitor targeting individuals with hypertension [66].
Numerous wellness management studies focused on distinct populations to explore various health outcomes. Twenty-seven percent focused on overweight and obese individuals to evaluate dietary and physical activity improvements (12/44), 16% involved cancer patients, examining enhancements in diet, exercise, mental health, and sleep (7/44), and 11% targeted women with conditions such as breast cancer and postpartum depression (5/44). However, the majority of wellness management apps did not have a target condition or disease in mind and were interested in improving wellness health outcomes in the general population (36%, 16/44).
Study characteristics of the included studies
A variety of research designs were employed in the evaluation of all included mHealth apps (Table. 1 and Fig. 3), with the majority (64%, 30/47) being RCTs. Among these, 56% were full-scale RCTs (17/30), characterised by medium-sized sample groups (median 107, range 28-1573), moderate intervention durations (median 2.5 months, range 0.3–24.0 months), and relatively high retention rates (mean 79.6%, SD 18.5). Pilot RCTs (37%, 11/30) had smaller samples (median 54, range 25–142), longer intervention durations (median 4.5 months, range 1.4–12.0 months), and higher retention rates (mean 86.3%, SD 9.7). Full-scale and pilot RCTs employed many control methods, including standard care, waitlist (delayed access to treatment), partial access to treatment, alternative treatments, or no treatment. Novel RCT approaches constituted a minor portion (7%, 2/30). A micro-randomized trial featured a large sample size of 1565 participants over a six-month study period, using a partial treatment control group, though retention rate was not reported. The mixed factorial (2x2) study involved a smaller sample of 52 participants for a one-week study period, using an alternative treatment control method, and achieving a 100% retention rate.
Pre-post studies accounted for 32% (15/47), split between non-pilot (40%, 6/15) and pilot (60%, 9/15) studies. The non-pilot pre-post studies featured larger sample sizes (median 129, range 61–416) and longer study durations (median 2.76 months, range 0.69-12.0 months), but had lower retention rates (mean 68.3%, SD 22.3). In comparison, the pilot pre-post studies had smaller sample sizes (median 27, range 8–90) and shorter durations (median 1.8 months, range 1.0-2.8 months), and exhibited higher retention rates (mean 84.6%, SD 16.0). The majority of pre-post studies used a before/after single group design (87%, 13/15), and only two used a non-randomised comparative design (with intervention and control groups).
Finally, of the non-randomised open label trials (4%, 2/47) the sample sizes were 19 and 75, study intervention lengths were 1.8 and 2.76 months, and the retention rates were 45.0% and 87.0%.
Table 1
Frequency table of study designs and associated study characteristics.
Study Designs (n = 47) | n (%) | Median (range) sample sizes | Median (range) study intervention length in months | Mean (SD) retention rate in % |
RCT (n = 30) | Full-scale (including 2- and 3-arms) | 17 (36.2%) | 107 (28-1573) | 2.5 (0.3–24.0) | 79.6% (18.5) |
Pilot RCT | 11 (23.4%) | 54 (25–142) | 4.5 (1.4–12.0) | 86.3% (9.7) |
Micro- randomised trial | 1 (2.1%) | 1565* | 6* | Not reported |
Mixed factorial design (2x2) | 1 (2.1%) | 52* | 0.23* | 100%* |
Pre-post studies (n = 15) | Pre-post | 6 (12.8%) | 129 (61–416) | 2.76 (0.69–12.0) | 68.3% (22.3) |
Pilot pre-post | 9 (19.1%) | 27 (8–90) | 1.8 (1.0-2.8) | 84.6% (16.0) |
Non-randomised open label trials | 2 (4.2%) | 19 and 75* | 1.8 and 2.76* | 45% and 87%* |
*For samples less than 2, true values are reported due to inapplicable median (range).
Participant demographics
The studies were conducted in 15 countries (Fig. 3, Supplementary Table 8). The majority (62%, 29/47) of the studies were conducted in the USA. Other countries were represented by one or two studies and no global or multi-country studies were found.
The majority of the studies (73%, 34/47) targeted adults aged 18 years and older, 10% focused on children under 18 years of age (5/47), and the remaining studies (17%; 8/47) focused on adults aged 40 years and older (Fig. 3). Eight studies were gender/sex-specific, with five of them exclusively researching female participants in the context of breast cancer, pre- and post-partum depression, and premenstrual syndrome. Conversely, three studies solely included male participants, focusing on esophageal cancer and obesity. The remaining studies exhibited a wide range in the proportion of female and male participants at baseline, varying from 21–95% and 5–78%, respectively (Fig. 4). Overall, 75% (36/47) of studies included a majority of female participants. Notably, only one study reported inclusion of individuals outside of a sex or gender binary [73].
Ethnicity was reported by 58% (27/47) of included studies (Figs. 3 and 5). Sixty-seven percent of these studies reported a majority of White/Caucasian participants (18/27). Two studies conducted in the USA targeted Hispanic/Latino adults [35, 71], one study conducted in USA researched an underserved community with 95% Black/African participants [66], and one study conducted in Singapore reported all Asian/Asian participants (92% Chinese, 0.6% Malay, 4.5% Indian, 2.9% Other) [72]. Excluding the studies that targeted specific ethnicities, the median (range) representation of all ethnic groups among included studies were: 62% (4%-98%) White/Caucasian, 7% (0%-50%) Black/African descent, 0.4% (0%-17%) Asian/Asian descent, 7% (0%-48%) Hispanic/Latinx, 9% (0%-60%) Biracial/Multiracial, and 0.3% (0%-14%) Indigenous Groups (Fig. 5).
Measurement tools for evaluating mHealth apps
Various measurement tools were used to assess the effectiveness of health outcomes (Supplementary Table 9). Five commonly employed measurement tools were identified: the Short Form Health Survey (SF-12 or SF-35) [77] for measuring health-related quality of life (7/47), the Patient-Reported Outcomes Measurement Information System (PROMIS) [78] for evaluating physical, mental and social health (6/47), the Perceived Stress Scale (PSS) [79] for measuring individual stress levels (6/47), the Five Facet Mindfulness Questionnaire (FFMQ-SF) [80] for assessing the five vital elements of mindfulness (4/47), and the Hospital Anxiety and Depression Scale (HADS) [81] for measuring anxiety and depression among patients in hospital settings (4/47). These measurement tools were employed to evaluate wellness apps and health promotion apps.