3.3 Functions of the mHealth Application
The function of mHealth apps in the selected studies can be divided into four categories: wellness management (n = 40), disease management (n = 33), health-care services (n = 19), and social contact (n = 4). Nearly half of the mHealth apps were used to improve the general wellness of older adults rather than focusing on specific diseases, which contained a variety of solutions, including fall prevention [40, 41], fitness [42, 43], lifestyle modification [44, 45], medication adherence [46, 47], health monitoring [39, 48], nutrition [49, 50], and cognitive stimulation [51, 52]. Approximately one-third of the mHealth apps focused on the management of chronic diseases in the elderly such as diabetes [53, 54], mental illness [55, 56], and cardiovascular disease [57, 58]. In order to achieve the continuous care, mHealth apps have been a useful tool for delivering health-care services and empowering users to manage their health during care transitions [59], including postoperative care [60], stroke rehabilitation therapy [61], and medical advisory services [62, 63]. Four mHealth apps aimed to reduce social isolation and loneliness in older adults by encouraging social participation and strengthening ties with family members [64–67]. In addition, the target users of the mHealth apps in 78 (81.4%) articles were the elderly (aged 50/55/60/65 years or older), while others were aimed mostly at people with chronic diseases and were tested to see whether these apps suitable for use by older people. The complete range of functions, health conditions and target users can be found in Multimedia Appendix 2.
3.5 Critical Measures of Usability Evaluation for mHealth Applications
Following the usability definitions of ISO 9241-11, ISO 25010, and Nielsen, nine critical measures of usability evaluation were extracted from the selected articles: effectiveness, efficiency, satisfaction, learnability, memorability, errors, attractiveness, operability, and understandability [15, 68, 69]. It is worth noting that effectiveness, efficiency, and satisfaction focus on the impact on users when they interact with the system, while the others concern the characteristics of the system and whether they can compensate for the decline of intrinsic capacity in the elderly. As shown in Table 3, the two most frequently evaluated measures are satisfaction and learnability, consistent with the dimensions of the Systems Usability Scale (SUS) [70], which was applied in 40 papers. The aspects of usability that were considered least often in the articles reviewed were errors and memorability. The assessment ratios of some critical measures were significantly related to the different stages of evaluation, as more of the articles focused on satisfaction and learnability in the third stage than in the first and second stages, and the articles on the third stage paid less attention to operability and understandability than those on the first two stages.
Table 3
The critical measures of usability evaluation for the mHealth application
Critical measures n (%)
|
Definition
|
Three stages of evaluation n (%) a
|
1 (n = 61)
|
2 (n = 30)
|
3 (n = 19)
|
P value b
|
Satisfaction
74 (77.1)
|
The extent to which the user’s physical, cognitive and emotional responses that result from the use of an app meet the user’s needs and expectations and can be expressed as interest in the app, willingness to continue using it, and initiative to share it.
|
43 (70.5)
|
25 (83.3)
|
19 (100)
|
.018
|
Learnability
60 (62.5)
|
The app should be easy to learn by the class of users, which can be reflected in the introduction/instruction documents helping users to reach a reasonable level of usage proficiency within a short time.
|
34 (55.7)
|
22 (73.3)
|
16 (84.2)
|
.04
|
Operability
50 (52.1)
|
The app should be easy to operate and control, which can be expressed as navigable and manipulable on the touchscreen to address the decline of cognitive ability, dexterity and muscle control in the elderly.
|
36 (59.0)
|
19 (63.3)
|
5 (26.3)
|
.02
|
Understandability 40 (41.7)
|
The interaction information of the app should be easy to understand, which can be embodied in the clarity of the provided explanations and the graphical interface to compensate for the cognitive decline of the elderly.
|
33 (54.1)
|
9 (30.0)
|
4 (21.1)
|
.01
|
Attractiveness
38 (39.6)
|
The interface of the app should enable pleasing and satisfying interaction for the user, for example, in terms of color use and graphic design, to meet the aesthetic needs of the elderly and accommodate their age-related perceptual resources.
|
29 (47.4)
|
12 (40.0)
|
4 (21.1)
|
.12
|
Efficiency
33 (34.4)
|
The extent to which external resources, including time, human effort, money, and materials, are consumed when achieving goals by using the app.
|
24 (39.3)
|
9 (30.0)
|
6 (31.6)
|
.63
|
Effectiveness
26 (27.1)
|
The extent to which actual outcomes match intended outcomes and can be measured by the accuracy and completeness with which users achieve specified goals using the app.
|
21 (34.4)
|
8 (26.7)
|
4 (21.1)
|
.48
|
Errors
22 (22.9)
|
The app should have a low error rate and protect users against making errors, for example, providing error messages or help documentation to tell users how to fix problems.
|
18 (29.5)
|
4 (13.3)
|
4 (21.1)
|
.22
|
Memorability
13 (13.5)
|
The operational flow of the app should be easy to remember, which can be embodied by reducing the demand on working memory through supporting recognition rather than recall.
|
11 (18.0)
|
4 (13.3)
|
3 (15.8)
|
.94
|
a First stage: combining components, second stage: integrating the system into the setting, third stage: routine use. |
b Chi-Squared Test were conducted to reflect the statistical significance of the intergroup difference. |
3.6 Empirical Methods of Usability Evaluation for mHealth Applications
Usability evaluation approaches can be classified in two categories: usability inspection and usability testing. Usability inspection is a general name for a set of methods that are all based on having experienced practitioners inspect the system using the predetermined principles with the aim of identifying usability problems [71]. In contrast, usability testing involves observing and recording the objective performance and subjective opinions of the target users when interacting with the product in order to diagnose usability issues or establish benchmarks [72].
Usability inspection methods
Fifteen articles used usability inspection methods to assess mHealth applications, which included two approaches: heuristic evaluation (n = 14) and cognitive walkthrough (n = 2).
The heuristic evaluation method requires one or more reviewers to compare the app to a list of principles that must be taken into account when designing and identify where the app does not follow those principles [73]. In the 14 heuristic evaluation articles, the evaluators usually had different research backgrounds, such as human-computer interaction, gerontology, and specific disease areas, so that a multidisciplinary perspective could be obtained [53, 64]. The number of evaluators was in the range of 2 to 8, which generally referred to the suggestion by Nielsen that ‘three to five evaluators can identify 85% of the usability problems’ [68]. The heuristics could be divided into two types: generic and specific. Six studies used Nielsen’s ten principles, which are the most utilized generic heuristics [32, 39, 68]. However, the traditional generic heuristics were not created for small touchscreen devices, which were the main type of app carrier, and did not consider design features that were appropriate for older adults to address their age-related functional decline in terms of perception, cognition, and movement [73]. To ensure that usability issues in these specific domains were not overlooked, the remaining eight studies extended the generic heuristics by adding usability requirements specific to the elderly, such as dexterity, navigation, and visual design, and finally established new heuristic checklists to evaluate the apps targeting older adults [53, 64]. Nevertheless, there was a lack of reliability analysis and expert validation for these tools except for a checklist developed by Silva [74].
Cognitive walkthrough involves one or more evaluators working through a series of tasks using the apps and describing their thought process while doing so as if they are a first-time user [75]. The focus of this method is on understanding the app’s learnability for new users [30]. The evaluators in these two studies were usability practitioners and health-care professionals [76, 77]. Before the assessment, the researchers prepared the users’ personals and the task lists [77]. During the walkthrough, the evaluators were encouraged to think aloud, and their performance was recorded by usability metrics, such as task duration and completion rate [76].
Usability testing methods
Almost 93% (89/96) of the studies used usability testing to evaluate mobile applications. Test participants were the target users of the apps, and they were all elderly. Some studies (n = 52) investigated the experiences of evaluators with mobile devices or their level of eHealth literacy to obtain the testing results for experts, intermediates, and novices [40, 46, 57]. The number of participants varied according to the stage and purpose of the evaluation. The average sample sizes of the first two stages were 22.8 (ranging from 2 to 189) and 15.2 (ranging from 3 to 50), respectively, with the purpose of identifying usability problems in the laboratory or real-life environment. Most of the above studies referred to Nielsen’s recommendations, which can come close to the maximum benefit-cost ratio, that is, testing three to five subjects, modifying the application, and then retesting three to five new subjects iteratively until no new major problems are identified [78]. Some studies determined the sample sizes according to the type of study design, including RCT and qualitative research [56, 79, 80]. In stage three, usability testing was usually part of a feasibility or pilot study, and the sample size was therefore based on these design types, with an average of 60.1 (ranging from 8 to 450) [58, 81, 82].
During usability testing, the objective performance and subjective opinions of the participants were collected with the corresponding data collection methods. Thirty-four studies presented objective performance data that came from observation of operational behavior, body movements and facial expressions and could be collected by performance metrics, behavioral observation logs, screen recordings, and eye tracking [46, 55, 76, 83]. Eighty-five studies gathered the subjective opinions of the participants, which involved the users’ experience with the app and their design preferences for each part of the interface and could be investigated by means of concurrent thinking aloud, retrospective thinking aloud, questionnaires, interviews, and feedback logs [36, 40, 51, 57, 84]. The details and descriptive statistics of each data collection method are presented in Table 4.
The most frequently used collection method was questionnaires (n = 68). Of the studies, 51 used well-validated usability questionnaires, which were flexible enough to assess a wide range of technology interfaces. Frequently used usability questionnaires were the SUS (n = 40), the NPS (n = 4) and the NASA-TLX (n = 3). However, considering the lack of specificity of the standardized tool, self-designed questionnaires that lacked a reliable psychometric analysis were used in 24 studies to assess the unique features of the apps, including navigation, interface layout, and font size [44, 56, 85]. A combination of these two types of questionnaires was employed in 8 studies [56, 64, 86].
Table 4
Data collection methods for usability testing
Data collection method
n (%)
|
Description
|
Metrics/Tools
|
Comments
|
Performance metrics
25 (26.1))
|
Collecting quantifiable measurements of participants’ actions during the test to understand the impacts of usability issues, usually focusing on effectiveness and efficiency.
|
Effectiveness: number of errors, number of tasks that can be completed successfully; efficiency: task duration, number of times asking for assistance or hints, time spent recovering from errors.
|
These quantitative indicators can be compared in young adults and seniors to reflect differences in performance. [79, 83]
|
Behavior observation log
14 (14.6)
|
Observing and recording the participant’s mood and body gestures during the test.
|
Sometimes, the observation is structured and based on predefined classifications of user behavior, such as delay or pause of > 5 s in locating the answer button. [42]
|
This method is often used in conjunction with thinking aloud and performance metrics to improve triangulation. [32]
|
Screen recording
3 (3.1)
|
Capturing the touches and actions performed on the mobile device.
|
Screen recording software and video coding software (Behavioral Observation Research Interactive Software)
|
—
|
Eye tracking
1 (1.1)
|
Monitoring and recording the visual activity of the participants by tracing pupil movement within the eye.
|
Fixations: the number of views of the area of interest; Saccade: the number of repeated visits to the specific area.[52]
|
Because of the drooping eyelids of the elderly, the eye tracker may not scan their pupil accurately.
|
Concurrent thinking aloud
25 (26.1)
|
Encouraging the participants to continuously verbalize their ideas, beliefs, expectations, doubts, and discoveries while performing tasks in order to understand their thoughts as they interact with the app.
|
—
|
This method relies heavily on the cognitive capacities of participants, whereas these capacities decline with age; thus, it may cause reporter bias. [85]
|
Retrospective thinking aloud
1 (1.1)
|
Asking the participants to view the recording of their actions and verbalize their thoughts about the tasks and the difficulties they encountered in completing the tasks.
|
—
|
1. This method will increase the overall length of the evaluation and may cause the elderly to lose focus. [87]
2. This method will not increase the cognitive load of the elderly compared with concurrent thinking aloud. [85]
|
Questionnaire
68 (70.8)
|
Gathering the participants’ opinions about, preferences for and satisfaction with the user interface on a predefined scale after they completed the tasks.
|
Validated questionnairesa: SUS, USE, UEQ, ASQ, NASA-TLX, NPS, Health-ITUES, QUIS, PSSUQ, ICF-US, MARS, Ruland’s eight-item adaptation of Davis' ease-of-use survey,
self-made questionnaires according to the unique features of a specific app.
|
1. A larger sample size can be investigated by this method. [81]
2. Some items have to be answered by an expert rather than the elderly because they are either beyond the scope of the test or based on experiencing rare occurrences. [87]
3. To prevent the response burden of the elderly and improve the understandability of the questionnaire, some items are removed or the language is modified. [42, 82, 84]
|
Interview
36 (37.5)
|
Collecting the data in the form of face-to-face oral conversations with the participants, including individual interviews and focus group interviews.
|
The interview outline: opinions on unique features, product satisfaction, and difficulties encountered during the test as well as suggestions for improvement.
|
1. This method can obtain more new insights from the participants.
2. This method is often combined with a questionnaire to collect the explanations of answers to the questionnaire.
|
Feedback log
1 (1.1)
|
Asking the participants to record their experiences on a provided form when using the app.
|
—
|
This method is suitable for long-term usability testing, as it can record the participant’s experience dynamically. [51]
|
a SUS: System Usability Scale [88], USE: Usefulness Satisfaction and Ease of Use questionnaire [89], UEQ: User Experience Questionnaire [90], ASQ: After Scenario Questionnaire [91], NASA-TLX: National Aeronautics and Space Administration task load index [92], NPS: Net Promoter Score [81], Health-ITUES: Health Information Technology Usability Evaluation Scale [93], QUIS: Questionnaire for User Interaction Satisfaction [94], PSSUQ: Post-Study System Usability Questionnaire [95], ICF-US: International Classification of Functioning based Usability Scale [96], MARS: Mobile Application Rating Scale [97] |
The intersection of these methods is presented in Fig. 4. Seven studies conducted both usability inspection and usability testing. Thirty studies analyzed the results of testing based on both the objective performance and subjective perceptions. Figure 5 demonstrates the distribution of the three types of evaluation methods in each stage of the mHealth app usability evaluation framework. In the three stages, most of the studies captured the subjective opinions during or after the user testing process, which was most prominent in the “routine use” stage (90.5%). The objective performance of the users was also collected at all stages, which accounted for the highest proportion in the “combining components” stage (29.3%). The usability inspection conducted by the experts was applied only in the first stage (16.3%). Table 5 illustrates the statistical description of each evaluation approach in the three stages.
Table 5
Usability evaluation approaches in three stages of the mHealth app usability evaluation framework
Evaluation approach
|
Three stages of evaluation n (%)
|
1 (n = 61)
|
2 (n = 30)
|
3 n = 19)
|
Heuristic evaluation
|
14 (23.0)
|
0 (0)
|
0 (0)
|
Cognitive walkthrough
|
2 (3.3)
|
0 (0)
|
0 (0)
|
Performance metrics
|
19 (31.1)
|
5 (16.7)
|
2 (10.5)
|
Behavioral observation log
|
10 (16.4)
|
5 (16.7)
|
0 (0)
|
Screen recording
|
3 (4.9)
|
0 (0)
|
0 (0)
|
Eye tracking
|
1 (1.6)
|
0 (0)
|
0 (0)
|
Concurrent thinking aloud
|
22 (36.1)
|
4 (13.3)
|
0 (0)
|
Retrospective thinking aloud
|
1 (1.6)
|
0 (0)
|
0 (0)
|
Questionnaire
|
34 (55.7)
|
22 (73.3)
|
18 (94.7)
|
Interview
|
22 (36.1)
|
16 (53.3)
|
3 (15.8)
|
Feedback log
|
0 (0)
|
1 (3.3)
|
0 (0)
|