A comparison between the HE and CW evaluation methods was performed in the nursing module of Shafa HIS (developed by Tirajeh Rayaneh Co.). The system is installed in more than 200 hospitals across the country. It possesses various capabilities as follows: admission of patient and allocation of bed and room for the patient in the inpatient ward, registration of requests for paraclinical services and monitoring the results, registration of the request for a surgery, registration a request to transfer of the patient to another ward, and registration of the patient's information at the discharge time. This system is fully integrated into the daily routines of the nurses and secretaries of inpatient wards. The nursing module was selected for this comparative evaluation since it encompasses the largest group of users and is the most critical clinical module in any HIS.
In this study, the usability evaluation of the nursing module of Shafa HIS was carry out in the Laboratory of Health Information Technology at Kashan University of Medical Sciences using the HE and CW evaluation methods. The results of these expert-based evaluation methods were then compared.
This method was first introduced by Nielsen and Molich . Based on the Nielsen approach, a summarized list of heuristic principles is provided to evaluators as a guideline and each evaluator independently examines the user interface and recognizes usability problems . Nielsen developed ten heuristics principles, including visibility of system status; match between system and the real world; User control and freedom; consistency and standards; help users recognize, diagnose, and recover from errors; error prevention; recognition rather than recall; flexibility and efficiency of use; aesthetic and minimalist design; and help and documentation that must be observed in the user interface design . According to this method, evaluators should not communicate with one other before the evaluation is completed since a single evaluator may miss recognizing a large number of problems, while different evaluators can recognize a wide range of unique problems. Therefore, more comprehensive results can be obtained after combining the findings of several HE evaluations . If time and resources are limited, this method can rapidly and economically identify the usability problems with the participation of three to five evaluators [21, 34, 35].
Cognitive Walkthrough Evaluation
The CW is one of the most common expert-based methods that emphasizes the ease of learning the system [22, 36]. This method is especially suitable where users have to master a new application or function by learning through exploration. To carry out CW, a precise description of the system user interface design, tasks scenario, clear assumptions about the users and the scope of use, as well as the series of actions that users take to successfully accomplish a given task are determined. Following this, a number of cognitive processes which are followed by users during the performance of a series of actions are simulated by an evaluator or group of evaluators in order to accomplish specific tasks. During the CW, the evaluators try to identify actions which seem to be difficult for ordinary users by understanding the interface behavior and its effects on the user. Therefore, this evaluation method could be done at the early phase of system development to meet user needs .
Since a number of (three to five) evaluators are adequate to perform HE and CW evaluations , the maximum number of evaluators (five evaluators) were chosen in a random and purposeful fashion so as to participate in this study. Three evaluators were PhD students in health information management and two evaluators had a M.Sc. in health information technology. Furthermore, evaluators had experience in the field of HE and CW evaluations and were familiar with various healthcare information systems .
In this study, the CW evaluation was initially performed by the evaluators in order to prevent the adverse effect of system learning on the CW evaluation results. Moreover, Nielsen's usability principles were provided to the evaluators to perform the HE method and they were asked to evaluate the user interface within the same range of evaluated tasks in the CW method in accordance with the checklist. Therefore, performing the CW evaluation did not influence the HE results. On the other hand, in the study by Khajouei et al , which was conducted in two rounds to prevent the possible effect of system learning on the results obtained from the HE and CW evaluation methods, no significant difference was found between the number of recognized problems in the first and second rounds.
A method suggested by Polson and Lewis was used to perform the CW evaluation . Five scenarios were identified according to daily routine tasks performed by nurses and secretaries of inpatient wards using the nursing module based on nurses' opinions and the approval of the head nurse. For each scenario, users’ goals and sub-goals, the series of actions for each task and the system response were defined. Table 1 shows an example of a scenario and its tasks and actions. Then, the evaluators independently accomplished the series of actions for each task and, by assuming themselves as real users, they expressed any potential problems to the researcher. The comments, questions, and ambiguities raised by the evaluators’ as well as the problem and its location were recorded by the researcher, who was also considered as the observer. At the end of the evaluation process, the evaluators reviewed their problems list either added a comment or corrected a previously given comment if necessary. Then, in a meeting with the researcher and evaluators, all lists were compared and duplicate problems were eliminated and a list of individual usability problems was prepared. At the end of the meeting, this list was provided to the evaluators, who independently determined the severity of each problem on a scale of 0 to 4 according to the frequency of the problem, its impact on users, and its persistence [13, 37]. Problem severity was graded as follows:
0 = No problem- I don’t agree that this is a usability problem at all.
1 = Cosmetics - No need for correction unless more time is available for the project.
2 = Minor- Correcting this problem is a low priority.
3 = Major- Important to correction, therefore, it is a high priority.
4 = Catastrophe- Correcting this problem is necessary before the product can be released [13, 37, 38].
The recognized usability problems were categorized according to ISO and Nielsen usability attributes [12, 14]. The usability attributes of both the ISO and Nielsen is shown in Table 2 [14, 29]. Thus, evaluators independently assigned the recognized problems to one of the usability attributes.
An example of a scenario and the related tasks and actions.
Recording a surgery request for femoral fracture reduction for a patient in the orthopedic ward
Determining a Patient
Click on inpatient ward name (orthopedic ward).
Icons and patients' names are displayed.
Right-click on the particular patient’s icon.
The drop-down menu is displayed next to the patient’s icon.
Entering a request for a surgical procedure
Choose “send to operating room waiting list” from the drop-down menu.
A list of available operating rooms is displayed.
Click on “general operating room”.
The general operating room window is displayed.
Patient’s surgery information is added.
Submitting a surgery request to the operating room
Click on the Save button.
A message titled “Information was saved successfully” is displayed.
Click on the Confirm button.
“Patients in the surgery waiting list” is displayed.
Click on the Return button.
Inpatient ward window is displayed.
The usability attributes according to the ISO and Nielsen [14, 29]
How well do the users reach the goals set with the usage of the system?
How much of each resource (e.g. time and mental effort) is required so that the goals can be obtained by users?
How pleasant is the use of the system for users?
How easy is it for users to do basic tasks when using the system for the first time?
When the system has not been in use for a while, How easy can users remember how to use it?
When using the system, how many errors are made, how severe are these errors and how easily can they be retrieved?
After a period of two weeks, once the CW evaluation was completed, five evaluators were asked to independently evaluate the user interface of the nursing module using the Nielsen’s usability Principles Checklist, which was based on Xerox heuristics Checklist . This checklist was according to Nielsen’s ten usability principles and included 254 items with multiple-choice questions that contained “yes”, “no”, and “not applicable” as the available answers. The validity and reliability of the checklist were previously confirmed in a study by Rezaei-Hachesu et al.. The recognized problems were then listed in the problem report form, which consisted of a four-column table that contained “problem title”, “problem description”, “location of the problem”, and “violated usability principle” as the head of columns.
Subsequently, the recognized problems were examined in a meeting with the five evaluators, where the repeated problems were discarded and a comprehensive list of unique problems was developed. Moreover, disagreements about the recognized problems were resolved by discussion in this meeting. Finally, similar to the CW evaluation, the evaluators independently determined the severity of each problem on a scale of 0 to 4 and assign the recognized problems to a specific usability attribute.
The data were analyzed using SPSS Statistics for Windows, version 20.0 (SPSS Inc., Chicago, Ill., USA) both descriptive and inferential statistical techniques. The average severity of the usability problems was calculated . Then, each usability problem was assigned into one of the five categories shown in Table 3, according to its average severity [13, 41]. Furthermore, each problem was associated with one of the usability attributes in which it had the most frequency. In case of same frequency in more than one attribute, the problem was associated with the most relevant usability attribute according to the evaluators' opinion . The Chi-squared test was used to compare the number of recognized usability problems by the two methods and to cover the usability attributes in each of them. Moreover, the Chi-squared test was used to compare the ratio of usability attributes in the two evaluation methods. The average severity of the recognized problems by the two methods was compared using the Mann-Whitney U test. A significance level of 0.05 was considered in this study.
Problem categories according to their average severity [13, 41]