High-Fidelity Simulation to Assess Task Load Index and Performance: A Prospective Observational Study

Objective: The NASA Task Load Index is a questionnaire widely used in aviation. This index might help for attesting the quality of a scenario in high-fidelity simulation in healthcare. The main purpose of this study was to observe whether NASA Task Load Index for critical care-simulated scenarios, designed for residents, was consistent with the literature. The second purpose was to describe relationships between NASA Task Load Index, performance and generated stress during high-fidelity simulation. Methods: All residents in anaesthesia and intensive care undergoing high-fidelity simulation were included. The primary endpoint was the task load generated by each scenario assessed by NASA Task Load Index. Based on the literature, the NASA Task Load Index scores between 39 and 61 were considered an acceptable level. Stress level (Visual Analogue Scale) and specific technical and non-technical skills performances (Team Emergency Assessment Measure) were also assessed. Results: Totally 53 residents actively participated in 1 of 10 different scenarios, between June and December 2017. The median NASA Task Load Index score of scenarios was 61 [48-65]. Five scenarios generated acceptable task load levels. There was no association between the NASA Task Load Index score and technical or non-technical skills performance scores, but an association between NASA Task Load Index and the stress level (rho = 4.7, P  = .001) was observed. Conclusion: Simulation scenarios generate different task loads for residents; the task load was deemed acceptable for half of the scenarios. The NASA Task Load Index could be considered as a tool to assess the pedagogic adequacy of scenarios. Scenario and generated stress level, but not task load, can modify residents’ performance during simulation. This should be considered when planning normative simulation.


Introduction
High-fidelity simulation (HFS) is used as an effective teaching method to enhance the acquisition of required competencies in anaesthesiology and intensive care. 1,2 By taking part in HFS scenarios, residents are actively involved in relevant critical situations. 3 High-fidelity simulation increases stress, mobilizing a level of mental and physical resources which may be expressed as the task load. 4,5 The relationship between stress and efficient memorization was reported to look like an inverted U shape function. 6 Too little stress would not help optimal learning, whereas too much stress would prevent the student from memorizing relevant information. 7 Therefore, the scenario should be assessed for its potential task load generation and associated increase in stress level. Similarly, one may hypothesize that the optimal task load would be associated with higher performance, and the performance may be reduced in situations when the task load is too low or too high. [8][9][10] Different behavioural and subjective metrics are used in operational environments (aviation, nuclear power plants, and medicine) to assess the task load of operators. [8][9][10][11] The NASA Task Load Index (NASA-TLX), initially developed for flight training, is the most commonly applied tool to assess procedural workload in healthcare and has been specifically evaluated in the surgical field. 12 This tool evaluates perceived task load immediately after the performed task through mental, emotional, and physical dimensions. We hypothesized that the task load in anaesthesia and intensive care residents generated by the scenarios was consistent with the expected normal values reported in the literature. The main objective of this study was to explore the task load (with the NASA-TLX) of each scenario during HFS for anaesthesia and intensive care residents. Secondary objectives were to describe the relationships between NASA-TLX, scenario performance, and associated stress level.

Design
This observational prospective cohort study was conducted at the university medical simulation centre of Lyon (Lyon University, France). The study obtained approval from the Hospices Civil de Lyon institutional ethics committee (June 27, 2017) and has been pre-registered on clinicaltrial.gov (protocol ID: NCT03175484). Informed written consent was obtained from all enrolled participants. This research has been carried out in accordance with The Code of Ethics of the World Medical Association (Declaration of Helsinki).

Population and Simulation Setting
This study involved all residents in anaesthesia and intensive care undergoing HFS sessions between June and December 2017. No exclusion criterion was applied. The scenarios simulated crisis situations occurring in the operating room, in the intensive care unit, or during intra-hospital patient transport. Scenarios were developed by instructors based on national guidelines. High-fidelity simulation sessions followed the standard repetitive sequences of briefing, scenario, and debriefing.

Experimental Protocol
The demographic data of participants at the beginning of the HFS session were collected (age, gender, previous HFS participation, and post-graduate year). We used the NASA-TLX questionnaire to assess the task load of each participant during each HFS scenario. NASA-Task Load Index score (0 point: no task load and 100 points: maximal task load) included 6 dimensions (mental demand, physical demand, temporal demand, performance, effort, and frustration). A detailed description and the questionnaires used are reported in Appendix A. As suggested in the literature, the NASA-TLX score was considered consistent with those values if situated between 25th and 75th percentile (39 < NASA-TLX < 61). 13 We considered those NASA-TLX values as "acceptable." We measured the individual quantitative stress level using a Visual Analogue Scale for stress translated on a 100mm numeric scale for stress (0: no stress and 100: maximal stress) immediately after each scenario. We also measured both technical performance, using specific technical skills scoring grid (previously described, 14 0: low performance and 100: maximal performance) and non-technical skills performance by the Team Emergency Assessment Measure scale (0: low performance and 44: maximal performance). 15,16 Two investigators (C.B., M.L.) independently evaluated the resident performance using video recording. The timeline of the study is presented in Figure 1.

Endpoints
The primary endpoint was the assessment of individual NASA-TLX score for each scenario. Secondary endpoints were the stress level at the end of the scenario measured by the Visual Analogue Scale for stress and the technical and the Team Emergency Assessment Measure performance score.

Statistical Analysis
Continuous variables were described using median (25th-75th percentile) and were compared using the Wilcoxon or the Student's t test as appropriate. The Kruskal-Wallis test was used for more than 2 group comparisons. A multivariate linear regression analysis was performed to explore the interaction between NASA-TLX, scenario, stress, and technical performance. The correlation between technical performance and non-technical performance skills scores was evaluated using Spearman's (rho) correlation index. All tests were 2-tailed, and P < .05 was considered statistically significant. Statistical analysis was performed on a per-protocol basis using MedCalc software version 9.6.4.0 (MedCalc, Mariakerke, Belgium).
The technical skills performance was correlated with the non-technical skills performance (rho = 0.6; P < .001). In a bivariate fit analysis, NASA-TLX score was not associated with technical skills performance and Team Emergency Assessment Measure scores but was associated with stress level (4.7 points of increment for 1 point increment of NASA-TLX score, P = .001, Figure 2).
An elevated Visual Analogue Scale for stress predicted decreased technical skills performance (−15 points decrement for each 10 points Visual Analogue Scale for stress increment, P = .03). No significant association between Visual Analogue Scale for stress and the Team Emergency Assessment Measure score was observed.
In multivariate analysis, the Visual Analogue Scale for stress incurred by HFS (F ratio = 4.16, P = .048) and the type of scenario (F ratio = 17.6, P < .0001) predicted technical skills performance. For the Team Emergency Assessment Measure performance, only the type of scenario (F ratio = 7.99, P < .0001) was predictive (Appendix B).

Discussion
The task load level was assessed with the NASA-TLX score in 53 anaesthesia and intensive care residents undergoing HFS training. The median NASA-TLX score of the 10 scenarios was 61 [48 -65]. Among the scenarios tested, one half generated an adapted task load and another half generated high task load level. Such scenarios are of great educational value but should be used with caution and may be suited to more experienced residents.
To the best of our knowledge, no study reporting a relationship between NASA-TLX and HFS performance scores in healthcare has been published. Different scales are available to measure task load. The subjective workload assessment technique uses 3 levels (low, medium, and high) for each of 3 dimensions of time load, mental effort load, and psychological stress. Overall workload allows the subjects to rate on a unidimensional scale from 0 to 100 points. Studies that compared all of these validated scales demonstrated the reliability and validity of the NASA-TLX in comparison to other workload measures, 17 probably due to the 6 subscales which allow more precision in the task load assessment. Hart et al 18 showed that NASA-TLX was the most used subjective scale to assess task load. Previous validation in the medical field has been published providing information to help interpret the obtained scores for each scenario. 13 However, to the best of our knowledge, no study validated objective methods to assess task load in the medical field.
The use of task load assessment by NASA-TLX was first described in aviation. It was demonstrated that reducing the task load significantly improved performance in aircraft. 19 In healthcare, an increasing number of studies focus on task load measurement since overload has been identified as a significant cause of errors. 20,21 Increased response time during simulated crisis situations has also been reported with task overload. 22,23 Residents might experience high task load during scenarios and this may harm the process of learning. By contrast, scenarios with too low task load might result in poor involvement. Task load analysis becomes important when the cognitive load theory is considered. This theory assumes that working memory capacity is limited. In some complex learning cases, reducing task load will help increase working memory capacity. 24 In the present study, no extreme NASA-TLX scores (>77 points) were observed, a level reported as clear overload in the literature. 13 An association between NASA-TLX and stress level was observed, while no association was observed with the technical or the non-technical performance. Those results suggest that NASA-TLX could be used as a marker of scenarios' educational quality but not as a single tool to assess residents' performance during HFS. Similar results were found in a surgical simulation study exploring NASA-TLX during laparoscopy, 25 which might be explained by the fact that performance and task load are 2 separate dimensions of the scenario. High-fidelity simulation scenario might be perceived as a challenge with a need to mobilize new working resources, such as providing leadership during critical situations. These tasks will affect the task load without systematically affecting performance skills. The performance scores obtained during HFS result from multifactorial and complex factors that affect the participants before and during the scenario. The performance during the scenario is not a pedagogical objective of the HFS but rather a tool to achieve better further performance in a real setting (helped with the information provided during the debriefing). Thus, we might assume that performance during HFS is not, in this study, a valuable independent marker for the scenario's educational quality. Moreover, the perceived performance is only one component of the 6 scales of NASA-TLX questionnaire and this might explain in parts why performance was not associated with NASA-TLX. The performance NASA-TLX subscale reflects the subjective self-evaluation of residents that could be influenced by several psychometric or emotional factors (self-efficacy feeling, fear of negative evaluation, social anxiety, and reaction of participants to the announcement of the end of the scenario). 5,[26][27][28] This auto-appreciation does not fit with the performance scale relying on specific objectives rated by investigators.
The association between task load and stress was previously described in surgery residents. NASA-Task Load Index was positively correlated with objective stress levels measured by sympathetic activity (heart rate and blood pressure). 29 These changes may be explained by specific NASA-TLX subscales as frustration and temporal demand, which are the feelings of being overpassed that could increase stress levels.
This study has several limitations. The number of residents was 53 and could have precluded the inference of NASA-TLX and performance. Ten scenarios for 3 different postgraduate years were included offering variability that might have influenced the results. Moreover, post-graduate years of residency might influence the stress levels during HFS in complex and multifactorial ways explaining the unpredictability of that relationship. 7 Then, as Hart et al 18 reported previously, we noticed that the main limitation of the NASA-TLX is the interpretation of the score. Thresholds used in this study were given by the analysis of the vast amounts of data published in the medical field. 13 This analysis, mainly in simulated endoscopic surgery and emergency room situations, did not match the setting for the participants (alone or in a team), and the performance did not impact any certification. All of these differences could have influenced the task load and it is very difficult to establish a universal and reliable threshold in the medical field. However, identifying the relative task load for each scenario might help instructors to identify the specific interest of the scenario in regard to the task load provided. Further studies are needed to assess the potential of error productions associated with perceived scenario-specific task load and to further evaluate the hypothetical difficulty of each scenario. There is a need for precision on the optimal NASA-TLX threshold to define locally what is an acceptable high or low workload, in order to use this tool effectively to enhance the pedagogical values of HFS. NASA-Task Load Index might be used as a tool to identify scenarios with outliers or marginal scores in order to select, upgrade, or adapt scenarios to the pedagogical objectives of HFS. The performance has been explored with specific technical skills evaluation grid and one non-technical skill evaluation grid.
Although several studies have been reported to observe the difference in performance score (technical and non-technical skills), still some performance may not be covered by these performance grids. 30,31 Therefore, one might suspect that the NASA-TLX explored some other information that could impact performance in a way that is not observed by a singular performance grid. Although no association between performance and NASA-TLX was observed here, a deeper exploration of task load level impact on performance should be further explored to confirm the lack of clear association. 32

Conclusion
To conclude, simulation scenarios generate different task loads in residents and NASA-TLX could be considered as an additional tool to help instructors to assess the pedagogic adequacy of HFS scenarios to learners. Scenario and generated stress level, but not task load, can modify residents' performance during simulation. This should be considered when planning normative simulation. anaesthesiology and intensive care residents from Lyon 1 University for their participation.

Declaration of Interests:
The authors declare that they have no competing interests.
Funding: This study received no funding