Research design and organizational background
The cross-sectional quasi-experimental study was based on the examination of a VR and MR training sequence in paramedic trainees, with a subsequent survey on media effect factors, motivation, and experience of the highly immersive environment via online questionnaire.
Data collection
The data was collected from the two educational Emergency Medical Service (EMS) project partners in the ViTAWiN project during a VR or MR simulation. The two educational partners provide vocational training for EMS in the German states of Lower Saxony, Bremen, and Hesse. The study was conducted at two separate sites in Lower Saxony (11.10.22) and Hesse (22.02.22). There are eight months between the two study dates because the trainees undergo dual training and are only at the "school" learning site (for a total of 1960 hrs.) in blocks, alternating between hospital (720 hrs.) and rescue station (1960 hrs.). In addition, problems with the robustness of the tracking of the manikin had to be resolved between the two dates. This was mainly due to the extremely low tolerance between the actual position of the manikin in the room and the virtualized position. A small tracking offset (< 1 cm) already leads to significant distractions for the participants.
As additional resource – “Roles and responsibilities at the study site”: The complete situation on site was controlled and documented by a study supervisor, enabling simulation technicians and educational professionals to concentrate on their roles, and documentation and quality assurance were ensured throughout. The role of the supervisor was also to instruct the participants and subsequently obtain consent forms. The simulation technicians were primarily tasked with expertly performing the setup of the XR technology and ensuring constant operational readiness of the system. The pedagogical professionals had the task of instructing the participants in the EMS scenario ("mission report") and conducting a short debriefing. They were also the point of contact for any questions of the participants during the simulation.
Technical setup & hygiene measures
The simulation can be displayed as a two-dimensional desktop variant, but it is primarily intended as a so-called highly immersive learning environment using head-mounted displays (HMD), operating elements (controllers), and position-tracking technology for capturing and processing the movements of the participants in the VLE. Although a simulation in a sitting position would be technically possible, all simulations were performed in room movement. During the entire simulation, the participants wore hairnets to improve the hygienic conditions of the HMD, which partially has fabric straps for fixation. In addition, disposable gloves and FFP2 masks (comparable to N95 and KN95) were used due to the COVID-19 pandemic.
As additional resource – “Technical details”: A free area of 3x4 meters for the play area proved sufficient but could be adapted to local conditions. In this study, we used the VR set Valve Index® from Steam, which offers a resolution of 1.440x1.600 pixels per eye and a field of view of up to 130 degrees. The position of the HMD and the controller is detected by laser sensors through so-called lighthouses. This position information is transmitted to and processed by a computer, which mirrors the updated VLE back to the HMD. In the classroom, the VLE can be additionally displayed on a screen from different perspectives (first person, third person, or stationary angle) so that other learners can observe the virtualized patient care. To project the real manikin into the VLE, predefined optical markers were attached to the manikin, which allowed the position in the play area to be calculated automatically based on pre-simulation photographs. During the simulation, the position of the markers was captured by the built-in cameras of the "VR glasses".
In the environment, it was only necessary to move around the virtualized patient with a few steps. As a locomotion technique, participants could use teleportation and movement via joystick in single-player mode. When multiple participants shared the same playing area, this was intentionally disabled to avoid potential serious injuries. Jumping, running up and down stairs, overcoming obstacles, or complex movements of the entire body were not necessary.
Experimental manipulation
In the comparison group (VR), prehospital patient care was completed without a manikin (Fig. 3), while in the experimental group (MR), a patient manikin was integrated as a haptic element (Fig. 4). In the experimental group, the patient manikin was integrated into the virtualized environment by a simulation technician (ST). The ST was able to draw on extensive knowledge in the field of computer science and was specifically familiar with this process through their own programming and co-design of the interface. In addition, the ST assisted the educational professional in controlling the simulation technology. This was also the case for the comparison group, except for the involvement of the manikin, which was not required here. This ensured that the pedagogical professionals could fully concentrate on the simulation and the participants. The experimental design meant that the participants of the experimental group could "feel" the patient on a rudimentary level. This enables examination steps such as palpating the pulse. The comparison group did not have this possibility and could only rely on the optically virtualized patient. This meant that a haptic reference was missing, such as the resistance of the patient's body during auscultation of the lungs on which the stethoscope corpus is placed.
The participants had to assess the scene in terms of safety and had to provide initial assessment and structured care to a burn patient in the MR/VR environment using medical guidelines and regional protocols (Standard Operating Procedures) [14].
Assignment method
Assignment to an experimental condition was made in a non-randomized manner on a class-by-class basis due to the extensive technical setting. Assignment at the individual level was not feasible from a research economics perspective as it would have meant hugely higher time and personnel requirements. The increased resource requirements are mainly explained by the technical setup and the room arrangement, but especially by the still very complex integration and calibration of the tracking of the patient manikin.
Inclusion and exclusion criteria
The participants were verbally interviewed about their health status on the day of the simulation. Only participants who did not report being acutely ill, injured, or in any way acutely impaired were included in the evaluation. All participants were also alert, visually inconspicuous in terms of coordination, and sure-footed. They were given verbal and written information about the study and had the option of not participating or dropping out, as well as withdrawing consent.
Participants’ characteristics
The participants were students of the German three-year paramedic training program. Unfortunately, due to staffing difficulties during the pandemic, we were not able to recruit emergency nurses from our educational partner. Prior to MR/VR exposure, experienced instructors from the educational partners gave the participants the opportunity to familiarize themselves with the hardware and software as well as the elements of the virtual environment. The questionnaire was not filled out in the simulation room, but in a separate classroom. Age and gender of the complete sample and subgroups are shown in Table 5. The participants had little or no prior experience with VR.
Base population: According to federal health reporting [15], 78000 people were employed in the German rescue service in 2020, 52000 of them full-time. 53000 employees were male, 25000 female; no information is available on non-binary persons. The age groups are shown in Table 1.
Table 1
Age groups of the population
Under 30 years
|
30–40 years
|
40–50 years
|
50–60 years
|
60 years and older
|
25000
|
20000
|
15000
|
12000
|
5000
|
Instrumentation
Immediately after the VR/MR exposure, the participants were asked to complete an online questionnaire. This comprised the collection of sociodemographic data as well as the Situational Motivation Scale – SIMS [16], the System Usability Scale – SUS [17, 18], and the IGroup Presence Questionnaire – IPQ [19]. The Simulator Sickness questionnaire – SSQ [20] was completed immediately after the evaluation. Since stricter hygiene requirements had to be observed due to the COVID-19 pandemic, approximately three to five minutes elapsed between the end of the VR/MR exposure and the start of the post-test questionnaire. The instruments used are described below:
Based on self-determination theory [21], the SIMS measures constructs of Intrinsic Motivation, Identified Regulation, External Regulation, and Amotivation. The higher the score, the higher the level. For Intrinsic Motivation and Internal Regulation, high scores are to be evaluated positively, for External Regulation only to a limited extent, and pronounced Amotivation is to be evaluated negatively. Table 2presents a brief description of the SIMS constructs.
Table 2
Description of the SIMS constructs
SIMS Construct
|
Amotivation
|
External Regulation
|
Identified Regulation
|
Intrinsic Motivation
|
Motivation
|
Absence of motivation
|
Externally determined
|
Rather autonomous
|
Intrinsic
|
Motivation control
|
Not applicable
|
External control variables, e.g., constraint by curriculum, high reward promises
|
A high personal value for action outcomes and action consequences is expected
|
Completely voluntary; goals are integrated into the sense of self
|
The instrument was tested with moderate to excellent reliability (Cronbach’s alpha: intrinsic motivation = .95, identified regulation = .85, external regulation = .62, amotivation = .83) and factor validity. “Results of the confirmatory factor analysis showed that the χ2 was significant, χ2(98, n = 907) = 856.50, p < .05, and the NNFI (.89) was somewhat lower than the .90 cut-off value. However, the CFI (.90) was satisfactory. […] All hypothesized factor loadings, covariances, error residuals, and factor residuals were found to be significant (z > 1.96).” [16]
The IPQ addresses questions representing Involvement, Spatial Presence, Realism, and a General Factor with responses using seven-point rating scales (0–6), with high scores being positive for each construct. The IPQ was tested with good reliability (Cronbach’s alpha > .7 for all subscales) and factor validity. The subscale items load on their respective subscales fairly highly, mostly above 0.6; for details, refer to Schubert et al. [19].
There are objective and subjective methods for assessing the severity of simulator sickness, the most popular being the self-report using Kennedy’s SSQ [22]. The SSQ includes three constructs – Disorientation, Nausea, and Oculomotor –, which are answered using a four-point rating scale. The participants were asked about any symptoms and, if applicable, their severity. Each construct, as well as the total score, has its own weighting, which is obtained by multiplying the series means. The use of the SSQ is critically debated in the application domain of VR due to its development, which was intended for flight simulators rather than the latest generation of HMD-VR [23, 24]. Nevertheless, because SSQ has been used in numerous studies to report VR-related adverse events [22, 25], it was used here to allow comparability. Weighted scores (see Table 3) < 5 are considered "negligible", 5–10 are considered "minimal", 10–15 are considered "significant", and 15–20 are considered "of concern." A total score above 20 is considered "bad" [24]. Recent work also shows psychophysiological changes associated with specific EEG abnormalities that correlate with the SSQ [26]. To date, a variety of theories and assumptions exist that try to explain simulator sickness in VR (11). The symptoms can last from minutes to hours and increase with the duration of VR exposure, although it is not currently possible to conclude a general pattern [22]. Studies in VR settings have shown that the probability of symptoms is heterogeneous. For example, Chen et al. [27] showed a frequency of 30%, while Kim et al. [26] showed a frequency above 80%.
The System Usability Scale (SUS) is a simple and technology-independent questionnaire for evaluating the usability of a system. It comprises ten questions, which are answered on a rating scale and evaluated as a percentage on the grade scale. Rule of thumb: “[…] products that scored in the 90s were exceptional, products that scored in the 80s were good, and products that scored in the 70s were acceptable. Anything below a 70 had usability issues that were cause for concern.” [28] The SUS was tested with excellent reliability (Cronbach’s alpha = .91) and acceptable factorial validity (for eigen values and factor loadings, see at Bangor et al., 2009 [28]).
Table 3: Constructs, scale, and sample items
Measures and covariates, power and precision
All statistical operations were performed using RStudio (version 2022.02.1 + 461). First, the data was prepared, then a pattern analysis of missing values and a descriptive-statistical evaluation were performed.
The subgroups were compared using Welch’s t test (α = .05, two-sided) after testing the dependent variable for extreme deviation from the normal distribution (R-package “stats”). Correlations were tested using Pearson correlation (R-package “rstatix”). The p values were adjusted for multiple testing according to Holm. The sample size of the two subgroups (n = 16 and n = 20) results in a beta error probability of 0.43 for the t test for two independent means (effect size d = 0.5) and 0.49 for the calculation of the correlation coefficient by Pearson (effect size q = 0.6) for independent samples (“g*power”, version 3.1). Considering that the size of the two subgroups is not extremely unbalanced, it can be assumed that the calculation of the Welch’s t tests and the Pearson correlation is robust for this sample [29]. The three genders were compared using the Tukey-Kramer test (α = .05) with adjustments for multiple comparisons (“stats” package). This test uses the harmonic mean of the sample numbers and can therefore be used when the sample numbers are different [30]. To test interrelatedness, Cronbach’s Alpha and McDonald’s Omega were computed with the R-package “psych” (Table 4).
Table 4: Reliability measurements
The raw data is publicly available at the FORDATIS research data repository [31].