Quantitative Inuence and Performance Analysis of VR Laparoscopic Surgical Training System

Background VR surgery training becomes a trend in clinical education. Many research papers validate the effectiveness of VR based surgical simulators in training surgeons. However, most existing papers employ subjective methods to study the residents’ surgical skills improvement. Few of them investigates how to substantially improve the surgery skills on speci ﬁ c dimensions. Methods In this paper, we resort to physiological approaches to objectively research quantitative in ﬂ uence and performance analysis of VR laparoscopic surgical training system for medical students. 41 participants were recruited from a pool of medical students. They conducted four pre and post experiments in the training box. In the middle of pre and post experiments, they were trained on VR laparoscopic surgery simulators (VRLS). When conducting pre and post experiments, their operation process and physiological data (heart rate and electroencephalogram) are recorded. Their performance is graded by senior surgeons using newly designed hybrid standards for fundamental tasks and GOALS standards for colon resection tasks. Finally, the participants were required to ﬁ ll the questionnaires about their cognitive load and ﬂ ow experience. Results The results show that the VRLS could highly improve medical students' performance (p < 0.01) especially in depth perception and enable the participants to obtain ow experience with a lower cognitive load. Conclusion The performance of participants is negatively correlated with cognitive load through quantitatively physiological analysis. This might provide a new way of assessing skill acquirement.


Background
As one of the modern minimally invasive procedures, laparoscopic surgery has become popular primarily due to its small wounds and rapid recovery. However, residents generally have a long training period (6 years at least) to be trained as quali ed laparoscopic surgeons [1]. In the narrow operation space, mistakes may easily occur if laparoscopic surgeons do not handle procedures properly. Traditional laparoscopic surgical training usually chooses the training box within vitro animals or corpses organs, which could give rise to negative effects, such as high cost, low reusability, and related ethical issues [2].
The advent of VR surgical simulators has changed the surgeons learning mode. It can simulate the surgery from the visual, auditory, and tactile aspects. It not only reconstructs the real surgical environment and procedures but also can be reused for a variety of designed training tasks without surgical risk [3]. Thus, training on laparoscopic surgical simulator based on virtual reality (VR) has gradually becoming a standard in Europe at present. There are many pieces of research that validate the effectiveness of VR based laparoscopic surgery simulators (VRLS) on training surgeons [4,5,6]. To our knowledge, most of them employ subjective methods to study the improvement of medical students' surgical skills through VRLS [7]. But few papers investigate how to substantially improve the surgery skills on speci c dimensions, such as physiological and psychological perspectives.
In this paper, we resort to physiological approaches to quantitatively measure the in uence (cognitive load and ow experience) of VRLS on medical students.
Cognitive load theory (CLT) builds on established models of memory, including subsystems of sense, work, and long-term memory. John Sweller rst put forward a systematic study in 1988 and established a theoretical hypothesis [8]. He believed that cognitive load refers to the total amount of mental activities exerted on an individual's cognitive system during a speci c time of operation. CLT is an important learning theory, which is paid more and more attention in medical education. The medical eld is a complex knowledge eld. Medical workers need to integrate a variety of knowledge, skills and behaviors at a speci c time and place at the same time, and make quick responses and decisions, which is prone to excessive cognitive load or even overload phenomenon. For novice physicians, complex tasks such as surgery can lead to excessive cognitive load (CL), which can have a negative impact on learning [9].
Rasiah assessed the relationship between cognitive load and dexterity parameters and found that reduced cognitive load signi cantly affected learning outcomes [10].
Flow experience is the state of mind in which a person is fully engaged in some activity and reaches an extreme level of pleasure [11]. It was rst proposed by Csikszentmihalyi in the 1960s. He found that when people maximize their physical, mental, and mental states, they often produce the ultimate optimal experience [11]. In simulated laparoscopic surgery, Flow experience is expressed as "go all-out work".
Studies have shown that improving ow experience during laparoscopic surgery can improve the effectiveness of the operation, thereby improving patient safety [12].
To our knowledge, there is no research to directly and quantitatively explore the relationship between the ow and total cognitive load. However, some studies have shown that learners with good academic performance have higher ow experience, lower external cognitive load, and higher related cognitive load [13,14]. Chang's research has shown that ow experience is related to three different cognitive loads, con rming that media richness and game interaction can improve learners' ow experience, reduce external cognitive load, and promote closely related cognitive load [15].
In the medical eld, the methods of measuring the cognitive load of medical workers basically follow the measurement technology of cognitive load or cognitive load [16]. The measurement methods for measuring cognitive load can generally be classi ed into three categories: subjective measurement, task performance, and physiological measurement. These three measurement approaches were all utilized in our study. At present, the main cognitive load measurement method is subjective measurement [17].
Some studies have found that combining subjective mental effort indicators and objective behavioral performance indicators to form a comprehensive indicator that can reveal some important information about cognitive load [18,19,20]. At present, the more classic scale is the Pass scale [20], SWAT scale [21] and NASA-TLS scale [22].
The main advantages of the psychophysiological measurement of cognitive load are the objectivity of the measurement, the sensitivity to different cognitive processes, the non-interference of the program, and their implicitness and continuity [23]. EEG is considered a physiological indicator, which can be used as an online and continuous cognitive load measurement method to detect subtle uctuations in instantaneous load. Measuring the changes in alpha and theta brainwave rhythms re ects what happens in the participant's information processing situation, even if the participant does not know these changes or cannot express them in words [24,25]. As for the cognitive load, α is gradually suppressed. As the di culty of the task increases, so does the amount of θ activity [26, 27,28].
The most commonly used methods to measure ow experience are retrospective questionnaires and interviews [29,30]. The Flow State Scale compiled by Jackson is mainly developed from the nine elements proposed by Csikszentimihalyi [31]. Many scholars develop a new ow scale based on the characteristics and needs of VR learning situations, but a sense of control, immersion, clear goals and feedback are still indispensable dimensions in the measurement of ow experience [32,33,34]. At present, there are few studies on evaluating ow experience in virtual reality based on physiological indicators [30]. Flow experience can be expressed in physical and physiological characteristics, as an objective ow indicator [15]. In terms of EEG, some studies indicate that the correlation between EEG and ow experience under peak performance conditions, and the induction of ow experience can improve the performance of workplaces, sports elds. Frederick [35] proposed EEG is an objective method of measuring the ow, it is more accurate than subjective behavioral measures.
With the rapid development of wearable devices, we could obtain different modality physiological data. The most easily acquired physiological data is the heart rate. Besides, we measure the cognitive load using EEG. Three hypotheses are designed as follows: H1: Training on VRLS could improve the performance of medical students in some dimensions.
H2: Training on VRLS could improve the ow experience and lower the cognitive load for medical students in some dimensions.
H3: The performance is positively related to ow experience and negatively related to the cognitive load.
These hypotheses are validated through the following experiments and user studies. 41 Participants were recruited using a pool of medical students which contains both undergraduates and graduates. They conducted four pre and post experiments in the training box. In the middle of pre and post experiments, they were trained on VRLS. When conducting pre and post experiments, their operation process and physiological data (heart rate and electroencephalogram) are recorded. Their performance is graded by senior surgeons using newly designed hybrid standards for fundamental tasks and Global Operative Assessment of Laparoscopic Skills (GOALS) standards [36] for colon resection tasks. Finally, the participants were required to ll the questionnaires about their cognitive load and ow experience.
There are two main contributions provided by this work: (1) Using multimodal sensing data (EEG and heart rate), we design a physiological approach to quantitatively measure the in uence of VRLS on medical students; (2) Our experiments reveal the negative correlation between the skill performance of trainees and their cognitive load, after correlation analysis. This research can identify the potential bene ts of VRLS and its improvement opportunities in laparoscopic procedure training. The remainder of this paper is organized as follows. We introduce our experiments, including participants, platform and procedure in Sect. 2. The results are documented and analyzed in Sect. 3. Section 4 provides the discussion. Section 5 concludes the paper with future work.

Participants
In this study, we recruited 41 participants between 17 and 27 years old (21.10 ± 2.79). Our experiments were approved by our hospital ethics committee. There are 15 male and 26 female medical students. 2 persons are left-handed and the others are right-handed. 31 participants never experienced VR and 10 participants have played VR or AR games once or twice before this study. 14 persons (34.15%) played games on PC or mobile phone every day. Just 4 (12.20%) participants rarely play games in their daily life.
All participants never experienced VR based laparoscopic surgery simulator before this study.

Platform
The pre-test and post-test were carried out using a training box. The commercial laparoscopic physical training box (38×27×27 cm) is illustrated in Fig. 2 (a). There is a high-resolution camera on the inner top of the training box. All the operation process of participants could be recorded using the camera. The participants were required to conduct four tests (three fundamental laparoscopic surgery skill training tasks and one colon resection task) illustrated in Fig. 3. Three fundamental surgery skill training tasks are peg transfer, picking beans and threading skill practice. The electroencephalography (EEG) data were collected using a four-channel dry electrode headset (Muse 2, InteraXon Inc.) shown in Fig. 2 (b). The sensors are located at TP9 (left ear), AF7 (left forehead), AF8 (right forehead), TP10 (right ear). Compared with traditional EEG devices such as fMRI, PET and 20/32/64 electrode headsets, Muse 2 is portable and easy to use. [37] and [38] pointed out that the accuracy of Muse is good enough for research, even though there are only four electrodes.

was developed by the State Key Lab of VR Tech & Syst in Beihang
University. The simulator consists of two major components. The rst component is the computation module, which is a high-performance PC connected with a touch-screen monitor (1920×1080 dpi). The hardware parameters are as follows: Intel(R) Core (TM) i5-8500 CPU @3.00GHZ with 6 cores and NVIDIA GeForce GTX 1060 with 6GB memory. The software is running on Windows 10 64 bits professional version. The second component is the simulation module, which contains two surgical handlers connected with haptic devices (Geomagic touch, 3D System Company, US) and a navigation camera in a box. Two-foot pedals were utilized to activate the electrosurgical coagulation during surgery training.
The whole experiment consists of three main steps. Firstly, the participants were required to conduct a pre-test on a training box. Before this step, we explained the purpose of this study and how to operate. The pre-test contains three fundamental surgery skills tasks (Pre-FT) and one colon resection task (Pre-CRT). During the operation, we recorded the heart rate and EEG data, using Polar H10 heart rate monitor chest strap and Muse 2 brain wave monitor respectively. Besides, the whole procedure was recorded as videos. Secondly, the participants were asked to conduct the same kind of tasks on VRLS. Everyone had to complete 4 trials within a week, and each trial lasts about 30 minutes. Finally, we required our participants to conduct the post-test. The post-test procedure (Post-FT and Post-CRT) is the same as the pre-test. After nishing all experiments, participants were asked to complete four questionnaires regarding the cognitive load and ow experience. Each questionnaire re ects the experimental results with different dimensions. To explore more information about the in uence of VRLS on participants, we utilize three scales (Pass [39], NASA-TLX [40], WP Scale [41]) to measure the cognitive load. To measure the ow experience during the experiments, we combine two scales from EGame scale [34] and Cheng's scale [42] and redesigned the questions according to our experiments.

Data Collection
In this study, we obtain three types of data. The rst is the performance scores computed from recorded videos according to the GOALS standards for colon resection task and our designed measure rules (e.g. completion time, number of mistakes, etc.) for fundamental surgery skill tasks. The second is the selfreported scores including cognitive load scores and ow experience scores computed from questionnaires. The third is the physiological data extracted from heart rate data and EEG. The performance scores and physiological data need to be processed before getting meaningful information. For performance scores, the fundamental surgery skill tasks and colon resection tasks are measured from different dimensions. We measured the performance of fundamental surgery skills from 7 dimensions including the completion time, the number of failures in peg transfer and picking beans, the number of times rope dropped, motion smoothness, depth perception and bimanual dexterity. The scores of each item are normalized and scaled to [0,10], then we obtain the nal performance score by the sum of all items' scores. The GOALS standard measured one's laparoscopic skills from 4 aspects: depth perception, bimanual dexterity, e ciency, tissue handling and autonomy. Each dimension is in the range of [0,10].
For physiological data, we ltered the data at rst. During the experiment, we found that the heart rate monitor chest strap could steadily and robustly measure the heart rates. However, the brain sensing headband is sensitive to head motion and the contacts be-tween skin and electrodes. Thus, we ltered the EEG data according to four data quality indicators (1 = Good, 2 = Medium, 4 = Bad). Besides, the electrooculography (EOG) signal has strong disturbing effects. Fortunately, it can be measured and recorded using AF7 and AF10. Thus, we could easily delete these interference signals such as blink and jaw clench.

Statistical Analysis
After pre-processing all raw data, we utilize SPSS (Statistics V.25) to analyze our computed scores. We use the classical paired samples t-test to test the differences between pre and post tasks. A p-value of < 0.05 was recognized as statistically signi cant.

Results
In this section, we investigate the in uence of VRLS on participants from two aspects: performance (Sec.

Performance and VRLS
We could easily measure one's surgical skill pro ciency using task completion time. We found that, after training on VRLS, the time to complete the same task could drop sharply (p < 0.01). The rst column of Table 1 shows the time required for each participant to complete each task. The left upper of Fig. 7 is the intuitive presentation using a histogram. The e ciency of participants improved 1.6 and 5.4 times for fundamental surgery task and colon resection task respectively. Besides, the performance scores for fundamental surgery skill task (FT) and colon resection task (CRT) are shown in Fig. 5 and Fig. 6. The pre-fundamental surgery skill task score is signi cantly lower than the post fundamental surgery skill task score (p < 0.01). We could obtain the same results for the colon resection task. In summary, we could be sure that training on VRLS would increase the performance of medical students. Furthermore, we inspect the performance change of colon resection task in each dimension according to GOALS standards. We found that the participants' performance is enhanced in all four dimensions: depth perception (p < 0.001), bimanual dexterity (p < 0.001), e ciency (p < 0.001), tissue handling and autonomy (p < 0.001). For depth perception, the average score

Flow Experience
The ow experience is related to moderate heart rate (HR) [45]. Thus, we investigated the ow experience of the participants through self-reported psychological ow questionnaires and heart rates. In Figs. 7 and 1, we could learn about the general change of heart rates between the pre-test and the post-test. We observe that the average heart rate decrease signi cantly (p < 0.05). Especially, the maximum heart rate decreases at post-test procedure (p < 0.05. Interestingly, the minimum heart rate did not increase signi cantly. The average heart rate of post-test fundamental surgery skill task even increases a little. We think this might demonstrate that the participants had a more steady heart rate change after VRLS training. The results of the self-reported ow experience are shown in Table 2. We could nd that the goals of tasks are clear (4.02/5), the participants had a positive attitude toward the whole experiment, and they had a great sense of involvement (4.19/5). From the aspect of ow dimension, the score is 3.95/5.0 ± 0.96 which is a relatively high score. We might conclude that the participants obtain a good experience during the whole experiment.

Cognitive Load
The overall cognitive load is measured using Pass scale [39] (M = 6.09/10, SD = 1.32). The overall task di culty is relatively low (M = 3.80/10, SD = 1.44). Table 3 (NASA-TLX) shows the self-reported mental workload of our designed tasks. The overall cognitive load of 41 participants is lower than the midpoint of the full range (0 − 10). From the table, the mental demand (M = 5.00 SD = 1.67), physical demand (M = 5.34, SD = 2.44) and effort (M = 6.00, SD = 1.48) dimensions were the mainly components that affect participants' cognitive load. Most participants thought they could accomplish the task well (M = 2.27, SD = 1.27) and had a pleasant experience (M = 3.05, SD = 1.80). That means they were con dent with their surgery skills after training.
Besides, we study the participants' cognitive load from ve perception dimensions shown in Table 3 (WP Scale). The results demonstrated that the most important ability needed in laparoscopic surgery skills is depth perception (M = 8.12, SD = 1.82). The following requirements are visual processing ability (M = 7.98, SD = 1.51) and haptic sensing ability (M = 7.80, SD = 1.86). The audio processing ability is the least used perception capability (M = 4.32, SD = 2.56). And to complete the tasks, it requires considerable attention from participants (M = 7.19, SD = 1.44). In addition, we could compute the cognitive load from EEG [41]. As they pointed that the α and θ brain waves are more competitive than others when computing cognitive load. When calculating cognitive load, we only considered α and θ frequency spectrums. Figure 8 shows the cognitive load scores comparison between pre (M = 0.17, SD = 0.11) and post (M = 0.14, SD = 0.09) fundamental surgery skills tasks. The post cognitive load is signi cantly lower than cognitive load of Pre-FT (p = 0.04 < 0.05). Figure 9 shows the cognitive load scores are also signi cantly decreased in colon resection tasks (p < 0.01).

Correlation Analysis
Conrad et.al observed that there is a negative relationship between average EEG alpha and ow experience by conducting a pilot study [46]. However, the limiting participants in their study might not result in a convincing conclusion. In our study, we tried to reveal the correlation between three kinds of scores.
Firstly, the relation between performance and corresponding cognitive load extracted from EEG is shown as Table 4. In four tasks, we found that the cognitive load has negative in uence on the participants' performance. For the Pre-FT, the cognitive load score is negatively related with the performance score (R 2 = 0.79, p = 0.1). For the Post-FT task, it also shows negative relation but not signi cant (R 2 = 0.74, p = 0.3).
For Pre-CRT, the cognitive load score is signi cantly negatively related with the performance score (R 2 = 0.74, p < 0.001). The negative relation is also shown in Post-CRT (R 2 = 0.61, p = 0.05).
Secondly, we evaluate the relation between the cognitive scores extracted from scales (CLS) and cognitive load scores computed from EEG (CLE). We found that CLS and CLE have positive relations. The regression R square of CLS and pre-colon resection CLE is 0.68, p < 0.05. The regression R square of CLS and post colon resection CLE is 0.70, p < 0.1. Furthermore, we explore the correlation between performance scores with ow experience. We found that there is no signi cant correlation between selfreported ow experience score (p = 0.34 > 0.05). The correlation between performance scores is also not signi cantly related to heart rates (p = 0.06 > 0.05).

Discussion
Training medical students on VRLS has been considered a promising direction since its relatively low cost, risk-free and high reusability. Besides, VRLS could simulate physiological phenomena (e.g. breathing) and provide interactive guides to increase goal clarity. In this study, we quantitatively explore the in uence of VRLS on medical students from surgery skill acquisition and physical-psychological aspects. In addition, we also examine the correlation between them.

Surgery Skill Acquisition
The results of performance scores indicated that VRLS could signi cantly improve the acquisition of surgical skills. After training on VRLS, the participants could accomplish the same tasks using a shorter time. That means the pro ciency of surgical skills has been substantially enhanced. At the pre-test phase, the worst-performing dimension is depth perception. After VRLS training, the depth perception becomes the best among the four dimensions. The least change dimension is bimanual dexterity.

Cognitive Load and Flow Experience
From the feedback of the participants, the principal source of cognitive load is the false perception of depth during the operation. This result is consistent with the observation in Sec. 4.1. Through the depth perception is signi cantly improved, it is still a major in uence factor. Introducing an immersive training environment might be an alternative scheme. However, that could rise the dizziness problem and cause uncomfortable training experience [47].
The results demonstrate that the participants' surgery performance has a relation with their physicalpsychological state. Developed skills might indicate lower cognitive load, moderate heart rate and ow experience. This provides us additional options to quantitatively measure one's task skills. We could evaluate one's performance by monitoring physiological data such as EEG, heart rate. Compared with the traditional subjective method evaluating after experiments, this approach could identify the features that increase the participants' cognitive load in real-time. When integrated with a physiological data detection device, the training course designer could optimize the experiment environment setting and the experiment procedure adjustment. Besides, when medical students suffer from a high cognitive load, the system could help them with guiding information.
However, the ow experience has no clear relation between heart rate and the performance of participants. We just observed a little negative relation between heart rate and ow experience score (p = 0.06). The improvement of skills pro ciency may increase the automation of operation. Then the participants might regard our tasks more easily to achieve (frustration = 3.05 in NASA-TLX). This would hinder the acquirement of ow experience. We think the ow experience might a state that is not easy to obtain in our study. To obtain ow experience, we should optimize our training system with more attractive elements (e.g. guidance information with AR glass).

Conclusion
Training surgeons on VRLS has been considered a promising direction due to its relatively low cost, riskfree and high reusability. In this paper, we quantitatively investigate the in uence of VRLS on medical students from three aspects: performance evaluation, physiology (heart rate and EEG) and self-reported cognitive load and ow experience. 41 Participants were recruited to conduct pre and post experiments in training boxes. In the middle of pre and post experiments, they were trained on VRLS. Their operation video and physiological data (heart rate and electroencephalogram) were recorded. Then their performance was graded by senior surgeons using a number of rules. Finally, the participants lled questionnaires about their cognitive load and ow experience. The experimental results demonstrate that the VRLS could highly improve medical students' performance and enable the participants to obtain ow experience with a lower cognitive load.
Nevertheless, our work is not without limits. Currently, we just reveal the correlation between performance and cognitive load. We have not investigated their exact functional relation such as linearity, non-linearity or exponential. Many researchers utilized machine learning to measure and classify one's cognitive load [48]. This could be a potential research topic in the future.

Declarations
Ethics approval and consent to participate Ethics approval for the conduct of the research was gained from the Beijing Normal University's Research Ethics Committee in November 2020. The participants (medical students) are volunteered to take part in our study. They granted us to freely distribute the results of the experiments. However, the identi able data including personal images, physiological data are not allowed to publish. All information related to an identi able person was anonymized.

Consent to publish
Yes, the participants and authors are consent to publish the content in this manuscript.

Availability of data and materials
The materials described in the manuscript, including all relevant experimental data, will be freely available to any scientist wishing to use them for non-commercial purposes, without breaching participant con dentiality. The analysis data is available in our manuscript. The raw data such as operation videos and physiological data involves much personal information. Thus, we couldn't publish it in the public.

Authors' Contributions
Peng Yu designed the experiments and analyzed the experiment data, wrote the article.
Junjun Pan took part in the design of this study and the writing of this manuscript.
Zhaoxue Wang took part in the design of the experiment, recruitment of participants, reference checking and raw data processing.
Yang Shen took part in the design of the experiment and reference checking. Jialun Li participant in the conduction of this experiment and was responsible for arranging the experiment schedule.
Aimin Hao took part in the design of this experiment.
Haipeng Wang was in charge of the recruitment of participants and medical guidance during our experiment. Table 1 The statistical results of heart rates (HR) for two kinds of experiments' pre-test and post-test respectively. Table 2 Self-reported flow scale after all experiments (1-5, the higher score means higher flow experience).  Table 3 Self-reported mental workload (NASA-TLX scale) after all experiments (0-10, the higher score means higher mental workload)  Table 4 The correlation of performance and cognitive load computed by EEG data. (The downward arrow ↓ means negative correlation, **: 0.01 level significant, *: 0.05 level significant, NA: Not Available)  Figure 1 The participants operated on a training box during pre-test and post-test (Left). The heart rate and EEG were recorded and quantitatively analyzed. During pre-test and post-test, they were trained on VRLS (Middle: screenshot of picking small balls. Right: operation illustration).