Objective performance indicators of cardiothoracic residents are associated with vascular injury during robotic-assisted lobectomy on porcine models

Surgical training relies on subjective feedback on resident technical performance by attending surgeons. A novel data recorder connected to a robotic-assisted surgical platform captures synchronized kinematic and video data during an operation to calculate quantitative, objective performance indicators (OPIs). The aim of this study was to determine if OPIs during initial task of a resident’s robotic-assisted lobectomy (RL) correlated with bleeding during the procedure. Forty-six residents from the 2019 Thoracic Surgery Directors Association Resident Boot Camp completed RL on an ex vivo perfused porcine model while continuous video and kinematic data were recorded. For this pilot study, RL was segmented into 12 tasks and OPIs were calculated for the initial major task. Cases were reviewed for major bleeding events and OPIs of bleeding cases were compared to those who did not. Data from 42 residents were complete and included in the analysis. 10/42 residents (23.8%) encountered bleeding: 10/40 residents who started with superior pulmonary vein exposure and 0/2 residents who started with pulmonary artery exposure. Twenty OPIs for both hands were assessed during the initial task. Six OPIs related to instrument usage or smoothness of motion were significant for bleeding. Differences were statistically significant for both hands (p < 0.05). OPIs showing bimanual asymmetry indicated lower proficiency. This study demonstrates that kinematic and video analytics can establish a correlation between objective performance metrics and bleeding events in an ex vivo perfused lobectomy. Further study could assist in the development of focused exercises and simulation on objective domains to help improve overall performance and reducing complications during RL.


Introduction
Surgical training historically has been dependent on the observation of technique and operative management of novice or resident surgeons by experienced or attending surgeons. Based on this subjective interpretation of performance, feedback is generated and advice on improvement is offered. For generations this educational paradigm has served surgeons well; however, assessment of skill can be difficult to objectively assess without the potential for internal or inherent bias or inter-observer variability [1][2][3]. From a surgical education perspective this conflict has created a desire to create tools for surgical performance evaluation which require less of a reliance on attending surgeon observation-feedback while adding a more objective, individualized experience for the learner.
In the era of computer-assisted or robotic surgery, the digitization of the surgeon's movements combined with video may allow access to quantitative data that may be used to develop more objective and nuanced insights into surgical training. Using a novel data and video recorder to draw analytics off of the surgeons' robotic cases, Hung and colleagues have shown significant improvement in assessment using this approach compared to traditional metrics [1]. To date, this training technology has not been applied in thoracic surgery procedures. In this study we sought to calculate objective performance indicators (OPIs) to assess surgical technical skills during a robotic-assisted lobectomy (RL) performed by thoracic surgery trainees on a perfused ex vivo porcine lobectomy model. We hypothesized that differences in OPIs could be determined between trainees who caused adverse bleeding events compared to those who did not.

Methods
The study population consisted of cardiothoracic residents who participated in the 2019 Society of Thoracic Surgeons (STS)/Thoracic Surgery Directors Association (TSDA) Resident Boot Camp, which is an annual program designed to provide an intense, concentrated exposure to key technical skills of cardiothoracic surgery to new cardiothoracic surgery trainees over 4 days. During the program, each of cardiothoracic surgical residents received 2 h of supervised experience performing a robotic left upper lobectomy on a perfused porcine tissue simulator (KindHeart, Inc, Chapel Hill, NC). Residents were either 1st year traditional track residents (PGY-6) or 4th year integrated track residents (PGY-4). The Institutional Review Board (IRB) approved study protocol and publication (Western IRB, work order number 1-1112682-1 on September 26, 2018). The participants provided informed consent to participate. Participants were asked to complete a brief survey on their prior exposure to thoracoscopic and robotic surgery, as well as open lobectomy. They were also required to watch a video recording of an attending performing a four-arm left upper robotic lobectomy on the standardized ex vivo perfused porcine model to become familiar with the anatomy and choreography of a robotic left upper lobectomy performed on this model.
Once at the station, console virtual reality (VR) simulation was also provided to ensure technical familiarity on the robot as well as to provide a warmup. The trainees were instructed to complete the left upper RL on the perfused model with attending faculty coaching that consisted of telestration and verbal advice. Port placement and robotic setup was standardized. Trainees had approximately 90 min to complete the RL. A recorder was connected to the da Vinci Xi surgical system (Intuitive, Inc., Sunnyvale, CA, USA) that simultaneously captured video (60 frames/ second) as well as kinematic (i.e., instrument movements collected at 50 Hz) and event data (i.e., button presses on the system such as camera clutch and energy application). After completion of the case, video annotations categorized the RL procedure into 12 component tasks by an experttrained, human reviewer with attending surgeon supervision (Table 1).
Annotations were normalized to a common timeline by setting the first clinical task's start time to zero. The trainees were instructed to perform the superior pulmonary vein dissection as their first task of the RL and this task was analyzed for OPIs as representative of the trainee's technical performance. Since some trainees took longer to progress through the lobectomy, there was a decrease in number of completed tasks towards the completion of the lobectomy. By focusing on the first task, we were able to maximize the data analysis.
Twenty OPIs were calculated off the kinematic and system data during the first task. These OPIs reflect bimanual dexterity, energy use, console events, instrument movement, smoothness, time, and wrist articulation ( Table 2).
These OPIs were chosen based on prior publications in other procedures and specialties [1,[4][5][6][7][8][9][10]. Significant bleeding was identified by reviewing the video of the entirety of the procedure for all trainees and was defined as bleeding that interrupted the task and required holding pressure or repairing the injury. The OPIs of the residents who had bleeding and those who did not were compared using Mann-Whitney or Welch's t test.

Results
A total of 46 residents participated in the porcine RL. All trainees had prior console virtual simulation experience and had completed a mean of 7.4 h of simulation experience prior to arriving at the course (range 0.5-30 h). Almost all trainees had prior clinical robotic-assisted surgery experience (43/46, 94%). Of those who had prior experience, 6/46 had experience limited only to bedside assisting. The other 37/46 had console experience with a mean of 22 cases (range 1-63 cases) with the last case occurring a mean of 8 weeks (range 0.5-56 weeks) prior to the study. None of the trainees had prior experience on the ex vivo porcine lobectomy model.
Complete data were available on 42/46 trainees and 4/46 cases were excluded due to incomplete data from the recorder. Of the 42 trainees with complete data, 40 started with dissection of the superior pulmonary vein (SPV) and 10/40 had at least one bleeding event at some point during the RL. The remaining 2 trainees began with dissection of the apical branch pulmonary artery (PA) and had no bleeding events. Therefore, a total of 10/42 residents (23.8%) had significant bleeding events during the left upper RL and comprised the bleeding group. The bleeding events did not show any pattern in terms of time of injury and appeared random (Fig. 1).
When bleeding and non-bleeding cases were compared over time, the group that had no bleeding showed greater economy of completing the task, while the group that had bleeding showed a greater degree of difficulty completing the task in an efficient manner (Fig. 2).
After assessing 20 OPIs for both the right and left hands during the initial task, six showed statistically significant differences in OPIs in both hands (p < 0.05) between residents who had bleeding and those who did not. Three OPIs were related to instrument usage and included idle time, total instrument distance traveled, and wrist articulation (Fig. 3).
The additional three significant OPIs were related to smoothness of motion and included arc length, speed peaks, and dimensionless jerk (Fig. 4). Bimanual proficiency was also assessed and those who had bleeding had greater asymmetry of utilization of both hands compared to those who did not have bleeding (Figs. 3, 4 and 5).
There were two additional OPIs that had a significant difference for only one hand (normalized speed, movement arrest period ratio) (see Table 2). Total duration was also The average linear instrument speed divided by the maximum speed Right 0.06490 [7,9] Left 0.02546 Spectral arc length Negative arc length of the amplitude and frequency-normalized Fourier magnitude spectrum of the speed profile Right 0.00094 [7,9] Left 0.00085 Negative log non-dimensional jerk A dimensionless measure of jerk Right 0.00775 [7,9] Left 0.00158 significant but could not be broken down into right and left hand (Fig. 2) and was excluded.

Discussion
Using a novel data and video recorder connected to a robotic surgical system, this study demonstrates the feasibility of calculating objective performance indicators (OPIs) from system and kinematic data during RL performed on a ex vivo perfused lobectomy model. Analysis of OPIs demonstrate significant differences between trainees who had significant bleeding compared to those who did not. This is the first study to our knowledge using OPIs rather than human observer metrics [e.g., External validation of Global Evaluative Assessment of Robotic Skills (GEARS) and Objective structured assessment of technical skill (OSATS)] to evaluate technical skill during thoracic robotic surgery. These findings reflect the unique capability within computer-assisted surgery to capture the surgeon's movements and activities, quantify them, and potentially translate such information into actionable insights for trainees or learning surgeons. Numerous prior studies in thoracic surgery using subjective evaluations have attempted to measure competency with mixed results [5,10,11]. The methods for these evaluations usually depend on trained experts in thoracic surgery evaluating the performance of residents. The most commonly adopted and encompassing example is the Accreditation Council for Graduate Medical Education (ACGME) milestones program which was created by an expert thoracic panel with the purpose of evaluating residents "in the context of their participation in ACGME-accredited residency programs" [12]. The limitation of this tool, for better or worse, is that it favors the assessment of a resident's knowledge as it relates to surgical ability as opposed to strict technical ability and intra-operative knowledge.
To help assess the surgical performance aspect of resident progression, OSATS was designed to more directly focus on surgical skill assessment and is predicated on direct observation of resident performance using an operation specific checklist, and detailed global rating system by expert observers [13]. As minimally invasive surgery has become more extensively adopted in surgical training, OSATS has been modified to evaluate robotic surgery-specific skill through the GEARS assessment tool-a standardized scoring report based on 6 areas. Since OSATS and GEARS are predicated on the same subjective methods, they are subject to the same vulnerabilities in that its data relies entirely on expert observers and does not completely eliminate inter-and intraobserver variability [1][2][3].
The shortcomings of subjective expert analysis compared to a completely digital and objective evaluation based on OPIs was analyzed in a study by Hung et al. where they demonstrated a significant discordance between 3 expert reviewers (each reviewer had a median experience of over 300 cases reviewed) during seminal vesical dissection and anastomosis during robotic radical prostatectomy [1]. Intraclass correlation among the reviewers was 0.6-0.7 (ideal is 1.0), and more importantly kinematic metrics or OPIs from the novel data recorder had low association with GEARS metrics, indicating that GEARS is not a sensitive or specific tool for certain aspects of surgery such as bimanual dexterity or smoothness.
The OPIs analyzed in this study were based on work from other studies and have been validated in other applications Fig. 1 Timing of bleeding relative to completion of dissection of the superior pulmonary vein. The orange bar represents the timepoint during the task where bleeding occurred, relative to the blue bar that represents the entire task length. (ID identification of trainee, min minute). For ID 4, the bleeding event occurred near the end of the task, while for ID 5, it occurred near the beginning. There was no identifiable pattern to when a resident caused bleeding during this task Fig. 2 Time to completion of the superior pulmonary vein dissection comparing those who had bleeding and those who did not. (Count number of trainees who completed the task at a certain time, min minutes to complete task). The distribution of the task duration for the bleeding group (blue) and no bleeding group (orange), with overlaps shown in gray. Overall, the no bleeding group showed a trend toward being more efficient in completing the initial task. Total task time does not include time spent holding pressure or repairing the injury beyond robotic surgery [14,15]. The authors deliberately removed any a priori considerations assumed to be related to bleeding prior to the selection of the OPIs analyzed in the study as this could have contributed to unintentional bias. The OPIs that were different between the bleeding and no bleeding group reflect differences in instrument efficiency and smoothness. Trainees who had bleeding had more idle time with both hands and required approximately twice the distance traveled of the instruments to complete the task, which could reflect a lack of confidence or familiarity with using the robotic system and/or the anatomy. Future work could establish benchmarks for these OPIs, and if a trainee exceeds certain limits of idle time or distance traveled, further practice could be prescribed.
One finding in an OPI that was somewhat unexpected is that the bleeding group utilized more wrist articulation to dissect the vein, which is somewhat counterintuitive. In previous studies based in urology, experienced surgeons used more wrist articulation, but these tasks were based on suturing [6]. In this study, we observed that less wrist articulation for vascular dissection was associated with better performance. This could reflect differences in technique-the more proficient trainee may dissect the vessel directly rather than a less proficient trainee who may try to bend the wrist and The other category of OPIs that differed between the bleeding group and no bleeding group is for instrument smoothness. The 3 OPIs in this category calculate smoothness in different ways, but they were significantly different. It is unclear if moving instruments smoothly and consistently is a primary skill that can be practiced, or it is a secondary trait that improves with less idle time, more efficiency, and less wrist articulation. In either situation, smoothness OPIs could be utilized as a global reflection of proficiency with benchmarks.
This study has several limitations. First, this is a pilot study with a limited number of participants with large variations in level of experience and training. Second, we limited our analysis to OPIs during the first task only and future analysis could compare this to OPIs throughout different tasks of the entire procedure. Thirdly, the study population existed only of trainees and a reference dataset of experts was not used but is being analyzed for future comparison. As an example, this could help measure how bimanual dexterity should be benchmarked during tasks. In addition, we were unable to determine left-or right-hand dominance which may have affected bleeding outcomes were calculated. Lastly, slower (and perhaps technically less agile) residents sometimes did not finish the RL and, therefore, may not have had a representative experience to cause bleeding at every step.
The current work has several important future implications in patient safety and surgical education. First, additional validation is necessary to compare OPIs to traditional subjective assessments in thoracic procedures. Second, correlation of OPIs to clinical outcomes (intra-operative or postoperative) will be important to validate this ex vivo perfused model study. Third, OPIs could be useful for proficiency-based progression during training, and benchmark values for OPIs in specific procedures could be used for summative evaluations or credentialing.

Conclusion
We identified specific OPIs based on instrument usage and smoothness during performance of the first task of RL that correlated with a bleeding event in a perfused ex vivo porcine heart-lung model. The development of exercises and simulation focused on these objectively identified domains could in the future help improve overall performance while reducing major complications during a resident's learning curve. To our knowledge, this is the first study to demonstrate a link between objective performance metrics and an adverse event in thoracic surgery.
Author contributions All the authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by JFL, KB, SY, and DO. The first draft of the manuscript was written by JFL and all the authors commented on previous versions of the manuscript. All the authors read and approved the final manuscript.
Funding A grant from Intuitive, Inc. helped fund this research including the recorders to perform this research.

Conflict of interest
The authors have the following financial disclosures: Manu Sancheti, Stephen Yang, Jules Lin, John Lazar, and Desmond D'Souza. Daniel Oh is an Intuitive employee as well as an academic surgeon at the University of Southern California. Each dot and line combination shows the value and amount of change. A decrease of more than 25% is shown in red and shifts in any direction less than 25% is shown in gray. Economy of motion and bimanual symmetry were significantly (p < 0.05) better in the Nonbleeding group when compared to the Bleeding group in relation to the total distance the instruments traveled. The left-and right-hand plots of the Non-bleeding group show a tighter grouping indicating greater refinement of movement and coordination IRB approval Western IRB, Work Order Number 1-1112682-1 approved September 26, 2018.