Active control time: an objective performance metric for trainee participation in robotic surgery

Trainee participation and progression in robotic general surgery remain poorly defined. Computer-assisted technology offers the potential to provide and track objective performance metrics. In this study, we aimed to validate the use of a novel metric—active control time (ACT)—for assessing trainee participation in robotic-assisted cases. Performance data from da Vinci Surgical Systems was retrospectively analyzed for all robotic cases involving trainees with a single minimally invasive surgeon over 10 months. The primary outcome metric was percent ACT—the amount of trainee console time spent in active system manipulations over total active time from both consoles. Kruskal–Wallis and Mann–Whitney U statistical tests were applied in analyses. A total of 123 robotic cases with 18 general surgery residents and 1 fellow were included. Of these, 56 were categorized as complex. Median %ACT was statistically different between trainee levels for all case types taken in aggregate (PGY1s 3.0% [IQR 2–14%], PGY3s 32% [IQR 27–66%], PGY4s 42% [IQR 26–52%], PGY5s 50% [IQR 28–70%], and fellow 61% [IQR 41–85%], p =  < 0.0001). When stratified by complexity, median %ACT was higher in standard versus complex cases for PGY5 (60% vs. 36%, p = 0.0002) and fellow groups (74% vs. 47%, p = 0.0045). In this study, we demonstrated an increase in %ACT with trainee level and with standard versus complex robotic cases. These findings are consistent with hypotheses, providing validity evidence for ACT as an objective measurement of trainee participation in robotic-assisted cases. Future studies will aim to define task-specific ACT to guide further robotic training and performance assessments.


Introduction
Robotic-assisted laparoscopic surgery has become increasingly prevalent across surgical specialties in the United States [1,2]. Many surgical training programs have started to develop curricula for structured robotic skill development incorporating didactics and simulation training as the need for training adaptation is recognized [3][4][5][6]. As residents progressively participate in robotic cases, there have been many educational benefits suggested from use of the dual-console robotic da Vinci Surgical System (Intuitive, Sunnyvale, CA) between faculty-trainee pairs [7,8].
The dual-console system equips trainees to participate at a designated secondary robotic console in parallel with faculty surgeons, theoretically allowing graded responsibility and greater involvement in robotic cases. Nonetheless, there have been concerns regarding poor educational value for residents in robotic cases secondary to limited case participation [9][10][11]. "Participation" in these cases, however, remains poorly defined, often simply describing if the resident was assisting at bedside versus sitting at the console. Lack of trainee autonomy has been previously demonstrated in open and laparoscopic case types [12,13], and this same potential concern may exist for robotic cases as well. Additionally, robotic surgery learning curves have been heterogeneously described, further suggesting a need for more reliable, quantitative methods for assessing surgeon participation and progression as well as providing valuable objective performance feedback [14,15].

3
In this context, robotic surgery offers a unique opportunity to provide objective metrics on trainee performance. The da Vinci "robot" is more aptly described as a computer-assisted surgical system. As such, it can collect objective performance metrics, such as instrument use, camera and energy device use, and active console time specified by user. These metrics may be able to provide more direct measurement of robotic surgeon and trainee performance with increased accuracy and less bias [16], potentially mitigating the subjectivity inherent in current evaluation methods. Use of OPMs has been studied in simulated scenarios with trainees [17] and applied in live case settings within urologic surgery [18,19] but has not been well studied with trainees in live general surgery case settings. Before such metrics can be applied for performance feedback and benchmarking, further validation of their use in this specific setting is necessary.
The aim of this study was to assess validity of a novel objective metric entitled active control time (ACT) for evaluating general surgery trainee participation and progression in robotic-assisted surgery. We hypothesized that ACT would increase with trainee level and decrease with case complexity.

Participants
This study was exempted by our Institutional Review Board (IRB #202108019). All dual-console robotic cases involving general surgery trainees and one minimally invasive surgery (MIS) fellow with a single MIS faculty (M.A.) at our institution from September 2020 through July 2021 were included. Cases with trainees from all post-graduate years (PGY) were included except for the PGY2 residents as they did not rotate on the MIS service during the 2020-2021 academic year. Cases involving more than one active trainee were excluded except for those cases involving PGY1 residents, as there were otherwise no case examples involving exclusively PGY1 residents at the trainee console.
The MIS faculty was included in the study as a high-volume robotic surgeon with over 1500 lifetime robotic surgery cases who regularly involves trainees on the dual console. The trainees included in the study had completed the existing robotic training curriculum at our institution including introductory didactics, completion of online Intuitive training modules, hands-on bedside assist training, and completion of da Vinci Simulator modules.

Data collection
Trainee performance metrics that had been automatically generated from the da Vinci Surgical System were retrospectively reviewed for all included robotic cases performed on an Xi robotic system. The primary performance metric of interest was percent active control time (%ACT) defined as the amount of trainee console time spent in active robotic instrument manipulations over the total active time from both the trainee and faculty consoles combined. This metric was chosen as our focus instead of total console time in order to minimize non-active time in the calculation, i.e., time spent by the trainee observing with head in at the console but not actually manipulating the robotic system.
Operative reports from the electronic medical record were retrospectively reviewed to define case complexity. Reports that included explicit complexity modifiers were categorized as complex cases. Cases without complexity statements were categorized as standard.

Statistics
GraphPad Prism statistical software (version 9.2.0) was used for data analysis and figure development. Descriptive statistics were evaluated, and data was expressed as either raw scores or percentages versus medians as noted in each section. Nonparametric Kruskal-Wallis tests were applied in comparison of %ACT across trainee levels. Mann-Whitney U tests were applied for evaluation of standard versus complex cases. A one-tailed p-value was used for all analyses and considered statistically significant if < 0.05.

Active control time by case type
Median %ACT for all trainees taken in aggregate differed by case type. Trainees participated the least in Heller myotomy cases with median %ACT of 32% (interquartile range 13-52%), then inguinal hernia repair cases with median %ACT of 41% (IQR 31-71%), followed by hiatal hernia repair cases with median %ACT of 47% (IQR 26-62%). Trainees participated the most in robotic cholecystectomies with median %ACT of 75% (IQR 70-100%). These medians were statistically different when analyzed in aggregate (p = 0.0008). Post hoc analysis revealed specific differences in median %ACT between the Heller myotomy and cholecystectomy cases (p = 0.0004) and the hiatal hernia and cholecystectomy cases (p = 0.0052) (Fig. 1). The difference in median %ACT between inguinal hernia repair cases and cholecystectomies did not reach statistical significance (p = 0.2176).

Active control time by case complexity
Of the 123 cases included, 56 (45.5%) were categorized as "complex" robotic cases. When stratified by case complexity, median %ACT was significantly higher in standard compared to complex cases for PGY5s (n = 42 vs. n = 37, 60% vs. 36%, p = 0.0002) and the fellow (n = 12 vs. n = 15, 74% vs. 47%, p = 0.0045) (Fig. 3). There were only 4 complex cases performed by residents at the PGY4 level and below. Similar analyses for standard compared to complex cases were performed for these groups but were not statistically significant.

Discussion
In this study, we demonstrated an increase in %ACT with trainee level as well as an increase in %ACT with standard compared to complex robotic cases. These findings are consistent with our hypotheses, providing evidence for the validity of ACT as an objective tool for measuring trainee participation in robotic cases. In our case cohort, the majority of robotic cases involved senior trainees. This in line with previous studies showing increased robotic operative exposure for senior level residents compared to their junior counterparts [9,20]. In both the aggregate case cohort and the hiatal hernia subset, %ACT increased with trainee level. When stratified by complexity level, senior trainees also had significantly less %ACT in complex compared to standard robotic cases. Similar autonomy and performance trends based on trainee level and case complexity have been previously reported in the literature [21,22], but these studies included a range of case types with faculty raters instead of focusing exclusively on the objective robotic experience.
Further, we observed wide variability in %ACT within each trainee level and within each case type. We overtly chose not to exclude outliers from the analysis in order to fully capture this characteristic. Some of the variability may be attributed to factors such as the timing of case performance within the course of a given rotation or trainees' previous robotic case experience overall. We did, however, control for faculty variation by inclusion of only one MIS faculty in the study. Similarly, all trainees in the study had completed the same standard robotic training curriculum at our institution. Despite this, there remained a wide range of participation experiences. This is consistent with previous work suggesting an often heterogeneous learning progression in robotic surgery that differs significantly by procedure type [14,15].
The use of %ACT as an objective tool in this setting has immense value. Previously, the cases included in our cohort would have simply been characterized with "trainee at console", as has been described in previous robotic studies [9,11]. Up until January of 2022, there had been no formalized method for even denoting these general surgery cases as "robotic" in the ACGME Case Log system. These participation categorizations are vague and have limited potential for targeted educational intervention. Though ACT measurements for isolated cases may be inadequate to provide a full understanding of trainee proficiency, these measures tracked over time-on the other hand-allow specific identification of the robotic case types in which trainees are not progressing. Further, it may help to identify specific trainees who need additional, focused robotic practice with simulator curricula and exercises. Historically, assessments of trainee autonomy and performance have relied heavily on subjective faculty evaluation tools such as the Ottawa Surgical Competency Operating Room Evaluation (O-SCORE) or the System for Improving and Measuring Procedural Learning (SIMPL) application, for example [23,24]. Though value of such assessment methods has been demonstrated, sustained and consistent utilization across programs has been challenging with increasing administrative and clinical demands [25]. Several studies have similarly validated the use of specific assessment tools within robotic surgery-Robotic Modification of the O-SCORE (RO-SCORE) [5], Global Evaluative Assessment of Robotic Skills (GEARS) [26], and Assessment of Robotic Console Skills (ARCS) [27]-but again, these require faculty participation and are subject to variability of rating behaviors. Though the use of multiple trained raters to review trainee procedure videos could help to mitigate individual faculty biases, this still requires time to train reviewers on the proper use of respective assessments tools and time to complete video reviews. These barriers have limited more routine and widespread use of such assessment rubrics across training programs.
Though faculty assessment remains critical for the provision of operative feedback and identification of targets for operative performance improvement, the use of objective performance metrics such as ACT can be complementary in several ways. First, such metrics are automatically captured via the computer-assisted robotic technology, providing a neutral and less burdensome measure of trainee case participation [16]. Second, performance metrics and additional case data are now being organized in an increasingly accessible way via mobile application, allowing more frequent and convenient review. At our institution, for example, trainees have been progressively enrolled in the My Intuitive application, increasing access to their personal data insights [28].
Our study has several limitations. First, all cases involved a single experienced surgical faculty from our institution with a specific practice pattern and entrustment behaviors. These factors affected the included case types and observed trend of higher trainee ACT in advanced robotic cases such as hiatal hernia repairs compared to core procedures such as inguinal hernia repairs. Though use of this advanced robotic surgeon in our study minimized the faculty learning curve confounder, single faculty participation may still limit the potential generalizability of our findings. Additional studies including surgical faculty with different robotic specialties are ongoing to generate further validity evidence for use of this performance metric in other case settings. Further, we did not capture previous robotic experience or case numbers of trainees. Additionally, we did not account for case timing within individual service rotations or the academic year as whole. These variables may affect representative ACT for trainees and will be included in future data collection.
While %ACT as analyzed in our study is a valuable tool for measuring trainee participation, it is one component of many that contributes to a full picture of trainee autonomy and skill acquisition. Defining ACT as the amount of time that trainees spend actively and independently manipulating robotic instruments provides a fundamental context into which further task discrimination can be incorporated. Delineating what specific operative steps took place during trainee control time and how efficiently those tasks were completed will provide a more comprehensive picture of trainee performance and autonomy in robotic cases in future. Automated methods for determining efficiency of relevant robotic surgical tasks and steps are currently in development [29,30], and we plan to include this next layer of analysis in future studies of trainee participation during robotic cases.
In this study, we generated validity evidence for the use of ACT as a measure of trainee participation. We see this as one of the initial steps toward routine incorporation of OPMs in our robotic surgical training programs. This type of specific, objective feedback could ultimately be used to provide performance tracking of an individual trainee over time as he or she progresses through training as well as comparative metrics within their peer trainee group. Targeted performance goals can then be tailored to the individual. Our hope is that the use of ACT analysis along with other performance metrics will provide actionable feedback to trainees in order to facilitate skill development and ultimately safer surgical training.

Conclusions
In summary, in this study, we investigated the use of ACT as an objective tool for measuring trainee participation in robotic surgery cases. Our findings were in line with predicted participation trends, providing validity for its use. Future studies will aim to define procedure-specific and taskspecific ACT trends in order to guide future robotic training efforts and objective performance assessments.