Validation of a Newly Developed Competency Assessment Tool for the Posterior Sagittal Anorectoplasty

Abstract Introduction The correction of an anorectal malformation (ARM) is complex and relatively infrequent. Simulation training and subsequent assessment may result in better clinical outcomes. Assessment can be done using a competency assessment tool (CAT). This study aims to develop and validate a CAT for the posterior sagittal anorectoplasty (PSARP) on a simulation model. Materials and Methods The CAT-PSARP was developed after consultation with experts in the field. The PSARP was divided into five steps, while tissue and instrument handling were scored separately. Participants of pediatric colorectal hands-on courses in 2019 and 2020 were asked to participate. They performed one PSARP procedure on an ARM simulation model, while being assessed by two objective observers using the CAT-PSARP. Results A total of 82 participants were enrolled. A fair interobserver agreement was found for general skills (intraclass correlation coefficient [ICC] = 0.524, p < 0.001), a good agreement for specific skills (ICC = 0.646, p < 0.001), and overall performance (ICC = 0.669, p < 0.001). The experienced group scored higher on all steps (p < 0.001), except for “anoplasty (p = 0.540),” compared with an inexperienced group. Conclusion The CAT-PSARP is a suitable objective assessment tool for the overall performance of the included steps of the PSARP for repair of an ARM on a simulation model.


Introduction
In surgical training programs, the development of surgical skills in the operating theater plays a pivotal role. 1 To adequately assess the development of these skills, the format of surgical training is structured as a competency-based curriculum. This results in a strong need for educational tools that provide objective assessment of surgical performance to evaluate predefined competency goals. [2][3][4][5][6] Over the past decades, various assessment methods have been developed to fulfil this goal, including tracking devices, virtual reality simulators, observational instruments, and computer games. [7][8][9][10] Currently, skills of surgeons in training are often assessed in the clinical setting by experts using the Objective Structured Assessment of Technical Skills (OSATS) form based on the overall performance. [11][12][13] However, tools like OSATS and its derivatives are not specifically designed to provide information on the separate skills that are trained and no clear correlation between the OSATS score and the outcome of the specific procedure that the resident or surgeon has performed. 14 The competency assessment tool (CAT) has advantages over this method, including description of specific steps of a given procedure, evaluating both performance (e.g., tissue handling), as well as the quality of the end product. The CAT has been successfully applied to improve the quality of training in the English National Training Program for laparoscopic colorectal surgery. 15 It has also been further developed and used for the assessment of the laparoscopic cholecystectomy 13,16 and for laparoscopic suturing. 17 There are currently no published assessment tools that specifically assess the performance of pediatric surgical procedures. One procedure in which technical proficiency correlates with clinical outcome is the Posterior Sagittal Anorectoplasty (PSARP) in infants with an anorectal malformation (ARM). Due to the rareness of this congenital malformation, surgical trainees have limited exposure to this complex surgical procedure. 18,19 The aim of this study is to develop and validate a new CAT for the objective assessment of the completion of a PSARP for the correction of an ARM.

Development of the Competency Assessment Tool-Posterior Sagittal Anorectoplasty
The CAT-PSARP was developed with a structured approach, based on the previously developed CATs. 13,[15][16][17] The focus of this CAT-PSARP form is on the performance of each step, as well as the general skills separately.
Multiple experts in the field of pediatric colorectal surgery were consulted to assess the steps used in the form and the separate assessment items. The five steps were standardized and agreed on, before subdividing these in gradings. The grading system was discussed with the experts to reach consensus on the form. The PSARP was divided into the following five component steps which were incorporated in the CAT-PSARP as the task specific skills: Step 1: Sagittal opening in midline.
Step 4: Reconstruction of sphincter complex.
The general opinion was that the task-specific skills and incorporation of a structured approach to the PSARP procedure were important in correction of an ARM. A score on the general skills (instrument handling and tissue handling) during the entire procedure was included as well. Because the general skills are scored as an overall score, these are doubled (2-4 to 6-8) compared with the scores of each component step (1-2 to 3-4) to assure equal weight of general skills and specific skills. The higher the score, the better the performance. The previous CATs had a lower score for the best performances but the opinion of the experts was that this was confusing and all preferred a higher score for a better performance. ►Fig. 1 shows the final assessment tool and grading system.

Anorectal Malformation Simulation Model
The ARM simulation model with perineal fistula used in this study was a validated partly reusable low-budget model which consisted of a wooden laser cut casing, with a disposable perineal body (PediaTrickBoxx, ►Fig. 2). 20,21 The casing is reusable and made of light weight material. The perineal body is a single use and replaceable model consisting of layered sponges and silicone with a double layered balloon to represent the rectoperineal fistula. Each participant received their own perineal body to complete the procedure.

Participants
The participants of the hands-on workshops during the "Regional pediatric surgery course," Nijmegen, the Netherlands, October 2019; "Pediatric Colorectal and Pelvic Reconstruction Congress," Columbus, Ohio, November 2019; the "12th European Pediatric Colorectal Congress," Vienna, December 2019; and the "Pediatric Colorectal Course" Nijmegen, March 2020 were asked to participate in this study. Participation in this study was voluntarily and independent of the supervision and training during the workshop. According to local law and legislation ethical board approval was waived, written informed consent was obtained from all participants. It was explained that the assessment score was for research purposes only, was not used during the course, and was processed anonymously.
The participants were divided into groups based on their experience, used for further subanalyses. Participants were included in the experienced group if they had performed 20 colorectal reconstructions and 5 PSARP procedures in their surgical career. Participants were included in the inexperienced group if they had performed 5 PSARP procedures and had performed 5 colorectal reconstructions in their surgical career. The remainder of the participants were in the intermediate group which did have a background in pediatric surgery but were not experienced in pediatric colorectal surgery.

Protocol
The participants of the study all completed a short questionnaire on their demographics and previous clinical experience, particularly their pediatric colorectal experience. All participants were shown an instructional video of the steps of the PSARP that should be performed on the model. Additionally, a poster was developed of the steps to be assessed to guide the trainees during the training on the model. This was also printed on the back of the CAT-PSARP form as an aid for the assessor. The participants received a short introduction on the CAT-PSARP form to acquaint them with the demands of each step of the procedure.
All participants performed one PSARP procedure on the model, while two independent objective observers assessed the component steps during their performance on the CAT-PSARP. The observers were selected based on availability from a pool of five experts which alternated to keep bias as low as possible. The CAT-PSARP forms were coded with the participant and observer number to be able to connect these to the demographics but were anonymized for the researchers. The demographic data of the participants were not available for the experts before, during, or after the assessment to avoid bias in the scoring of the performance. All data were processed anonymously and all participants provided approval for inclusion in the study.

Statistical Analysis
All statistical analyses were performed with IBM's SPSS statistics v.25 package. The interobserver agreements for the component steps, as well as for the general and total scores between the two observers, were assessed using the intraclass correlation coefficient (ICC), with a two-way random effects model, on a two-tailed significance level of p < 0.05. An ICC value of < 0.4 was considered a poor  agreement, 0.4 and < 0.6 were considered a fair agreement, 0.6 and < 0.8 a good agreement, and 0.8 an excellent agreement. 22 For construct validity, the scores of inexperienced, intermediate, and experienced participants were compared with a one-way analysis of variance (ANOVA). Equality of variances was assumed if Levene's test for equality of variance was > 0.05. This process was conducted by an independent researcher who was not involved in the scoring process using the completed CAT-PSARP forms of the objective observers. The aim was to include at least 30 participants for the interobserver reliability. 23

Results
A total of 82 participants completed the PSARP on the PediaTrickBoxx ARM model and were scored independently by two objective observers (observers A and B). The majority (48%) were pediatric surgeons, 33% were surgical residents, and 15% were fellows of pediatric surgery. Male-to-female ratio was equally divided between the participants, with a mean age of 36 years, and the majority (76%) was European, including 29% Dutch and 18% German participants (►Table 1). Of the total group, 41% had never performed a colorectal reconstruction and 52% had never performed a PSARP procedure. In contrast to that, five participants had performed > 50 colorectal reconstructions and two participants performed > 50 PSARP procedures. The detailed pediatric colorectal experience is shown in ►Table 2.

Interobserver Reliability
When looking at the outcomes of the component steps separately (►Table 3), the ICC revealed a poor interobserver agreement for "opening in the midline (ICC ¼ 0.289)" and "dissection of the fistula (ICC ¼ 0.148)." A fair interobserver agreement was found for "placing traction sutures on the fistula (ICC ¼ 0.557)." A good agreement was found for the steps "closure of the perineal body and sphincter complex (ICC ¼ 0.602)" and "anoplasty (ICC ¼ 0.605)." Combining the results of the component steps resulted in a good interobserver agreement with an ICC of 0.646 (p < 0.001).
When focusing on the general skills rated on the CAT-PSARP, these showed a fair interobserver agreement (instrument handling, ICC ¼ 0.475 and tissue handling, ICC ¼ 0.574), with also a fair interobserver agreement for the "total score of general skills (ICC ¼ 0.523, p < 0.001)." The evaluation of the CAT-PSARP is aimed at the total score of all items on the form which demonstrated a good interobserver agreement with an ICC of 0.669 (p < 0.001) as shown in ►Table 3. Validation of a Newly Developed Competency Assessment Tool Joosten et al.

Construct Validity
To evaluate the construct validity of this assessment form, the results of the three expertise groups (inexperienced, intermediate, and experienced) were compared. The scores of the three groups were significantly different for all items (p ¼ 0.038 to p < 0.001) as shown in ►Table 4. A subsequent post hoc analysis was performed to evaluate these differences. As seen in ►Table 5 and ►Fig. 3, the total score, overall score of the component steps, and general skills were significantly different between all three groups. The experienced group scored significantly higher than the inexperienced group for all separate items except for step 5 "anoplasty (p ¼ 0.054)." The experienced group also scored significantly higher than the intermediate group, who had some pediatric colorectal experience but still did not perform, as well as the experienced participants, with significant differences in the overall scores but also in the separate general skills (p < 0.001) and component steps "dissection of the fistula (p < 0.001)" and "anoplasty

Discussion
This CAT-PSARP has shown to be a suitable scoring tool for the PSARP procedure in a simulation setting with a good overall interobserver reliability. The majority of component steps showed a good interobserver agreement, as well as the general performance skills. By comparing the experienced with the inexperienced participants, significant differences in performance were found, establishing construct validity of this assessment tool for use on a simulation model. The differences in the performance between experienced and inexperienced participants were three points on both the specific and the general skills, with a total difference of six points on average. This indicates a possibility for training toward a score goal of 30 (out of a possible total of 36). By adding the intermediate group, which was the real target group for assessments, we showed that small differences in performances could also be assessed with this CAT-PSARP form. There were significant differences between all three expertise groups in total scores of the form which indicates that this is a potent tool for the assessment of pediatric  surgery trainees throughout their training on simulation models. A poor interobserver agreement was found for the steps "opening in midline" and "dissection of the fistula"; however, this could be related to the difficulty in simulating these steps on this low-budget PediaTrickBoxx ARM simulation model. Currently this is the only validated, inanimate ARM model available. Possibly, a high fidelity model would be able to simulate the difficulty of these steps even better, leading to a higher interobserver agreement. However, the increased cost might limit its usability in clinical practice. The vast majority of the participants scored the maximum amount of four points on both of these component steps due to the fact that they were fairly easy in comparison with the real procedure. In the clinical setting, the distribution of the scores for these steps will likely differ more among the trainees which may result in an improvement of the intraobserver correlation. 24 Overall, this CAT-PSARP had slightly lower interobserver reliability scores compared with the other previously developed CAT's. 13,[15][16][17] There are several possible explanations for this. First, this is a CAT for open surgery and assessments are done in real time. In contrast, when assessing minimally invasive surgery (MIS), it is possible to use a video of the performance for the assessment which results in the possibility for a more meticulous assessment by pausing or replaying the video. In addition, this newly developed CAT-PSARP was used for assessment of all participants of the course at the same time, up to a maximum of 10 participants simultaneously. If the CAT-PSARP will be used in the clinical setting, this could be an advantage because the supervisor (who has seen the whole procedure) can complete the CAT-PSARP if the trainee has done (part of) the procedure. Further research will evaluate whether implementation of the CAT-PSARP in clinical setting is feasible and whether it ultimately results in an improved performance of pediatric surgical trainees over time. Second, based on the consensus of pediatric colorectal experts, this CAT-PSARP focuses on the component steps of the PSARP and tissue and instrument handling are scored separately. This is in contrast to previous CAT's in which each step was only scored on tissue and instrument handling. This is because it is believed that the task-specific skills and incorporation of a structured approach in the PSARP procedure were as important in performing a technically proficient PSARP as perfect instrument and tissue handling alone. Therefore, general skills and taskspecific skills were weighted almost equally in the total score of overall performance. It has previously been suggested that procedure-specific assessments are more useful for trainees than a general assessment. Ahmed et al stated that the combination of global skills (e.g., tissue and instrument  handling), as well as task-specific skills (such as the steps of the PSARP), may provide a more comprehensive and concise feedback to the trainee than a general scoring system, only evaluating general skills. 7

Limitations
The assessment was done while participants practiced the PSARP procedure on a simulation model. The next step is to validate this assessment tool in the clinical setting as well. In addition, the component steps were defined after discussion with experts in the field. However, there may be other aspects of the repair of an ARM with another type of approach (such as an anterior-sagittal anorectoplasty [ASARP]) that were not addressed in this CAT. The two independent objective observers were selected from a group of five objective observers to minimize risk of bias; however, this could not be completely eliminated. All received the same brief explanation on how to use the CAT-PSARP shortly before starting the assessment. This design limits the risk of bias and is true to the clinical setting where trainees are assessed by different supervisors as well. 25

Conclusion
The CAT-PSARP showed a good overall interobserver agreement, as well as a good construct validity, for use on a simulation model which makes it a potent assessment tool. The combination of assessment of both specific component tasks of the procedure and general skills, including tissue and instrument handling, showed to be the most relevant in the assessment of the repair of an ARM by posterior sagittal approach.

Ethics Approval and Consent to Participate
Written informed consent was obtained from all participants. According to national law and legislation, ethical board approval of the ethics committee of Arnhem and Nijmegen was deemed unnecessary. Approval of the ethics committee of the institution Radboudumc was waived according to national regulations. 26 The study was performed in accordance with the ethical standards as laid down in the 1964 Declaration of Helsinki and its later amendments or comparable ethical standards.

Availability of Data and Materials
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.