Rating of physiotherapy student clinical performance in a paediatric setting: is it possible to gain assessor consistency?

Background: During workplace based clinical placements, best practice in assessment dictates that students should expect consistency between assessors rating their performance. To assist clinical educators (CEs) to provide a consistent assessment of physiotherapy student performance, nine paediatric vignettes depicting various standards of student performance, as assessed by the Assessment of Physiotherapy Practice (APP), were developed. The project aimed to evaluate the consistency of physiotherapy educators assessing student competence in a paediatric setting using video vignettes alongside the APP. Methods: Thirty-six Results: Conclusions: Experienced educators demonstrate consistency in identifying a not adequate from adequate or better performance when assessing a one-off student performance using the APP. These validated video vignettes will be a valuable training tool to improve educator consistency when assessing student performance in paediatric physiotherapy.

Quality clinical education in paediatric physiotherapy is integral to the development of competent health professional graduates. In physiotherapy programs across Australia and New Zealand, students are assessed on their ability to deliver entry-level physiotherapy services to the paediatric population across a variety of clinical settings. The minimally competent standard is defined within the Physiotherapy Practice Threshold Statements (1).
However, to assess performance effectively, consistency independent of the clinical area and setting is essential to maintain grade integrity (2, 3).
Grade integrity defines a grade as "representing the quality, breadth and depth of the level of achievement a student reaches" (4). Essentially the grade applied to a student performance is representative of the actual performance and so that grade is able to accurately determine the level of competence of the student. Competence is an ongoing state and therefore assessment needs to include actual performance as well as the demonstrated ability to adapt to change and seek new information (5). Within the paediatric setting, physiotherapy students are assessed using the Assessment of Physiotherapy Practice (APP) (6); (7). The APP defines an adequate on the global rating scale (GRS) as the minimal acceptable standard for an entry level physiotherapist. A non-adequate rating indicates the student did not demonstrate the minimal acceptable standard and a good or excellent rating will indicate that the student performances at a level above the minimum standard. However, anecdotally clinicians within the paediatric setting report additional challenges to the interpretation of the APP due to the nuances of the paediatric environment. Furthermore, there is a lack of resources and training to support this clinical setting.
Evidence supports the benefit of consensus moderation to deliver consistent, accurate and effective assessment (8). This is supported by the fact that learning resources related to assessment using the APP are available for other clinical settings. A study conducted by (9) demonstrated the potential variability that may exist in clinical assessment and how training and resources using video vignettes can augment assessment practices.
In response to a gap in the literature, the primary aim of this study is to determine the level of consistency among paediatric clinical educators when assessing a student performance using the APP. The secondary aim was to collate qualitative data on assessment decisions to aid in the development of training resources to improve student assessment in the physiotherapy paediatric setting.

Ethical approval was granted by The Human Research Ethics Committee of Queensland
Children's Hospital (protocol number -HREC/16/QRCH/362) and Griffith University (protocol number 2016/941) prior to the study's commencement. All methods were carried out in accordance with the relevant guidelines and regulations. Informed consent was obtained from all participants as per the ethics guidelines.

Development of the vignettes:
Three paediatric scenarios representing the core area of neurodevelopment across three age ranges, infant, toddler and adolescent, were developed and scripted into performances that Once agreement was reached for all nine scripts, each video was filmed using a standardised actor to portray the student physiotherapist. Children who portrayed patients during filming were known to the project team and consented to participate. This included one child with a known neurological condition and two children who were typically developing. Clinical staff from Queensland Children's Hospital played the role of clinical educator and parent. During the filming of each scenario, the authors were present to direct each scene to ensure adherence to the script. Each video was on average 18 mins in length.

Assessment of reliability
A purposive sample of physiotherapists currently providing paediatric clinical education in Australia were invited to participate in the study. The Inclusion criteria was a minimum three years clinical experience in paediatrics and 1-year experience in the clinical education of physiotherapy students. Participants who did not meet the inclusion criteria were excluded to ensure the sample population was familiar with the assessment of student performance using the APP Paediatric physiotherapists identified as meeting the inclusion criteria were invited to participate via email correspondence. Participants were provided with a Participant Information sheet detailing their involvement including the nature of the study and the total time commitment. Consent to participate was achieved via an 'opt in' approach with a response to the sent email indicating consent to participate. Consent was confirmed from the participants prior to completing the study survey (Supplementary 1). Participants included in the study were allocated to a clinical scenario in their nominated area of expertise (infant, toddler or adolescent).
Following provision of consent and group allocation, participants were emailed detailed directions for reviewing the video vignette and completing the evaluation via Survey Monkey®. Each participant was sent a total of three videos vignettes, representing 'not adequate', 'adequate' and 'good to excellent', over a 12-week period of time. The vignettes were sent in a randomly allocated order to minimise bias. At the completion of the first vignette, participants were asked to provide demographic information in addition to a global rating scale based on the APP, key factors used to determine the global rating and feedback on video quality and clinical relevance that was collected for all three video vignettes.
Vignettes were securely stored on Google Drive and distributed by email in the form of a closed link.
Participants were required to watch and complete a survey for three video vignettes in the same clinical area. A wash out period of four weeks between the sending of each video was selected to ensure that it was unlikely that participants would be able to recall specific information of the previously watched video (10), (11).
The video and survey links were closed two weeks after the initial email. After a 4 week wash out period, the same process was repeated with the second video with a new video and survey link. To maximize response, a reminder email was sent to all participants one week after the initial email. Furthermore, all participants who completed the study by watching all three videos were provided a financial incentive equating to $50AUD.  (Table 1). Ninety-seven percent of participants reported a confidence level of 'somewhat confident' or greater with 73% being confident or very confident in using the APP to assess student performance.   Thematic analysis: Participants were asked to provide three to five KPIs that they used to formulate their global rating of each video. Thematic analysis highlighted consistent themes across all nine videos and are outlined in Table 4.

Evaluation of video vignettes:
All participants either agreed (56%) or strongly agreed (44%) that the clinical scenario they viewed were realistic and believable. Most participants (98%) reported that the clinical scenario was professional and well presented.

Discussion
The aim of the project was to undertake a reliability study for a suite of video vignettes depicting paediatric physiotherapy student performance based on the APP GRS. The study demonstrated acceptable consensus among participants at the not adequate and good to excellent level. However, at the adequate level there was insufficient exact agreement. If ratings of adequate and good-excellent were combined consensus once again reached an acceptable level when comparing 'not adequate' to 'adequate' or better.
In line with previous research investigating consensus in the adult clinical setting, the study showed strong agreement in rating the not adequate and the good-excellent student performance across all three clinical scenarios (9). This is relevant for clinical practice, particularly in relation to the level of agreement when rating the not adequate videos.
Consistency in scoring students correctly using the GRS is essential for ensuring all preregistration students meet an adequate level of performance prior to entering the physiotherapy profession (7). In this study, all not adequate students were correctly identified by participants, and only one adequate performance received a not adequate rating.
There was insufficient consensus for the adequate video scenarios, which is a similar outcome to the previous adult video vignettes research (9). Student performance at both the not adequate and excellent levels is more easily recognised and rated by assessors whereas the variability in students performing at an adequate standard makes assessment at this level more difficult.

Conclusions
Experienced educators demonstrated consistency in identifying a not adequate from adequate or better performance when assessing a one-off student performance using the APP. Availability of data and materials: The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

Competing interests:
The authors declare that they have no competing interests. Acknowledgments: The Research Team would like to acknowledge the contribution of several individuals and organisations to this project: