The present study was carried out in the Department of ENT, in Jawaharlal Institute of Postgraduate Medical Education and Research (JIPMER), Puducherry, India, a tertiary care teaching hospital to develop the Script Concordance testing to foster clinical reasoning among undergraduate medical students. The Institutional Ethics Committee of JIPMER approved the study (JIP/IEC/2014/9/460). All methods were carried out in accordance with the guidelines and regulations of JIPMER. The methodology was designed to be an evolving pattern of constructing SCTs, administering them to the panel members, analysing the panel with response pattern and consensus index, based on which the SCT’s final items were chosen to be administered to the students. Item-total correlation and Cronbach’s alpha were calculated from the students’ scores.
Construction of the Scripts
The study used the guidelines put forth by Fournier et al. (6) for the construction of script concordance tests. The case scenarios were made by two specialists in ENT who had more than five years’ experience in ENT practice and undergraduate teaching. The authors of this study designed an SCT comprising 26 clinical scripts, each clinical script with three to six items, thus a total of 98 items. The developed 98-item SCT is attached as Supplementary File 1. These items were made at the standard of the undergraduate curriculum. Each script of the SCT was designed to reflect the common ENT conditions the undergraduate students face and learn during their clinical postings. The scripts included rhinology, otology, and laryngology. The scripts were developed to promote clinical reasoning in the domains of diagnosis, investigation, and management for an undergraduate student.
Construction of the Expert panel
To achieve the highest possible reliability for our study, we set up a panel comprising 20 members (9). Each panel member was a certified ENT specialist who had a work experience of more than five years and had undergone undergraduate curriculum pedagogy training. The SCT was mailed to the panel members. The experts took the SCT independently, and their responses were recorded. After collecting all the panel members’ responses, for each item, the number of panel members marking the respective responses were aggregated, as shown in Table 1. In this way, for all 98 items, the aggregated responses were recorded.
Construction of the Scoring Grid
From the responses obtained from the panel, for each item, a credit score was calculated corresponding to the proportion of panel members who have chosen the same response. The credit scores ranged from 0 to 1. The maximal score of 1 was given to the response having the maximal number of panel members (modal response) agreeing to it. A partial credit score was calculated for any nonmodal panel member response. The number of panel members who gave the modal response was taken as the denominator and divided to the other nonmodal response to get this proportional credit scoring. This method of aggregate scoring has a better construct validity than consensus scoring (10) as well as better reliability and validity coefficients (11).
For example, on item number 46 from Table 1, 14 members chose “+1” on the Likert scale. Hence “+1” is the modal response. The response “+1” for that item, a score of 1 (14/14), is calculated (Table 2). Only three members chose “-1” as their response. The response “-1” gets only a partial credit score of 0.21 (3/14). Similarly, the response “+2” gets a partial credit score of 0.14 (2/14) and so on.
In this way, the credit scores for all the 98 items in the 26 clinical scripts were tabulated to create the scoring grid (Table 3)
Panel Response Patterns
For each item, the pattern of expert panel responses was analyzed. SH Wan et al. (3) classified the panel response patterns into four types: ideal response, uniform response, bimodal, and outlier response. When we attempted to classify our panel responses in a similar manner, we found an additional pattern, which we labelled as partial ideal response, which is elaborated below (Fig. 1). We noticed that the panel members were split in choosing the extremes of the options available for some questions. We called this a bimodal response (Fig 1a). When there was an equal spread in the number of members choosing all the five options, it was classified as uniform divergence responses (Fig 1b). A discrete outlier response (Fig 1c) was labelled when there were one or more responses beyond a nil response. The ideal response pattern meant a close convergence with some variation limited within </= 3 options (Fig 1d). We noticed the fifth pattern in which there was relatively close convergence with some variation limited by four options chosen by the panel members, and we labelled this as partial ideal response pattern (Fig 1e).
We eliminated the items showing uniform and bimodal patterns (as elaborated in the Results section), and the SCT to be administered to the students had 82 items. Analysing the panel response patterns to identify the uniform and bimodal response patterns was a time-consuming process. So, we looked for a much simpler tool to do the same. We tested the use of the consensus index to achieve this purpose.
Consensus Index
The consensus index reflects the agreeability among the panel members for each item, and it is calculated using the following formula,
where µX is the mean of item X, and dX is the width of X, dX = Xmax - Xmin.
The consensus index takes a value ranging from 0 -100, with complete disagreement being 0 and 100 being the entire agreement. As an ordinal measure of the panel members’ scoring, the consensus index is argued to be superior to mean and standard deviation (8).
SCT Administration to Students
Around 30 undergraduate (UG) students and 10 postgraduate (PG) students of ENT volunteered for this study. Informed consent was taken after explaining their role in this study. The SCT was provided in printed format to the students in a pre-designated hall for this study, and their responses were to be marked in the answer sheet. No time limit was set to complete the SCT. However, all the participants were required to answer all the 82-item SCT.
Scoring of students
Based on the scoring grid, credits were awarded to the students for all the 82 items, and the total marks of each student and mean marks scored by all students in each item were calculated. Each student’s total credit score was calculated and converted to a 100 point score to yield the result as a percentage. Mean marks scored by all students were calculated separately for undergraduate and postgraduate students.
Statistical Analysis
With the confidence interval set at 95%, the students’ mean score for each item was calculated along with its standard deviation (SD). These scores of the students were compared with the responses of the experts using a t-test. A p-value of less than 0.05 was taken as significant. The reliability of the test was calculated by Cronbach’s alpha. Pearson’s correlation was used to ascertain the item-total correlation. IBM SPSS software (Version 17; SPSS Inc., Chicago, Illinois, USA) was used for statistical analysis.
Item-total correlation is the correlation between the score on a particular question and the collective score on all the remaining questions. In short, questions with low item-total correlations do not produce responses that are consistent with the remainder of the test (12). We used the item-total correlation to identify questions with low values (r<0.05). The items with low item-total correlation were discarded, and the Cronbach’s alpha reliability coefficient was recalculated after the deletion of such items.