Assessment requires validity evidence that affirms academic standards, gives meaning to test scores, and informs high stakes decisions for advancing students through a curriculum.3,5,6,7,8 In our study, “consequential validity” was defined in terms of the binary outcome, Academic Success. This outcome and our method for investigating validity are based on the work of Dunleavy et al who defined academic success in terms of “unimpeded progress” and investigated predictive validity with binary logistic regression and ROC-curve analyses.4 In this way of thinking about validity: (1) the probabilities for achieving Academic Success were predicated, in large part, on faculty-set passing standards; and (2) AUC indicated goodness of fit and thus the quality of consequential validity evidence. Per our study’s method, AUC findings indicated adequate goodness of fit in our predictive models. Moreover, rates of classification accuracy affirmed the consequential outcomes of students progressing through the curriculum unimpeded and passing Step 1 on their first try.
Significantly, validity ultimately concerns both the consequences of test use and the accuracy of inferences made about psychological constructs.4,8,14 Thus, in addition to validating standard setting outcomes, our study’s results provided validity evidence for the foundational psychological construct that defines the Yes-No Angoff procedure, namely “borderline competence.”1,2 Accordingly, our results pointed to consequential validity evidence that most students had developed (or not) the necessary competence in biomedical science. However, our models were not always helpful in predicting the likelihood that students who failed truly lacked the necessary competence for medical knowledge. Our school anticipated this limitation when it adopted the Yes-No Angoff procedure and therefore developed a protocol for adjusting faculty standards with an approach suggested by Camara et al.15 As such, we addressed the possible classification error of unwarranted impeded progress by adjusting cut-off scores downward according to our calculations for standard error of measurement (SEM).15,16,17 Concurrently, we adjusted faculty standards upward per SEM to address the other kind of classification error: that of passing incompetent students. For this kind of classification error, upward adjustments also helped us identify students who needed academic support. Ultimately, we see our application of SEM as sound protocol in standard setting that mitigates undesirable consequences while supporting the appropriate use of academic standards.8
It should be noted that the borderline student is not an honors student or even an average student; but neither is the borderline student a failing student. The borderline competent student may struggle to pass at a given point in the curriculum, but having once passed, will have demonstrated the sufficient competence required for advancing through the pre-clinical curriculum. Faculty development aimed at honing expert raters’ shared understandings of borderline competence is foundational for supporting accurate inferences.18,19 Moreover, we hold that validation of passing standards should be continuous in order to reflect the dynamic nature of testing for locally-developed assessments. Accordingly, we advocate for annual reviews of predictive models and classification statistics that affirm the competence of high achievers but also help schools of medicine identify students who need additional academic support. As regards our school’s dynamic application of the Yes-No Angoff method across multiple cohorts we observed a profoundly important value-added to teaching and learning: the systematic review of exams, item by item, was a necessary task fulfilled by multiple faculty colleagues. In reviewing items, faculty provided feedback on the accuracy of test items, their formatting and relevance to medical practice, and their alignment with curriculum goals and objectives. These reviews supported the quality improvement of new MCQ-items developed for new knowledge content in our pre-clinical curriculum.
Certainly, our cohorts included a number of students who were not coded for Academic Success according to our study’s stringent rules for defining the binary outcome. And yet, most of our students who failed to achieve so-called Academic Success did in fact achieve ultimate success after remediation in the pre-clinical curriculum or after their passage of Step 1 on a second try. We would note that all of our matriculated students have gone through a rigorous evaluation process for selection and that our school’s curriculum is stepwise such that faculty do not expect students to have achieved full expertise all at once. It should also be noted that the typical medical school incorporates six competencies into a curriculum, not just medical knowledge. Thus, an MD graduate must also demonstrate competency in interpersonal and communication skills, professional behavior, patient care, systems-based practice, and practice-based learning and improvement.20 With this broader view of the knowledge, skills, and attitudes students need for a successful professional career, the importance of consequential validity evidence comes into sharper focus as schools of medicine assess for more than just a single competency. Ultimately, validity studies for all competencies are required for affirming students’ probable success in graduate medical education and their career readiness for the entry-level responsibilities of first-year residents.