Virtual Versus In-Person Surgical Fellowship Interviews and Ranking Variability: the COVID-19 Experience

Objective: To investigate the effect of interview format changes (in-person to virtual, one-to-one to multiple-to-one) necessitated by the COVID-19 travel restrictions on candidate ranking variabilities. Method: In 2018/2019, the glaucoma fellowship interviews were conducted in-person and one-to-one, whereas in 2020, interviews were virtual and multiple (interviewers)-to-one (candidate). We compared ranking ranges of interviewers within the same virtual room (WSR) and not within the same virtual room (NWSR) to assess the effect of this change on ranking variabilities. We also compared ranking categories (“accept,” “alternate,” and “pass”) agreements between in-person and virtual interviews to assess the effect of this change on ranking variabilities. Results: NWSR and WSR mean rankings differed by 1.33 (95% confidence interval difference 0.61 to 2.04, p = 0.0003), with WSR interviewers having less variability than NWSR pairs. The variability of in-person interviews and later virtual interviews showed no differences (weighted Kappa statistic 0.086 for 2018, 0.158 for 2019, and 0.101 for 2020; p < 0.05 for all years). The overall least attractive candidate has the lowest variability; the most attractive candidate has the second lowest variability. Conclusion: Grouping interviewers decreased ranking variabilities, while a change from in-person to virtual interview format did not increase the ranking variabilities.


Introduction
The surgical fellowship interview is an important component of the selection process. For the applicants, the fellowship interview allows them to assess the training environment and culture, become acquainted with the program faculty members, and present themselves in the best light to future mentors. For the training program faculties, the interview allows assessment of the applicants' maturity and self-con dence, and their ability to articulate thoughts, listen well, and ask relevant questions. [1] The interview was found to be the single most important factor used in resident selection within the eld of ophthalmology and other specialties of medicine. [2,3] Similarly, prior surveys of program directors identi ed the interview and communication skills as criteria weighed most heavily in the selection of fellows in oculoplastics, retina, glaucoma, cornea, and oculoplastics. [4,5] The interview was often the most important factor in ranking fellowship candidates in other medical specialties. [6,7] In 2020, the coronavirus disease pandemic  and related travel restrictions made in-person interviews infeasible for most ophthalmology fellowship programs. As a result, most programs adopted virtual interviews via videoconferencing, as well as other changes in format such as switching from the one-to-one interview format to multiple faculty-on-one interviews to economize the interview time periods. The impact of these changes on candidate ranking is not well understood.
In this study, we compared faculty interviewers' preliminary candidate rankings to identify the impact of the new interview formats on ranking variabilities. We hypothesized that in 2020, there may be signi cantly better agreement among interviewers who were assigned to the same virtual interview rooms compared to those that were in different rooms. Furthermore, the year 2020 agreement may be signi cantly better than the years 2018 and 2019 agreement due to limited ability to assess body language as a result of virtual interview. To test these hypotheses, we performed a comparison of ranking variabilities between different interviewer-pairs for the 2020 interviews, and compare the candidate ranking categories between 2018, 2019 and 2020 interviews.

Results
For 2018 and 2019, rankings were from 1 to 20 (21 candidates including one internal candidate interviewed each of the two years), and for 2020, rankings were from 1 to 18 (19 candidates including one internal candidate interviewed). There were 8, 9 and 11 interviewers in 2018, 2019 and 2020, respectively, 7 of which interviewed in all 3 years.
Effect of interviewers being within the same virtual rooms versus different virtual rooms on ranking variability There were 11 interviewers in 2020, which were separated into 5 different virtual interview rooms (4 rooms with 2 interviewers, 1 room with 3) and results in the permutation of 7 within-the-same-room (WSR) pairs (the 3-interviewer room has a permutation of 3 pairs) and 48 not-within-the-same-room (NWSR) pairs. The ranking range spanned between 0 (no difference in ranking range) to 17 (widest possible range), with the most common ranking difference being "2" amongst all WSR (n = 28, 22.2%) and NWSR (n = 114, 13.2%) pairs. Overall, the mean difference in the ranks between NWSR and WSR was 1.33 (95% con dence interval [CI] difference 0.61 to 2.04, p = 0.0003), with the NWSR pairs having a greater variability (greater difference in the ranks of the 2 interviewers) than the WSR pairs. This implies that WSR interviewers, on average, ranked candidates 1.33 places closer to each other than NWSR interviewers. Furthermore, when comparing the rank range of each of the 7 WSR pairs to the 48 NWSR pairs, WSR pairs ranked candidates closer to each other than NSWR pairs, although this difference is only signi cant or marginally signi cant in 3 of the 7 comparisons (Table 1).  Effect of in-person versus virtual interviews on ranking variability A 3-way comparison of variability between 2018/2019 (in-person interviews) and 2020 (virtual interviews) ranking variabilities amongst interviewers who were present at all three years showed no signi cant differences between inperson and virtual interviews (Table 3). When comparing agreement of candidates being in "accept," "alternate" and "pass" categories, the weighted Kappa statistic was 0.086 for 2018, 0.158 for 2019, and 0.101 for 2020 (p < 0.05 for all years). Since the weighted Kappa statistic for 2020 is within the 95% CI of the weighted Kappa statistic for both 2018 and 2019, the 2020 virtual interviews did not result in a greater degree of disagreement in the categorization of candidates when compared to the in-person interviews in 2018 and/or 2019 (Table 4). Mean diff. (beta) = mean difference resulted from subtracting the rank range from one year to another. Candidates with the least and greatest ranking variabilities For 2020, we assessed the variability of rank ranges for the overall most and the least attractive candidates. The overall least attractive candidate has the lowest variability (indicating a high degree of agreement amongst the interviewers), while the most attractive candidate has the second lowest variability. The greatest discordances were with the candidate ranked 8th (GLM estimated average difference = 5.93) and the candidate ranked 5th (GLM estimated average difference = 5.31). However, a candidate's overall composite ranking (1 to 18) was not signi cantly associated with the variability in the individual pair's rankings (p = 0.8597).

Discussion
The COVID-19 pandemic has signi cantly altered many processes in academic medicine, such as the oral examination for board certi cation as well as the residency and fellowship interviews for most training programs. In our institution, a change from one-to-one interviews to multiple-on-one produced less variabilities in candidate rankings amongst interviewers who shared the same virtual room. This is likely due to the decreased variabilities of the candidates' presentation due to decreased variety of questions, although cognitive biases due to groupthink and herd effects may allow the more in uential member of the interview team to overshadow dissenting opinions.
[8] If a program's goal were to allow the candidates to represent themselves in the best light, then group interviews that provide fewer opportunities for such representation would be less desirable compared to individual, one-to-one interviews.
The virtual interviews in 2020 did not result in less ranking variabilities compared to in-person interviews from 2018 and 2019 among interviewers who were not in the same virtual interview room in 2020. While the limits of videoconferencing on nonverbal body language perception may present a barrier to adequate care in telemedicine, [9,10] it does not seem to affect candidate rankings when compared to in-person interviews. This is supported by two prior surveys in which candidates to surgical fellowships felt that they were able to present themselves satisfactorily to the training program through a videoconferencing format when compared to in-person interviews. [11,12] In our clinical training program, the candidates with the least and second least variabilities were the least and most desired candidates, suggesting that individual opinions converge at the most and least quali ed applicants. However, the candidates with the most variabilities (thus with the most disagreement) were ranked 5th and 8th, which are on the cusps of being from the "alternate" to either the "accepted" or "pass" categories, and perhaps the interview committee's greatest deliberation effort would be spent on adjudicating these two positions.
There are several limitations to our study. First, we do not know whether the nding of decreased ranking variabilities of WSR interviewers would have occurred if they were in the same physical rooms. Second, our results relate to fellowship selection in one subspecialty at a single academic institution, which may limit its generalizability to other disciplines. Third, we could not gather data on the candidates' ranking variabilities of glaucoma fellowship programs to assess whether the effects we have observed applies to the candidates. Despite the lack of differences in variabilities in virtual interviews compared to-in person interviews, prior studies showed that most candidates consider the in-person interview experience to be better than virtual interview experience, and in-person interview candidates felt better acquainted with the faculty and current trainees compared to virtual interview candidates. [13] Furthermore, in-person interview candidates were also more likely to agree that the interview experience was su cient to allow them to make a ranking decision, [14] while many virtual interview candidates felt that they did not get an adequate understanding of the program. [15] In summary, in our clinical glaucoma fellowship program, a change from one-to-one interview resulted in less candidate ranking variabilities between interviewers paired in the rooms, which may decrease the candidates' opportunities to represent themselves satisfactorily. A change from in-person to virtual interviews did not result in greater candidate ranking variability, while greatest agreement occurred with the least and most attractive candidates. However, prior studies showed that candidates in general prefer in-person interviews, a factor that should be weighed substantially when deciding whether to substitute in-person fellowship interviews with virtual interviews.

Methods
The study protocol has been reviewed and was determined to be exempt by the Institutional

Overall design
We analyzed the individual candidate rankings produced by the interviewers to elucidate the impact of a change from inperson, one-to-one interviews (years 2018 and 2019) to virtual, multiple-on-one interviews (year 2020). Speci cally, we assessed the variabilities in candidate rankings in two ways: First, for the year 2020, we compared candidate ranking variabilities between interviewers who were within the same virtual interview room ("within the same room," WSR) to those who were not within the same room (NWSR). Second, we compared candidate ranking variabilities between interviewers (who had interviewed all three years 2018-2020) to see if 2020 (virtual interviews) had less variabilities compared to prior years (in-person interviews), speci cally avoided comparing interviewers who were WSR in 2020 to avoid WSR as a confounding factor.

Glaucoma fellow selection process
Each year following the application deadline, applications for the Bascom Palmer Eye Institute clinical glaucoma fellowship were downloaded from the SF Match Residency and Fellowship Match Services (www.sfmatch.com) and screened by a committee of full-time glaucoma faculty members. From this application pool, a portion of quali ed candidates were invited for an interview. In 2018 and 2019, the interviews were conducted in-person and one-to-one, whereas in 2020, the interviews were conducted virtually using a Zoom platform (Zoom Video Communications Inc., San Jose, CA) and two-on-one (faculty to applicant) or three-on-one. For each year, all interviews were completed in the same day. As the interview day progressed, each faculty interviewer was asked to individually rank the candidates from 1 (the most attractive candidate) to the highest number (the least attractive candidate). It was required that all ranking start with 1, be consecutive, and no two candidates may share the same rank. Following the interviews, these individual interviewers' preliminary candidate rankings were pooled, and a composite rank list was generated, with the candidate with the rank closest to 1 being the most attractive, and the one farthest from 1 being the least attractive. The committee then adjusted the rankings in a structured discussion. More than one discussion meeting may have occurred before a nalized rank list is submitted to SF Match.

Interview formats
There were no overt instructions to the interviewers nor structures imposed on the individual interviews except to remain within the allowed time slot in order to stay on schedule. In 2018 and 2019, the candidates were invited on-site at the Miami campus of the Bascom Palmer Eye Institute (Miami FL), and the interviews were conducted in-person in a one-toone format. The interviews took place in individual faculty members' o ces. A facilitator was used to keep time and to direct candidates from one interview room to the next. In 2020, the interviews were conducted via the internet using a virtual online meeting format. Rather than one-to-one, there were ve virtual rooms each with two or three faculty members. In all years, each interview was allotted a 15-minute time slot. Two warnings were given when there were threeand one-minute remaining. In all years, all candidates had the opportunity to interview with all participating faculty interviewers. In the 2020 virtual interviews, no two interviewers occupied the same physical space. The interviewers' candidate rankings were created individually, although there were no speci c instructions to prohibit discussion of candidate quali cations.

Internal candidates
Since the goal of the study was to assess the effect of different interview settings and formats on candidate ranking variabilities, internal candidates (Bascom Palmer Eye Institute residents and/or research fellows applying for clinical glaucoma fellowships) were removed from the database and excluded from analysis, as the interviewers had frequent interactions with them outside the glaucoma fellowship application process. Once the internal candidates were removed from the data sets, each interviewer's rankings were adjusted by collapsing the removed rank. For example, if an interviewer had ranked the internal candidate as 3, then all subsequent rankings were reduced by 1 (so the person ranked faculty member was paired with all other included faculty members (except those with which he/she was WSR in 2020), and the ranges were determined for each year. However, since a different number of candidates were interviewed each year, this resulted in different rank ranges for each year, and analyses across different years based on the range of ranks could be confounded. To avoid this problem, when comparing across different years, we translated the rankings into 3 categories: accept, alternate, and pass. Each year, four glaucoma fellows are appointed, so each interviewer's top 4 candidates were categorized as accept, the next 4 candidates were categorized as alternate, and all other candidates were categorized as pass. This categorization approach translated different rank ranges into the same 3 categories for the 3 years and thus avoids confounding factors due to different rank ranges between the 3 interview years.

Statistical methods
Categorical and ranked variables are presented with frequencies and percentages. We assessed the variability of the ranges (of the ranks for pairs of interviewers) between 2018, 2019 (in-person interviews) and 2020 (virtual interviews) using a general linear model (GLM). Similarly, for 2020 interviews, we assessed the variability of the ranges of WSR interviewer pairs to NWSR pairs using a GLM. We used weighted Kappa statistics to compare the agreement of interviewer rankings placed into the 3 categories: accept, alternate, and pass for 2018, 2019, and 2020. Similarly, we used weighted Kappa statistics to compare the agreement of interviewer rankings placed into the 3 categories: accept, alternate, and pass between the WSR pairs and the NWSR pairs for the 2020 interviews. In all inter-year comparisons, interviewer-pairs that shared a virtual interview room in 2020 were omitted to avoid the effect of sharing a virtual interview room on individual variabilities, since there had been no shared rooms in 2018 and 2019. All analyses were done using SAS version 9.4 software (SAS Institute, Cary, NC, USA). A p-value ≤ 0.05 was considered statistically signi cant, and a p-value between 0.05 and 0.15 was considered marginally signi cant. SAS does not produce precise pvalues for weighted (ordinal) Kappa statistics, nor for comparisons between two weighted Kappa statistics.
Declarations Figure 1 Candidate category ("accept," "alternate," "pass") agreements between pairs of interviewers who were within the same virtual interview room (WSR) and not within the same virtual interview room (NWSR); "differ by 1 step" denotes a difference between "accept" and "alternate" or "alternate" and "pass," whereas "differ by 2 steps" denotes a difference between "accept" and "pass."