Differential Diagnoses That Final-Year Medical Students Need to Consider: A Modied Delphi Study

Background: Contrastive learning is known to be effective in teaching medical students how to generate diagnostic hypotheses in clinical reasoning. However, there is no consensus on comprehensive lists of differential diagnoses across different medical disciplines regarding the common symptoms that should be learned as part of the undergraduate medical curriculum. In Japan, the national model core curriculum for undergraduate medical education was revised in 2016, and lists of differential diagnoses for 37 common symptoms were introduced into the curriculum. This study aimed to validate the list of items based on expert consensus for use as a reference worldwide. Methods: The authors used a modied Delphi method to develop consensus among a panel of 23 expert physician-teachers in clinical reasoning from across Japan. The panel evaluated the items on a 5-point Likert scale, based on whether a disease should be hypothesized by nal-year medical students considering a given symptom. They also added other diseases that should be hypothesized. A positive consensus was dened as both a 75% rate of panel agreement and a mean of 4 or higher with a standard deviation of less than 1 on the 5-point scale. The study was conducted between September 2017 and March 2018. Results: This modied Delphi study identied 275 essential and 67 supplemental items corresponding to the differential diagnoses for 37 common symptoms that Japanese medical students should master before graduation. Conclusions: The lists developed in the study can be useful for teaching and learning how to generate initial hypotheses by encouraging students’ contrastive learning. Although the lists may be specic to the Japanese context, the lists and process of validation are generalizable to other countries for building national consensus on the content of medical education curricula.


Background
Amid the growing complexity of healthcare settings and considering the social accountability of physicians in ensuring patient safety, medical schools are required to educate and evaluate medical students with su ciently high standards, so that they become physicians who make fewest possible diagnostic errors. This is particularly evident in the worldwide trend of incorporating the assessment of clinical reasoning skills in high-stakes examinations such as national medical licensing exams, as represented by Step 2, "Clinical Skills," of the United States Medical Licensing Examination (USMLE) [1].
Thus, greater educational support is required for students to acquire the competence to anticipate a set of differential diagnoses from the earliest phase of the diagnostic process, gather con rming and refuting information according to an initial hypothesis, select and perform the relevant history taking and physical examination, and interpret the ndings to con rm or deny the initial hypothesis.
In this context, the lack of development of diagnostic hypotheses remains an issue in the teaching of clinical reasoning to medical students. For example, medical students learn how to take a patient's history without anticipating differential diagnoses, even though the literature suggests that diagnostic errors can be reduced by querying an initial hypothesis [2]. Furthermore, attempting to diagnose without generating a hypothesis may reduce students' reasoning performance [3]. Nevertheless, many medical schools have taught physical examination maneuvers in isolation, usually following a systematic "headto-toe" approach [4]. Recent studies in cognitive load theory have described that such fragmented reasoning may lead to diagnostic errors [5].
How can teachers effectively instruct medical students in hypothesis generation? Schmidt and Mamede [6] found that there is no evidence elucidating the most effective method for teaching clinical reasoning to medical students. In the highly specialized wards of teaching hospitals, which feature increasingly shorter patient stays, patient interaction with and exposure to role model doctors in clinical clerkships may be insu cient for students to gain the competence required in clinical reasoning. This is particularly the case regarding the need to consider the full range of differential diagnoses across all medical disciplines and specialties [7]. Students usually learn specialty-speci c or disease-oriented reasoning skills from specialists, who tend to generate diagnostic hypotheses focused on a particular organ system [8]. Variation among the patient cases that students experience tends to be limited, and feedback from attending doctors is opportunistic [6]. Instead, various pre-clinical curricula have been introduced to teach students clinical reasoning, in which cases are usually also limited [6]. In this context, previous research has suggested that a "comparing and contrasting" approach is effective for medical students to foster illness scripts in their minds [6,9]. Medical students without su cient clinical experience can effectively formulate an illness script of disease by comparing and contrasting the discriminating clinical features of other competing diagnoses in terms of a particular symptom.
Nevertheless, in the context of undergraduate medical education, there was no learning resource supporting medical students' contrastive learning by covering a wider variety of signs and symptoms. As of 2017, when this study was conducted, no previous study had developed a consensus on comprehensive lists of differential diagnoses across different medical disciplines regarding the common symptoms to be learned during the undergraduate medical curriculum. Therefore, this research aimed to develop lists of a limited number of diagnostic considerations regarding all the signs and symptoms that medical students should learn before graduation. The emerging lists will be bene cial for developing both case scenarios that consider a plausible set of differential diagnoses suitable for medical students' reasoning competence and assessment tools for checking differentiating information in the process of history taking and physical examination.
Although these lists can be universally applicable in the context of undergraduate medical education, they may also be speci c to the local social context in which they are developed and introduced. The reason for this speci city is that various epidemiological factors and societal needs can in uence what diseases medical students learn to diagnose within their countries. In Japan, the fourth version of the national core curriculum for undergraduate medical education in 2016 newly introduced lists of differential diagnoses regarding 37 common symptoms ( Table 1) that ought to be learned as part of the six-year undergraduate curriculum [10]. An original set of lists was developed through a review of the previous literature on clinical reasoning by committee members consisting of general internists specializing in teaching clinical reasoning, followed by a revision based on public feedback. As this process possibly re ected the personal perspectives of a limited number and range of specialists with authority, we attempted to validate the lists by building a consensus among experts in clinical reasoning education through a systematic, iterative process. Thus, the research question for this study asked: what are the differential diagnoses that nal-year medical students need to consider for the symptoms listed in the national guideline for undergraduate medical education? Table 1 The 37 common symptoms in the national model core curriculum for undergraduate medical education (revised in 2016)

Methods
In this study, we utilized a modi ed Delphi method [11,12]. In the original Delphi method [13], an initial list to be examined is established based on feedback from experts in the rst round. However, in the modi ed Delphi method, the initial list is mostly produced by the researchers based on a literature review, interviews with relevant professionals, or other academic methods [14]. The Delphi and modi ed Delphi methods allow researchers to gather and achieve consensus on the opinions of experts through a series of structured questionnaires, which are conducted anonymously to avoid the in uence of authority among the experts [13]. This is considered one of the strengths of these methods, particularly in East-Asian countries such as Japan, where hierarchical social relationships tend to have a strong in uence on the stakeholders of decision-making processes [15].
Both methods have been used in a variety of studies to establish a consensus on core competencies or curricula regarding a speci c topic or domain among medical specialties. For instance, Alahla and Burge [16] conducted a Delphi study to build a consensus on what medical students need to learn about psoriasis. Battistone, Barker, Beck, Tashjian, and Cannon [17] de ned the core skills of shoulder and knee assessment using the Delphi method. Moore and Chalk [18] used the Delphi method to produce a consensus on neurological physical examination skills that should be learned by medical students.
Finally, Moercke and Eika [19] used the Delphi method to identify the required clinical skills and minimum competency levels therein during undergraduate training.

The expert panel
Although there is no consensus on the most appropriate number of experts to include in a panel for Delphi studies, previous studies have generally required at least 20 expert participants for su cient reliability [20]. Considering the average response rate in past Delphi studies of approximately 80% [21], we aimed to recruit 25 participants. The previous literature also suggested that recruiting a variety of participants may produce higher quality results that are more generally acceptable [15]. Thus, we used purposeful sampling to recruit participants from different areas across Japan and different types of institutions ranging from community hospitals to university hospitals. The four authors (MI, JO, MK, HN), who were general internists specializing in teaching clinical reasoning, produced a list of 23 candidates for discussion. YM, who was a physician-researcher on clinical reasoning, contacted all the potential participants via email to ensure their interest and obtain their agreement to participate; all the candidates agreed to join our research. Informed consent was obtained from all participants. Thus, we assembled a panel of 23 physicians (clinical faculty members of medical schools or clinical teachers in teaching hospitals) with more than ten years of experience in the practice and teaching of clinical reasoning to medical students and with an understanding of the national model core curriculum.

The initial lists
The initial lists regarding the 37 common symptoms consisted of 277 items. To make each list su cient but as short as possible for the study, the four authors decided to incorporate only the 170 diseases designated as the minimum requirement in the guidelines for the Japanese national licensing examination [22] into the lists. The questionnaire, edited by YM, was piloted by two members of the research team (HN and MK). As previous studies, including Hasson et al. [13], suggested two or three rounds for Delphi studies, we designed our study using a two-round modi ed Delphi method, with an optional additional third round for re-evaluating certain items, an option that we ended up using. In all rounds, questionnaires with the lists to be evaluated were provided to the participants via a web-based survey system (Google Forms: available at https://www.google.com/intl/ja/forms/about/).
In Round 1, for each list of differential diagnoses for the 37 symptoms, the participants were asked to evaluate on a 5-point Likert scale (1 = absolutely needs to be excluded, 5 = absolutely needs to be included) whether each item (i.e., disease) should be hypothesized by nal-year medical students when diagnosing a patient with the given symptom. The participants were also asked to add any other diseases they considered relevant to include in the list. Their responses were anonymously collected and analyzed with a prede ned standard for positive consensus as follows. Based on our literature review, which included research from Dielissen et al. [21] and Heiko [23], our standard was 1) a mean score of 4 and higher with 2) a standard deviation of less than 1 and 3) 75% or more of the experts scoring 4 or 5 (i.e., 75% agreement). The participants were given two weeks to evaluate the lists in each round. To keep the response rates su ciently high, the participants received reminder e-mails a few days before each deadline. Two researchers (YM and HN) analyzed the primary data and discussed the results with the other researchers. Among all the additional items suggested by the panel, only the diseases listed as minimum requirements for the national licensing exam were incorporated into the revised list for Round 2.
The second version of the list, to be evaluated in Round 2, consisted of 1) diseases from the original lists that met our positive consensus standard and 2) diseases added by the panelists that were part of the minimum requirements for the national licensing exam. Additionally, the panel members re-evaluated diseases from the initial lists that did not meet the standard to ensure the appropriateness of their

Results
The participating experts, who had 11 years or more (average 21 years) of clinical expertise and more than 5 years (average 13 years) of teaching expertise, were all male and had a variety of backgrounds, including general medicine/general internal medicine (n = 20), family medicine (n = 3), and internal medicine (n = 2); eight of them had more than two specialties ( Table 2).  In the rst round, 47 items were eliminated from the initial lists (of 277 items) according to the pre-set standard. Among the 428 items that the study participants additionally suggested, 185 items that were also part of the minimum requirements for the national licensing examination were included in the revised list for Round 2 (Fig. 1).
After analyzing the feedback our panel members gave in Round 2, we examined the face validity of all the lists on which consensus had been reached. This third version consisted of 13 items on average per symptom (and a median of 7 items) as essential differential diagnoses for nal-year medical students. In total, the nal lists comprised 275 items; 187 items, including 79 items from the initial lists, were eliminated in the two rounds of the study.
In their free comments, some of the experts questioned the validity of the minimum requirement for the national licensing exam being used as part of the standard. We decided to conduct one additional round for our panelists to evaluate the diseases they suggested in Round 1 that were not part of the minimum requirements for the national licensing exam but were included elsewhere in the guideline. Among the 257 items that were added by the experts in Round 1, 89 were eliminated because they were not identi ed in the guidelines. Among 167 items included in the guideline, 67 items that met the consensus standard were incorporated in supplemental lists.
The nal lists consisted of essential differential diagnoses, and the supplemental lists are available as Additional le 1. As an example, the list of essential differential diagnoses for the symptom of chest pain as well as the supplemental list are illustrated in Table 3. Of the 23 participants who took part in the study, 22 completed the rst and second rounds (96%), and 20 completed the additional round (87%). Table 3 The example of "Chest pain"

Discussion
This modi ed Delphi study identi ed 275 essential and 67 supplemental items as the differential diagnoses for 37 common symptoms that Japanese medical students should master before graduation.
The lists developed in the study can be useful for teaching and learning how to generate initial hypotheses because, as Bowen and ten Cate [9] suggest, a preselection of differential diagnoses to consider is crucial for novice learners acquiring this competence. This method could also be transferable to other countries with a similar growing emphasis on patient safety and de-emphasis on hospital stay [25]. Moreover, both the lists and method can be useful in other countries when developing a national consensus on lists of symptom-speci c differential diagnoses across all medical disciplines.
The knowledge and skills regarding essential diseases for nal-year medical undergraduates should be equivalent to those of new graduates. In the move toward competency-based medical education, Japan has been reforming competency-based postgraduate programs per the competencies developed during undergraduate medical education [26]. Thus, this research is signi cant because clinical faculty members and clinical teachers with experience in teaching both medical students and residents participated in the study. The results of our study can contribute to further revision of the national guideline for undergraduate medical education as well as stronger continuity between undergraduate and postgraduate education.
One of the study's strengths is the high response rate across all three rounds of the Delphi process. According to Cantrill, Sibbald, and Buetow [27], response rates can in uence the validity of the derived consensus. The high response rate may also indicate the perceived importance of this study to our panelists. Considering the limited reduction in respondent numbers after every round, we would argue that the results are valid. The validity of our study was also enhanced by ensuring a representative panel of experts recruited from a variety of institutions from community hospitals to university hospitals across Japan.
This study also had some limitations. First, we limited the differential diagnoses to those listed in the minimum requirements of the guideline for the national licensing examination and adopted a categorical representation of diseases according to the guideline. For example, when the panelists added "renal failure," this was converted into "acute kidney disease" and "chronic kidney failure" as indicated in the guideline, and both were evaluated independently in the following rounds. Although the authors were aware of the possibility that this might eliminate some diseases that should be actively considered by nal-year medical students, we opted to make the lists su ciently compact with a low variance in terminology to avoid "curriculum hypertrophy" [28]. To reduce the possible in uence of this limitation, we designed an additional round to evaluate items outside of the minimum essentials. While this study attempted to produce a consensus on the minimum requirements for nal-year medical students, it does not imply that the generated lists are thorough or complete.
Second, some of the symptoms and their classi cations might have confused our panelists. For instance, among the 37 common symptoms considered in the model core curriculum, two different clinical conditions were combined into one condition several times, such as "Hematuria" and "Proteinuria," "Anxiety" and "Depression," and "Disturbance of consciousness" and "Syncope." Some of the experts claimed that these were clinically distinct due to different but partly overlapping working diagnoses. Moreover, the panelists pointed out that the relationship between some independent symptoms among the 37 was ambiguous, such as "Vertigo" and "Syncope." They also reported a few irregular cases in which the concept of "clinical reasoning" did not seem to apply, such as cases of "Trauma" and "Burns." These notes are crucial for future improvement of both the core medical curriculum and the guidelines for the national licensing examination. Finally, initial differential diagnoses for a symptom can be dependent on a patient's sex and age, neither of which were considered in the lists generated in this study. We would advise future users of the lists to consider this background information.

Conclusions
In conclusion, our study can contribute to promoting the integration of fragmented clinical reasoning teaching processes by encouraging students' contrastive learning as well as ensuring greater continuity between undergraduate and postgraduate medical education in the Japanese setting. This is also important for other countries where the ever-expanding competencies required of medical students exacerbate the fragmentation of medical education [5]. Furthermore, as the use of arti cial intelligence (AI) in the teaching of clinical reasoning has been increasing [29], the results of this study may be useful as a guideline regarding which diagnoses should be considered in machine learning. Similarly, the ndings can also be bene cial in the development of clinical decision support systems. The method and its potential impact on future AI-assisted medical education could apply to other countries. For our next step, we aim to develop a list of which medical history and physical examination elements nal-year medical students are required to gather and master when considering the differential diagnoses on the lists resulting from this study.

Declarations
Ethics approval and consent to participate The study protocol was performed in accordance with the Declaration of Helsinki, reviewed and approved by the institutional research board of the Kyoto University Graduate School of Medicine (R0481, 6/9/2017).

Not applicable
Availability of data and materials The datasets used and analyzed during the current study are available from the corresponding author upon reasonable request.
participate. The questionnaire, edited by YM, was piloted by HN and MK. YM and HN analyzed the primary data and discussed the results with the other researchers. All authors read and approved the nal manuscript. Figure 1 Numbers of items in the three rounds

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.