3.1. Baseline demographics of the participants
All seven developers (100%) participated in all surveys and meetings. They were all faculty members who had clinical experience of 3–16 years (mean: 10.8 years) and education experience of 2–11 years (mean: 6 years; Table 1).
3.2. Self-assessment of confidence in developing OSCE cases
The developers responded that during the OSCE case development, their competencies for all items of “case selection,” “situation guide,” “post-encounter note,” “score,” “scenario,” and “case creation” were better than before development (Figure 1). The average in the 1st survey was 18,857 points, the 2nd was 21.429, the 3rd was 23, and the 4th was 23.429 based on a total of four self-competence evaluations. As a result, we confirmed that the developers’ self-competence grew as the development process was repeated.
3.3. Challenges and solutions
Through an analysis of the meeting minutes, we identified the difficulties and problems the developers experienced and subcategorized them into challenges (Table 2). Based on the results of the analysis, 18 subcategories were extracted under 7 main themes.
3.3.1. Case
Most of the developers expressed that deciding the criteria to use to select a sample case was difficult. The main contents were as follows.
One participant said the following at the 2nd meeting:
“I am not sure what kind of criteria I should use when I choose a patient case. Given this is an educational module, I’m curious whether an extreme case is preferable over a common patient case. I am not sure which is the more important between OSCE development itself and the procedure of development.” (D1)
As a result of the meetings and feedback from medical education experts, the methods of solving this kind of challenge were classified into three subcategories, which were that the cases should be “common, typical, or clinically important,” “diverse in age and gender,” and “related to CPG.”
The medical education specialist’s feedback (S1):
“By simply changing the age and gender of the patient in each module, students can feel like it is a new case. And, because the purpose of OSCE is for evaluation rather than education, it is preferable to choose a case that is a case of a patient we see frequently in clinical practice.” (S1)
3.3.2. Test situation
The developers thought that it was very unnatural to present the results of oriental medicine examinations on a card in a test situation. For example, after the student performed a tongue and pulse examination on an SP, the result (for example, the tongue has coated a lot, the pulse is weak) was presented by the SP on a card.
“I’m concerned about how many times I should use the diagnosis card for writing tongue or pulse diagnosis results. I don’t think it’s a good idea to use it too much because it seems to be apart from reality.” (S1)
“I have thought about the problem a lot for a long time. As a result, I believe that it is best to use a model. For example, in the case of tongue diagnosis, it can be evaluated by making an artificial model with a photo attached to a face. However, this is difficult to apply because there is a problem with how realistically the photo resolution can be printed.” (S2)
Furthermore, according to the developers’ experiences, some necessary information is frequently missing in test situations. They believed that this problem should be solved. The following is an example:
“It appears that the SP must provide sufficient information about the case to the students in the test situation. For example, in the case of a 55-year-old woman diagnosed with cancer, information about medical history, previous diagnoses with cancer, or kinds of prescription should be presented to the student in advance…” (S1)
3.3.3. Post-encounter note
The developers suffered from a problem in presenting clear scoring criteria for the post-encounter note. Following a review by the medical education experts and SP trainer, the OSCE module was completed by the developers. It was determined that a common standard was required because each module was written based on a different standard. In addition, the expert and trainer suggested that the post-encounter note question and the score for each question should be adjusted based on the actual test situation. The following are some sample comments:
“In a OSCE situation, we should discuss the scoring criteria more, and in this project, there was no common scoring criteria. For example, some developers set the scoring standard as ‘2 points for including the first correct answer and 1 point for including the second correct answer,’ while others did not.”
They seemed to have difficulty suggesting scoring criteria in the post-encounter note. After the medical education experts and SP trainer reviewed their development, all researchers concluded that common score criteria for the post-encounter note are needed.
“It is better to show the number of right answers in the post-encounter note. And it should be numbered in order of the priority of the questions. For example, in the case of this kind of question, ‘Write down the examination plan,’ it is much better for the scoring criteria needed to be written as follows: 1) First priority right answer, 2) second priority right answer, and 3) third priority right answer.”
We should settle the scoring criteria when we perform a real OSCE. In this project, at the beginning, we did not standardize the scoring criteria. For example, it should be graded as 2 if the answer included the first priority answer, 3 for the second priority answer, and so on. Also, some differences were identified in the priority among the modules.
3.3.4. Checklists
The evaluation of OSCE is scored by the instructor and the SP based on the checklist. The checklist includes items such as taking the patient’s history, physical examination, patient education, and the patient-physician relationship (PPI). The following difficulties were identified by the developers when they created the OSCE checklists including score criteria. The specific contents were classified into four categories, as shown in Table 1.
3.3.4.1. Too many items
When the students performed the OSCE, the developers awarded points after checking whether the students asked the SP adequate questions. However, there were some problems regarding the number of checklist items, as there were too many questions in the checklist. This was caused by developing the OSCE based on the KM schema. The medical education experts proposed two subcategories for problem solutions. A representative difficulty and its solution are as follows:
“This is the problem I struggled with the most while creating the OSCE module. I had to derive both Conventional and traditional Korean medical diagnoses, so I had to add additional questions related to diagnosis, so the content was lengthy. So, I am not sure if I should omit some of the questions related to dialectics.” (D2)
“In KM clinical practice, the main difference between KM and Conventional medicine is performing its own diagnosis, so it is critical to use questions related to KM diagnosis effectively. So, it is crucial to include all kinds of items related to KM diagnosis when you develop the checklist. I hope that the sections dealing with Conventional medicine diagnosis will be reduced.” (S1)
“It is appropriate to have 10 to 15 items, focusing on the core history taking, and it’s usually around 12 items. I’d like to put in a few more questions about KM diagnosis.”
3.3.4.2. Missing critical items
At the meetings, the researchers confirmed that the history taking or examinations required for differential diagnosis had been omitted from the checklist:
“When the patient says ‘My stomach hurts,’ according to the schema, it is necessary to distinguish whether it was induced by an ulcer, stress, drug, or heart disease, but the questions to distinguish it were missing.” (S1)
3.3.4.3. Ambiguous
After the developers observed the demonstration at the meeting, they discovered various problems with the checklist items. These kinds of challenge were categorized as “ambiguous.” First, there was difficulty for the scorer in evaluating whether the student was performing an adequate examination, such as inspection, and that is why scoring in a real OSCE situation may be impossible.
“It is difficult to check whether the students directly observe the patient’s lips, so it is preferable to remove the observation of lip color from the checklist.” (S1)
Second, there were numerous sections where the scoring criteria were presented in an ambiguous manner. For example, the item “Perfect physical examination” would be checked with one point, but the standards for performing it were not stated.
Third, the checklist had a problem in that, despite being scored by an SP with no medical education, the scoring criteria were written in difficult-to-understand terms using medical terminology or Chinese characters. Also, the scorers found that is it hard to understand due to the mix of oriental and Conventional medicine terms. The following is an example presented by an SP trainer who participated in the demonstration from the SP's point of view:
“If you want to check somethings like ‘Did the student palpate the stomach area?’ please specify the location so that the SP can understand it easily.” (S3)
As a solution, it was suggested that the scoring criteria should only include items that can be observed in the actual test situation, and as far as possible, scoring criteria should be written in Korean so that scorers can easily understand them.
“If the drug name is Afatinib, please write it in Korean.” (S3)
Furthermore, some duplicated items were discovered in the checklist. In particular, developers were asked to classify the items related to patient education as presumptive diagnosis, presumptive dialectics, diagnosis plan, treatment plan, and education plan and not duplicate them.
“In case of dizziness, if the type of dizziness is important for evaluation, it should be evaluated separately as a different item. For example, how the dizziness changes with posture, whether the patient is conscious or not, and so on. When I reviewed the developers’ module, it was difficult to score because all of the items were combined into one item.” (S1)
3.3.4.4. The order of items is different from that of performance
The developers reported that it was difficult to check all of the contents quickly because the order of actual students’ performance order and the order of the checklist differed. As a result, there was some discussion about how the checklist should be ordered:
“The grading was difficult because the students’ questions did not appear in the order specified on the checklist. I had to score while searching for each question, and some of them were unclear, so how could I judge them?”
“So, when structuring the order of the questions, you prefer to create a question-centric framework so that students will be likely to ask the important questions more frequently. Also, it would be better to write the physical examination in the order of head-to-toe.” (S1)
3.3.5. Scenario for SP training
3.3.5.1. Insufficient information
The medical education experts and SP trainer reviewed the scenario and discovered that information about the patient’s emotional state, situation, symptoms, social history, and overall health status was not adequately provided. In other words, it was observed that developers found it difficult to include specific information in the scenario. The following are some of the points mentioned during the meetings by the medical education experts and SP trainer:
“In terms of social history, when the student asked the patient whether he or she drinks coffee, if the SP responds that he or she no longer drinks coffee, the student should follow up with a question to confirm how many cups the patient drank before quitting coffee. This isn’t present.” (S1)
“Regarding the patient’s physical examination, please add more questions such as… ‘What motivated you to receive a health checkup?’ and ‘Is it just a regular check-up at your workplace, or did you receive a check-up because of abnormal symptoms?’ (S3)
“In the case of dizziness, in the question ‘Does the dizziness get worse if you overwork?,’ does overwork mean a mental or physical thing? Please explain more about it.” (S3)
3.3.5.2. Difficult for SPs to understand
Second, there was an opinion that the KM terminology and Chinese characters should be written in Korean so that the SPs can understand the scenario.
3.3.5.3. Concerns about differences between SPs within the same case
Third, in cases where several SPs demonstrate the same single case, it should be standardized as much as possible to ensure that there is no variation between their demonstrations. In this regard, the solution suggested to the developers was as follows. Because students may ask sudden questions while performing OSCE, it is necessary to inform the SP of the behavioral standardized guidelines, which cover how to answer or act in response to a student’s question. In addition, one more solution was presented to the developers to clarify the timing of the question that must be asked. There were some questions from SPs due to a lack of guidelines.
“Should I unwrap my watch to show my hand while the student is performing the pulse diagnosis? Or should I unwrap only if the student tells me to take off my watch?” (S3)
“Should I present the examination card if only one of my hands is pulsed? Please tell me when or in which situation I should present the physical examination card.” (S3)
“Should I withhold information about the medications I’m taking unless the student specifically asks about that?”
Here is a suggested solution from a medical education expert:
“If you have a question for the SP, you should set up adequate time to ask it. If standardization is difficult, it is preferable to eliminate the question entirely.” (S1)
3.3.5.4. SPs’ dialogue includes critical contents
Fourth, there were some problems where the SP gave some information to the student in advance, even though the student did not ask them anything. So, according to the researchers, there is a need to develop scenarios with certain guidelines, as above. In addition, a medical education expert suggested removing questions such as “Doctor, aren’t you going to do a pulse diagnosis for me?” All developers modified their scenarios based on the above solutions.
3.3.5.5. The range of patients’ education is limited
We argued about the purpose of this development, that is, developers should have in mind when making a scenario. In previous projects about OSCE module development, there was a format that was categorized as “patient education,” but in this project, that format was not used. So, two developers were not sure if they should make a scenario about patient education in detail.
“In other CPG projects, they told the developers that the patient education item was missed, so they asked us to develop [something] for that. I think that the ‘patient education’ item should be included because the purpose of this project is to spread the CPG.” (D4)
In this regard, a medical education expert suggested the following:
“The part of ‘patient education’ should be developed focusing on following the diagnostic progress, because the diagnosis wasn’t decided yet, so we couldn’t offer any kind of education to the patient.” (S1)
Another KM education expert suggested the following solution:
“In KM education the OSCE is the beginning stage, so we need to focus on not PPI but ‘OSCE performance’ like ‘history taking’ or ‘physical examination.’ So, when I reviewed your scenario, I removed the part on ‘patient education’ from my purpose.” (S2)
“Therefore, the patient education in the scenario mostly consisted of future diagnostic progress including life management, future plans, and so on.” (S2)
3.3.6. Format
There was some discussion on the need to use a standardized format because each developer used a different format to write the OSCE module. To resolve this, the researchers agreed to prepare a standard for matching the format in common through a meeting, as follows:
(1) The present illness must be expressed in short sentences in a sequence so that students can read and memorize it more easily. As an example, 3 months ago, 1 year ago.
(2) The format of the checklist should be expressed in the patient’s words. For example, not “Did the student ask the patient when their symptoms started?” but “The patient said that he had been tired for about a year.”
(3) It should be organized so that the contents are appropriate for each item. For example, in some cases, the contents of the physical examination were included in the history taking, so it should be revised. There were also issues with the contents not being organized consistently between the checklist and the scenario or the present illness and the case summary within the same scenario. For example, at the beginning of one OSCE module, it was shown that the patient was in his 60s, but at the end of the scenario, the patient was shown in his 50s. So, the developers were asked to correct these parts by the medical education experts.
3.3.7. Pattern identification
The issue most frequently raised by the developers was that they had to develop cases based on PI schemas. They became confused about whether the existing PI schema was well-crafted, so some of them even created their own PI schema for developing their OSCE modules. The following are some of the difficulties associated with PI.
3.3.7.1. The gap between schemas and real clinical situations
We categorized their difficulties by PI schema into three items. First, there was a significant gap between the inference based on the previously developed schema and the actual clinical situation. The developers wondered whether KM doctors were treating based on the existing schema in the actual treatment. They were also doctors, so they thought that many actual clinical situations would not follow the schema. As a result, they were not sure if it was allowable to create a OSCE based on the existing schema.
“The PI is really complicated. In my clinical situation, I mainly diagnose patients based on the stage of the disease not based on the schema. For example, in the case of stroke patients, if they are in the acute phase, I mainly diagnose them as ‘fire heat (風熱證)’ or ‘strength (實證)’ and, if the patients are in the chronic phase, they are mainly diagnosed as ‘Qi deficiency (氣虛證)’ or ‘Yin deficiency (陰虛證).’ So, most clinicians generally diagnose based only on stage not schema, like me. In the case of a complex PI schema, can we consider the schema only by the stage of the disease?” (D1)
In regard to the above problem, a KM education expert replied that it does not matter if the contents of a OSCE differ from a clinical case. This is because the purpose of a OSCE is not to reflect the actual clinical field, but to train students to be familiar with schema-based deductive reasoning. Therefore, it needs to be developed based on schemas, and the developers tried to solve the issue based on the expert’s opinion.
“Clinicians have a tendency for pattern recognition, which requires advanced training in inductive reasoning based on schemas. In other words, they consider three to five items at once for diagnosis and it leads to them making comprehensive decisions. However, in the case of students, their knowledge is dispersed, making it difficult to do pattern recognition. And as the goal of OSCE is education and evaluation for students, that’s why you should focus on training students to become more like professors; that is, pattern recognition of experts, via schema-based reasoning training. So, even if the OSCE module is not similar to clinical practice, the goal should be for the student to imitate and practice the expert’s treatment form, so schema-based development should be conducted.” (S2)
3.3.7.2. Uncertainty of the existing PI schema
The PI list was presented in the CPG. As the CPG is developed mostly based on clinical trials and has some differences compared to textbooks, the developers could not be sure that the PI list could be used for a model answer as-is. Furthermore, the listing of simple items does not reflect the clinical reasoning process, so a schema-type configuration was required as in the disease diagnosis. Therefore, there was a problem in determining which PI items to include and how to create the schema. The following topics were discussed at the meeting:
“The Conventional medicine schema is a type of systematic exclusion diagnosis that classifies and organizes possible diseases under a single symptom. On the other hand, the KM schema presented in the current CPG is not a type of exclusion diagnosis, but rather a parallel form that summarizes the possible PI diagnoses. I think these kinds of PI schema could not be considered as reliable schemas. So, I’m not sure if I should develop the module using the existing schema.” (D5)
“There may be enough challenges. Realistically, asking about all PI categories is impossible, and there are only a few key questions that we have to include. That’s why we have no choice but to compromise the schema by considering various cases. I think that it would be better to create a new schema focusing on overlapping parts of the PI system in textbooks and the CPG.” (S2)
Accordingly, in our 2nd meeting, all researchers agreed that the PI schemas needed to be reconfigured, which was done by the developers during this project. After that, they continued to develop the following OSCE modules based on their new schemas. For this process, a KM education expert joined the study to provide appropriate solutions for schema reconfiguration. The following are the challenges encountered by the developers while reconstructing the PI schemas, as well as the solutions proposed by the KM education expert:
“Let’s reorganize the schemas by categorizing the related PI rather than dividing PI into ‘deficiency (虛證)’ and ‘excess (實證).’ Because there are various diagnosis systems in KM, it is difficult to create totally perfect PI schemas. Let’s decide how we make the schemas by reaching a consensus among the researchers.” (S2)
3.3.7.3. Difficulty in selecting items for PI
The developers had many troubles in creating questions related to PI. Due to the characteristics of PI diagnoses, there can be many questions for only one diagnosis. For this challenge, it was suggested that the developers replace the PI-related questions with disease characteristic questions such as OLD CoEx CAFÉ, an abbreviation of onset, location, duration, course, experience, character, associated symptoms, and factor. In addition, it was suggested that rather than questions for selection, it should include questions for exclusion. All researchers agreed with the above solutions.
“For example, when I diagnose as spleen Qi deficiency syndrome in some cases, there are many PI-related questions in accord with it such as anorexia and stomachache. Should I include all these questions on PI-related symptoms? I’m not sure which of these questions I should ask in order to properly diagnose and assign a score.” (D4)
“Do not try to include all PI-related content as questions. In other words, kinds of questions about the characteristics of disease, such as OLD CoEx CAFÉ, could also be good questions for PI diagnoses. You might be relieved of the burden of making questions with this solution.” (S2)
Even though the developers solved the above issues based on the experts’ suggestions, they emphasized the need to develop a PI item list commonly applied to all diseases. In the PI diagnostic system, even if two patients have different diseases, the diagnoses can be the same. For this reason, the developers found some problems, as follows:
“Is there a significant difference between spleen Qi deficiency syndrome of indigestion and spleen Qi deficiency syndrome of chronic fatigue? In the case of gastrointestinal symptoms, for example anorexia, it can show both indigestion and chronic fatigue. I’m not sure how to tell them apart. It would be better to choose critical questions from a well-made PI item list.” (D3)
“The most difficult thing for me as a developer is that even though I am a clinical expert, I am concerned about my ability to create new PI schemas. I think that first, it should be set up with the goal of distinguishing ‘deficiency syndrome’ and ‘excess syndrome’ at a higher level for differential diagnosis…” (S2)
3.4. IPA Analysis
An IPA analysis was conducted to identify which challenges should be the focus when trying to solve the difficulties faced by developers. An IPA graph is divided into four quadrants with importance on the x-axis and performance on the y-axis (Figure 2).
According to the location of each item, four quadrants were named: “Possible overkill” (Quadrant 1), “Keep up the good work” (Quadrant 2), “Low priority” (Quadrant 3), and “Concentrate here” (Quadrant 4).7 “Possible overkill” is the area where developers’ performance compares well to importance and “Keep up the good work” is where the current level must be continuously maintained because both importance and performance are high. “Low priority” is an area that needs to be improved because both its importance and performance are low, while “Concentrate here” does not necessitate excessive effort due to its high performance in comparison to its importance.
In our results, the items included in the “Concentrate here” area were “PI schema reconfiguration” and “Making adequate PI-related items” under the PI theme and “Providing standardized guidelines” under the scenario theme. In the “Low priority” area, there were “Providing critical information,” “Adjusting the number of items,” and “Including critical items” under the test situation theme and “Prohibited telling critical items in advance by SP” and “Range of patients’ education” under the scenario theme. “Authentic case” and “Adjusting the number of items” under the checklist theme and “Providing sufficient information” under the scenario theme were found in the “Possible overkill” area. The “Keep up the good work” area included “Case selection,” “Setting scoring criteria and format,” “Marking items clearly,” “Using easy terms for SPs to understand,” “Observation format,” and “Consistent contents.”