The null hypothesis of this study that ChatGPT’s exam result will be more successful compared to the exam results of students was rejected; In multiple-choice questions, English version of ChatGPT-4 has demonstrated the highest performance, but same with the class profile(students). And in open-ended questions, ChatGPT-4 in English and in Turkish and the class profile have yielded equivalent results.
As of January 2023, ChatGPT has reached 100 million users, establishing itself as the artificial intelligence program with the highest user ratio (Eysenbach,2023). For researchers and practitioners to use ChatGPT correctly and to prevent potential problems, it is essential to have a good understanding of ChatGPT's capabilities and limitations (Cascella,2023).
However there is a limited number of studies available that have examined ChatGPT in the medical and dental fields so far (Fatani,2023).
In a similar study performed by Huh (Huh,2023), the answers of medical school students and ChatGPT were compared in an examination conducted in the field of medical parasitology. The examination success performance of medical school students has been found to exceed that of ChatGPT (Huh,2023). In the present study, success has been evaluated based on two question types: open-ended questions and test questions. Additionally, two different versions of ChatGPT, in English and Turkish, were subjected to the examination twice each. Unlike Huh's study, in this research, ChatGPT-4 in English joined the highest academic achievement with the class profile both in terms of the total result of the exam and the correct answers to the test questions.
In open-ended questions, both students and ChatGPT-4 in Turkish and English achieved the same level of success. However, ChatGPT 3.5 provides different results for Turkish and English. It is thought that some language models give different answers in different languages, it is because of the complexity of these models, training data and language characteristics. Therefore, the model may give different answers depending on the language and it is not always possible to achieve perfect consistency.
The study has revealed another finding that the first and second evaluations of the results of the Oral and Maxillofacial Radiology exam conducted in English by ChatGPT are more consistent with each other compared to those conducted in Turkish language. In fact this consistency has reached 100% in version-4.
According to the evaluated versions of chatGPT, the highest success rate in multiple-choice questions is 75%, whereas in open-ended questions, it decreases to 62.5%.
In this study, within the field of oral and maxillofacial radiology, ChatGPT has demonstrated relatively superior performance in multiple-choice questions compared to open-ended questions. However, the results from both sets of questions indicate that there are some areas where the knowledge of ChatGPT appears to be lacking.
This study results may be applicable not to all dentistry fields, but specifically to the field of oral and maxillofacial radiology. However, it would be said that the knowledge of ChatGPT is not highly sufficient for advanced levels in this field.
In their study, Khurana and Vaddi have identified the potential applications of ChatGPT within the realm of Oral and Maxillofacial Radiology. They have highlighted several areas where ChatGPT can be utilized, including generating oral radiology reports, responding to multiple-choice questions, aiding in scientific writing, contributing to dental education by assisting in presentation creation, providing feedback on student assignments, and aiding in the preparation of academic content outlines. However, they have noted that ChatGPT exhibits limitations in addressing queries involving image-based questions (Khurana,2023).
The performance of ChatGPT has been evaluated in a study for the United States Medical Licensing Examination(USMLE) Step 1, 2, and 3. The examination encompassed a mixture of open-ended questions with variable inputs and multiple-choice questions, single answer questions. As observed in the present study, ChatGPT demonstrated its lowest performance in step-1 assessment, which involved open-ended questions, followed by its performance in step-2 and 3. The outcomes of the study are consistent with those of human subjects (Alkaissi,2023).
According to the results obtained from this study, it has been confirmed that ChatGPT-4 has been enhanced and improved compared to version 3.5.
In another study Ali and his colleagues utilized ChatGPT to generate clinical letters of patients. According to the obtained results, the correctness rates of the letters created using ChatGPT were high scores compared to the letters generated by the physicians themselves (Ali,2023). One of the crucial aspects of studies lies in the act of writing. This is because a study can only be effectively introduced to the literature and the realm of academia through well-crafted composition. However, it is imperative to recognize that ChatGPT should not serve as the sole source. It is advisable to seek support from ChatGPT for language enhancement and rectification of certain errors.
The most significant advantage of ChatGPT can be evaluated as its ability to quickly understand the given information and reach evidence-based conclusions faster than humans (Salvagno,2023). In essence, anyone can ask questions on any topic and receive satisfying answers quickly through ChatGPT (Fatani,2023).