Clinicopathological concordance of clinicians, Chat-GPT4 and ORAD for odontogenic keratocysts and tumours referred to a single New Zealand Centre- A 15-year retrospective study.

doi:10.21203/rs.3.rs-4115114/v1

Download PDF

Research Article

Clinicopathological concordance of clinicians, Chat-GPT4 and ORAD for odontogenic keratocysts and tumours referred to a single New Zealand Centre- A 15-year retrospective study.

https://doi.org/10.21203/rs.3.rs-4115114/v1

This work is licensed under a CC BY 4.0 License

Journal Publication

published 26 Jul, 2024

Read the published version in Oral and Maxillofacial Surgery →

You are reading this latest preprint version

Background: This research aimed to investigate the concordance between clinical impressions and histopathologic diagnoses made by clinicians and artificial intelligence tools for odontogenic keratocyst (OKC) and Odontogenic tumours (OT) in a New Zealand population from 2008-2023.

Methods: Histopathological records from the Oral Pathology Centre, University of Otago (2008-2023) were examined to identify OKCs and OT. Specimen referral details, histopathologic reports, and clinician differential diagnoses, as well as those provided by ORAD and Chat PT-4, were documented. Data were analyzed using SPSS, and concordance between provisional and histopathologic diagnoses was ascertained.

Results: Of the 34,225 biopsies, 302 and 321 samples were identified as OTs and OKCs. Concordance rates were 43.2% for clinicians, 45.6% for ORAD, and 41.4% for CHAT-GPT4. Surgeons achieved higher concordance rate (47.7%) compared to non-surgeons (29.82%). Odds ratio of having concordant diagnosis using CHAT-GPT and ORAD were between 1.4-2.8 (p<0.05). In differentiation between Ameloblastoma and OKC, CHAT-GPT4 had highest sensitivity at 75.9% and accuracy of 82.5%. For clinicians and ORAD the corresponding values were 66.7%/86.8% and 66.7%/84.9%, respectively.

Conclusion: Clinicians with surgical training achieved higher concordance rate when it comes to OT and OKC. CHAT-GPT4 and Bayesian approach (ORAD) have shown potential in enhancing diagnostic capabilities.

Odontogenic Tumour

odontogenic keratocyst

ameloblastoma

concordance

Chat-GPT

Bayesian

artificial intelligence

Odontogenic tumours (OT) are heterogeneous lesions derived from the tooth-forming apparatus (1–3). Although these tumours are relatively rare, comprising of less than 5 % ofall oral and maxillofacial biopsy specimens (4–7); they encompass a diverse spectrum of pathologies, ranging from hamartomatous to locally invasive and neoplastic lesions (1–3).

The World Health Organization (WHO) published the classification of (OT) in 1971 (8), with multiple revisions till the most recent 5th edition in 2022. In 1992 calcifying odontogenic cyst was introduced as an OT (9) and in 2005 odontogenic keratocyst (OKC) was reclassified as keratocystic odontogenic tumour (KCOT) (10), due to its locally aggressive nature and links to the PTCH1 gene and SHH signalling pathway (11, 12). However, they were subsequently reclassified as cysts in the 4th edition in 2017 (13) and in 2022, citing a lack of compelling evidence for neoplastic behaviour ((11, 12).

Ameloblastoma and odontogenic keratocyst (OKC) are common types of odontogenic lesions and particularly noteworthy due to their potential for locally infiltrative behaviour and the risk of recurrence (14, 15). The recommended treatment for conventional Amelobalstoma is resection of the jaw with a margin (16). While such treatments often necessitate intricate reconstructive surgeries for aesthetic and functional restoration, early and precise diagnosis can allow for more conservative treatments (17). Approaches such as enucleation and peripheral ostectomy become viable, which is crucial for paediatric patients to minimize disruptions to craniofacial development and emerging dentition (18, 19). With OKC on the other hand more conservative enucleation is a common treatment with adjunctive measures like Carnoy’s solution, 5FU, or cryotherapy employed to reduce the chances of recurrence (20–22).

A practitioner's provisional diagnosis and impression of a lesion are pivotal, setting the course for the patient's initial management, be it monitoring, biopsy, surgery, or referral to a surgical specialist. But for this methodology to be effective, the practitioner's initial clinical assessment must be accurate to minimise the possibility of over or undertreatment. Diagnostic delays can escalate disease progression, necessitate more intrusive surgeries, and dampen treatment outcomes. Thus, gauging the precision of clinical diagnoses against the definitive histopathologic diagnosis—the gold standard—is crucial. The clinicopathological concordance (CPC) quantifies this alignment, measuring the concordance between a clinician's preliminary diagnosis and a pathologist's conclusive pathological diagnosis of the biopsied specimen (23).

General dental practitioners (GDPs) often serve as the primary point of contact for many patients, equipped with training to inspect, diagnose, and handle a diverse range of oral ailments. In contrast, specialists undergo additional postgraduate training in their specific domains. When it comes to the CPC rate the literature presents varied conclusions. While some studies suggest specialists exhibit higher CPC rates, others contend that the outcomes between GDPs and specialists are comparable (23, 24). Reported CPC rates among specialists typically range from 50–70% (23, 25, 26).

Even with the result of the initial biopsy, securing an accurate provisional diagnosis is still paramount, as there is potential margin of error and hence discrepancies between incisional biopsy findings and the comprehensive specimen (27). For instance, Chen et al. highlighted a discrepancy rate of 11.1% between preliminary and definitive histopathological assessments. In such instances, clinicians might opt for an additional biopsy or seek a second opinion from another pathologist (28).

However, humans are prone to diagnostic errors stemming from human cognitive biases and heuristics can be minimised with the assistance of AI tools. Mental shortcuts, such as confirmation bias and the availability heuristic, often lead clinicians to make judgments based on initial impressions or memorable experiences(29–31). AI, with its data-driven approach, can provide a more objective analysis, free from personal biases. By offering evidence-based suggestions, AI can challenge and expand a clinician's differential diagnosis, countering biases like anchoring and overconfidence. While human expertise remains paramount, integrating AI can serve as an additional check, enhancing the accuracy and consistency of clinical diagnoses.

Artificial intelligence (AI) is machine-based simulation of human intelligence which has started to integrate into daily lives and into healthcare (32). AI has advanced clinical applications by enhancing patient results, making processes more efficient, and cutting costs. In the clinical realm, AI has made remarkable strides in tasks such as analysing data for image segmentation, aiding in clinical decisions like predicting disease outbreaks, and executing intricate procedures including surgeries and rehabilitation(32). This underscores its transformative potential for healthcare. In the field of oral and maxillofacial surgery, convolutional neural networks have demonstrated improved efficacy in identifying and categorising maxillofacial fractures, TMJ disorders, classification, and detection of malignant disorders (32–39).

The use of AI in the diagnosis of cysts and tumours of the jaw is an emerging field with promising potential. AI algorithms can potentially assist healthcare professionals in analysing medical images, identifying patterns, and providing diagnostic support. Previous studies have shown promising results in respect to identifying and classifying lesions in plain radiographs and cone-beam computed tomography (CBCT) scans (38–41). By leveraging machine learning techniques, notably deep convolutional neural network (CNN), AI systems can learn from large datasets of annotated images to recognise specific radiographic features associated with different types of cysts and neoplasms (38–45). Further to this by combining radiological and demographics and clinical data, AI models can generate differential diagnoses to provide clinicians with additional decision support(38–41) and for the purpose of surgical planning ((46–48).

Much of the research based on AI and CNN has been restricted to solely image-based methods, curtailing the effective relay of information, and not fully harnessing the potential of AI in clinical setting whilst Natural language processing (NLP) using large language model (LLM) excel in correlating textual and visual data, aiding in the interpretation of radiographs.

NLP is a form of AI that plays a key role in clinical decision support (CDS) systems (49). Generative Pretrained Transformer (GPT) model is one of such that is an open source available to the public. Such AI system were utilised as triaging systems and virtual clinics during COVID-19 pandemic (50, 51). Furthermore, NLP systems that extract disease symptoms from clinical texts have been used in identifying cofounding characteristics of patients in medical records and predicting disease outcomes (52, 53).

AI as CDS has been proposed to increase efficiency and accuracy of clinical diagnosis and therefore safety for the clinicians (49). Diagnostic accuracy has been tested in internal medicine based on common chief complaints where CDS have shown high rates of accuracy for achieving diagnosis (49). Currently there is a lack of study examining the ability for CHAT-GPT to assist clinician with diagnosis of jaw lesion and in particular OT and OKC.

The University of California Los Angeles developed the Oral Radiographic Differential Diagnosis (ORAD) system in 1990s. ORAD employs the Bayesian approach and Bayesian belief networks (BBN), tools that have seen applications in both medical and dental fields to enhance diagnostic accuracy (45, 54–57). The essence of this approach is that it can estimate the probability of a disease from a particular set of observed variables. This estimation is possible when the relative prevalence of each disease is known, along with the likelihood of the occurrence of the findings. A Bayesian Belief Network (BBN) showcases the numerical relationships between various nodes. Importantly, it avoids cyclical logical connections and clearly defines the directional relationships between these nodes. Utilizsing a directed acyclic graph, the BBN delineates the causality flow among the nodes. Furthermore, it quantifies the strength of connections between variables. An essential feature of a BBN is its capability to autonomously update probability estimates as new data is introduced.

The study aimed to investigate diagnosis range, relative frequencies, and clinical presentations of odontogenic cysts and tumors in a New Zealand population over a 15-year period. The study also explored CHAT-GPT and ORAD as a possible tool to assist with diagnosis relating to odontogenic jaw pathology. In this regard, the study sought to compare the CPC achieved by CHAT-GPT, ORAD, and by clinicians.

Ethics approval

The Institutional Minimal Risk Human Ethics Approval and Māori Consultation was obtained in accordance with the Declaration of Helsinki and the University of Otago’s policy on research (HD23/058). The committee approved wavier of consent to participate, as the data was collected anonymously.

Patient Cohort

Cases of OKC and OT with histologically confirmed diagnosis between 31/05/2008 and 31/05/2023 were accessed from the electronic database (Oral Path Pro and Sysmex) and paper archives of the Oral Pathology Centre, Faculty of Dentistry, University of Otago, using the search terms according to 2022 WHO classification or their alias based on previous classifications (2005 and 2017).

The cases meeting above terms were included in the study. Tumours of non-odontogenic origin, metastatic in origin, local spread from adjacent areas and cysts other than OKC were excluded from the study sample (see below).

Classifications

The study period spanned three WHO classifications of odontogenic lesions; the third [2005], fourth [2017] and Fifth edition [2022]. The main difference in classification were between 3rd and the 4th edition; In the 3rd edition the OKC and the calcifying odontogenic cyst (COC), were classified as tumors and were referred to as the keratocystic odontogenic tumor and the calcifying cystic odontogenic tumor. However, they were subsequently reclassified as cysts in the 4th edition in 2017. Furthermore, odontoameloblastoma was excluded and ameloblastic fibrodentinoma/fibroodontoma were reclassified as developing odontoma. The specimens before 2022 will therefore be renamed according to the 5th edition in 2022. OKC and COC were included in study.

Referring clinicians

Clinicians were categorised into surgical and non-surgical specialties. For the purpose of this study, the term specialty was defined as the workforce, thus general dentist, periodontist and public health, endodontist and orthodontists were grouped under the category of non-surgical dental specialties. Surgical specialties comprised oral and maxillofacial surgeons (OMFS), oral surgeons, ear, nose, and throat surgeons (ENT/otolaryngologists).

Artificial intelligence

A diagnosis was requested with the input of demographical and clinical data into an open AI platform (ChatGPT) as well as ORAD II (http://www.orad.org/cgi-bin/orad/index.pl) (Fig. 1).

Definition of different concordance categories:

1) Concordance is when the first provisional diagnosis matched the definitive diagnosis.

2) Partial concordance will be when multiple provisional diagnoses were given, of which included the correct diagnosis within first five on the list but not listed as the first diagnosis.

3) Discordant will be defined as

a. an incorrect provisional diagnosis (e.g., “Ameloblastoma” given as provisional diagnosis for an OKC case).

b. a widely termed clinical provisional diagnosis (e.g., “Cyst, Tumor” given as provisional diagnosis for an OKC case)

c. when differential diagnoses did not contain the correct diagnosis.

d. when no diagnosis was provided on the referral, clinical letter, or operation notes.

Statistical Methods

Descriptive statistics for clinician type distributions was computed using SPSS. Concordance was compared between surgical/non-surgical specialties, ORAD and CHAT-GPT to assess the association. For the purpose of Odds ratio (OR) calculation the data were dichotomised into Positive and Negative for histologically concordant diagnosis by redefining a) Partial concordance as Negative for clinician, ORAD and CHAT-GPT for OR analysis 1 (Fig. 2); b) Partial concordance was redefined as Positive for clinicians and Negative for ORAD and CHAT-GPT for OR analysis 2 (Fig. 2). OR was calculated using the Chi-Square test on SPSS to compare the likely of concordant diagnosis when Clinical diagnosis was augmented with ORAD or CHAT-GPT. Only specimens with sufficient clinical and radiographical information were included in OR analysis. p-values were obtained via Fishers Exact Test to determine statistical significance for the OR.

Diagnostic accuracy was also calculated for Ameloblastoma using subset of data of OKC and ameloblastoma. Amelobalstoma diagnosis was classified as positive and the OKC as negative. The calculation for sensitivity, specificity, Accuracy, F1 Score, PPV and NPV were determined using calculation formulae as below. Pathology was also grouped into Aggressive and Indolent based on their potential clinical behavior. Aggressive were classified as positive and Indolent negative to calculate diagnostic accuracy.

Sensitivity = TP/ TP + FN

Specificity = TN /TN + FP

Accuracy = TP + TN /TP + TN + FP + FN

F1 score = 2 ∗ TP /2 ∗ TP + FP + FN

Positive Predictive Value (PPV) = TP/ TP + FP

Negative Predictive Value (NPV) = TN / TN + FN

A total of 34 225 specimens were submitted to the Oral Pathology Centre, Faculty of Dentistry, University of Otago, School of Dentistry, oral pathology diagnostic service between 2008 to 2023. Of these 623 specimens fulfilled the criteria and were included in the study (Fig. 3). Surgical specialties contributed to 74% of the sample, 19% from non-surgical specialties and 7% had no details. Oral and maxillofacial surgeon contributed to majority of the referrals at 58%. Second opinion referral from general pathologist accounted for 19% of the sample.

Concordance between Clinical diagnosis and histopathological diagnosis was 43%, partially concordant in 12% and discordant in 44%. Similar concordance rate was also seen with ORAD (46%) and CHAT-GPT (41%) (Fig. 4). However, partial concordance was higher, and discordance were lower for ORAD (35% and 19%) and CHAT-GPT (50% and 8%). In respect to type of referring clinicians, the concordance was higher for Surgeons (48%) when compared to non-surgical specialists and GDP (30%). The difference was significant when comparing Surgeons and non- surgeons; and when comparing clinicians, ORAD and CHAT-GPT.

When comparing direct concordance rate between clinician’s diagnosis compared to CHAT-GPT or ORAD, the concordance for former (60%) was higher when compared to latter (44%) (Table 1). However, when ORAD and Clinicians had concordant diagnosis i.e. same diagnosis, the rate of concordance with the histopathological diagnosis was higher (72%) compared to CHAT-GPT (60%) (Table 1). Similarly, the odds ratio of having concordance with pathologist when Clinician had same diagnosis as ORAD (OR 3.4, p-value < 0.05) was higher than CHAT-GPT (OR 1.9, p-value < 0.05) (Table 2).

For Odds ratio, analysis was performed on 422 specimens with sufficient clinical and radiographical information (Table 2). The OR of having concordance amongst the clinician with aid of computer, with histopathology were between 1.5–2.2. The OR were higher for ORAD than CHAT-GPT for all clinicians, surgeons, and non-surgeons alike. When first diagnoses of all three diagnosticians are considered (ORAD, CHAT-GPT and Clinician), the OR of having correct diagnosis increased to 2.8. All results were statistically significant.

Furthermore, the OR of having the concordant diagnosis above all of clinician’s differential diagnosis were also significant when considering only the first diagnosis of the AI (Table 2. Clinician Partial = Concordant, and AI Partial = Discordant).

A subset of data of Ameloblastoma and OKC were reviewed and analyzed for ability to differentiate former from latter. The sensitivity, specificity, Accuracy and F1 were similar amongst the three groups (Table 3). However, the CHAT-GPT had the highest sensitivity at 75.9% (Clinician 66.7% & ORAD 66.7%) and F1 value at 60.74 (Clinician 58.5 & ORAD 56.5). Furthermore, when the data were divided into Aggressive and Indolent group, CHAT-GPT also had the highest rate of sensitivity, specificity, accuracy and F1 value in differentiating Indolent from aggressive lesions (Table 3).

The concordance between the clinician's diagnosis and the histopathological diagnosis was 43%. This is comparable to the concordance rates for ORAD (46%) and CHAT-GPT (41%). Another study assessing the alignment between clinical and pathological diagnoses of oral lesions observed a similar concordance rate for OKC at 40%. This was lower than the rate for the more frequently seen dentigerous cyst (58.6%), and concordance rate of all lesions (66.6%). This study's sample encompassed prevalent pathologies such as periapical granuloma, radicular cyst, mucocele, and dentigerous cyst, all of which exhibited concordance rates exceeding 85% (26). The authors posited that the reduced concordance rate for OKC might be attributed to its variable radiographic presentation, be it multilocular or unilocular (26). Additionally, OT’s lower incidence rate could be a contributing factor.

In this study, surgeons exhibited a higher rate of concordance at 48% when compared to the combined average of non-surgical specialists and GDPs, which stood at 30%. The increased concordance among surgeons is likely attributed to their advanced postgraduate training and expertise in oral and maxillofacial pathology. Another study reported an even higher concordance rate for surgeons at 68.5%, in contrast to non-surgical specialists (26). However, it's worth noting that this study included all types of lesions, including common pathologies, and did not provide separate concordance data for rarer pathologies based on the clinician type. In a different New Zealand-based study, the concordance rate between GDPs (49.4%) and specialists (51%) was closely matched when examining a spectrum of soft tissue lesions, ranging from benign to malignant (23). However, this study did not differentiate between specialists based on their training. Seonane et al. reported comparable concordance levels between GDPs and OMS specialists for benign lesions. Nevertheless, OMS specialists demonstrated superior diagnostic accuracy for malignant lesions, with scores of 0.78 for GDPs versus a perfect 1.00 for the specialists (24).

Differentiating between ameloblastoma and OKC is crucial due to the distinct treatment approaches necessitated by the locally invasive nature of ameloblastoma. CHAT-GPT demonstrated the highest sensitivity at 75.9% (Clinicians and ORAD both at 66.7%) and the highest F1 value at 60.74 (Clinicians at 58.5 and ORAD at 56.5, Table 3). This was also true when categorizing the data into Aggressive and Indolent groups. While there's a lack of studies evaluating the diagnostic performance of large language models like CHAT-GPT for odontogenic neoplasms, CNN have been explored for their diagnostic capabilities using radiographs. Chai et al. assessed the CNN and Inception v3 deep learning algorithm against OMS experts in differentiating OKC from ameloblastoma using CBCT images. The CNN achieved superior results in sensitivity, specificity, accuracy, and F1 score compared to both senior and notably junior surgeons. This indicates the potential of CNN as a diagnostic aid, especially for less experienced practitioners (58). Yang et al. evaluated the You Only Look Once v2 model on panoramic radiographs for distinguishing among dentigerous cyst, OKC, ameloblastoma, and no lesions. The CNN's performance metrics, although variable across the different lesions, generally surpassed those of OMS experts and were considerably better than GDP results. The study emphasized the potential of AI, particularly for clinicians less familiar with OMS pathology (59). Several other studies have also underscored the high diagnostic accuracy of CNN, especially when enhanced with transfer learning and more complex network architectures. These networks have been applied across a spectrum of dental concerns, from cystic and traumatic lesions to premalignant/malignant growths, sinus pathologies, TMJ disorders, and histopathological specimen evaluations (32, 38, 39, 41–44, 60–65)

In terms of diagnostic concordance, the alignment between clinicians and CHAT-GPT was more pronounced at 60%, compared to the clinician-ORAD concordance rate of 44% (Table 3). Interestingly, when both the AI system and the clinician concurred in their provisional diagnosis, there was even greater concordance with the pathologist's diagnosis. This was seen with ORAD at 72% and CHAT-GPT at 60%. The statistical odds ratio (OR) further quantified this observation: when the clinician's and ORAD's diagnoses matched, the OR was 3.4 (with a p-value < 0.05); for clinician and CHAT-GPT concordance, the OR was 1.9 (with a p-value < 0.05). Essentially, when there's diagnostic concordance between the clinician and either CHAT-GPT or ORAD, the odds of arriving at a correct diagnosis are amplified, ranging from 1.9 to 3.4 times.

Ezhov et al. utilized CNN and deep learning to aid clinicians in identifying periapical lesions on CBCT (64). Their study juxtaposed sensitivity and specificity outcomes between aided and unaided clinician groups. The findings reveal sensitivity scores of 0.85 for aided groups compared to 0.77 for the unaided, while specificity stood at 0.97 and 0.96, respectively (p = 0.032) (64). Drawing parallels to our study, we observed an OR ranging from 1.5–2.2 for accurate diagnosis when the clinician's first provisional diagnosis was considered in conjunction with either ORAD or CHAT-GPT's fist diagnosis. This likelihood further amplified to 2.8 when both the clinician's first diagnosis and those of ORAD and CHAT-GPT were accounted for (P < 0.05). This trend persisted even when the first diagnoses by ORAD or CHAT-GPT were considered in conjunction with the entire differential diagnosis set by the clinician. These observations underscore the augmented diagnostic accuracy potential when AI tools are leveraged alongside traditional clinical judgments.

CHAT-GPT, while being model with expansive training data, might not be intrinsically tailored for niche areas like oral pathology. Its inability to directly decipher radiographic images is a significant limitation, especially in a domain where visual data is paramount. Moreover, its propensity to overlook nuanced clinical contexts, combined with its "black box" nature wherein the reasoning behind its outputs remains opaque casts doubts on its standalone reliability in critical diagnostic scenarios. There is a significant concern is model bias with LLM. Skewed clinical analysis results stand out as a primary issue. LLMs, being solely driven by data, base their insights on patterns found in the training dataset. The reliability of an LLM hinges heavily on the data's quality and comprehensiveness. While LLMs continue to advance, even with innovations like GPT-4, complete reliance on AI for clinical analysis remains questionable. The need for human oversight in validating AI outcomes persists. Therefore, integration of neural-symbolic models has been suggested by some authors(66). These models merge neural networks' ability to discern patterns in extensive datasets.

Furthermore, CHAT-GPT-4 has limited ability to upload and interpret images but still more reliable when textual input is made. However, in future research an LLM's insights can be coupled with specific visualization tools to pinpoint areas affected by pathology. CHAT-GPT-4 may be coupled with tools such as ALBEF model (A Lite BERT for Adaptive Embedding Factorization) tailored for image-text tasks (66).

Conversely, ORAD, rooted in the Bayesian approach, offers a more structured probabilistic framework for diagnosis. However, its efficacy is deeply tethered to the accuracy of the prior probabilities fed into the system. Inaccurate or misjudged prior beliefs can lead to diagnostic discrepancies. Furthermore, the Bayesian method comes with its set of challenges. Its need for specific assumptions about data distributions, its computational demands, especially with complex models, and its vulnerability to overfitting can curtail its reliability. A critical determinant of ORAD's success is the robustness of its lesion database. An outdated or inadequately detailed database can significantly impede its diagnostic accuracy.

Therefore, the integration of ChatGPT (LLM), CNN, and Bayes' theorem for diagnostic aid offers a multi-faceted approach to medical diagnosis. ChatGPT excels in processing textual clinical data, CNN specializes in interpreting medical images, and Bayes' theorem provides a probabilistic framework to enhance decision-making by combining prior knowledge with new evidence.

Concordance rate for surgeons were higher than for non-surgical specialists and general dentists. Despite their limitations both Computer aided system using lesion data base information system based on Bayes' theorem and untrained LLM have shown promising results at aiding clinician in differential diagnosis of odontogenic tumours and OKC. These findings support the potential integration of these tools in the diagnostic process with potential room for improvement. This may be in prospect of a multimodal system, amalgamating LLMs, CNNs, and possibly incorporating the tenets of Bayes' theorem, offers a promising avenue.

Conflict of interest

The authors have no conflicts of interest to declare. All co-authors have seen and agree with the contents of the manuscript and there is no financial interest to report.

Funding

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

Author Contribution

Authors Paul Kim, Benedict Seo and Harsha De Silva made substantial contributions to the conception or design of the work; or the acquisition, analysis, or interpretation of data; or the creation of new software used in the work; drafted the work or revised it critically for important intellectual content; approved the version to be published; and agree to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Data availability

Data is provided within the manuscript or supplementary information files.

International Agency for Research on Cancer; 2022. WHO Classification of Tumours Editorial Board. Head and neck tumours. 5th ed. Vol. 9. WHO ; 2022.
Ahire MS, Tupkari J V., Chettiankandy TJ, Thakur A, Agrawal RR. Odontogenic tumors: A 35-year retrospective study of 250 cases in an Indian (Maharashtra) teaching institute. Indian J Cancer. 2018 Jul 1;55(3):265–72.
EL-Gehani R, Orafi M, Elarbi M, Subhashraj K. Benign tumours of orofacial region at Benghazi, Libya: A study of 405 cases. Journal of Cranio-Maxillofacial Surgery. 2009 Oct;37(7):370–5.
Becconsall-Ryan K, Love RM. Range and demographics of radiolucent jaw lesions in a New Zealand population. J Med Imaging Radiat Oncol. 2011 Feb;55(1):43–51.
Kelloway E, Ha WN, Dost F, Farah CS. A retrospective analysis of oral and maxillofacial pathology in an Australian adult population. Aust Dent J. 2014;59(2):215–20.
Ha WN, Kelloway E, Dost F, Farah CS. A retrospective analysis of oral and maxillofacial pathology in an Australian paediatric population. Aust Dent J. 2014;59(2):221–5.
Silveira FM, Soares Macedo CC, Vieira Borges CM, Mauramo M, Uchoa Vasconcelos AC, Soares AB, et al. Odontogenic tumors: An 11‐year international multicenter study. 2020;
PINDBORG JJ. Histological typing of odontogenic tumours、jaw cysts and allied lesions. International histological classification of tumours, No 5 [Internet]. 1971 [cited 2023 Sep 18];31–4. Available from: https://cir.nii.ac.jp/crid/1573950399901219200.bib?lang=en
KRAMER I. R. International histological classification of tumors : histological typing of odontogenic tumors. World Health Organization [Internet]. 1992 [cited 2023 Sep 18]; Available from: https://cir.nii.ac.jp/crid/1572543025372464256.bib?lang=en
Barnes L, Eveson J, Reichart P, Sidransky D. World Health Organization classification of tumours: pathology and genetics of head and neck tumours. 2005;
Soluk-Tekkesin M, Wright JM. The World Health Organization Classification of Odontogenic Lesions: A Summary of the Changes of the 2022 (5th) Edition. Vol. 38, Turk Patoloji Dergisi. Federation of Turkish Pathology Societies; 2022. p. 168–84.
Vered M, Wright JM. Update from the 5th Edition of the World Health Organization Classification of Head and Neck Tumors: Odontogenic and Maxillofacial Bone Tumours. Head Neck Pathol. 2022 Mar 1;16(1):63–75.
EI-Naggar AK. WHO classification of head and neck tumours. International Agency; 2017.
Small IA, Waldron CA. Ameloblastomas of the jaws. Oral Surgery, Oral Medicine, Oral Pathology [Internet]. 1955;8(3):281–97. Available from: https://www.sciencedirect.com/science/article/pii/0030422055903509
Jattan R, De Silva HL, De Silva RK, Rich AM, Love RM. A case series of odontogenic keratocysts from a New Zealand population over a 20-year period. New Zealand Dental Journal. 2011;107(4).
Nakamura N, Higuchi Y, Mitsuyasu T, Sandra F, Ohishi M. Comparison of long-term results between different approaches to ameloblastoma. Oral Surg Oral Med Oral Pathol Oral Radiol Endod. 2002;93(1):13–20.
Reichart PA, Philipsen HP, Sonner S. Ameloblastoma: biological profile of 3677 cases. Eur J Cancer B Oral Oncol. 1995;31(2):86–99.
Takahashi S, Idaira Y, Sato T, Asada Y, Nakagawa Y. Unicystic ameloblastoma in a child treated with a combination of conservative surgery and orthodontic treatment: A case report. Journal of Clinical Pediatric Dentistry. 2019;43(2):121–5.
Sano K, Yoshimura H, Tobita T, Kimura S, Imamura Y. Spontaneous eruption of involved second molar in unicystic ameloblastoma of the mandible after marsupialization followed by enucleation: A case report. Journal of Oral and Maxillofacial Surgery. 2013 Jan;71(1):66–71.
Singh AK, Khanal N, Chaulagain R, Bhujel N, Singh RP. How effective is 5-Fluorouracil as an adjuvant in the management of odontogenic keratocyst? A systematic review and meta-analysis. Vol. 60, British Journal of Oral and Maxillofacial Surgery. Churchill Livingstone; 2022. p. 746–54.
Winters R, Garip M, Meeus J, Coropciuc R, Politis C. Safety and efficacy of adjunctive therapy in the treatment of OKC (odontogenic keratocyst): a systematic review. British Journal of Oral and Maxillofacial Surgery [Internet]. 2023 Apr; Available from: https://linkinghub.elsevier.com/retrieve/pii/S0266435623001146
Johnson NR, Batstone MD, Savage NW. Management and recurrence of keratocystic odontogenic tumor: A systematic review. Vol. 116, Oral Surgery, Oral Medicine, Oral Pathology and Oral Radiology. 2013.
Patel KJ, De Silva HL, Tong DC, Love RM. Concordance between clinical and histopathologic diagnoses of oral mucosal lesions. Journal of Oral and Maxillofacial Surgery. 2011 Jan;69(1):125–33.
Seoane J, Varela-Centelles PI, Ramírez JR, Cameselle-Teijeiro J, Romero MA. Artefacts in oral incisional biopsies in general dental practice: A pathology audit. Oral Dis. 2004 Mar;10(2):113–7.
Soyele OO, Aborisade A, Adesina OM, Olatunji A, Adedigba M, Ladeji AM, et al. Concordance between clinical and histopathologic diagnosis and an audit of oral histopathology service at a Nigerian tertiary hospital. Pan African Medical Journal. 2019;34.
Seifi S, Hoseini SR, Bijani A. Evaluation of clinical versus pathological difference in 232 cases with oral lesion.
Chen S, Forman M, Sadow PM, August M. The Diagnostic Accuracy of Incisional Biopsy in the Oral Cavity. Journal of Oral and Maxillofacial Surgery. 2016 May 1;74(5):959–64.
Seo B, Hussaini HM, Rich AM. Second opinion oral pathology referrals in New Zealand. Pathology [Internet]. 2017 Apr 1;49(3):277–84. Available from: https://doi.org/10.1016/j.pathol.2016.11.007
Hammond MEH, Stehlik J, Drakos SG, Kfoury AG. Bias in Medicine: Lessons Learned and Mitigation Strategies. Vol. 6, JACC: Basic to Translational Science. Elsevier Inc.; 2021. p. 78–85.
Croskerry P. 50 Cognitive and Affective Biases in Medicine (alphabetically).
O’sullivan ED, Schofield SJ. Cognitive bias clinical medicine. Vol. 48, Journal of the Royal College of Physicians of Edinburgh. Royal College of Physicians of Edinburgh; 2018. p. 225–32.
Ding H, Wu J, Zhao W, Matinlinna JP, Burrow MF, Tsoi JKH. Artificial intelligence in dentistry—A review. Vol. 4, Frontiers in Dental Medicine. Frontiers Media S.A.; 2023.
Warin K, Limprasert W, Suebnukarn S, Inglam S, Jantana P, Vicharueang S. Assessment of deep convolutional neural network models for mandibular fracture detection in panoramic radiographs. Int J Oral Maxillofac Surg. 2022 Nov 1;51(11):1488–94.
Warin K, Limprasert W, Suebnukarn S, Jinaporntham S, Jantana P. Performance of deep convolutional neural network for classification and detection of oral potentially malignant disorders in photographic images. Int J Oral Maxillofac Surg. 2022 May 1;51(5):699–704.
Warin K, Limprasert W, Suebnukarn S, Jinaporntham S, Jantana P, Vicharueang S. AI-based analysis of oral lesions using novel deep convolutional neural networks for early detection of oral cancer. PLoS One. 2022 Aug 1;17(8 August).
Warin K, Limprasert W, Suebnukarn S, Paipongna T, Jantana P, Vicharueang S. Maxillofacial fracture detection and classification in computed tomography images using convolutional neural network-based models. Sci Rep [Internet]. 2023 Dec 1 [cited 2023 Sep 14];13(1). Available from: https://pubmed.ncbi.nlm.nih.gov/36859660/
Alabi RO, Youssef O, Pirinen M, Elmusrati M, Mäkitie AA, Leivo I, et al. Machine learning in oral squamous cell carcinoma: Current status, clinical concerns and prospects for future—A systematic review. Vol. 115, Artificial Intelligence in Medicine. Elsevier B.V.; 2021.
Hung K, Montalvao C, Tanaka R, Kawai T, Bornstein MM. The use and performance of artificial intelligence applications in dental and maxillofacial radiology: A systematic review. Dentomaxillofacial Radiology. 2019;49(1).
Hung KF, Yeung AWK, Bornstein MM, Schwendicke F. Personalized dental medicine, artificial intelligence, and their relevance for dentomaxillofacial imaging. Vol. 52, Dento maxillo facial radiology. NLM (Medline); 2023. p. 20220335.
Heo MS, Kim JE, Hwang JJ, Han SS, Kim JS, Yi WJ, et al. Dmfr 50th anniversary: Review article artificial intelligence in oral and maxillofacial radiology: What is currently possible? Vol. 50, Dentomaxillofacial Radiology. British Institute of Radiology; 2020.
Calazans MAA, Ferreira FABS, Alcoforado M de LMG, Santos A dos, Pontual A dos A, Madeiro F. Automatic Classification System for Periapical Lesions in Cone-Beam Computed Tomography. Sensors. 2022 Sep 1;22(17).
Kwon O, Yong TH, Kang SR, Kim JE, Huh KH, Heo MS, et al. Automatic diagnosis for cysts and tumors of both jaws on panoramic radiographs using a deep convolution neural network. Dentomaxillofacial Radiology. 2020 Jun 11;49(8).
Poedjiastoeti W, Suebnukarn S. Application of convolutional neural network in the diagnosis of Jaw tumors. Healthc Inform Res. 2018 Jul 1;24(3):236–41.
Endres MG, Hillen F, Salloumis M, Sedaghat AR, Niehues SM, Quatela O, et al. Development of a deep learning algorithm for periapical disease detection in dental radiographs. Diagnostics. 2020 Jun 1;10(6).
Wellwood JM, Spiegelhalter DJ. Computers and the diagnosis of acute abdominal pain. Br J Hosp Med. 1989;41(6):564–7.
Rana M, Modrow D, Keuchel J, Chui C, Rana M, Wagner M, et al. Development and evaluation of an automatic tumor segmentation tool: A comparison between automatic, semi-automatic and manual segmentation of mandibular odontogenic cysts and tumors. Journal of Cranio-Maxillofacial Surgery. 2015 Apr 1;43(3):355–9.
Abdolali F, Zoroofi RA, Otake Y, Sato Y. Automated classification of maxillofacial cysts in cone beam CT images using contourlet transformation and Spherical Harmonics. Comput Methods Programs Biomed. 2017 Feb 1;139:197–207.
Mikulka J, Gescheidtová E, Kabrda M, Peřina V. Classification of Jaw Bone Cysts and Necrosis via the Processing of Orthopantomograms.
Hirosawa T, Harada Y, Yokose M, Sakamoto T, Kawamura R, Shimizu T. Diagnostic Accuracy of Differential-Diagnosis Lists Generated by Generative Pretrained Transformer 3 Chatbot for Clinical Vignettes with Common Chief Complaints: A Pilot Study. Int J Environ Res Public Health. 2023 Feb 1;20(4).
Sezgin E, Sirrianni J, Linwood SL. Operationalizing and Implementing Pretrained, Large Artificial Intelligence Linguistic Models in the US Health Care System: Outlook of Generative Pretrained Transformer 3 (GPT-3) as a Service Model. JMIR Med Inform. 2022 Feb 1;10(2).
Wu X, Chen J, Yun D, Yuan M, Liu Z, Yan P, et al. Effectiveness of an Ophthalmic Hospital-Based Virtual Service during the COVID-19 Pandemic. Ophthalmology. 2021 Jun 1;128(6):942–5.
Luo J, Lan L, Yang D, Huang S, Li M, Yin J, et al. Early Prediction of Organ Failures in Patients with Acute Pancreatitis Using Text Mining. Sci Program. 2021;2021.
Zeng J, Gensheimer MF, Rubin DL, Athey S, Shachter RD. Uncovering interpretable potential confounders in electronic medical records. Nat Commun. 2022 Dec 1;13(1).
Hatton GE, Pedroza C, Kao LS. Bayesian Statistics for Surgical Decision Making. Vol. 22, Surgical Infections. Mary Ann Liebert Inc.; 2021. p. 620–5.
Wiener F, Laufer D, maxillofacial ARI journal of oral and, 1986 undefined. Computer-aided diagnosis of odontogenic lesions. Elsevier [Internet]. [cited 2023 Sep 15]; Available from: https://www.sciencedirect.com/science/article/pii/S0300978586800655
White SC. Computer-aided differential diagnosis of oral radiographic lesions. Dentomaxillofac Radiol [Internet]. 1989 [cited 2023 Sep 14];18(2):53–9. Available from: https://pubmed.ncbi.nlm.nih.gov/2699592/
Iwasaki H. Bayesian belief network analysis applied to determine the progression of temporomandibular disorders using MRI. Dentomaxillofacial Radiology. 2015 Apr 1;44(4).
Chai ZK, Mao L, Chen H, Sun TG, Shen XM, Liu J, et al. Improved Diagnostic Accuracy of Ameloblastoma and Odontogenic Keratocyst on Cone-Beam CT by Artificial Intelligence. Front Oncol. 2022 Jan 27;11.
Yang H, Jo E, Kim HJ, Cha IH, Jung YS, Nam W, et al. Deep learning for automated detection of cyst and tumors of the jaw in panoramic radiographs. J Clin Med. 2020 Jun 1;9(6):1–14.
Liu Z, Liu J, Zhou Z, Zhang Q, Wu H, Zhai G, et al. Differential diagnosis of ameloblastoma and odontogenic keratocyst by machine learning of panoramic radiographs. Int J Comput Assist Radiol Surg. 2021 Mar 1;16(3):415–22.
Bispo MS, de Queiroz Pierre MLG, Apolinário AL, dos Santos JN, Junior BC, Neves FS, et al. Computer tomographic differential diagnosis of ameloblastoma and odontogenic keratocyst: Classification using a convolutional neural network. Dentomaxillofacial Radiology. 2021 Oct 1;50(7).
Ariji Y, Yanashita Y, Kutsuna S, Muramatsu C, Fukuda M, Kise Y, et al. Automatic detection and classification of radiolucent lesions in the mandible on panoramic radiographs using a deep learning object detection technique. Oral Surg Oral Med Oral Pathol Oral Radiol. 2019 Oct 1;128(4):424–30.
Lee A, Kim MS, Han SS, Park PG, Lee C, Yun JP. Deep learning neural networks to differentiate Stafne’s bone cavity from pathological radiolucent lesions of the mandible in heterogeneous panoramic radiography. PLoS One. 2021 Jul 1;16(7 July).
Ezhov M, Gusarev M, Golitsyna M, Yates JM, Kushnerev E, Tamimi D, et al. Clinically applicable artificial intelligence system for dental diagnosis with CBCT. Sci Rep. 2021 Dec 1;11(1).
Bittencourt MAV, Mafra PH de S, Julia RS, Travençolo BAN, Silva PUJ, Blumenberg C, et al. Accuracy of computer-aided image analysis in the diagnosis of odontogenic cysts: A systematic review. Med Oral Patol Oral Cir Bucal. 2021;26(3):e368–78.
Huang H, Zheng O, Wang D, Yin J, Wang Z, Ding S, et al. ChatGPT for shaping the future of dentistry: the potential of multi-modal large language model. Int J Oral Sci [Internet]. 2023 Jul 28;15(1):29. Available from: https://www.nature.com/articles/s41368-023-00239-y

Table 1 Concordance between Clinicians and ORAD, Clinicians and CHAT- GPT; and Concordance between clinical and histopathologic diagnosis when ORAD/CHAT-GPT have concordant diagnosis with Clinicians.

	Concordant	Partial	Discordant
Concordance between Clinicians and ORAD, Clinicians and CHAT- GPT
ORAD and Clinician	172 (44.44)	141 (36.43)	74 (19.12)
CHAT and Clinician	233 (60.20)	111 (28.68)	33 (8.53)
Concordance between clinical and histopathologic diagnosis when ORAD/CHAT-GPT have concordant diagnosis with Clinicians.
ORAD AND Clinician concordant	150 (72.464)	13 (6.28)	44 (21.26)
CHAT-GPT AND Clinician concordant	199 (59.58)	43 (12.87)	92 (27.55)

Table 2. Odds ratio of Histologically Concordant diagnosis

	Odds ratio	P value
Clinician and AI Partial Concordant = Discordant *
Clinician +ORAD	2.2	2.218×10^-8
Clinician +CHAT	1.6	0.001
Surgeon +ORAD	2.2	9.310×10^-7
Surgeon +CHAT	1.5	0.025
Non-Surgeon +ORAD	2.1	0.028
Non-Surgeon +CHAT	1.8	0.083
Clinician CHAT+ORAD	2.8	8.994×10^-13
Clinician Partial = Concordant, and AI Partial = Discordant *
Clinician +ORAD	1.9	2.11X10^-5
Clinician + CHAT	1.4	0.014
Clinician CHAT+ORAD	2.2	7.67x10^-8
Histological concordance when Clinician and ORAD/CHAT are concordant OR (PARTIAL = Discordant) **
ORAD AND Clinician concordant	3.4	4.66 x10 ^-13
CHAT AND Clinician concordant	1.9	2.678 X 10^-6

*422 specimens with sufficient clinical and radiological data were included in the analysis.

** Numbers included as per Table 1:Concordance between clinical and histopathologic diagnosis when ORAD/CHAT-GPT have concordant diagnosis with Clinicians.

Table 3. Diagnostic accuracy for Ameloblastoma and Locally Aggressive/Malignant

	Clinician (%)	ORAD (%)	CHAT-GPT (%)
Ameloblastoma*
Sensitivity	66.67	66.67	75.93
Specificity	86.75	84.94	83.94
Accuracy	83.17	81.68	82.51
F1	58.54	56.47	60.74
PPV	48.98	50.62	50.62
NPV	92.16	94.14	94.14
Aggressive/malignant**
Sensitivity	54.09	65.84	74.73
Specificity	94.33	93.26	94.09
Accuracy	67.54	75.00	81.20
F1	68.93	77.81	84.11
PPV	95.12	96.18	96.18
NPV	57.80	65.14	65.14

* Data for Ameloblastoma and OKC, Partial concordance included as positive diagnosis.

** Locally Aggressive or Malignant is Positive and Benign/Indolent or normal anatomy is negative.

No competing interests reported.

DataforJOMS.xlsx

Download PDF

Journal Publication

published 26 Jul, 2024

Read the published version in Oral and Maxillofacial Surgery →

Editorial decision: Revision requested
02 Jul, 2024
Reviews received at journal
24 Jun, 2024
Reviewers agreed at journal
23 Jun, 2024
Reviewers agreed at journal
02 May, 2024
Reviewers agreed at journal
03 Apr, 2024
Reviewers invited by journal
29 Mar, 2024
Submission checks completed at journal
20 Mar, 2024
Editor assigned by journal
20 Mar, 2024
First submitted to journal
16 Mar, 2024

You are reading this latest preprint version

Clinicopathological concordance of clinicians, Chat-GPT4 and ORAD for odontogenic keratocysts and tumours referred to a single New Zealand Centre- A 15-year retrospective study.

Status:

Journal Publication

Version 1

Abstract

Figures

Introduction

Material and Methods

Results

Discussion

Conclusion

Declarations

Conflict of interest

Funding

Author Contribution

Data availability

References

Tables

Additional Declarations

Supplementary Files

Status:

Journal Publication

Version 1