Development and Validation of a Cbct-Based Articial Intelligence System for Accurate Diagnoses of Dental Diseases

Cone-beam computed tomography (CBCT) in dental practice is becoming increasingly popular. However, the correct teeth identication, positioning and diagnosis based on CBCT can be tedious and challenging for the untrained eye. This is due to additional training, specic knowledge and time required for analysis and diagnosis. When compared to conventional dental imaging methods. In this study, we introduce a novel articial intelligence (AI) system that facilitates analysis and diagnosis. This system is based on deep learning approaches that can localize teeth and dene pathologies within three-dimensional CBCT scans. The study showed that the diagnostic performance of AI system image interpretation reaches and sometimes exceeds in comparison to clinician’s expertise. In this randomized cross-over trial we demonstrated a signicant improvement of aided diagnostic accuracy for various dental diseases in comparison to a group of radiologists that made unaided decisions. AI can be used for both stand-alone CBCT interpretation and as a decision support system to improve quality of diagnostics and time eciency. to evaluate diagnostic performance and to measure the localization capabilities of our model, binary voxel-wise intersection over union (IoU) of the ground truth and prediction were used. In the current study, we tested the diagnostic performance of this AI system compared to the experienced dentomaxillofacial examiners. Secondly, we compared the diagnostic performance and required diagnostic time within aided and unaided groups in real time clinical environment. To the best of our knowledge, no study was performed for a real time clinical performance trial of CBCT imaging diagnosis to demonstrate clinical safety and effectiveness of its use by dentists in clinical settings. In the rst study, ground truth set-up results showed the sensitivity and specicity values for human examiners in between 0.9318–0.9438 while the value for this AI system There was no statistically signicant difference between Diagnocat and the radiologists > 0.05). The results of sensitivity and specicity values also similar this AI system and the examiners. For the second study it was shown that the AI-aided group had a superior performance compared to the unaided group. The group with AI support 0.85 averaged condition sensitivity value and 0.97 specicity value opposed to 0.77 and 0.96 average was 17.55 minutes minutes unaided respectively. a signicant between two groups localizer. localization caries probability using segmentation of carious lesions found inside a tooth area. (3) The periapical lesion localization module detects periapical lesion presence and allows classication of 4 lesion types found around a tooth. The embedded localizers of three classication modules are implemented as volumetric U-Nets performing semantic segmentation.


Introduction
Radiological examination is an essential part of patient management in dentistry and is frequently used to supplement and aid clinical diagnosis of pathology related to teeth and adjacent structures [1][2][3][4] . CBCT was proposed for maxillofacial imaging 5,6 during the last decade and is now becoming increasingly popular for such diagnoses. It offers distinct advantages including lower radiation doses, compared to medical CT, and the potential of importing and exporting individualized, DICOM and overlap-free reconstructed data to and from other applications [4][5][6][7] . CBCT can supply high-resolution three-dimensional (3D) images without distortion and superimposition of bone and other dental structures that can be seen in conventional radiography [8][9] .
Several studies have compared diagnostic accuracy of CBCT with conventional and digital panoramic and periapical radiography. CBCT has been shown to signi cantly increase the detection rate of tooth root canal spaces and periapical areas for the evaluation of dental infection and pathology compared to conventional imaging. This suggests that CBCT enhances recognition of periapical bone lesions and offers improved diagnosis accuracy, treatment planning, and thus prognostic outcomes. These and other possibilities along with increasing access to CBCT imaging for dentists are allowing the transition from 2D to 3D imaging in everyday dental practice [10][11] .
However, using CBCT for diagnostics requires the dentist to be a highly educated in radiographic diagnosis, which is not always possible. Because of the lack of time or experience of the average dentist, many pathologies can remain unidenti ed, thus computer-aided systems have been developed to assist in medical and dental imaging diagnosis [12][13][14][15] .
Arti cial intelligence (AI) provides a promising solution for such medical image interpretation. For object detection and segmentation, Convolutional Neural Networks (CNN) are most commonly used. There are several studies available with deep-learning methods, including CNNs, to assist clinicians in dentistry.
To optimize their use, AI systems must be applicable to real-world situations and must be designed for clinical evaluation and deployment. Furthermore, an important part of the development and integration of these AI systems is that its functionality (ease of use, speed and accuracy) reaches or exceeds the clinicians' expertise in such situations.
In this study we evaluate a novel AI system called Diagnocat, which is based on deep learning methods, to determine its real-time performance of CBCT imaging diagnosis of anatomical landmarks and pathologies and its clinical effectiveness and safety when used by dentists in a clinical setting.

Results
The output of this process was a set of condition detections, where for every tooth and condition model outputs a probability distribution along with predicted tooth number. This evaluation was calculated only once during this study and was not used for the system training purposes. Moreover, during this process engineers did not have access to the examiners' votes data. To score the results, an independent expert was provided with both ground truth and Diagnocat inference data. The outside expert was responsible for running data analysis and producing performance measurements for primary and secondary endpoints.
The results of AI evaluation are shown in Tables 1 and 2. Table 1 shows the overall sensitivity and speci city for the system and dentomaxillofacial radiology examiners. Outcome counting for Table 3 was summarized over case, tooth, and condition, whilst grouped by the participants. Over-all sensitivity values for human examiners ranged between 0.9318-0.9438 while the value for this AI system was 0.9239. Overall speci city values for ground truth examiners were between 0.9899-0.9946 while the value for this AI system was 0.9899. Both sensitivity and speci city were recorded as higher for human examiners. However, since examiners' evaluations were taken as the ground truth votes, this comparison is biased in favor of human examiners. Moreover, the difference in overall performance seems to be rather small, which we interpreted as a proof of Diagnocat standalone capability. Table 2 shows sensitivity and speci city values for the system given per condition. The results of speci city values were high, with the lowest being 0.94 when determining missing tooth. Sensitivity values were conditiondependent, with lowest values being around 0.7 for some di cult or subjective conditions such as evaluating the quality of endodontic treatment (missed canal, short lling, voids in root lling), and signs of dental caries (complex to diagnose using the CBCT). Notably, Diagnocat struggled to detect very rare anatomical con gurations of the tooth e.g., 5 canals or 4 roots. Finally, a rare subtype of periapical lesion, periapical radiopacity, did not register in the dataset. Yet, this subtype currently is not claimed as a diagnostic capability of the Diagnocat system.   Over lling Filling material is visualized beyond a radiographic apex. Should be speci ed only if a tooth was endodontically treated.

Periapical lesion
Presence of in ammatory periapical lesion adjacent to one or more roots of a tooth Pontic There is a pontic restoration in place of a tooth (either base or middle part) Post and core A tooth was restored with a post and core restoration

Number of roots
Number of roots in the tooth (1-4)

Short lling
Root canal lling is short (ends in 2 mm or more from a radiographic apex). Should be speci ed only if a tooth was endodontically treated.
Voids present in the root lling A root canal contains voids (spaces that were not lled during previous endodontic treatment). Should be speci ed only if a tooth was endodontically treated.
In Table 4 results of the study are presented. Sensitivity and speci city values are shown for aided versus unaided reads for each condition. The lowest sensitivity values for the aided group were 0.1818 and 0.3535, detecting the roots (n = 4) and periodontal bone loss. The lowest speci city value was 0.8111 for periodontal bone loss. For the unaided group the lowest sensitivity value was 0.2045 for periapical lesion and poorly de ned radiolucency, the smallest speci city value was 0.7973 for caries. The highest sensitivity and speci city for both groups was 1 for implant detection. The results for the overall sensitivity and speci city for aided and unaided groups, calculated as an aggregate of all conditions. The sensitivity values for aided and unaided groups were 0.8537 and 0.7672 while speci city were 0.9672 and 0.9616 respectively. There was a statistically signi cant difference between the groups (p < 0.05).
Statistical tests revealed the group with aided Diagnocat had a superior performance in comparison to ground truth.

Discussion
The integration of AI into the medical eld has dramatically accelerated in the past decade. The use of deep learning advanced almost synchronously in both medical and dental elds9. Previous studies in dentistry focused on image-processing algorithms to achieve high-accuracy classi cation and segmentation in dental radiographs. They used mathematical morphology, active contour models, levelset methods, Fourier descriptors, textures, Bayesian techniques, linear models, or binary support vector machines 15,26 . However, image components are usually obtained manually using these imageenhancement algorithms. The deep learning method used in this AI system (Diagnocat) yielded fairer outcomes by automatically obtaining image features. The objects detected in an image are classi ed into a pretrained network without preliminary diagnostics, as a result of processes such as various ltering and subdivision. With its direct problem-solving ability, deep learning is used extensively in the medical eld. Deep learning methods using CNNs are a cornerstone of medical image analysis 27 . Such methods have also been preferred in AI studies in dental radiology as well. Tooth detection, identi cation and numeration are the rst diagnostic steps in dental radiography. Image-processing algorithms have been developed with classi cation and segmentation in dental radiographs using mathematical morphology, active contour, or level-set method. Mahoor et al. 13 presented an automated dental identi cation system to classify and identify teeth in bitewing radiographs using Bayesian classi cation. Lin et al. 28 recommended a tooth classi cation and numbering system to e ciently segment, classify, and number teeth using an image enhancement technique in bitewing radiographs. Tooth detection and numbering have been researched intensively during the last few decades mainly using threshold and region-based techniques. CNN as a popular deep learning method has been used to detect and number teeth as well.
Eun et al. 29 emphasized that localization of teeth is important for dental image applications in their study. They suggested an original teeth localization technique for periapical radiographs by means of oriented tooth detection using a CNN. The results of their study showed that the proposed method is effective to localize teeth successfully.
Similar studies were done and reported in the literature for CBCT as well. Miki et al. 22 considered automatic teeth classifying system into 7 types from an axial slice on CBCT images using a CNN. They concluded that a 7-tooth type classi cation system can be used e ciently for the automatic charting of tooth lists. Another study performed by Oktay 23 described a CNN model modi ed with AlexNet architecture for tooth detection in panoramic radiographs. This study de ned mouth gap detection that showed possible placement of teeth for pre-processing step. It was concluded that this model could be e ciently used for the detection of teeth. Jader et al. 30 presented a study that used segmentation of teeth using mask region-based CNN method with transfer learning strategies. In a similar study Lee et al. 31 used a fully deep learning mask region-based convolutional neural network (R-CNN) method implemented through a ne-tuning process for automated tooth segmentation. This technique showed high performance for automatic teeth segmentation on panoramic radio-graphs. Recently published paper by Chen et al. 15 proposed a deep learning CNN model with a VGG16 network structure for the teeth detection and classi cation of periapical radiographs. They offered to distinguish teeth from a different position and experimental ndings. Another study conducted by Tuzoff et al. 32 also used state-of-the-art Faster R-CNN model of tooth detection and numbering. VGG-16 Net that is a 16-layer CNN architecture was used as a base CNN. They concluded that the AI model had a close performance to the experts' level. In 2018 we published our AI algorithm (later called Diagnocat) 33 that presented coarse-to-ne volumetric segmentation of teeth in CBCT images which was e cient for handling large volumetric images for tooth segmentation. Diagnocat's approach in diagnosing lesions is based on a deep convolutional neural network using a U-net-like architecture 34 . The problem formulation in the study in terms of machine learning tasks was semantic segmentation, including segmenting background and periapical pathology. Speci city and sensitivity metrics were used to evaluate diagnostic performance and to measure the localization capabilities of our model, binary voxel-wise intersection over union (IoU) of the ground truth mask and prediction were used. In the current study, we tested the diagnostic performance of this AI system compared to the experienced dentomaxillofacial examiners. Secondly, we compared the diagnostic performance and required diagnostic time within aided and unaided groups in real time clinical environment. To the best of our knowledge, no study was performed for a real time clinical performance trial of CBCT imaging diagnosis to demonstrate clinical safety and effectiveness of its use by dentists in clinical settings. In the rst study, ground truth set-up results showed the sensitivity and speci city values for human examiners in between 0.9318-0.9438 while the value for this AI system was 0.9239. There was no statistically signi cant difference between Diagnocat and the experienced dentomaxillofacial radiologists (P > 0.05). The results of sensitivity and speci city values were also similar between this AI system and the examiners. For the second study it was shown that the AI-aided group had a superior performance compared to the unaided group. The group with AI support had 0.85 averaged by condition sensitivity value and 0.97 speci city value opposed to 0.77 and 0.96 for the unaided group, correspondingly. The average time for the aided group was 17.55 minutes while it took 18.74 minutes for the unaided respectively. There was a signi cant difference between two groups (p < 0.05). Statistical tests revealed that the AI-aided group had lower evaluation period in comparison with unaided group (p < 0.05).
The results showed that the evaluation process was improved with AI-aid and required diagnosis time was reduced with a better speci city and sensitivity. In conclusion, we presented a novel framework that analyses clinical CBCT scans and makes referral suggestions to a standard that is comparable to dental professional expertise. This study showed that the proposed AI system (Diagnocat) signi cantly improved diagnostic capabilities of radiologist. Moreover, AI can save time by enabling automatic preparation of dental charts and electronic dental records together with automated pathology detection.

Methods
Ethics and information governance. A written informed consent was obtained from all patients before Only de-identi ed anonymized retrospective data were used for research, without the active involvement of patients.
Testing the system. The primary goal of this study is to evaluate the ability of this AI system (Diagnocat) to enhance the diagnostic capabilities of the dentist and radiologist. In order to test this, a few steps had to be taken to prepare the dataset for viewing and analysis. These steps are necessary due to the inherent variability of CBCT datasets coming from CBCT machines as well as the variability in clinical experience on the part of the examiners. Thus, this study has two distinct parts. The rst was preparing the dataset for evaluation and the second was evaluating the usefulness of the system for enhancing diagnostic capabilities.
Part (A): Preparing the dataset for evaluation: 1. Image processing. Due to high variety of CBCT scanning devices and different calibration settings, CBCT images need to be normalized for both manual and automatic diagnostics. This is usually done with the help of window level and width DICOM properties extracted from scan metadata. Unfortunately, the radiodensity of bone and tissue of scans from the same scanning device manufacturer differ when the extracted window is applied. The difference is signi cantly higher when corresponding windows are applied to images from different devices. We apply the normalization process based on voxel radiodensity measured in Houns eld Units (HU): 2. HU values below −1000 (air radiodensity) are clipped.
3. HU values below 5th and above 95th percentiles of an image are clipped.  with segmented enamel area is used to obtain the rst point, while the dataset with segmented alveolar bone is used to obtain two latter points.
-Caries localization module. The dataset consists of 4398 tooth volumes with a context area. The class labels are: background (no pathology), caries sign, metallic artefact, and non-contrast lling. One instance can have multiple conditions. The dataset was additionally validated by a lead radiologist.
-Periapical lesion localization module. The dataset consists of 2800 tooth volumes with a context area. The class labels are: background (no pathology), periodontal ligament (PDL) widening, poorly circumscribed radiolucency, well circumscribed radiolucency, and radiopacity. One instance can have multiple conditions. The dataset was additionally validated by a lead radiologist.
1. Classi cation (Descriptor) datasets. Descriptor, the main diagnostic module, is a complex model that, besides accurate data collection, requires several iterations of dataset formation and annotation regulations. We provided detailed description of the annotation process and insights on managing class imbalance and high model uncertainty.
-Annotation protocol. Every radiologist was provided with an instruction de-scribing annotation including a list of required pathologies, access to the internal web-based application that provided a data collection form, and an option to download study DICOM for a standalone viewing. Additionally, every radiologist reviewed and described 3 sample CBCTs containing all target pathologies, which were then reviewed by the study supervisor, highly experienced oral and maxillofacial radiologist. Then, the study supervisor provided feedback to the radiologist. Each radiologist independently studied a CBCT image in a clinical viewer software and noted the presence or absence of each condition for each tooth in the target list.
Radiologists were required to answer either "applicable", or "not applicable" for every condition in table 4.
-Initial annotation. During the rst stage of the annotation process, a group of experienced radiologists annotated a large set of images following the annotation protocol. Images were randomly sampled, ltered by the study coordinator according to the inclusion and exclusion criteria, and then passed to radiologists. Before the main annotation process, annotators were trained and evaluated by the study coordinator:

Participant studied annotation instruction and protocol
Participant annotated a small set of exemplary images, the study coordinator evaluated the results and provided feedback to the participant During this stage each sample (distinct patient-tooth) received 1 diagnostic vote for every condition in consideration.
-Test set separation. Following the completion of the rst stage of annotation, a test was separated from the annotated data pool and excluded from all following development activities. Test images were sampled in a way to have at least N positives and N negatives for every condition. The choice of N = 300 was motivated by the available number of positive samples for rare conditions. The sampling procedure was as follows.
1. Randomly sample a condition.
2. If the test set contained less than N positives of the condition, sample a random positive example from the data pool and allocate it to the test set. Additionally, allocate all other samples from the same image.
3. Repeat until the test set contains at least N positives and N negatives for each condition.
Each sampled example contains annotation for all target conditions, so the resulting test set contains more than N positives/negatives for the majority of conditions. Additionally, the test set contains a different number of positives and negatives for each condition, typically, negatives outnumbering the positives (class imbalance). This in uenced our decision to choose the AUPRC metric for evaluation as it is robust to signi cant class imbalance.
-Test set additional annotation. An additional vote from a second radiologist was obtained for each tooth-condition (sample). Then, for the sample where the rst two radiologists disagreed, another vote from the third radiologist was obtained. Ground truth was decided by the majority vote (2-vote agreement).
-Model development dataset. A set for model development purposes formed from remaining annotated data pool (i.e. not included in the test set) was split into training and validation subsets as it was t for the task. As the majority of examples in the train set had only 1 vote, it was expected that some labels would be incorrect. However, deep learning is known to have some level of robustness against noisy training labels, and we hypothesized that the models would be able to learn the correct labels and achieve satisfactory scores. Additionally, the partially trained model could be used to nd and correct the erroneous votes by measuring disagreement between votes and model predictions. In the course of this project, this hypothesis was con rmed. While samples with 1 vote were widely used in the train set, model validation was performed using standard 2-vote agreement protocol.
-Rare case mining. Following the separation of the train set, a series of models was trained. Then, the best model was used to enrich the train set by mining rare cases and nding potentially erroneous votes in the train set. Initially rare conditions did not contain enough positive examples in the train set. To rectify this, following mining procedure was implemented: 1. De ne a set of rare condition list where additional data is required.
2. Perform inference of the best model available at a time on studies from the non-annotated data pool.
3. Calculate information entropy for every condition in the rare condition list. 4. Sample teeth with high information entropy.

Run images containing sampled teeth though the annotation process.
Information entropy is de ned as where P i is the probability of ith outcome of a set of all possible outcomes. For binary task, such as our formulation, i iterates over "present" and "not present". Information entropy is highest when probabilities of "present" and "not present" are equal to 0.5. Intuitively, information entropy is a measure of uncertainty in the probability distribution. High uncertainty on an example excluded from the training and validation set means that the training process is likely to improve if the example is annotated and added to the training set.
-Incorrect vote mining and recti cation. Since we collected only 1 vote for a large number of images allocated to the train set, some of these votes were submitted incorrectly. To rectify this, we implemented the following procedure: 1. Perform K -fold inference on all images in the train set using the best model available at a time. Kfold inference procedure: 2. Sample from cases where radiologist-model disagreement was high.
3. Collect additional votes for sampled cases using the annotation process. The cropped image is further passed to the tooth localization and numeration module ( g. 1) that plays a crucial role in diagnostic pipeline. Tooth localization allows further analysis of different conditions inside and around a tooth, while tooth numeration helps with determining number-speci c attributes and intertooth relations. The localization and numeration module is implemented as a volumetric U-Net network performing semantic segmentation on 54 classes (the background, 52 possible teeth, and an additional class for supernumerary teeth). It operates at 1 mm 3 per voxel resolution.
At the next step each localized tooth area is extended with some context and passed to Descriptor ( g. 1) that de nes the probabilities of a tooth being affected by a set of conditions (table 1). Descriptor is a principal classi cation module and is implemented as an ensemble of a ResNeXt 37 with integrated squeeze-and-excitation blocks 38  Each tooth volume is further examined by three modules for auxiliary classi cation purpose. (1) The periodontitis module detects and evaluates alveolar bone loss in close vicinity to a tooth. It allows classi cation of 3 bone loss types of different severity by calculating distances between pairs of periodontium landmarks segmented by a separate landmark localizer. (2) The caries localization module de nes signs of caries probability using segmentation of carious lesions found inside a tooth area. (3) The periapical lesion localization module detects periapical lesion presence and allows classi cation of 4 lesion types found around a tooth. The embedded localizers of three classi cation modules are implemented as volumetric U-Nets performing semantic segmentation.
Part (B) Evaluating the ability of the AI system (Diagnocat) to enhance the diagnostic capabilities of the dentist and radiologist: 1. Evaluating diagnostic capabilities of the Diagnocat AI system. The primary end-point was to test end-to-end performance of this AI system, measuring tooth localization, numeration, and diagnostic sub-modules as a single system. It allowed to estimate overall safety and performance of the proposed system.
The Diagnocat AI software was used to obtain a binary condition prediction made on 3D CBCT scans using its prede ned operating point (checkpoints of the trained models), which was then compared to ground truth to calculate sensitivity (pro-portion of correctly de ned conditions) and speci city (proportion of correctly de ned teeth not having conditions).
Secondary endpoint was to evaluate examiners performance and compare it to the AI results. Although examiners were tested on the data that was beforehand annotated by each of them the results showed comparable diagnostic quality of Diagnocat and the examiners. For the performance evaluation a set of 300 CBCT maxillofacial images in DICOM format was sourced consecutively from three clinics (100 images from each site) and anonymized by replacing "PatientName" with an empty string and truncated "PatientBirthDate" to the rst day of the nearest year. Subsequently, images were screened against the inclusion and exclusion criteria.
The inclusion criteria were: -a patient with the ability to consent to participate in the project -a patient of 21 years or older -anonymized CBCT image of maxillofacial region, and -both model and manufacturer of imaging device are not present in the training dataset of the system (allows testing generalizability to new imaging devices).
The exclusion criteria were: -images containing signi cant motion artifacts (as judged by radiologist coordinating the study) -images containing severe artifacts such as streak artifacts, beam hardening (low and medium artifact remover was applied using device-speci c software when available for standardization of the images) and; -images of patients with cleft lip and palate, trauma, bone lesions, and severe bone erosions.
Final set of images was then reviewed by a scienti c coordinator (an internationally recognized dentomaxillofacial radiologist with at least 18 years of experience) and 10 images were rejected due to signi cant motion artifacts. To establish the ground truth, examiners were recruited from experienced dentomaxillofacial radiologists. In total, the data was evaluated by four of them with a mean of 10 years professional experience.
Each examiner was responsible for annotation of CBCT anatomy on their own. Moreover, the examiners were unaware of conditions of the patients. Each examiner was then trained by the study coordinator to annotate 3D CBCT scans and ll the provided form correctly. After the study coordinator evaluated the examiners and approved them as su ciently trained, the study proceeded to actual data collection. Each radiologist received a random, non-overlapping portion of the dataset via electronic means (shared folder). They evaluated the cases in their clinical environment and lled the spreadsheet, then saved them to the separate shared folders. The examiners could not access each other's forms. After they evaluated cumulatively and annotated the full CBCT dataset, a second round of annotation started, where the examiners were assigned a different random subset of the dataset. After the second round was nished, the third commenced. At the end of the third round, the scienti c coordinator collected the examination from 3 radiologists for every sample. Evaluations took place between December 2019 and April 2020.
Data was extracted on an individual and group comparison level. To establish the true values of conditions, a consensus process was performed, where the ground truth was taken as a majority (at least 2 of 3 votes) per each case, tooth, and condition. The whole process was then reviewed again by the study coordinator for last adjustments and establish nal ground truth evaluations of each patient and teeth as well as for each condition. Inference of Diagnocat system was performed once for the full dataset: an engineer performed inference using the production version of the system.
1. Evaluating the clinical performance. After evaluation of diagnostic capability of the Diagnocat AI system, the next step is to evaluate the clinical performance of the system which can be achieved by comparing the accuracy of the diagnosis and time required for the reading for aided and unaided cases. Evaluation duration was compared between aided and unaided to determine if the addition of Diagnocat suggestions changes the time required to review the case. It was estimated that approximately eight weeks were required to conduct this study. This was addressed at each stage, from recruitment to analysis: recruitment and consent -1 week, training and randomization -1 week, investigation -1 week, washout -at least 1 month, investigation -1 week and analysis -2 weeks. The washout period was at least 1 month. This was to minimize memory bias and confounding factors.
Crossover design reduced confounding factors as well. To identify the number of required examiners a power analysis was performed 40 . To hold the study at least 20 examiners were required in total.
Thus, 24 dentists were enrolled in the study as the examiners and divided into two groups at a 1:1 ratio: (1) Group 1 examined the CBCTs with AI system-aided; (2) Group 2 examined the CBCTs unaided. The con dence interval is 0.80, and 5% was used for error. Enrolled examiners were quali ed general dental practitioners of various experience with no de ned specialty interest. Following inclusion and exclusion criteria were applied: The scienti c advisor for CBCT scan reading conducted training sessions for examiners for one hour including the use of the Diagnocat AI system. 10 training CBCT scans were used for training and practice purposes; those encompassed the full spectrum of required diagnostics. A list of all possible diagnoses was given as well to ensure that the scope of diagnosis was calibrated and participants were aware of that. There was also remote support available to guide through the training process. For this study overall dataset contained 40 CBCT images, including 30 study images and 10 images for examiners training.
These scans were sampled randomly from the dataset of the standalone performance test. 30 study images were sequentially numbered after randomization was performed. Thus each participant had a different sequence of clinical cases. Each CBCT scan required all 32 teeth to be diagnosed with none, one or more pathologies. Thus, 32 units in each CBCT, multiplied by the number of pathologies identi ed in each unit, with a total number of 30 CBCT scans per group. In this way, 960 (30x32=960) diagnostic activities were carried out in each investigation by each participant. The crossover nature of this study ensured that this was performed twice by each participant. Table 3 shows the conditions that were asked to diagnose by the examiners. Once investigations were completed, the raw data from forms lled by unaided group and Diagnocat was transformed into the same format using automated scripts written before the study and then sent to an independent blinded assessor. This assessor analyzed the data and compared it with ground truth (same as in the standalone Diagnocat performance test). Raw data was compared to ground truth electronically. Scoring was performed via electronic means and data stored securely. Once this was completed, groups were decoded and results compared.

Declarations Acknowledgement
Since the present study have been conducted by the retrospective radiologic images, it is not subject to the "registration" and "clinical trial number" procedures required for clinical trials (Clinical    (1)(2)(3)(4)(5) Caries signs Signs of dental caries (cases where caries is certain, and there is a low chance of confusion with a metallic artifact or non-contrast filling) Crown has defect over 50% A crown is largely destroyed: at least 50% of the crown is missing Endodontically treated tooth A tooth displays signs of previous endodontic treatment Filling A crown was restored with a filling Impaction A tooth is impacted (unerupted) Implant There is an implant in place of a tooth Missed canal A root canal was missed (not filled) during endodontic treatment of. Should be specified only if a tooth was endodontically treated.

Missing
Absence of tooth, implant, and pontic under specified number including both extracted teeth and teeth that are not visible in the FoV of an image.
Overfilling Filling material is visualized beyond a radiographic apex. Should be specified only if a tooth was endodontically treated.

Periapical lesion
Presence of inflammatory periapical lesion adjacent to one or more roots of a tooth

Pontic
There is a pontic restoration in place of a tooth (either base or middle part) Post and core A tooth was restored with a post and core restoration Number of roots Number of roots in the tooth (1-4)

Short filling
Root canal filling is short (ends in 2 mm or more from a radiographic apex). Should be specified only if a tooth was endodontically treated. Voids present in the root filling A root canal contains voids (spaces that were not filled during previous endodontic treatment). Should be specified only if a tooth was endodontically treated.