The Automatic Detection of Caries in Third Molars on Panoramic Radiographs Using Deep Learning: A Pilot Study

The objective of this study is to assess the diagnostic accuracy of dental caries on panoramic radiographs using deep-learning algorithms. A convolutional neural network (CNN) was trained on a reference data set consisted of 400 cropped panoramic images in the detection of carious lesions in mandibular and maxillary third molars, based on the CNN MobileNet V2. For this pilot study, the trained MobileNet V2 was applied on a test set consisting of 100 cropped OPG(s). The detection accuracy and the area-under-the-curve (AUC) were calculated. The proposed method achieved an accuracy of 0.87, a sensitivity of 0.87, a specicity of 0.86 and an AUC of 0.90 for the detection of carious lesions of third molars on OPG(s). A high diagnostic accuracy was achieved in caries detection in third molars based on the MobileNet V2 algorithm as presented. This is benecial for the further development of a deep-learning based automated third molar removal assessment in future.


Introduction
The removal of third molars is one of the most commonly performed surgical procedures in oral surgery. Recent guidelines recommend the removal of pathologically erupting third molars in order to prevent future complications [1,2]. The second molar is frequently disrupting the eruption path of the third molar, evoking it to only erupt partially or not at all, which can adversely affect periodontal health of the second molar. Impacted or partially erupted third molars are often the cause for various pathology such as pericoronitis, cysts, periodontal disease, damage to the adjacent tooth and carious lesions [3]. The prevalence of carious lesions in third molars is reported to range between 2,5% and 86% [4].
Orthopantomograms (OPG(s)) are generally used in the decision making process whether to remove a third molar or not. An OPG is an excellent diagnostic tool for planning the third molar removals, however, they are not suitable for the detection of dental caries. Previous studies have reported a lower caries diagnosis accuracy of OPG(s) compared to conventional bitewings and periapical radiographs [5]. To reduce the radiation exposure and to avoid taking additional radiographs, enhanced caries diagnostics on OPG(s) is useful.
In recent years, deep learning models like convolutional neural networks (CNNs) have been used to analyse medical images and to support the diagnostic procedure[6].
In the eld of dentistry, CNNs have been applied for the detection of carious lesions on different image modalities such as periapical radiographs [7], bitewings[8], near-infrared light transillumination images [9,10] and clinical photos [11,12]. However, none of the studies have explored automated caries detection on OPG(s). The aim of this study is to train a CNN-based deep learning model for the detection of caries on third molars on OPG(s) and to assess the diagnostic accuracy. Table 1 summarizes the classi cation performance of MobileNet V2 on the test set. The diagnostic accuracy was 87%. The model achieved an AUC of 0.90 (Fig. 1). The confusion matrix is presented in  Table 1 The Accuracy, precision, F1-score, recall and speci city for the detection of dental caries in third molars on OPG(s)

Discussion
A daily dilemma in dentistry and oral surgery is to determine whether a third molar should be removed or not. In cases of diseased third molar, where pain or pathology are obvious, there is a general consensus that surgical removal is indicated [3]. Improved diagnostics, e.g. on OPG(s), might improve the selection process whether to remove or not. A more stringent indication pathway may reduce millions of unnecessary third molar removals every year, so reducing comorbidity and health costs [13].
This pilot study assesses the capability of a deep learning model (MobileNet V2) to detect carious third molars on OPG(s) and is therefore a mosaic stone in the picture of automation of M3 removal diagnostics. Caries detection on third molars using OPG(s) is awed by limited and varying accuracy of individual examiners leading to inconsistent decisions and consequently suboptimal care [8]. The use of deep neural networks might bring us a more reliable, faster and reproducible way of diagnosing pathology, and can therefore reduce the number of unnecessary third molar removals [14]. As previously stated, the assessment of third molars on OPG(s) is additionally interesting, as radiation exposure can be reduced when additional radiographs are avoided.  [12] and a F1-score of 0.889 [11]. Finally, U-net with E cientNet-B5 as an encoder was used to segment caries on bitewings with an It is important to note that the performance of the deep learning models are highly dependent on the dataset, the hyperparameters, the image modality and the architecture itself [6,15]. As these parameters differed between the studies, a direct comparison of these studies would be misleading.
In this study, an accuracy of 0.87 and an AUC of 0.90 was achieved for caries detection on third molars on OPG(s). In comparison, previous studies have stated an AUC of 0.768 for caries detection on OPG(s) by clinicians [16]. Several factors are associated with the model performance. Firstly, the use of depthwise separable convolutions and the inverted residual with linear bottleneck reduced the number of parameters and the amount of memory constraint while retaining a high accuracy [17]. These characteristics make the MobileNet V2 less prone to over tting. Over tting is a modelling error that occurs when a good t is achieved on the training data, while the generalization of the model on unseen data is unreliable.
Secondly, a histogram equalization was applied on the OPG(s) as a pre-processing step. Histogram equalization is a method for adjusting image intensities to enhance the contrast and this can increase the prediction accuracy [18]. Lastly, transfer learning was used to prevent over tting. Transfer learning is a technique that pre-trains very deep networks on large datasets in order to learn the generic and low-level features in the early layers of the network. By reusing these learned weights on other tasks, the need to relearn these low-level features in new data sets is eliminated, which greatly reduces the amount of data and timed required to converge such a deep neural network [19].
A limitation of the present study is that only cropped images of third molars were included. Training and testing the model with cropped premolars, incisors and canines might further increase the robustness and the generalizability to assess all caries on OGP(s). Secondly, the clinical and radiological assessment by surgeons is not the gold standard in detection of caries. Histological con rmations of caries and further extension of labeled data are required, to overcome the model's limits in this present study.
To the best of our knowledge, this is the rst publication to rely deep learning using solely OPG(s) for caries detection on third molars. Furthermore, class activation maps are generated to increase the interpretability of the model predictions. Considering the encouraging results, future work should reside on the detection of other pathologies associated with third molars such as pericoronitis, periapical lesions, root resorption or cysts. Also, the potential bias in these algorithms with possible risks of limited robustness, generalizability and reproducibility has to be assessed in future studies using external datasets and is a necessary step to further implement deep learning successfully in daily clinical practice.
In conclusion, a convolutional neural network (CNN) was developed that achieved a F1 score of 0.87 for caries detection on third molars using panoramic radiographs. This forms a promising foundation for the further development of automatic third molar removal assessment.

Data selection
Random preoperative OPG(s) of patients who underwent third molar removal were retrospectively selected from the Department of Oral and Maxillofacial Surgery of Radboud University Nijmegen Medical Centre, Netherlands. The accumulated OPG(s) were acquired with a Cranex Novus e device (Soredex, Helsinki, Finland), operated at 90 kV and 10 mA, using a CCD sensor. The inclusion criteria were a minimum age of 16 and the presence of at least one third molar (M3). Blurred and incomplete OPG(s) were excluded from further analysis. This study has been conducted in accordance with the code of ethics of the world medical association (Declaration of Helsinki). The approval of this study was granted by the Institutional Review Board (Commissie Mensgebonden Onderzoek regio Arnhem-Nijmegen) and informed consent were not required as all image data were anonymized and de-identi ed prior to analysis (decision no. 2019-5232).

Data annotation
The present third molars (M3(s)) on OPG(s) were classi ed and labeled as carious M3 and non-carious M3 based on electronic medical records (EMR). Subsequently, a crop of 256 by 256 pixels around the M3 was created. The cropped data consisting of carious M3(s) and non-carious M3(s) were de-identi ed and anonymized prior to further analysis. All anonymized data were revalidated by two clinicians (SV, MH). In cases of disagreement, the cropped OPG(s) were excluded. The nal dataset consisted of 250 carious M3(s) and 250 non-carious M3(s).

The model
The MobileNet V2 was used in this study. This model is characterized by depthwise separable convolutions and an inverted residual with linear bottleneck. The low-dimensional compressed representation is expanded to a higher dimension with a lightweight depthwise convolution. Subsequently, features are projected back to low-dimensional representation with a linear convolution.
[17] The applied model structure is shown in Fig. 5.

Model training
The total dataset was randomly divided into 3 sets, 320 for training, 80 for validation and 100 for testing. All datasets had an equal class distribution of carious and non-carious third molars. Histogram equalization and data augmentation techniques were employed on the training dataset, in order to improve the model generalization. The MobileNet V2 was pretrained on the 2012 ILSVRC ImageNet dataset [20]. During the training process, hyperparameters and optimization operations were empirically determined, so that a maximum model performance was achieved on the validation set. Subsequently, the best model was used to perform predictions on the test set.
The optimization algorithm employed was the Adam optimizer, at a learning rate of 0.0001, with a batchsize of 32 and batch normalization. The training and optimization process were carried out using the Keras library in the Colaboratory Jupyter Notebook environment [21].

Statistical analysis
The diagnostic accuracy for caries detection was assessed based on the true positives (TP), true negatives (TN), false positives (FP) and false negatives (FN). Classi cation metrics are reported as follows for the test set: accuracy = precision = dice = (also known as the F1score), recall = (also known as sensitivity), speci city = . Furthermore, the area-under-thecurve-receiver-operating-characteristics-curve (AUC) and confusion matrix are presented. Gradientweighted Class Activation Mapping (Grad-CAM), a class-discriminative localization technique was applied, in order to generate visual explanations highlighting the important regions in the cropped image for detecting carious lesions [22].

Data availability
The data used in this study can be made available if needed within the regulation boundaries for data protection.

Declarations Data availability
The data used in this study can be made available if needed within the regulation boundaries for data protection. Area-under-the-curve-receiver-operating-characteristics-curve (AUC)   Class activation map for non-carious third molars. The left column shows the cropped non-carious M3s. The middle column represents the class activation map. The right column illustrates the overlay.