In this study, our dental germ detector using Scaled-YOLOv4 P6 with an input size of 1280 × 1280 achieved a very high AP50 of over 98% by cross-validation, as presented in Table 1. The training data, which is much larger than that of previous studies [12, 13], was sufficient for our model to learn the features of images. In addition, since the method of obtaining panoramic radiography with optimal quality has been established [39], our models could achieve high performance by learning dental germ features, including background images, overlapping with other objects, or relative position to other dental germs. Therefore, Scaled-YOLOv4 or older YOLO families [22, 24] may have sufficiently detected dental germs, and the newest, but computationally time-consuming models, such as YOLOv7 [40], were not necessary.
However, using state-of-the-art image models for dental germ-stage classification is important. Despite EfficientNet V2’s exceptional performance, the Top-1 accuracy of our germ classifier was approximately 70%, as presented in Table 1. This might be because overlapping with other objects or background images, thought to be good for germ detection models, negatively affects the germ classification model. Therefore, we utilized one of the state-of-the-art but computationally expensive models for germ classification. However, our germ classification models are considered to be similar to human experts and are clinically applicable with reasonable accuracy. Our models focus on the crown shape or root formation of the dental germ to classify developmental stages, as illustrated in Fig. 6 like human experts. In addition, our model tends to misclassify adjacent stages using the confusion matrix in Fig. 5. This tendency would also be observed in real-world dentists because the adjacent stages of the dental germ are similar [6]. This is why we achieved an exceptional Top-3 accuracy of 98%.
In addition, this is also the reason why we adopted the Top-3 weighted averages to calculate dental age, which reduced the mean absolute error between the automatic and manual calculations by experts, as presented in Table 2. The single selection showed a similar mean absolute error, but the standard deviation was worse than the weighted average, indicating that the calculated value may spread out over a wide range and may be far from the actual dental age. The expected value showed the worst result, which suggested that using all data may be noisier than using Top-3 accurate values.
Our germ detection model achieved a high AP50 of 98%; nonetheless, a few dental germs were sometimes not detected, as illustrated in Fig. 4. However, this may not be critical as regards dental age calculation because we can still use over twenty dental germs and average them for calculation despite several germ detection failures. Thus, our automatic calculation pipeline is robust against detection failures.
Our automatic dental age calculation achieved a mean absolute error of 0.261 years (about 3 months) compared with human experts, raising the concern of this difference being clinically acceptable. Most of previous studies have focused on the chronological age estimation [17–20, 41], whereas our research aimed to evaluate dental age calculation. Therefore, our results cannot simply be compared to previous studies aimed at evaluating chronological age estimation. One potential metric to evaluate our result can be the difference in years between the developmental stages of the teeth. For each teeth, the minimum difference between each developmental stage and its adjacent stage is 0.4 years [6]. Our model has achieved better result of 0.261 years, indicating that our automatic dental age calculation is accurate by less than one developmental stage error and thus is acceptable for supporting dentists. Moreover, the automatic calculation can be performed in a few minutes, which is significantly faster than manual calculation [10] and is useful not only for pediatric or orthodontic dentist but also for general dentist and even for students. We believe that our results will serve as a new benchmark for further research in dental age calculation.
Our method can easily be applied to other developmental stage assessment methods by changing reference values [7, 8]. In addition, our proposed pipeline can be useful not only for dental age calculation using various methods but also other clinically supportive applications. When there were congenitally missing teeth on the panoramic radiograph, the germ detector did not respond to the missing teeth’s location, as shown in Fig. 3. This behavior can inform dentists about missing teeth, which is a crucial factor in treatment planning. Moreover, our germ classifier can help human experts improve their diagnostic skills for developmental stage classification by receiving feedback from the decision-making process illustrated in Fig. 6. In the future, human and AI collaboration in dentistry will be expected in academic education and clinical practice [11, 42, 43].
We are aware that out study has some limitations. Although the number of training datasets in this study is much larger compared to that in previous studies in the field of dentistry, it is still small compared to that in other fields. For example, ImageNet consists of 14 million natural images [44], MS-Celeb-1M has 10 million face images [45], and RadImageNet provides 1.35 million medical images [46]. There may be some room for further improvement of automatic calculation performance by training with a large dataset.
Another limitation is that our datasets lack metadata like chronological age or sex because of ethical reasons. In particular, since the metadata of sex and race are important factors for dental age estimation, they should be necessary to evaluate the difference between our results and further studies in which metadata are available. Additionally, if the metadata of age are available, our model can be modified to calculate not only dental age but also chronological age, which is useful in forensic science [41]. For these reasons, the large-scale dental image dataset, which has metadata and is annotated by experts, is expected to help in developing successful AI models in dentistry.