A deep-learning artificial intelligence system for estimating chronological age using panoramic radiography in the Korean population

doi:10.21203/rs.3.rs-3219635/v1

Download PDF

Article

A deep-learning artificial intelligence system for estimating chronological age using panoramic radiography in the Korean population

https://doi.org/10.21203/rs.3.rs-3219635/v1

This work is licensed under a CC BY 4.0 License

Journal Publication

published 09 Dec, 2023

Read the published version in Scientific Reports →

You are reading this latest preprint version

The purpose of this study was to suggest a hybrid method based on ResNet50 and ViT in an age estimation model using panoramic radiographs for learning by considering both local features and global information, which is important in estimating age.

Transverse and longitudinal panoramic images of 9663 patients were selected and used (4774 males and 4889 females with a mean age of 39 years and 3 months). To compare ResNet50, ViT, and the hybrid model, the MAE, mean square error (MSE), root mean square error (RMSE), and coefficient of determination (R²) were used as metrics.

The results confirmed that the age estimation model designed using the hybrid method performed better than those using only ResNet50 or ViT. In addition, when examining the basis for age determination in the hybrid model through attention rollout, it was evident that the proposed model used logical and important factors rather than relying on unclear elements as the basis for age determination.

Health sciences/Medical research

Health sciences/Health care/Medical imaging

Health sciences/Health care/Dentistry

Health sciences/Health care/Dentistry/Dental radiology

Health sciences/Health care/Dentistry/Forensic dentistry

Deep learning

Automatic intelligence

Forensic medicine

Panoramic radiography

Age estimation is a crucial step in biological identification in the forensic field. Age estimation is required to identify the deceased, and it is essential for living people, particularly children, and adolescents, to answer numerous legal questions and resolve civil and judicial issues^1,2.

Numerous techniques are available for estimating age using various body components. Several studies have focused on the connection between epiphyseal closure and age^3,4. Many factors are related to epiphyseal fusion, including sex, genetics, and geography^3,5. However, the bone age assessment method is usually used to evaluate immature individuals because of incomplete skeletal development⁶.

Evaluation of dental age using radiographic tooth development and tooth eruption sequences is more accurate than other methods^7,8. As tooth and dental tissue is largely genetically formed and is less susceptible to environmental and dietary influences, there is less deformation caused by external chemical and physical damage^2,3,7.

Many attempts have been made to create standards for age estimation using human interpretations of dental radiological images. The most common method, the Demirjian technique, divides teeth into eight categories, A–H, based on their maturity and degree of calcification⁹. Willems et al. modified the Demirjian method and provided a new scoring method that allowed direct conversion from classification to age¹⁰. Cameriere established a European formula by gauging the open apices of seven permanent teeth in the left mandible on panoramic radiographs¹¹. However, these methods have a certain degree of subjectivity, leading to a relatively high level of personal error, and their application requires adequate experience to minimize errors¹². Also, there are fundamental limitations to its applicability in young subjects.

Machine learning, the cornerstone of artificial intelligence, enables more precise and effective dental age prediction^12–14. Tao and Galibourg applied machine learning to the Demirjian and Willams method for dental age estimation^13,14, and Shihui et al. used the Cameriere method¹². Most studies related to age estimation use CNN-based models^15–18. CNN-based models learn local features well because of the convolution filter operation but do not learn global information well. This problem can be solved by learning local features and global information using a vision transformer (ViT)¹⁹. In addition, the hybrid method, which uses the feature map extracted from the CNN-based model as input to the ViT model, showed better image classification performance than using each model alone¹⁹. Therefore, this study used a hybrid method to design an age estimation model because learning by considering both local features and global information is important for estimating age.

This study aimed to construct an age estimation model using a hybrid method of the ResNet50 and ViT models. Subsequently, we aimed to confirm whether the model performs better so that it can be used effectively in clinical field.

1. Data set and image pre-processing

Transverse and longitudinal panoramic images of patients who visited the Daejeon Wonkwang University Dental Hospital between January 2020 and June 2021 were collected. A total of 9663 patients were selected (4774 males and 4889 females with; mean age of 39 years 3 months). Panoramic images were obtained using three different panoramic machines: Promax® (Planmeca OY, Helsinki, Finland), PCH-2500® (Vatech, Hwaseong, Korea), and CS 8100 3D® (Carestream, Rochester, New York, USA). Images were extracted using the Dicom format.

The age of the acquired data ranged from 3 years 4 months to 79 years 1 month (Table 1). Because the amount of data for each age group was different and may adversely affect the results if used randomly, the amount of data for each age group was divided by a 6:2:2 ratio to balance the data among the training, validation, and test sets. Thus, 5861 training data, 1916 validation data, and 1886 test data were used.

Table 1

Number of patients by age
Age (years)	Number of patients
3 ~ 9	892
10 ~ 19	999
20 ~ 29	2029
30 ~ 39	1067
40 ~ 49	1134
50 ~ 59	1609
60 ~ 69	1285
70 ~ 79	648
Total	9663

The edge of the image was cropped to focus on the meaningful region and filled with zero padding around the image. In addition, because the image sizes obtained from the two devices were different (2868×1504 pixels and 2828×1376 pixels), the images were resized to the same size (384×384 pixels) for batch learning and to improve learning speed.

In order to learn more effectively with the acquired data, augmentation techniques using normalization, horizontal flip with a probability of 0.5, and color jitter were applied to the training set.

2. Architecture of deep learning model

In this study, two types of age estimation models were used. The first is ResNet, which has been used as a feature extractor in many studies related to age prediction^20,21. ResNet, a well-known convolutional neural network (CNN)-based model, can build deep layers by solving the gradient vanishing problem through residual learning using skip connection²². However, because the CNN-based model has a locality inductive bias, relatively less global information is learned than the local features. The other is the Vision Transformer (ViT)¹⁹, which uses a transformer²³ encoder and lacks inductive bias compared with CNN-based models. However, by performing pretraining on large datasets such as ImageNet21k, it overcomes structural limitations. It has a wide range of attention distances that can learn the global context and local features. It also showed better classification performance than CNN-based models. Using the strengths of these two models, we propose an age prediction model based on ResNet50-ViT ¹⁹, a hybrid method that can effectively learn the global context and focus on local information.

Figure 1 shows the overall architecture of the proposed model. The feature map $\mathbf{x}\in {R}^{H\times W\times C}$ extracted by placing the panoramic image into ResNet50 was used as the input patch for the transformer, where $(H, W)$ are the (height and width) of the feature map, and $C$ is the number of channels. We define $HW(=N)$ as the total number of patches because each pixel in the feature map is considered a separate patch.

To retain the extracted feature map's positional information, we add a trainable positional encoding ${\varvec{x}}_{\varvec{p}\varvec{o}\varvec{s}}\in {R}^{(N+1)\times C}$ to the sequence of feature patch: $\varvec{z}=[{x}_{reg};{x}_{1};{x}_{2} ;...;{x}_{N}]+{\varvec{x}}_{\varvec{p}\varvec{o}\varvec{s}}$ where ${x}_{reg}\in {R}^{C}$ is trainable regression token, and ${x}_{i}$ is $i$th patch of the feature map.

Then, the $\varvec{z}$ is entered into the transformer encoder blocks composed of the layer norm (LN)²⁴, multi-head self-attention (MSA)^19,23, and multilayer perceptron (MLP), which contains two linear layers with a Gaussian Error Linear Unit (GELU) function. The transformer encoder process is as follows:

$${\stackrel{-}{z}}^{l}=MSA\left(LN\left({z}^{l}\right)\right)+{z}^{l}, l=\text{1,2},\dots ,L$$

$${z}^{l+1}=MLP\left(LN\left({\stackrel{-}{z}}^{l}\right)\right)+{\stackrel{-}{z}}^{l}, l=\text{1,2},\dots ,L$$

where $l$ denotes the $l$th transformer encoder block. Finally, we estimated the age from the regression head using ${z}_{reg}^{L+1}$.

3. Learning details

To train the model efficiently, we employed transfer learning, which aids in overcoming weak inductive bias and improving accuracy. That is, the parameters of the models were initially set using weights pre-trained using ImageNet21k and then fine-tuned using our panoramic-image dataset. The models used in this experiment were trained with an SGD optimizer with a momentum of 0.9, learning rate of 0.01, and batch size of 16; for 100 epochs, the objective function was the mean absolute error (MAE). After training on the training set at every epoch, an evaluation was performed using the validation set. When the training was completed, the weight parameter of the model with the best MAE in the validation set was stored.

Figure 2 plots the losses for the training and validation sets at each epoch. The model with the best MAE was selected from the validation set for testing. As a result, the MAE of the hybrid age estimation model for the 1886 test data was 2 years and 11 months (2.95 years). A scatter plot of the estimated and actual ages is shown in Figure 3.

In addition, as shown in Table 2, it is confirmed that the age estimation model designed using the hybrid method performs better than the model designed using only ResNet50 or ViT. For comparison, the mean square error (MSE), root mean square error (RMSE), and coefficient of determination ( ) were used as the metrics.

Table 2. Comparing the performance of models designed with each method

	ResNet50	ViT	Hybrid
MAE	3.20	4.09	2.95
MSE	18.59	32.57	16.76
RMSE	4.31	5.70	4.09
	0.95	0.92	0.95

As shown in Fig. 4, the estimation is highly accurate for young people at an age with distinct growth characteristics. However, as aging progressed, the error tended to increase.

Finally, we used attention rollout²⁵, which is a suitable method for visualization in a transformer-based structure, to analyze the model’s results.

The results for young children were superior and similar to those of existing research^12–14. This is because the radiological features and changes are quite obvious in young children.

The structure of the oral maxillofacial region can be observed on a single two-dimensional image using panoramic radiography. As the principle of panoramic radiography is the combination of tomography and scanning, only the structure located in the image layer can be clearly obtained, interpreted, and have diagnostic value²⁶. Therefore, even if we take a panoramic radiograph of the same patient, extremely different images can be obtained depending on the positioning of the patient, type of equipment, and skill of the radiographer. Therefore, deep learning models that are more meaningful and can be used in clinical practice should be built through training with panoramic radiographs obtained by multiple radiographers using multiple pieces of equipment. We used images obtained by three pieces of equipment by more than 15 radiographers. In addition, using only Korean data (approximately 10,000 data), it was possible to effectively learn the differences by age by minimizing the differences in racial factors.

Because most studies related to age prediction use CNN-based models, local features were learned well, but global information was not. This study proposes an age prediction model that learns global information and local features through a hybrid model using a CNN-based ResNet50 and transformer-based ViT. The results confirmed that the proposed model effectively predicted age by performing better than ResNet50 or ViT (Table 2).

We examined the basis for the age determination of our hybrid model through attention rollout, focusing on the specific areas that the model considers. In young children, dentition development is often considered an important factor in age determination (Fig. 5a). One noteworthy aspect was that the focus was placed more on the mandible than the maxilla. This is thought to be because there is more free in overlapping adjacent structures. For individuals in their late teens to early twenties, the focus of age determination was on the second and third molars of both the maxilla and mandible (Fig. 5b). This is believed to be due to distinct changes in the development of these teeth during this period. In older patients, age estimation is primarily based on the overall alveolar bone structure, and age-related or periodontal-induced alveolar bone loss appears to be a significant factor in determining age (Fib. 5c, d). Thus, it was evident that the proposed model uses logical and important factors rather than relying on unclear elements as the basis for age determination.

The proposed age estimation model designed using the hybrid method of the ResNet50 and ViT models showed better performance in predicting age by showing better performance than those using ResNet50 or ViT, respectively. We expect this model to perform better and be used effectively in clinical field.

Ethical approval and informed consent

This study was conducted in accordance with the guidelines of the World Medical Association Helsinki Declaration for Biomedical Research Involving Human Subjects. It was approved by the Institutional Review Board of Daejeon Dental Hospital, Wonkwang University (W2304/003-001). The IRB waived the need for individual informed consent, either written or verbal, from the participants, owing to the non-interventional retrospective design of this study and because all data were analyzed anonymously.

Author contribution statements

HGY, THL, and JPY designed the study and prepared the manuscript. HGY, BDL, and WL selected the appropriate cases, obtained and reviewed the imaging data, and analyzed the results. THL and JPY constructed the AI model and analyzed the results.

Data availability

The data used in this study can be made available, if required, within the regulation boundaries for data protection.

Schmeling, A., Geserick, G., Reisinger, W. & Olze, A. Age estimation. Forensic Sci. Int.165, 178–181 (2007).
Lee, Y. H., Won, J. H., Auh, Q. S. & Noh, Y. K. Age group prediction with panoramic radiomorphometric parameters using machine learning algorithms. Sci. Rep.12, 11703 (2022).
Wang, X. et al. DENSEN: a convolutional neural network for estimating chronological ages from panoramic radiographs. BMC Bioinform.23, 1-15 (2022).
Gurses, M. S. & Altinsoy, H. B. Evaluation of distal femoral epiphysis and proximal tibial epiphysis ossification using the Vieth method in living individuals: applicability in the estimation of forensic age. Aust. J. Forensic Sci. 53, 431-447 (2021).
Ekizoglu, O. et al. Forensic age diagnostics by magnetic resonance imaging of the proximal humeral epiphysis. Int. J. Leg. Med. 133, 249-256 (2019).
Iglovikov, V. I., Rakhlin, A., Kalinin, A. A. & Shvets, A. A. Paediatric bone age assessment using deep convolutional neural networks. Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, MICCAI 2018. Proceedings 4, 300-308 (2018).
Lacruz, R. S., Habelitz, S., Wright, J. T. & Paine, M. L. Dental enamel formation and implications for oral health and disease. Physiol. Rev.97(3), 939–993 (2017).
Dudar, J. C., Pfeiffer, S., & Saunders, S. R. Evaluation of morphological and histological adult skeletal age-at-death estimation techniques using ribs. J. Forensic Sci.38(3), 677-685 (1993).
Jelliffe, E. P., & Jelliffe, D. B. Deciduous dental eruption, nutrition and age assessment. J. Trop. Pediatr.19(supp2A), 193-248 (1973).
Willems, G., Van Olmen, A., Carels, C. & Spiessens, B. Dental age estimation in Belgian children: Demirjian's technique revisited. J. Forensic Sci.46(4), 893-895 (2001).
Cameriere, R., De Angelis, D., Ferrante, L., Scarpino, F. & Cingolani, M. Age estimation in children by measurement of open apices in teeth: a European formula. Int. J. Legal Med.121(6), 449-453 (2007).
Shen, S., Liu, Z., Wang, J., Fan, L., Ji, F. & Tao, J. Machine learning assisted Cameriere method for dental age estimation. BMC Oral Health.21(1), 1-10 (2021).
Galibourg, A. et al. Comparison of different machine learning approaches to predict dental age using Demirjian’s staging approach. Int. J. Legal Med.135, 665-675 (2021).
Tao, J. et al. Dental age estimation: a machine learning perspective. The International Conference on Advanced Machine Learning Technologies and Applications (AMLTA2019). 722-733 (2020).
Vila-Blanco, N., Carreira, M. J., Varas-Quintana, P., Balsa-Castro, C. & Tomas, I. Deep neural networks for chronological age estimation from OPG images. IEEE Trans. Med. Imaging.39(7), 2374-2384 (2020).
Milošević, D., Vodanović, M., Galić, I. & Subašić, M. Automated estimation of chronological age from panoramic dental X-ray images using deep learning. Expert Syst. Appl. 189, 116038 (2022).
Mualla, N., Houssein, E. H., & Hassan, M. R. Dental Age Estimation Based on X-ray Images. Comput. Mater. Contin. 62(2), 591-605 (2020).
Kim, J., Bae, W., Jung, K. H. & Song, I. S. Development and validation of deep learning-based algorithms for the estimation of chronological age using panoramic dental x-ray images. Proc. Mach. Learn. Res. (2019).
Dosovitskiy, A. et al. An image is worth 16 x 16 words: Transformers for image recognition at scale. Preprint at http://arXiv.org/2010.11929 (2020).
Aljameel, S. S. et al. Predictive Artificial Intelligence Model for Detecting Dental Age Using Panoramic Radiograph Images. Big Data Cogn. Comput.7(1), 8 (2023).
Wallraff, S., Vesal, S., Syben, C., Lutz, R. & Maier, A. Age estimation on panoramic dental X-ray images using deep learning, InBildverarbeitung für die Medizin 2021: German Workshop on Medical Image Computing. Proceedings 186-191 (2021).
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. IEEE conference on computer vision and pattern recognition. Proceedings 770-778 (2016).
Vaswani, A. et al. Attention is all you need. 31st Conference on Neural Information Processing Systems, Advances in neural information processing systems, 5998-6008 (2017).
Ba, J. L., Kiros, J. R. & Hinton, G. E. Layer normalization. Preprint at http://arXiv:1607.06450 (2016).
Abnar, S. & Zuidema, W. Quantifying attention flow in transformers. Preprint at http://arXiv:2005.00928 (2020).
Yeom, H. G. et al. Development of a new ball-type phantom for evaluation of the image layer of panoramic radiography. Imaging Sci. Dent.48(4), 255-259 (2018).

No competing interests reported.

Download PDF

Journal Publication

published 09 Dec, 2023

Read the published version in Scientific Reports →

Editorial decision: Major revision
11 Oct, 2023
Reviews received at journal
08 Oct, 2023
Reviews received at journal
03 Oct, 2023
Reviewers agreed at journal
27 Sep, 2023
Reviewers invited by journal
27 Sep, 2023
Editor assigned by journal
27 Sep, 2023
Editor invited by journal
02 Aug, 2023
Submission checks completed at journal
02 Aug, 2023
First submitted to journal
31 Jul, 2023

You are reading this latest preprint version

A deep-learning artificial intelligence system for estimating chronological age using panoramic radiography in the Korean population

Status:

Journal Publication

Version 1

Abstract

Figures

Introduction

Material and Methods

1. Data set and image pre-processing

2. Architecture of deep learning model

3. Learning details

Results

Discussion

Conclusion

Declarations

References

Additional Declarations

Status:

Journal Publication

Version 1