The sample selection did have some bias in each group, but the broad baseline was not problematic and would not affect the study results. And the total sample size and the sample size of each group were completely adequate.
Demirjian method shows a trend of overestimating chronological age in both male and female groups of Southern China population, and the error was large, so it is considered not suitable for the age estimation of southern China population. This result is consistent with many studies in China6, 13, 14 or Asian region15. Demirjian method was developed for age estimation of French-Canadians children, and the overestimation of the results in and around China suggests that the results of the age estimation are probably matched by the geographical proximity of the population. The poor linear relationship between the two may be the reason why this method cannot give a good estimation result, which is fundamentally the difference between the development patterns. The Willems method had a smaller margin of error than the Demirjian method. The Willems parameter modification was also due to the Demirjian method's unsatisfactory estimation effect in Belgian children, so the overall estimation results were revised down, and thus showed a corresponding better estimation performance in the population of southern China. Similarly, the results of the Willems method are similar in China6, 16 and its neighboring regions17.
The new model proposed new weights, has eliminated deviations between chronological age and dental age in the total sample and has shown good results in improving the accuracy of estimation. This suggests that the reconstruction of a linear model can effectively improve the accuracy of estimation when the original linear correlation is good, to propose a more accurate model for the population in each region. At the same time, this study also proves that morphology-based dental age estimation is an effective method to estimate the chronological age of young children and adolescents.
Optimization of the model by machine learning resulted in a significant reduction in MAE for each age group, with the GBDT algorithm performing best in the optimization. Possible reasons for this optimization include (1) Machine learning eliminates the use of transformation tables, reducing the error in the model process. (2) The Boosting algorithm is first trained on the full sample set, and the training produces a series of weak learners. In the next round of training, the training set is kept constant but the weights of the correct samples are reduced and the weights of the incorrect samples from the previous round are increased. A function is found to fit the residuals from the previous round, generating a strong learner until the residuals are sufficiently small or a set maximum number of iterations is reached to stop the iterations. As tooth development is not completely linear and teeth tend to develop faster and then slower, such a general, non-linear estimation method may be more suitable for age assessment based on the developmental pattern of tooth mineralization.
The GBDT scheme is flexible enough to handle various types of data, including continuous and discrete values, and also takes full account of the weights of each classifier. GBDT has been shown to excel in a number of areas12, 18, 19, but our study is the first to apply GBDT to dental age estimation.Our study also demonstrates that GBDT can break the accuracy limits of traditional dental age estimation and lead to better results for forensic age estimation. The GBDT algorithm in the Boosting framework can estimate the age of the population more accurately, suggesting that machine learning can effectively improve the accuracy of dental age estimation models for all age groups, and with a sufficient number of prior samples, a model with higher accuracy can be proposed for each region. Compared to traditional dental age estimation methods, the age estimation models obtained by machine learning algorithms are more easily applied in other regions and are simpler to implement However, it is noteworthy that seemingly large enough datasets and enough learning algorithms have existed for decades, yet despite thousands of papers applying machine learning algorithms to medical data, few papers have made meaningful contributions to solving practical problems. This study proposes a practical age estimation scheme to facilitate the application of dental estimation in population forensics in South China, and the conclusion that GBDT is effective in improving the efficacy of age estimation, as demonstrated in this study, also provides a reference for the development of forensic dental age assessment schemes in other regions.
Unlike end-to-end estimation schemes with no human intervention and automatic extraction of dental features for age estimation using CNN algorithms20, this study manually determines the staging and then uses machine learning algorithms to continuously iterate to obtain the most appropriate age estimation model, and this estimation process can be accomplished using a lightweight computer terminal, which eliminates the tediousness of using a large server for model training. In comparison, the MAE obtained by the GBDT algorithm (< 0.5) is also lower than that of the end-to-end solution (0.83). Machine learning algorithms bring more possibilities for the wide application of age estimation in forensic science.