In clinical practice, it is crucial to determine the growth stage accurately. Orthodontists, pediatric dentists, and even pediatricians often use bone age to determine the growth stage to guide treatment planning and drug selection, especially in the treatment of growth modification and intervention before the growth spurt. Using growth and development potential to treat disorders can lead to better efficacy.
The CVM described by Baccetti et al.[23] is widely used in clinical practice, but the experience and seniority of the clinicians have an enormous impact on judgment.[12]. Gabriel et al.[11] tested the reproducibility of the CVM with 10 orthodontists and showed an interobserver agreement of less than 50%. Nestman et al.[12] also reported significant variability in the results of clinicians' application of CVM to assess growth and development, leading to unreliable guidance for clinical application.
Applying artificial intelligence to evaluation may be a solution to the problem. Kök et al.[27] defined 19 reference points on C2, C3 and C4 and performed 20 different linear measurements to create a dataset to train neural network models. They compared the accuracy of several models, showing that the model with the highest accuracy, ANN-6, had an accuracy of 0.8687. The AI model and human observers observed an average of 58.3% agreement. Hakan Amasya et al.[18] marked 26 reference points on the vertebrae of the lateral cephalograms for measurement and built a dataset to compare the model's output with the results of human observers to validate the effectiveness of the CVS AI model. Hakan Amasya et al.[19] then developed five different machine learning classifier models and compared the performance analysis of CVM assessment with the highest agreement between ANN models and experts (κ = 0.926). Most of these studies used
Most of the previous studies measurements of vertebral body morphology to train AI models, such as the ratio of the posterior height to the anterior height of vertebral bodies[18-22]. This approach is labor-intensive and time-consuming and results in a limited number of lateral cephalograms for training, while the sample size for training dramatically affects the final performance of the AI model. On the other hand, CVM is commonly evaluated in clinical settings by direct vision, allowing easy access to many samples for training. Deep learning algorithms are the dominant technology in the field of artificial intelligence. It can perform feature extraction in an automated manner, which allows researchers to extract differentiated features with minimal domain knowledge and human effort. Therefore, combining the two makes it possible to train highly performing-AI models. In this study, experts created a dataset with a sample size of 10,200 and carefully assessed and collated it, resulting in a large and high-quality dataset.
In addition, unbalanced sample distribution (the difference in sample size between different classifications is more than ten times) will result in classifications with small sample sizes containing too few features, and extracting features from them will not be easy. Even if a classification model is obtained, it is prone to the problem of over-fitting due to over-reliance on limited data samples, and the robustness and accuracy of the model will be poor when the model is applied to new data. In this study, the sample size for each CVS classification in this dataset was 1700, effectively avoiding the problems associated with unbalanced samples.
In this study, we developed a new system, called the psc-CVM assessment system, for CVM assessment based on deep learning. Compared to the methods reported in the literature, this study assessed CVM directly according to extracted cervical vertebrae shape. The proposed system was designed as three parts with different roles, each operating in a specific order. 1) Position Network for locating the position of the second (C2), third (C3), and fourth (C4) cervical vertebrae; 2) Shape Recognition Network for recognizing and extracting the shapes of C2, C3, and C4; and 3) CVM Assessment Network for assessing CVM according to the shapes of C2, C3, and C4.
In the Position Network, YOLOv3[24] was selected as the core of operation for this phase. Unlike R-CNN[28], Fast-R-CNN[29], and Faster-R-CNN[30], the latter is a one-stage target detection network. Compared with other mainstream target detection methods, YOLOv3 can achieve state-of-art accuracy and has a significant advantage in terms of speed.
In the Shape Recognition Network, this study proposed an EfficientNet-B0-based dense key point extraction network as the core of operation for this phase, using the link between contour key points and morphology to extract vertebral contours by predicting dense key points accurately. It must emphasize that spikes, or islands of bone, observed along the inferior border of the cervical bodies (C2, C3, and C4) in the anterior and posterior regions will interfere with the CVM assessment[31]. These small osseous structures are not part of the vertebral body but are free-floating. Spikes are often mistaken for a part of cervical bodies by young clinicians when they stage the cervical vertebrae. Using the contour of the vertebral body marked by senior experts as a training sample can make the system avoid such mistakes.
Due to the continuity of the skeletal growth process, there are transitional phases in adjacent growth cycles, such as the CVS3 phase containing growth spikes, and vertebral morphology often does not show typical morphology due to the active growth pattern. CVS uncertainty is unavoidable for samples close to the boundary of two stages[32]. This feature leads to inconsistent labeling results across clinicians and further affects the accuracy of the system. Setting the commonly used hard-label as the prediction classification target of the network could not be conducive to the network learning the features of the data better. Therefore, in the CVM Assessment Network, we use soft-label as the prediction target of the network[26]. Intuitively, this approach makes the system not wholly trust the label, so there are some ambiguous samples on edge in the data, and the model will not be affected by subjective classification, which is also why soft-label can increase the generalization of the system. At the same time, this method is also helpful in improving the accuracy of CVM assessment for the system.
Although this study tried several methods to improve the accuracy of CVM assessment of the system, the assessment effect of some CVS was still not ideal, resulting in an overall accuracy of 70.42%. CVS1 and CVS6 lasted relatively long, and the morphology of the three vertebrae was stable. This feature explains why the system performs best on CVS1 (F1 score: 80.00%) and CVS6 (F1 score: 81.51%), with poorer F1 scores on CVS2 (F1 score: 60.03%) and CVS3 (F1 score: 63.40%). And the whole system is not end-to-end but divided into three steps. This working framework would carry errors generated in the previous step into the next. Moreover, we found that the overall ICC between the psc-CVM assessment system and the expert panel was 0.946, indicating that the system in this study was significantly consistent with the expert panel in the CVM assessment.
In this study, the system only focused on the vertebrae. At the same time, other regions in the lateral cephalograms may have valid information to help CVM assessment, which was not included in training in this study. In subsequent studies, it may be possible to unify the entire process into an end-to-end all-in-one system, coordinate and optimize the various steps, add valid information related to CVM, and improve the system's accuracy.
The above results indicate that the psc-CVM assessment system in this study is stable and significantly consistent with the expert assessment results. Nevertheless, in a clinical setting, where diagnosis and treatment planning require the integration of various factors, the system still needs to be able to make systematic decisions like an expert due to the limitations of the technology. Therefore, the psc-CVM assessment system is only used as an auxiliary guidance tool in the clinical setting, providing valuable reference information for clinicians who lack the clinical experience. The system will make the treatment process more precise and effective and is now available for integration into the software of medical companies for free use by clinicians. In addition, the system will be regularly monitored and upgraded in future studies to ensure its stability in real-world applications.