This article aims to have an outlook on the efficacy and accuracy of different segmentation methods in dentomaxillofacial radiology.
Jaw Related Structures
Making virtual 3D models of pertinent anatomic regions of interest, such as the mandible, maxilla, and teeth, from CBCT scans is a crucial first step in the identification of dentofacial anomalies and deformities. 3D representations of a patient at various times can be superimposed to visually and quantitatively analyze orthodontic changes.(Cha et al. 2021) Segmenting craniomaxillofacial and jaw-related structures including the mandible, maxilla, maxillary sinus, mandibular canal, condyle and related structures, and alveolar sockets is a labor-intensive, time-consuming, and highly operator-dependent task. (Wang et al. 2021) Thus there is growing interest in the application of semi-automatic and fully automatic segmentation models for different structures in CBCT, Panoramic, and MDCT imaging. Although panoramic images present a general field of view with a low cost, this radiography technique has limitations in representing 3D structures. (Ghaeminia et al. 2009) CBCT images on the other hand provide full construction of 3D structures with lower dose and cost compared to MDCT. Therefore, CBCT imaging is the major modality used for the segmentation of jaws and associated anatomies.(Janssen et al. 2017)
3D models of craniomaxillofacial structures are crucial for diagnosis, treatment planning, and patient communication, particularly in the field of computer-assisted surgery.(Verhelst et al. 2021) According to our findings, the employment of automatic segmentation techniques with AI-based models in the mandible has shown great promise.
• Mandible
One article proposed a segmentation model for mandible in panoramic radiographs (Abdi et al. 2015) with comparable results to manual segmentation. Another study (Rueda et al. 2006) performed segmentation by an active appearance model on CT images. 2 studies (Verhelst et al. 2021) (Antila et al. 2008) evaluated mandible automatic segmentation in CBCT radiographs and reached a Dice similarity coefficient higher than 0.91. The inability of the U-Net AI technique to differentiate between cortical and medullary bone is one of its limitations. Another study (Wang et al. 2021) proposed a mixed-scale dense (MS-D) CNN that can segment both jaws and teeth simultaneously on CBCT images. This segmentation method was compared to the Binary segmentation (which can either segment the jaw bone or the teeth) and showed advantages such as not generating conflicting labels. The MS-D network showcased impressive performance in segmenting the jaw and teeth, exhibiting the Dice Similarity Coefficients (DSC) of 0.93 for the jaw and 0.95 for the teeth. The edges of the bony structures were where the majority of the segmentation mistakes of the model occurred.
• Maxilla and Maxillary sinus
One of the AI models (Cha et al. 2021) was developed based on Panoptic segmentation which involves a fusion of both semantic segmentation and instance segmentation. This combination makes the segmentation of both countable objects and uncountable regions possible. It was used to segment various structures in dental panoramic radiographs including the maxillary sinus, maxilla, mandible, mandibular canal, normal teeth, treated teeth, and dental implants. Based on the results, the mandibular canal segmentation exhibited the lowest panoptic quality (PQ) and segmentation quality (SQ) scores, aligning with the Intersection over Union (IoU) results. Overall result indicated that this segmentation model was not good enough for teeth segmentation.
• Condyle
Condyles are frequently difficult structures to segment because of their complex and varied shape, poor bone density, and prominently shaded and superimposition-filled glenoid fossa. One article (Xi et al. 2014) proposed a semi-automatic segmentation model for condylar regions.
• Bone
One of the included studies (Minnema et al. 2019) compared AI models including MS-D Net, U-Net, Res-Net, and the snake evolution algorithm (a model-driven segmentation method) in bone segmentation in CBCT scans affected by metal artifacts interactively. Findings suggested that the DSCs of the models were 0.87, 0.87, 0.86, and 0.78 respectively. The CNNs employed in this study demonstrated superior performance compared to the current clinical benchmark (the snake evolution algorithm) in segmenting bony structures and accurately classifying metal artifacts as background in CBCT scans which is attributed to the CNNs' capacity to learn distinctive features that differentiate between bone and metal artifacts. The results suggested that the MS-D network architecture can be applied across a diverse array of applications.
• Mandibular canal
Precise knowledge of the mandibular canal's exact location is crucial for planning appropriate oral-maxillofacial surgeries, including procedures like implant placement and third molar extractions. (Jung and Cho 2014) Anatomical variations in the course, size, and shape of the mandibular canal render it difficult to accurately detect particularly in CBCT due to low contrast resolution.(Valenzuela-Fuenzalida et al. 2021) The segmentation model designed by Jeoun et al.(Jeoun et al. 2022) was Canal-Net, which is a multi-task learning framework with bases of 3D U-Net. The model successfully acquired knowledge of local anatomical variations of the canal by integrating spatiotemporal features besides capturing the global structural continuity information of the canal which resulted in a 0.87 DSC. Also, the comparison of 2D U-Net, SegNet, 3D U-Net, MPL 3D U-Net, and ConvLSTM 3D U-Net was conducted. The lower accuracies in segmentation of mandibular canal were achieved by 2D networks (2D U-Net and SegNet) and the higher accuracies were achieved by the MPL 3D U-Net and ConvLSTM 3D U-Net. The MPL 3D U-Net faced challenges in delineating boundaries, particularly around the mental foramen area. In contrast, the ConvLSTM, by learning anatomical context through spatiotemporal features, achieved smoother boundaries and consistent accuracies in the canal volume. Consequently, the Canal-Net demonstrated the most accurate segmentation of the entire MC volume, learning global structural continuity through MPL and anatomical context through ConvLSTM. Another study (Kwak et al. 2020) also compared 2D and 3D networks in automatic segmentation of the mandibular canal in CBCT images. The models were based on 2D SegNet, 2D and 3D U-Nets. The results indicated that the 2D U-Net had a global accuracy of 0.82, the 2D SegNet achieved a global accuracy of 0.96, and the 3D U-Net outperforms all with the highest global accuracy of 0.99. However, there was a limitation with 3D networks in situations where the cortical layer around the canal was unclear.
Lesions
Segmentation and detection of lesions is a challenging task that can be done using periapical, panoramic, and CBCT imaging. (Kruse et al. 2015)
• Periapical lesions
Periapical diseases are mostly inflammatory lesions that can be seen in CBCT radiographs with high 3D resolution, and better visible canal spaces, without the distortion and superimposition of neighboring structures compared to conventional radiographs. (Davies et al. 2015) Diagnosis of these lesions is challenging due to the variety of diseases with the same symptoms and not having radiographic signs. (Patel et al. 2009) In the study of Orhan et al.(Orhan et al. 2020) a deep learning process with a U-Net-based algorithm measured the volumetric features quite similar to the manual method, however, it did not outperform it. The calculated volume slices by the manual method were 191.41 and by the automatic method was 143.84. Neighboring anatomical structures like the maxillary incisive canal, inferior alveolar canal, mental foramen, maxillary sinus, and nasal fossa may impact AI analysis. In another study (Zheng et al. 2021) the constrained U-Net algorithm outperformed the data-driven method.
• Maxillary sinus lesions
Precise three-dimensional segmentation of the maxillary sinus is vital for various diagnostic and treatment purposes including evaluating sinus changes and lesions, monitoring remodeling over time, conducting volumetric analysis, and generating 3D virtual models.(Janner et al. 2020) (Al Abduwani et al. 2016)
Jung et al.(Jung et al. 2021) compared a 3D nnU-Net algorithm for segmentation of the maxillary sinus into the maxillary bone, air, and lesion performance to expert manual segmentation. Besides being faster than manual segmentation, the model achieved DSCs of 0.93 for air and 0.76 for lesion, respectively. low segmentation performance when the sinus is filled with inflammatory material and severe maxillary sinusitis were limitations of the model. Another study (Hung et al. 2022) proposed an algorithm based on V-Net and Support vector regression to segment mucosal thickening and mucosal retention cysts in both low-dose and full-dose CBCT images. The result was DSCs of 0.66–0.73 for mucosal thickening and 0.68–0.79 for mucosal retention cysts. One of the limitations of this study is that the performance of the model in segmenting unilateral or partial coverage lesions was not evaluated.
Craniofacial structures segmentation:
• Airway
Airway analysis is done based on 3D volumetric images such as MDCT and CBCT. A part of the quantitative airway assessment is the segmentation of the structure.(Weissheimer et al. 2012) This is useful for the diagnosis and treatment planning of pulmonary diseases, assessment of obstructive sleep apnea patients, the prediction of airway changes after orthognathic surgery, and for orthodontic and growth modification treatments.(AlQahtani et al. 2021) However, manual and semi-automatic segmentation techniques are time-consuming.(Alsufyani et al. 2012) Five articles included in this review have developed deep learning-based models for automatic segmentation of the airways. Three of them used CBCT and two used MDCT.
One study (Park et al. 2021) designed a regression neural network-based deep-learning model to conduct airway volume measurements based on CBCT scans. Using the reference plane, the deep learning model of this study splits the airway into the nasopharynx, oropharynx, and hypopharynx segments entirely automatically. The measured volume differences were 48.620 mm3, 37.987 mm3, and 50.010 mm3 in the nasopharynx, oropharynx, and hypopharynx, respectively. Another study (Sin et al. 2021) compared a U-Net-based to human method by ITK-SNAP software. The average volume of the pharyngeal airway was found to be 18.08 cm3 by the human observer and 17.32 cm3 by the artificial intelligence. This study concluded that the developed models performed equally as well as the expert at a reduced time.
• Vocal tracts
Ruthven et al. (Ruthven et al. 2021) developed a U-Net algorithm for the segmentation of vocal tracts and multiple groups of articulators on 2D MR images. The automated model performed segmentation in 6 classes including head, soft palate, jaw, tongue, vocal tract, and tooth space, and reached a 0.92 median Dice coefficient and a 5mm median general Hausdorff distance. The most accurate segmentation belonged to the head class and the least belonged to the soft palate and tooth classes. These results are due to the largest and the smallest number of pixels respectively. The limitation mentioned in this study is that occasionally the method struggled to maintain the integrity of small gaps between the soft palate and the pharyngeal wall.
• Masseter
Masseter is One of the muscles of mastication most affected by Bruxism.(Jiménez-Silva et al. 2017) Ultrasonography (US) imaging enables accurate and exact measurements of the muscle thickness/ width and can identify the changes in the muscle.(Blicharz et al. 2021) Two deep learning-based algorithms were developed and are included in this review.
One study (Orhan et al. 2021) created a D-CNN model, utilizing the U-Net, Pyramid Scene Parsing Network (PSPNet), and Fuzzy Petri Net (FPN) architectures in ultrasonography images of individuals experiencing bruxism. The FPN model achieved an accuracy of 0.985, while the PSPNet demonstrated performance of 0.947, and the U-Net exhibited an accuracy of 0.969.
• Others
Other articles worked on models for segmenting structures including Bone/Vertebrae/Vessels which were developed based on MDCT images. 2 others presented algorithms for segmenting the articular disc of the temporomandibular joint and parotid gland on MRI images.
One (Steybe et al. 2022) developed a multiscale stack of 3D convolutional neural networks based on U-Net architectures for segmentation of head CT structures including: Bones (Viscerocranium/skullbase, Nasal septum, and Mandible), Paranasal sinuses (Frontal sinus, Sphenoid sinus, and Maxillary sinus), Canals (Nasolacrimal duct, Carotid canal, and Jugular foramen), Foramina, and Soft tissue (Ocular globe, Extraocular muscles, and Optic nerve). The mean DSC of the structures achieved by the presented model was 0.81 (Lowest: 0.61 for mental foramen, and Highest: 0.98 for mandible), and the mean Surface DSC was 0.94 (Lowest: 0.87 for mental foramen, and Highest: 0.99 for mandible). Of limitations mentioned in this study is that modified anatomical structures, such as those brought on by tumors, other pathologies, and trauma injuries, would compromise the existing model's segmentation accuracy.
Treatment for head and neck cancer often involves radiation therapy. One of the difficult things about planning radiation therapy is figuring out the precise target volume and the organs that are at risk nearby. Among its many drawbacks are the extreme labor intensity, length of time required, and reliance on radiation oncologists' anatomical expertise for the manual identification of these organs. (Lim and Leech 2016) The automatic model developed by one of the studies (Zhong et al. 2021) is a U-net-based full CNN for the segmentation of organs at risk for head and neck cancer radiotherapy on CT images. All other DSCs generated in this study, except for the optic nerve and chiasm, were bigger than 0.7. Direct clinical consequences of radiotherapy arise from the delineation of the target volume and surrounding organs at risk. The dosimetric impact of the automated segmentation results should be taken into consideration when evaluating them. This is one of the limitations of this study.
One of the studies (Ito et al. 2022) compared the TMJ articular disk segmentation abilities of 3 AI methods including 3DiscNet, U-Net, and SegNet-basic. The DSCs were 0.70, 0.46, and 0.74 respectively. The limitations mentioned in the study were using images from a single institution and training the algorithm by the 'ground truth' manual segmentation images created by a limited number of experts.
Tooth and pulp cavity segmentation
• Pulp cavity segmentation
Recently several models have been developed for pulp segmentation on CBCT and micro-CT radiographs. Zheng et al.(Zheng et al. 2021) and Penaloza et al.(Marroquin Penaloza et al. 2016) developed algorithms for pulp segmentation on CBCT images. With the deposition of secondary dentin over time, the size of the pulp chamber decreases, so this can be used for age estimation based on the pulp chamber volume. Zheng et al. concluded that the model can successfully perform age estimation; however, the algorithm was developed based on the first molars but it is applicable for single and multi-radicular teeth. Penaloza et al. showed that applying the same setting parameters to all teeth for automatic and manual segmentation is impossible. Furthermore, they concluded that manual segmentation is time-consuming. Four other articles developed models using CBCT and micro-CT from extracted teeth. Penaloza et al. used micro-CT of the extracted teeth from which CBCT had been obtained as the ground truth. This article showed promising results and hope for future use in endodontic diagnosis and therapy.
• Tooth segmentation
Tooth identification is quite important in accurate diagnosis, treatment planning, and better clinical decision-making. Manual annotation is a time-consuming process since there is low contrast between cementum, dentin, and bone.13 articles in this review worked on tooth segmentation. 4 used panoramic, 6 used CBCT and 3 used MDCT for algorithm development. During recent years deep learning-based algorithms for segmentation have been proposed.(Minaee et al. 2022) Almost all models reported improved accuracy and a decrease in time for segmentation.
Regarding the assessment of quality in each study included, the most important focus was on the accuracy of segmentation methods. The reference standard in 81% of the studies was reported to have a low risk of bias in the analysis. The AI technology applied for the ultimate output was very standardized, with no impact on flow or time and it has been considered to be in a category of lower risk. The index test was reported to have a low risk of bias in 88% of cases in the current systematic review.
This study has a small number of limitations, although we used a detailed methodology, it is possible that some articles that may have been included were missed, due to the wide range of segmentation methods used in dental imaging. In addition, this systematic review is an all-inclusive analysis of different segmentation methods, thus complete evaluation and in-depth assessment of each of the segmentation methods were not performed. Therefore, further studies that evaluate certain subjects and different clinical applications and their effectiveness are required.