Tumor segmentation plays a crucial role in medical image analysis[20]. However, to the best of our knowledge, there have been no studies focusing on the application of AI in ameloblastoma segmentation, although similar studies have been conducted on other types of tumors. For instance, Abdolali et al. proposed an asymmetric analysis-based method for automatic segmentation of jaw cysts, which demonstrated favorable performance in experimental results[21]. In the case of keratocystic odontogenic tumor, the average Dice coefficient achieved was 0.80. Furthermore, Paderno et al. achieved a Dice coefficient of 0.65 for the AI model in segmenting oral squamous cell carcinoma, as observed in their experiment[22]. Those experimental results provide evidence of the effectiveness of AI in segmenting maxillofacial tumors. It is widely recognized that the clinical application of segmentation becomes more valuable as the efficiency of the process improves. In our research, we proposed a Mask RCNN model specifically designed for automatic segmentation of ameloblastoma based on CT images. In terms of accuracy assessment, it is generally considered reliable when the intersection over union value exceeds 0.7[23]. Our experimental results demonstrate that when the IoU is set to 0.75, the model achieves accurate segmentation with an AP value of 0.826. This indicates the effectiveness of Mask RCNN in accurately partitioning ameloblastoma. This success may be attributed to the utilization of Mask R-CNN, which is a versatile and compact framework for object instance segmentation. This model not only detects targets within the image but also provides pixel-level segmentation results for each individual target[24]. The PR curves clearly indicate that the model's performance starts to decrease significantly when the IoU is set above 0.9. On the other hand, the model consistently demonstrates reliable performance when dealing with IoU values between 0.5 and 0.75. The results suggest that the AI model we developed exhibits a relatively accurate segmentation ability for ameloblastoma, although it may not possess an absolute accuracy in segmentation. Furthermore, we assess the generalization of our model through the evaluation of externally validated datasets, and the results are also satisfactory. This indirectly indicates that the model possesses good robustness.
Previous studies commonly employed traditional semantic segmentation models in similar contexts. However, these methods often face challenges related to imbalanced classification[25]. In contrast, our study utilizes Mask R-CNN, an instance segmentation framework known for its superior feature extraction capability. By selecting Mask RCNN as our deep learning model, we benefit from its ability to provide accurate bounding boxes and pixel masks for each target, enabling pixel-level segmentation[26]. Furthermore, Mask R-CNN exhibits versatility in handling targets of different sizes, shapes, and quantities, making it highly adaptable across various application scenarios, including medical image segmentation[27]. During our investigation, we found that three scenarios may lead to less efficient segmentation. Firstly, when the tumor is situated in the maxilla, challenges arise due to the thinner bone cortex of the maxilla and the proximity of multiple sinus cavities. Maxillary lesions often lead to the destruction of the nasal cavity, ethmoid sinus, and sphenoid sinus, causing the tumors to locally extend into the sinus cavity. As a result, the model faces difficulty in accurately distinguishing the boundary between the sinus cavity and the tumor. Secondly, when the tumor extensively destroys the jaw cortex, the damaged jaw bone loses its structural continuity, posing difficulties for both human experts and AI algorithms in accurately determining the tumor boundary. Lastly, in cases where a section of the ameloblastoma boundary overlaps with the crown or root of a tooth, precise distinction becomes challenging due to the high density exhibited by both structures on imaging.
In our study, CT imaging was selected over cone-beam computed tomography (CBCT) for the diagnosis of jaw tumors due to its unique advantages. Firstly, CT offers higher spatial resolution and a wider scanning range[28], enabling more accurate segmentation of jaw tumors and facilitating the development of precise treatment plans for ameloblastoma. Secondly, CT images exhibit a highly linear relationship between pixels, which enhances the stability and reliability of computer analysis and reconstruction[29]. Moreover, unlike previous experiments that utilized only horizontal CT images, our study employed three-dimensional CT images, providing a more comprehensive representation of the anatomical structures and pathology. This allowed our model to capture additional spatial information, further enhancing its performance.
The training of deep learning models typically relies on large-scale datasets to ensure the robustness of the final results[30]. However, certain rare medical conditions, like the ameloblastoma studied in this research, often present limited sample sizes that do not meet the requirements for deep learning. Consequently, there has been a growing research focus on leveraging deep learning techniques with limited sample sizes, resulting in notable advancements in this field in recent years[31, 32]. Nonetheless, deep learning based on small samples faces its own challenges, with overfitting and data imbalance being among the most common issues[33]. To address these concerns during the construction of Mask R-CNN, we fine-tuned the classifier's weights and employed data augmentation techniques to mitigate the impact of data imbalance. Furthermore, we obtained images from other medical centers to assess the model's extrapolation capability, which ultimately yielded satisfactory results.
Several studies have explored the application of artificial intelligence and radiomics in the diagnosis and treatment of ameloblastoma, with a focus on differential diagnosis. Gomes et al. conducted research on MRI texture analysis, including contrast, entropy, and homogeneity, as a tool for distinguishing ameloblastoma from odontogenic keratocyst, achieving a diagnostic efficiency of 83.3%[34]. In another study, researchers utilized the VGG-16 model for jaw tumor classification based on panoramic images, but the diagnostic efficiency and time of the AI model did not exhibit significant advantages compared to manual diagnosis[15]. Subsequently, Liu et al. improved the experiment by comparing four deep learning models and found that a convolutional neural network structure based on transfer learning algorithm could accurately differentiate ameloblastoma from odontogenic keratocyst with an accuracy of 90.4%[16]. Similar studies utilizing CT images have also demonstrated relatively high accuracy levels[18, 19]. However, accurate segmentation is a crucial prerequisite for AI models to effectively classify medical images and produce reliable results. In other fields, the segmentation and classification of tumors by artificial intelligence on medical images has been intensively studied[35, 36]. Therefore, the significance of our study lies in addressing this research gap by developing a segmentation approach for ameloblastoma. By accurately separating ameloblastoma from medical images, our model can assist doctors in precisely determining the tumor's location, size, and shape. This information holds great importance for accurate diagnosis and differential diagnosis. Furthermore, it can provide valuable insights for radiotherapy and surgery, enabling better treatment planning. Additionally, tumor segmentation can contribute to patient prognosis assessment and aid in estimating the probability of recurrence. Manual interpretation in radiology often involves perceptual errors, which account for a substantial portion of misdiagnoses[37]. AI has the potential to effectively address this issue[38]. Our model demonstrates remarkable efficiency in the segmentation process, requiring only 0.1 seconds per image. This significant reduction in operational time compared to manual segmentation methods makes our approach highly time-effective.
However, this study does have certain limitations that should be acknowledged. The samples used in this experiment were obtained from a single center, which may introduce a bias and limit the extrapolation ability of the model to diverse populations or different imaging protocols. Further research involving multiple centers and a more diverse patient population would be beneficial to explore the model's generalizability and assess its performance in different clinical settings.