Automated Segmentation of Articular Disc of the Temporomandibular Joint in Magnetic Resonance Images Using Deep Learning: A Proof-of-Concept Study

Shota Ito Hiroshima University Yuichi Mine (  mine@hiroshima-u.ac.jp ) Hiroshima University Yuki Yoshimi Hiroshima University Saori Takeda Hiroshima University Akari Tanaka Hiroshima University Azusa Onishi Hiroshima University Tzu-Yu Peng China Medical University Takashi Nakamoto Hiroshima University Toshikazu Nagasaki Hiroshima University Naoya Kakimoto Hiroshima University Takeshi Murayama Hiroshima University Kotaro Tanimoto Hiroshima University


Introduction
Temporomandibular disorder (TMD) is a collective term covering a number of clinical manifestations that involve pain and dysfunction of the masticatory muscles and temporomandibular joint (TMJ) 1 . The most common signs and symptoms of TMD are regional pain in the face and preauricular area, malocclusion, limited range of jaw movement, and TMJ noises and locking 2 .
According to a prospective cohort study of US adults, the estimated annual incidence rate of rst-onset TMD is 3.9%, and it is typically accompanied with mild to moderate levels of pain and disability 3 . In developed countries, it is considered a widespread disorder affecting 5-12% of the population 4 .
Magnetic resonance (MR) imaging is recognized as the best imaging modality for assessment of TMJ because it allows visualization of the anatomical and pathological features of all joint components 5 .
Notably, MR imaging permits evaluation of the morphology and position of the articular disc, the presence or absence of reduction during mouth or jaw opening, the morphology and surface characteristics of the mandibular condyle, abnormal bone marrow signal in mandible and temporal bone, and the presence or absence of joint effusion. The most important subgroup of articular abnormalities in patients with TMD includes those with displacement and deformation of the articular disc 6 ; this is an intracapsular disorder involving the disc-condylar complex, with a prevalence of 30-60% in patients with TMD 7 . Importantly, an MR imaging examination is expected for con rmation of the displacement and deformation of the articular disc, to ensure accurate diagnosis and prediction of treatment response.
Arti cial intelligence (AI) is gaining attention in various clinical disciplines and the dental eld is no exception, with AI-based applications having been studied to streamline dental and oral care and improve the health of more people at a low cost 8-12 . Such AI-based applications should free dental professionals from time-consuming routine tasks and ultimately promote personalized, predictive, preventive, and participatory dental care 13 . Among the variety of AI algorithms available, the convolutional neural network (CNN)-based deep learning approach has become popular because of its excellent ability for object recognition when applied to medical images. Moreover, the increase in computational power and the pervasiveness of open-source frameworks have dramatically facilitated the development of CNNs 14 . In these circumstances, deep learning has been widely implemented for detection and segmentation purposes, and has showed encouraging performance. The fully convolutional network-a derivative form of the CNN-is the most widely used deep neural network in medical image segmentation, and several variants have been reported, including U-Net 15 and SegNet 16 architectures.
Our ultimate goal is to devise a robust algorithm to achieve a comprehensive diagnostic system for the oromaxillofacial region. In this study, as a rst proof-of-concept, we investigated and validated deep learning-based semantic segmentation algorithms for automatic detection and segmentation of the articular disc of the TMJ on MR images. Our results show good matches between manually segmented and algorithm-segmented discs. As the position of the articular discs of the TMJ could be partially estimated from each test MR image, we expect that the algorithm could form a diagnostic assistance tool for clinicians.

Dataset
This nonrandomized retrospective study was approved by the Ethical Committee for Epidemiology of Hiroshima University (Approval Number: E-2119). All methods in this study were performed in accordance with the Ethical Guidelines for Medical and Human Research Involving Human Subjects, Japan. Because of the retrospective design of this study, the requirement for informed consent was waived by the Ethical Committee for Epidemiology of Hiroshima University by gaining consents using opt-out method. The study included MR images of 10 patients with anterior disc displacement aged between 19 and 39 years (mean age of 26.4; 8 women, 2 men), and 10 healthy control subjects aged between 18 and 41 years (mean age of 27; 8 women, 2 men), all with available medical records. Each subject underwent MR imaging on an Ingenia 3.0-T CX Quasar Dual scanner (Philips Healthcare, Best, the Netherlands). Only proton density-weighted sagittal images were used in this study. In total, 217 proton density-weighted sagittal images were used in this study, with these including the left and right TMJ regions with closedand open-mouth positions; 106 images from the 10 patients and 111 images from the 10 control subjects.
Two expert orthodontists (12 and 6 years of experience) and one expert oral and maxillofacial radiologist (25 years of experience) independently identi ed and manually segmented all articular discs of the TMJ on the MR images using ImageJ software (version 1.53, National Institutes of Health, Bethesda, MD; Fig. use in each of the following experiments. To derive a dataset showing the normal position of articular discs, the 111 images from the 10 control subjects were randomly split into 88 training images and 23 test images. For a dataset showing displaced articular disc positions, the 106 images from 10 patients were randomly split into 84 training images and 22 test images. For a dataset showing a mix of normal position and displaced articular discs, the 217 images were randomly split into 173 training images and 44 test images.

Deep learning algorithms
All procedures were performed using an Intel Core i7-9750H 2.60-GHz CPU (Intel, Santa Clara, CA), 16.0 GB RAM, and an NVIDIA GeForce RTX 2070 MAX-Q 8.0-GB graphics processing unit (NVIDIA, Santa Clara, CA). Deep learning algorithms were constructed using Python 17 and were implemented using the Keras framework for deep learning with TensorFlow as the backend.
We adapted three convolutional semantic segmentation approaches: an encoder-decoder CNN, U-Net 15 , and SegNet 16 , which are all well suited to segmentation tasks. The overall architectures are shown in Fig.  2. In this study, we propose an encoder-decoder CNN model that we named 3DiscNet (Detection for Displaced articular DISC using convolutional neural NETwork), which has an asymmetric encoderdecoder architecture for the extraction of features at different spatial elds of view ( Fig. 2A). To reduce the over tting of the network, the dropout layer is placed behind the convolutional layers and max-pooling layers 18 . All the dropouts were given rates of 0.3 for the work described in this study. The nal layer consists of a Sigmoid activation function that classi es each pixel as articular disc or background. The U-Net was a fully connected convolutional network that consists of convolution and max-pooling layers in the encoder part, and convolution and transpose layers in the decoder part. Encoder outputs were concatenated to the decoding layers to share spatial cues and to propagate the loss e ciently. The SegNet used a classical architecture for semantic pixel-wise segmentation, with encoder layers using max-pooling indices to upsample the feature maps and convolve them with a trainable decoder network. The original architectures of U-Net and SegNet are illustrated in Fig. 2B and C. The type of SegNet architecture used is currently termed SegNet-Basic 19 . The nal layers are similar to the 3DiscNet, employing a sigmoid classi er instead of the original soft-max classi er in U-Net and SegNet-Basic. The U-Net and SegNet have shown promise for MR images semantic segmentation of organs and pathology [20][21][22] .
First, regions of interest (ROIs) around the articular disc were extracted from the datasets. The original image resolution was 512 × 512 pixels, and the ROIs, which were de ned using a 161 × 184 pixel bounding box, were automatically cropped from the images using Python algorithms. The ROI images were then resized to 224 × 256 pixels for input into the three types of convolutional encoder-decoder network. (Fig. 3). The 3DiscNet was trained using the Adam optimizer with a learning rate of 1.0 × 10 -3 , and the three algorithms were trained for a total of 2000 epochs.

Performance metrics
Page 6/18 The test data were used to validate the accuracy and computational e cacy of the models. The convolutional encoder-decoder network performance was assessed using the Dice similarity coe cient, sensitivity, and positive predictive value (PPV) of the test dataset. The Dice similarity coe cient, which is a popular similarity metric, was calculated using the following formula: where P is the pixel area of the articular disc segmented with the convolutional encoder-decoder network, and T is the pixel area of the manually segmented ground truth ROI. The sensitivity is the percentage of the actual articular disc area correctly predicted as the articular disc area, de ned as: The PPV is a measure of the percentage of the correctly predicted articular disc area over the actual articular disc area as follows:

Results
As shown in Fig. 4, the training loss of each of the models decreased and converged, which indicates that these models did not show over tting. The training loss dropped faster with the 3DiscNet model than with the U-Net and SegNet-Basic models, indicating faster convergence. Figure 5 shows representative examples of visual segmentation. The rst column shows test data for validating algorithm performance, the second column shows ground truth segmentations manually performed by the experts, and the third and fourth columns show the articular discs segmented by each algorithm. Red represents correctly segmented articular disc areas, green misdetected areas, and blue undetected areas. Each row denotes a particular algorithm: 3DiscNet, U-Net, and SegNet-Basic, from the top downwards. Results obtained on the dataset containing only normal articular disc placement are shown in Fig. 5A; 3DiscNet and SegNet-Basic made predictions that were in good agreement with the ground truth data. Figure 5B shows the results for the dataset containing only patients with articular disc displacement. The results are similar to those shown in Fig. 5A, with 3DiscNet and SegNet-Basic making predictions that were in good agreement with the ground truth data. Figure 5C shows results for both normal and displaced articular discs; 3DiscNet and SegNet-Basic again made segmentations that were in good agreement with the ground truth data. However, the U-Net results showed a large number of false positives and false negatives with all of the datasets (Fig. 5).
We used the test data to evaluate the performance of the algorithms according to the three quantitative metrics of Dice coe cient, sensitivity, and PPV, computing the values for both normal and displaced articular disc segmentations (Table. 1). To reveal their distribution, these metrics are shown as box plots in Fig. 6 for 3DiscNet, U-Net, and SegNet-Basic, and for both normal and displacement test images. The Dice coe cient for segmentation performance was highest for SegNet-Basic, with the highest median accuracy and a small standard deviation in the dataset including both normal position and displaced articular discs. U-Net showed low values and large standard deviations for all metrics, and the results were unstable in both of the conditions. Discussion TMJ disc disorders caused by articular disc displacement, deformation, perforation, and brosis are the most common pathological conditions in TMD. Although MR imaging can provide a de nitive diagnosis of TMJ disc disorder, concerns have been raised about the reliability of MRI interpretations 23 . Segmentation of the articular disc of the TMJ sounds simple but is actually a very challenging task for most clinicians, including dentists. Previous studies have revealed that uncalibrated observers, even experienced experts, are unable to make accurate MR imaging assessments of disc disorders; interpretation of MR imaging of the TMJ typically shows poor reproducibility [24][25][26] . These studies concluded that more effort is needed to understand the changes detectable on MR imaging. In this study using MR images, we demonstrated that a deep learning-based semantic segmentation approach can be applied to the detection and segmentation of the TMJ articular disc. Our overall results showed that two deep learning algorithms-3DiscNet and SegNet-Basic-performed good detection and segmentation, whereas the U-Net algorithm did not obtain satisfactory results. The mean Dice coe cients for the dataset with both normal placement and displaced articular discs were 0.70 and 0.74 for 3DiscNet and SegNet-Basic, respectively. These are important results in that they show that the models can not only detect the existence of articular discs, but can also successfully nd the locations of articular discs, regardless of normal positioning or displacement. However, the performance of U-Net was relatively poor, and its mean Dice coe cient was only 0.46. Indeed, the segmentation by U-Net revealed oversegmentation with the inclusion of irrelevant regions in addition to the articular disc.
The articular disc lacks a clear border on MR images and its position is often displaced in patients with TMD, and therefore a great deal of variation in the shape and position of the disk is found among patients. Similarly, the prostate has also been reported as an organ with fuzzy boundaries on MR images 27 . These conditions make it di cult to detect and segment the articular disc accurately. Although U-Net was originally proposed for the segmentation of biological images with a limited quantity of training data 15 , studies have reported that it has a tendency for less accurate segmentation of objects with fuzzy boundaries [27][28][29] . Speci cally, on very challenging images, U-Net tends to over-segment, undersegment, make false predictions, and even completely miss the target objects 29 . A previous study aiming to achieve mandibular canal segmentation on cone beam computed tomography (CBCT) images reported that U-Net mis-and over-segmented the mandibular canal region 30 . These reports are in accord with our results showing that the performance of U-Net on articular disc detection and segmentation was poor, even though 3DiscNet and SegNet-Basic showed comparably good metrics (i.e., Dice coe cient, sensitivity, and PPV) for all datasets.
SegNet was originally proposed for outdoor and indoor scene segmentation at a pixel level 16 . Some studies compared SegNet and U-Net for tissue segmentation, including Liu et al., who reported that SegNet showed more favorable performance than U-Net for cartilage and bone segmentation on musculoskeletal MR images 31 . Kwak et al. used SegNet and U-Net to segment the mandibular canal on the CBCT images of 102 patients, and also obtained good performance with SegNet 30 . However, to the contrary, Zhang et al. found that U-Net performed more favorable segmentation than SegNet when applied to breast MR images, which play a crucial role in diagnosis and the screening of those at high-risk of breast cancer 20 . Therefore, the suitability of these two models depends on the speci c segmentation task and dataset, and appropriate comparisons will continue to be required.
Research on the application of AI to TMD has recently been reported, although the studies are limited to the diagnosis of TMJ osteoarthritis (OA) using CBCT images. A group from Brazil developed a system using deep learning that allows the staging of bony changes in TMJ OA 32,33 . Lee et al. tried to develop a system to detect TMJ OA on sagittal CBCT images using a deep learning method for object detection 34 . Two studies reported by US groups successfully integrated high-resolution CBCT and biological markers from patients with TMJ OA, with one study performing staging of condylar morphology 35 and the other diagnosing the status of the disease 36 . While all these studies used CBCT, we have shown that AI can also be applied to MR imaging for TMD diagnosis. Given the results to date, including those from our proposed algorithms, it can be expected that AI systems for the diagnostic imaging of TMJ will be further developed, and will contribute to establishing a comprehensive diagnostic system for the maxillofacial region.
Our study had several limitations. All MR imaging scans were acquired at a single institution, and our models do not account for variations in hardware implementation and scanning techniques across institutions, which may bias the results. To increase the model robustness, evaluation of our concepts with a multicenter dataset is desirable. Another limitation is that our study only made comparisons between the three models. Although AI has much potential, no algorithm can perform well for all possible problems. Therefore, the successful use of AI requires a great deal of effort by human experts 37 . Further studies are needed to optimize the structure of the CNNs, including comparisons with other models. For future work, we will modify the SegNet, and 3DiscNet that will include segmentation of other TMJ components (e.g., effusion, osteophytes) within the framework, and that will be trained and tested using a multicenter study.

Conclusion
We are the rst to propose algorithms using deep learning-based semantic segmentation approaches for detecting and segmenting articular discs on MR images: we have performed a proof of concept of this methodology and obtained promising initial results. Figure 1 Representative images of manually segmented anterior discs. The rst row shows raw MR images. The second row shows images with segmentations manually drawn by experts (white regions).