Prediction of Osteoporosis Through Deep Learning Algorithms on Panoramic Radiographs

Objective: The aim of the present study was to predict osteoporosis on panoramic radiographs of women over 50 years of age through deep learning algorithms. Method: Panoramic radiographs of 744 female patients over 50 years of age were labeled as C1, C2, and C3 depending on mandibular cortical index (MCI). According to this index; C1: presence of a smooth and sharp mandibular cortex (normal); C2: resorption cavities at endosteal margin and 1 to 3-layer stratication (osteopenia); C3: completely porotic cortex (osteoporosis). The data of the present study were reviewed in different categories including C1-C2-C3, C1-C2, C1-C3 and C1-(C2+C3) as two-class and three-class prediction. The data were separated as 20% random test data; and the remaining data were used for training and validation with 5-fold cross-validation. AlexNET, GoogleNET, ResNET-50, SqueezeNET, and ShueNET deep learning models are trained through the transfer learning method. The results were evaluated by performance criteria including accuracy, sensitivity, specicity, F1-score, AUC and training duration. Findings: The dataset C1-C2-C3 has an accuracy rate of 81.14% with AlexNET; the dataset C1-C2 has an accuracy rate of 88.94% with GoogleNET; the dataset C1-C3 has an accuracy rate of 98.56% with AlexNET; and the dataset C1-(C2+C3) has an accuracy rate of 92.79% with GoogleNET. Conclusion: The highest accuracy was obtained in differentiation of C3 and C1 where osseous structure characteristics change signicantly. Since the C2 score represent the intermediate stage (osteopenia), structural characteristics of the bone present behaviors closer to C1 and C3 scores. Therefore, the data set including the C2 score provided relatively lower accuracy results. radiologist and regions where the deep learning models extract features are consistent. The deep learning-based CAD system implemented in this study accelerate the processes help the for the radiologist as detection of osteoporosis through MCI and the evaluation of the textural changes between the Early intervention and diagnostic losses that to the and the of oral radiologists be by the AI-based systems developed.


Introduction
Osteoporosis is a bone disease that develops when bone mineral density (BMD) and bone mass decrease or the quality or structure of the bone changes. Osteoporosis results in an increased risk of fractures in the vertebrae, hip and forearm bones along with decreased strength and increased fragility of the bone. Since hip and spine fractures can develop without symptoms, patients do not seek treatment and their mortality is higher. 1 Therefore, it is vital to diagnose this group of patients who may be asymptomatic. Early diagnosis of this disease which causes pathological fractures 2 especially in post-menopausal women over 50 years of age is very important.
Osteoporosis may be diagnosed through imaging techniques, biochemical markers and bone biopsy. The most practical diagnosis is established by imaging methods. The dual x-ray absorptiometry (DXA) is the gold standard for BMD measurement, and single photon absorptiometry, dual photon absorptiometry (DPA), quantitative computed tomography (QCT), high resolution magnetic resonance imaging (MRI) may also be used; however, the World Health Organization bases on BMD values measured by DXA for the diagnosis of osteoporosis. Accordingly, the T score above -1 is de ned as normal whereas a T score between -1 and -2.5 is de ned as osteopenia, and a T score below -2.5 is de ned as osteoporosis. 3 Although DXA is a widely accepted diagnostic method, it is not a triage method in the diagnosis of osteoporosis. The frequent use of panoramic and intraoral radiographs directed researchers to focus on detecting lower BMD with these methods due to their cost-e ciency and usefulness. 4 The mandible cortex has been particularly studied by researchers in order to detect lower BMD. A number of mandibular cortical indexes including the mandibular cortical index (MCI), mandibular cortical width (MCW), and panoramic mandibular index (PMI), have been developed to assess and measure the quality of mandibular bone mass and to observe signs of resorption. 5 The MCI is one of the most commonly used osteoporosis risk indexes in the diagnosis of osteoporosis with a sensitivity ranging from 48.7-100% and speci city up to 88.89% in detecting BMD, and with a sensitivity between 35.9% and 90.9% and speci city up to 93.9% for diagnosis of osteoporosis. 6 The Computer-Assisted Diagnosis (CAD) systems that may help differential diagnosis by using medical images take place in the literature day by day along with the recent development of arti cial intelligence (AI) methods and computer hardware. 7,8 The image classi cation, segmentation, detection and recognition processes appear with various applications in every clinic of dentistry through development of deep learning algorithms which is a sub-branch of AI since 2012. Some of these include detection of the tooth number, 9 detection of caries, 10 detection of apical pathologies, 11 detection of periodontal bone loss. 12 There are studies on prediction and classi cation of osteoporosis with machine learning and fuzzy systems by obtaining features from panoramic images in the literature. [13][14][15][16][17][18] However, deep learning methods reveal superior implementations when compared to traditional machine learning methods. Machine learning methods take the features obtained by gray-level matrix occurred concomitantly, wavelet local binary pattern, and other texture analyzes as input for the classi cation of an image and classify them with feature selection methods. Since there are numerous feature extraction and selection methods, and the search for suitable methods for the machine learning method to be used is a time-consuming and costly process. 19 In deep learning methods, the image is taken as a direct input and features are automatically obtained and selected along the created layers. 20 The Gradient-weighted Class Activation Mapping (Grad-CAM) method visually enables to locate the weights obtained as a result of deep learning gather the feature from which image regions. 21 The aim of the present study was to predict the radiological changes compatible with osteoporosis by using different deep learning algorithms on panoramic radiographs in women over 50 years of age. Panoramic radiographs of female patients over 50 years of age without history of systemic disease affecting the bone metabolism and medical treatment and maxillofacial trauma were reviewed. Panoramic radiographs with artifacts which are not su cient in terms of diagnosis, especially in which the hyoid bone is superimposed to the mandibular cortical region were excluded from the study. Seven hundred and forty-four panoramic radiographs were evaluated in accordance with the speci ed criteria.

Image Acquisition and Radiographic Evaluation
All panoramic radiographs reviewed were taken by a single device and acquired with a 2D Veraviewpocs (J MORITA MFG corp, Kyoto, Japan) digital panoramic x-ray device with 70 kVp, 5 mA and 15 sec irradiation time in accordance with the exposure protocols determined by the manufacturer. Studies were conducted by the same single observer through i-Dixel (J Morita MFG Corp., Kyoto, Japan) software. Intraobserver correlation coe cient Kappa value = 1.

Dataset and Image Pre-Processing
Each panoramic image cut as right-left was labeled as C1,C2,C3 by an oral radiologist. The data were exposed to the three phased image preprocessing in order to predict by deep learning algorithms. In the rst stage, the right and left ROI areas were cut and recorded in *.tiff image format by considering the existence of the region extending from the distal to the antegonial region of the mental foramen. 23 ImageJ v1.52 for Windows which is a version of the National Institutes of Health (NIH) Image software was used for determination of the ROIs. A dataset (744 individuals) was thereby created with a total of 1488 images including 597 images tagged with C1, 581 images tagged with C2, and 310 images tagged with C3.
Since anatomical structures and image sizes of the individuals are different, the cropped images have different width and height pixel sizes. Since standard size data will be input to the input of deep learning algorithms, standardization is required in image sizes. The second stage included resizing of images to 224x224 by using the bilinear interpolation method for this procedure. The deep learning algorithms used process with three-channel image input. Therefore, the images were reconverted into three channels in the third step. This procedure does not convert the image to color format, it only adds the same image one after the other, allowing the algorithms to process in three channels. The ow chart of right and left cutting of the images, the standardization process, and conversion into three channels are presented in Figure 2.

Deep Learning
In the present study, GoogleNET 24 , AlexNET 25 , ResNET-50 26 , Shu eNET 27 , SqueezeNET 28 models were used for osteoporosis classi cation through transfer learning method. Algorithms were run on a computer with Intel Core i7-7700HQ 2.80 Ghz processor, 16 GB RAM, NVIDIA GTX 1050 graphics card. Deep learning hyper-parameters are adjusted equally in order to operate the algorithms under equal conditions. The data increase and frozen process in deep learning layers were not implemented. Deep learning algorithms were developed in MATLAB 2021a program and operated on GPU graphics card. Hyper-parameters used in deep learning models were provided in Table 1.

Model Training
Four different datasets were designed in order to measure the classi cation performance of deep learning algorithms in the present study.
The original data set C1-C2-C3 was used to measure the three-class output performance of the systems. The original data set is divided into C1-C2, C1-C3, C1-(C2+C3) for measurement of the two-class output performances. The aim of this procedure was to compare the two-class output performances of the systems against the three-class output performances. The estimates of C1-labeled normal images according to C2 and C3 were analyzed in the C1-C2, C1-C3 datasets, and presence of osteoporosis-osteopenic ndings was analyzed in the C1-(C2+C3) dataset.
For the test set, 20% of the data in different labels were randomly selected. The remaining data are split at each iteration through 5-fold cross validation as training by 80% and validation by 20%. The test data undetected by the system were given to the trained network and the classi cation performance of the system was measured. The data set used in the study was presented in Table 2.  Figure 3 shows the ow chart of the training, validation and testing model used in the study with 5-fold cross validation. Statistical values were obtained after each fold and the average of the performance criteria was taken after the cross-validation was completed. Four different datasets given in Table 2 were tested in 20 different scenarios through 5 different deep learning algorithms and the methodology given in

Performance Criteria
The performance evaluation of the models was conducted with the test dataset obtained by taking 20% of the entire dataset randomly. The models were thereby evaluated by previously undetected data. The complexity matrix of the system was obtained and TP,  Figure 4a, the ROC curves and AUC values of each class are given in Figure 4b. This procedure was performed at each iteration.

Estimation Performances of Models
The average performance criteria obtained with four different datasets and the total training times of the models are given in Table 3. The AUC values of the classes of each data set and the average AUC values of the models are given in Table 4. The highest accuracy rate was 81.14% in the C1-C2-C3 dataset with AlexNET, 88.94% in the C1-C2 dataset with GoogleNET, 98.56% in the C1-C3 dataset with AlexNET in the shortest duration; and the highest accuracy rate was 92.79% in the C1-(C2+C3) dataset with GoogleNET. Since the number of labels in the datasets is not evenly distributed, the data is de ned as an unbalanced dataset. Therefore, although the accuracy rate shows the classi cation performance, another parameter to be considered in unbalanced data sets is the AUC values.  Table 4). The highest estimation performances of the models trained in this study are given in Table 5.

Visualization of Model Estimations
Feature maps were determined by Grad-CAM method on the images labeled with C1, C2, C3 given in Figure 1 through the model weights with the highest accuracy rates. Heat maps were obtained according to the weights obtained after the dataset and transfer learning by overlaying the original image with the Grad-CAM map. The performance of tissue characteristics determined by the oral radiologist may be thereby compared with the maps acquired by deep learning. GoogleNET Grad-CAM results for dataset C1C2 were provided in Figure 5; AlexNET Grad-CAM results for dataset C1C3 were provided in Figure 6; AlexNET Grad-CAM results in dataset C1C2C3 were provided in Figure 7; the GoogleNET GradCAM results for C1-(C2+C3) are provided in Figure 8.

Discussion
The deep learning procedures which has started with suggestion of the LeNET in 1988 29 has been started to be used with inclusion of popular algorithms such as ImageNET, AlexNET, 25 ResNET, 26 VGG, 30 GoogleNET, 24 Inception 24 into the literature. Researchers continue to propose new models in order to improve their accuracy. Hardware with high processing power is required in order to train deep learning models. However, since each researcher has limited access to the hardware, the weights of the trained networks with standard data sets are shared as open source. The values of the previous weights may be used practically by applying these weights to new data sets with transfer learning methods. 31 Osteoporosis is a silent disease and individuals are not aware of this condition until they experience bone fractures. Therefore, dentists will be able to detect patients at an early stage and direct them for early treatment with an effective CAD system for detection of osteoporosis on panoramic radiographs. 32 Accordingly, deep learning models of AlexNET, GoogleNET, ResNET-50, SqueezeNET and Shu eNET were trained through the transfer learning method in this study in order to predict osteoporosis by using MCI over panoramic radiographs of female patients over 50 years old. The results were evaluated by performance criteria including accuracy, sensitivity, speci city, F1-score, AUC, and training duration.
There are several studies in which the presence of osteoporosis was investigated on panoramic radiographs with some classical image processing methods, 16 Different from the study stating that healthy and osteoporotic individuals were differentiated only, 32 estimation of osteopenic ndings scored as C2 was performed in this research. The most striking result was in the C1-C3 dataset where the radiological change related to osteoporosis is most evident among the model performances given in Table 5 where the highest prediction performances are shown. Classi cation of this dataset was performed through the AlexNET model with a total training duration of 18 minutes and 38 seconds with an accuracy rate of 98.56%, and an AUC value of 0.9987. The feature extraction map of the model is seen when the Grad-CAM images given in Figure 6 are reviewed for differentiation of C1-C3 where the cortical bone change was most evident. As the score progresses from C1 to C3, the cortical change differentiates at length to include the entire area of interest, the areas shown in red are the regions on which the model bases on for distinction between C1 and C3. The total training duration of 43 minutes and 32 seconds with GoogleNET in differentiation of C1-C2 scores has realized with an accuracy rate of 88.94% and a value of 0.9560 AUC. The C2 score shows radiological features close to the C1 score; this may be explained by the lower AUC values in differentiation of C1 and C2. Since the data is unevenly distributed (imbalanced because the number of labels in the datasets is not evenly distributed), the classi cation estimation performance of the model may be evaluated over the AUC. The three-class C1-C2-C3 dataset has realized with AlexNET with 35 minutes and 43 seconds of total training duration with an accuracy rate of 81.14% and AUC value of 0.9363. The review of GradCAM images given in Figure 7 revealed that the strong features in C1 and C3 scores were acquired from the mandibular cortex at length, and C2 score was acquired from certain regions where the mandibular cortex show porosity in a patchy pattern. In the C1-(C2+C3) dataset where the C2-C3 dataset was labeled as osteoporoticosteopenic changes, the C1 dataset was labeled as no osteoporosis, presence and absence of osteoporosis and osteopenia were estimated only; an accuracy rate of 92.79% and an AUC value of 0.9787 were acquired with a total training duration of 58 minutes and 35 seconds. Values above 0.9 were obtained in the review of AUC results of four datasets. The heat maps from which the model weights are obtained in the Grad-CAM images and the regions where the oral radiologist interprets the textural differences between the disease scores are consistent ( Figures 5-8). These results show that the models may be used as a CAD system in order to assist the oral radiologist to estimate osteoporosis and mapping the localization sites of tissue change through MCI.
Ki-Sun Lee et al. 36 classi ed panoramic images of 680 patients as osteoporosis and no-osteoporosis according to BMD values; 20% of the data was used for random test set, the remaining data was used for training and validation, data were given to deep learning inputs with 5fold cross validation. The system performances of deep learning models of CNN, VGG-16, VGG-16_TF, VGG-16_TF_FT trained with simple three convolution layers are discussed. The highest values were obtained by VGG-16_TF_FT model with an accuracy rate of 84.00% and AUC of 0.858. Grad-CAM images were created through the model weights acquired with VGG-16_TF_FT. According to these images, the weak lower border of the mandibular cortical bone and the region presenting with less intense, spongy bone tissue in the periphery were evaluated as the regions where strong features were obtained in accurate estimation of osteoporosis. In the normal labeled image, the region presenting with the strong lower border of the mandibular cortical bone and the surrounding dense tissue were determined as the region where strong features were obtained for estimation.
ROI cutting of panoramic radiographs was performed with ImageJ program in this study. The subsampling was then performed according to the row and column sizes consistent for the input of deep learning models. A homogeneous dataset was thereby created in which osteoporosis may be examined in the most adequate anatomical region in each panoramic radiograph. Subsampling panoramic radiographs rst followed by cutting through a designated region would create a heterogeneous dataset for the area of interest. Therefore, deep learning models will result in biased training by searching for features in different mandible points and image spaces. Retrieving the entire mandible instead of just taking the ROI would cause search for more features in unnecessary regions by the model and increase the training duration. Although BMD indexing is de ned as the gold standard by the World Health Organization, the MCI index suggested by Klemetti 22 has been used in many studies. A random test dataset of 20% was determined in the study; the remaining data were given to the model inputs with 5-fold cross validation as training and validation. Undifferentiation of the random test data from the data and not providing to system inputs by 5-folc cross validation both causes failure to interpret the reaction of models against data which were not seen before, and average output of the model obtained through cross validation. Models would produce different results in each run without cross validation.
The limitation of the present study was a lack of a gold standard for diagnosis of osteoporosis in each of the cases. However, it is known that the mandibular cortical shape is signi cantly compatible with the skeletal BMD obtained with DXA. 32 The images were acquired manually in order to include the mandibular inferior margin in the center of the mandibular body as a preoperative preparation for classi cation of osteoporosis. The construction of a network for scanning of osteoporosis from dental panoramic radiographs through automated detection of the ROI from untrimmed dental panoramic radiographs is required.

Conclusion
The regions where the textural features change in the radiological examination of the oral radiologist and regions where the deep learning models extract features are consistent. The deep learning-based CAD system implemented in this study may accelerate the processes that could help the diagnosis for the radiologist such as presence detection of osteoporosis through MCI and the evaluation of the textural changes between the scores. Early intervention and diagnostic losses that may occur due to the fact that dentists deal with general dental problems and the number of oral radiologists may be resolved quickly by the AI-based systems developed.   Cutting of the images as right and left, the standardization process, and conversion into three channels The ow chart of data splitting, 5-fold cross validation and performance evaluation approaches used in the study    GoogleNET Grad-CAM results for dataset C1-(C2+C3) (a) C1 (b) C2 (c) C3