A Two-Stage Deep Learning Architecture for Radiographic Assessment of Periodontal Bone Loss

Objective: To establish a comprehensive and accurate assessment model of periodontal alveolar bone loss based on panoramic images. Methods:A total of 640 panoramic images were included, and 3 experienced periodontal physicians marked the key points needed to calculate the degree of periodontal alveolar bone loss and the specic location and shape of the alveolar bone loss. A deep learning architecture based on UNet and YOLO-v4 was proposed to localize the tooth and key points, and the percentage and stageof periodontal alveolar bone loss were accurately calculated. The ability of the model to recognize these features was evaluated and compared with that of general dental practitioners. Results: The overall classication accuracy of the model was 0.77, and the performance of the model varied for different tooth positions and categories;model classication was generally more accurate than that of general practitioners. Conclusion: It is feasible to establish deep learning model forassessmentand staging radiographicperiodontal alveolar bone loss using two-stage architecture based on UNet and YOLO-v4.


Introduction
Periodontal disease is among the most prevalent diseases of humankind globally; it affects billions of individuals and has heavy health and economic burdens. Periodontitis is the main cause of missing teeth in adults [1,2] , and most intraoral teeth may be affected with disease progression. Furthermore, as a chronic infectious disease, periodontitis is the sixth most common type of in ammatory disease [3] and is a risk factor or indicator for various systemic diseases, such as cardiovascular disease [4] , diabetes mellitus [5] , respiratory system infection [6] , and digestive disease [7] .
In the early stage, the symptoms of periodontal disease are not obvious and are sometimes ignored or missed, leading to the continued and irreversible development of the disease as it remains untreated, resulting in tooth mobility, loss, or even systemic disease. Timely and appropriate treatment based on early diagnosis and correct staging is critical for the control of periodontal disease.
The diagnosis and staging classi cation of periodontitis are mainly based on the state of periodontal alveolar bone resorption, including the level, shape, and location [8] , which can be performed clinically with a periodontal probe. Since alveolar bone loss is often hidden behind the periodontal tissue and inaccessible, X-ray radiography, as a common aid applied to detect and assess the bone loss that is irreplaceable [9] .
The bitewings and periapical X-rays focus on the details of the mouth area, such as one or several teeth, while the panoramic X-rays screen the whole dentition, jaws and bone structure. Moreover, panoramic Xrays are lmed outside the mouth, and they are better accepted by patients as they allow faster shooting, require less radiation exposure, and have a lower infection rate. Due to its relative cost-effectiveness and diagnostic evidence, panoramic imaging is considered the most common and important radiological tool for clinical dental diagnosis and treatment evaluation and has great potential advantages in whole oral dental disease screening. It has been demonstrated that the intraoral and panoramic radiographic periodontal bone loss (PBL) results are largely in agreement with each other [10] .
However, for various reasons ( lming angle, structural overlap, physician ability, personal subjectivity, etc.), PBL detection on radiographs is marred by the limited accuracy of individual examiners and the low reliability between different examiners [11] , especially general dentists, as demonstrated by a large range of studies and by various reference tests [12] . Therefore, an automatic diagnosis system is needed to evaluate dental image data. This allows a more reliable and accurate assessment of PBL on dental Xrays. Considering the large amount of human and economic resources required for a systematic, comprehensive, consistent and reliable assessment, the automatic assisted diagnosis system seems to play an important role.
In the past decade, with the advancement of arti cial intelligence (AI) information technology and its integration with medicine, research on AI-assisted medical diagnosis models based on deep learning networks has shown potential for widespread applications.
Recent advances in deep learning models based on convolutional neural networks (CNNs) have shown potential for use in the automated identi cation and quanti cation of radiologic and pathologic features to improve diagnostic consistency and standardization of care. CNNs also have the potential to provide quanti able outcomes, for example, to detect pulmonary nodules on CT imaging [13] , hepatocellular carcinoma on multiphasic contrast-enhanced MRI [14] , skin lesions in clinical skin screenings [15] , or coronavirus disease 2019 (COVID-19) indications in computed tomography images [16] .
In dentistry, CNNs have been employed in the detection of caries in periapical X-rays and panoramic Xrays, as well as apical lesions and PBL on periapical X-rays, all with acceptable to high accuracy [17,18] . To date, there have been limited attempts at automated assessments of PBL in dental radiographs by using deep learning; also, previous studies were sparsely committed to detection or trisection classi cation of alveolar bone height loss [19][20][21][22][23] . Due to the inconsistency with the new staging framework widely accepted and used in clinical practice, the signi cance of these models for clinical diagnosis and decision-making is limited.
On the other hand, the shape (vertical type) and position (furcation lesions) of alveolar bone resorption have not been taken into consideration in previous studies. Both the shape and position are essential for the correct staging of periodontitis and appropriate clinical treatment. Vertical absorption and furcation lesions indicate possible local promoting factors, such as abnormal anatomy or occlusal interference, which require careful examination and corresponding interventions to address the risk factors [24,25] . Therefore, we conducted this research to explore an automatic, comprehensive and correct diagnosis system. In summary, the current study applied UNet to automatically identify and segment the tooth position on the panoramic lm to reduce the interference of overlapping adjacent structures in the recognition process; used YOLO-v4 to automatically identify key points of the bone level (the cementoenamel junction (CEJ), apical point, and alveolar crest) to accurately calculate the degree of alveolar bone height reduction; used YOLO-v4 to automatically detect the shape of alveolar bone resorption (vertical type) and bone resorption at the furcation (furcation lesions); and nally aimed to comprehensively and accurately assess PBL. The main contributions of this work are threefold. (1) We were the rst to seek an automatic diagnosis system for PBL with special shapes and positions (vertical and furcation lesions). (2) We adopted the widely accepted stage classi cation standard advocated by the American Academy of Periodontology and the European Federation of Periodontology in 2017, with greater signi cance in guiding clinical practice. (3) We correctly calculated the percentage of PBL after detecting the key points of each tooth in panoramic lms so that the condition could be accurately staged. Table 1 shows the distribution of periodontal lesions and their classi cations in the reference dataset. Table 2 shows the performances of the YOLO-v4 model on all lesions and strati es lesions in different teeth. Table 3 analyses the performances of YOLO-v4 and general dentists in stratifying lesions to different stages in the test set. Table 4 summarizes metrics results of YOLO-v4 in vertical resorption and furcation lesion detection.

Results
First, the stage classi cation results for YOLO-v4 in different positions was presented. As shown in Table 2, the performance of YOLO-v4 was entirely acceptable with an overall accuracy of 0.77, and differed in different teeth. In maxillary anterior, premolar and mandibular posterior teeth, the accuracy of the model was relatively high at 0.78-0.81, and in maxillary molars and mandibular anterior teeth, the accuracy was lower at 0.71. The same results were found for the precision, sensitivity, speci city, and F1score metrics.
Second, we compared the staging performance of YOLO-v4 and dentists. Overall, there was little difference in speci city, but the model obtained better accuracy, precision, sensitivity and F1 scores than the dentists, especially in stage I and II lesions, while the results showed signi cant difference (CI:95%, p < 0.05) ( Table 3). In stage I lesions, the sensitivity of the model was 0.76, while it was 0.57 for dentists. For stage II lesions, the sensitivity of the model was 0.75, while it was 0.46 for dentists. For stage III lesions, the sensitivity of the model was 0.81, while it was slightly higher at 0.82 for dentists. It seemed that the model seemed was more sensitive in detecting stage I and II lesions, with little advantage in stage III lesions. The same results were found for the accuracy metrics.
Finally, we evaluated the metrics of vertical resorption and furcation lesion detection ( Table 4). The precision in furcation PBL was 0.94 and sensitivity was 0.75, which was considered satisfactory. For the vertical type, the precision and speci city of YOLO-v4 model were 0.88 and 0.51, respectively.

Discussion
In 2017, the American Academy of Periodontology and the European Federation of Periodontology proposed a new de nition and classi cation framework for periodontitis based on a multidimensional staging and grading system [8] . Currently, the staging related to the severity and extent of periodontitis. This widely accepted consensus proposed the following information: An individual case of periodontitis should be further characterized using a matrix that describes the stage and grade of the disease. Stage is largely dependent on the loss of periodontal tissue, implies the severity of disease at presentation, as well as the anticipated complexity of case management; the stage further includes a description of the extent and distribution of the disease in the dentition. Staging is the basis for the patient's treatment plan based on scienti c evidence of the different therapeutic interventions, consisting of speci cally designed supportive periodontal care at different intervals of 3 to a maximum of 12 months [26] . In the early stage, basic treatment works well. With periodontal disease progression, the treatment plan becomes more complicated, and the prognosis worsens. Thus, comprehensive screening for early diagnosis and precise staging for appropriate treatment are also important for the control of periodontal disease.
Therefore, we trained an AI model that could intelligently identify the key points for judging the percentage of periodontal bone resorption and then calculate the accurate percentage according to the formula to accurately stage periodontitis. On the other hand, the AI model could output reading results stably based on the same standard when facing a large number of lms, with excellent potential for periodontitis screening.
In this study, UNet and YOLO-v4 were used to train a deep learning model for comprehensively diagnosing and accurately staging periodontal alveolar bone loss on panoramic oral lms and compared with the diagnosis determined by dentists.
UNet is often used to evaluate biomedical images and performs well in medical image segmentation. YOLO-v4 used and combined some features, including weighted residual connections (WRCs), crossstage partial connections (CSPs), cross mini-batch normalization (CmBN), self-adversarial training (SAT) and misactivation, mosaic data augmentation, DropBlock regularization, and CIoU loss, to improve CNN accuracy. Therefore, it could provide an e cient and powerful object detection model [27] .
As shown in the previous section, the staging model generally performs well, as all metrics were satisfactory. The speci city is particularly superior; that is, the teeth predicted to be negative by the model were highly likely to be truly diagnosed as negative. This means that the model had a very small probability of missed detection, showing good screening potential. The model had different performance outcomes in different tooth positions, which may be related to the stretching and deformation of the image and the overlap of other local or adjacent structures in maxillary molars and mandibular anterior teeth. On average, the performance of the staging model was better than that of the dentists in addition to being stable, although the diagnosis results of the dentists were relatively consistent. The possible reason for this outcome was that YOLO-v4 accurately calculated the percentage based on the identi ed key points, while the physician estimated the percentage based on visual observation (simulating clinical scenarios). There was a difference in accuracy between the two, especially near the staging threshold. In different categories, the accuracy was different due to the difference in the height of the alveolar bone loss. Hence, models trained based on expert diagnostic criteria may perform better than ordinary general dentists.
The detection model of furcation lesions had acceptable results, while that of vertical absorption had relatively low speci city and high accuracy, which may have been related to the insigni cant image characteristics of vertical absorption. Furcation lesions could be a strong reminder or warning of a missed diagnosis; that is, if the model predicted vertical absorption, there was a high probability of vertical absorption. If the dentists or radiologists re-examined or read the lm carefully under the prompt of the positive result of the model, the vertical absorption that was missed previously was likely to be con rmed.
Compared with the published research, the advantages of this research are the following: we used new structures and models, combined with automatic tooth recognition and segmentation and key point object detection; this combination reduced the overlap of irrelevant structures. We also calculated the percentage of PBL more accurately and achieved accurate staging. The classi cation standard was based on the consensus of the clinically widely recognized and widely used periodontitis staging framework, so that the research results were more consistent and relevant to the clinic. Also, the inspection content was more comprehensive. The study included the detection of PBL in speci c parts and shapes related to diagnosis and decision-making.
However, this study still had many limitations. First, the research was not conducted in a real clinical environment. The data set was acquired retrospectively from radiological lms. Also, these single-modal data did not include other clinical information. Second, the diagnostic criteria were based on the results of professional and experienced periodontal specialists. The absence of a gold standard leads to possible diagnostic bias. In addition, there was no distinction between non-resorption of periodontal alveolar bone and stage I resorption in the staging diagnosis because a distance of 2 mm could not be accurately measured on panoramic lm. Furthermore, the model was not externally veri ed, which may lead to over tting to the training data set, potentially resulting in an overestimation of the model's diagnostic performance [28,29] . Finally, the resulting model has not been used in the clinic.
Further work will focus on increasing the size of the data set, using three-dimensional images to improve model prediction accuracy, and combining clinical text information for further treatment decisions making and prognosis prediction.

Conclusion
In summary, the well-trained deep learning architecture based on UNet and YOLO-v4 performed well in detecting and clarifying alveolar bone loss radiologically, and could assist dentists in comprehensive and accurate assessment for periodontal bone loss.

Data set
Panoramic radiographs of each patient were acquired in 2018 using a dental panoramic X-ray machine (Orthopantomograph OP 100D, Instumentarium Corporation, Tuusula, Finland) at the A liated Stomatology Hospital, Zhejiang University School of Medicine. We prepared a total of 640 panoramic radiographs excluding the images of patients with primary or mixed dentition. The panoramic radiographs were collected retrospectively after identi able patient information was removed. The study was approved by the Medical Ethics Committee of the A liated Hospital of Stomatology, School of Medicine, Zhejiang University (ChiCTR2100044897) and was conducted in compliance with the ICH-GCP principles and the Declaration of Helsinki (2013). The data collection and all experiments were performed in accordance with the relevant guidelines and regulations.
The images were randomly separated into a training set (80%) and a test set (20%) before data augmentation. The training set was used for CNN training of detection, and the testing set was used to evaluate the nal trained model.

Periodontist reading and labelling
Our reference measurement of alveolar bone height reduction was the maximal radiographically detectable PBL in % of the root length.
For each tooth, three periodontal specialists, each with more than 3 years of clinical experience, manually determined 6 points on each radiograph to estimate PBL in %. The points on the radiographs were the mesial and distal CEJ, the deepest point of the mesial and distal root apex (for unirooted teeth, the mesial and distal root apex overlap was in the same position) and the most apical extension of the alveolar crest (the deepest extension mesial and distal was considered). If the CEJ was covered by a restoration or caries cavity, the most apical point of the restoration or caries was used instead. Using these points, it was possible to calculate the % of PBL. The results of these three independent measurements were three %-values of PBL for each tooth.
The nal label was determined based on consensus between the periodontists, i.e., different opinions on the point position were resolved by periodontists' repeating their evaluation, and then all of the labels were reviewed and revised (addition, deletion, and con rmation) by a fourth periodontist. The examiners were instructed in person and calibrated using a handbook (describing how to use the annotation tool and how to annotate caries lesions, as well as how to discriminate these lesions from other entities) before they performed labelling and annotating tasks.
Additionally, examiners framed and labelled teeth with vertical alveolar reductions and furcation lesions (Fig. 1). The staging standard of the degree of alveolar bone resorption is based on six key points: m1, m2, m3, d1, d2, and d3 (for the mesial and distal CEJ, alveolar crest and root apex; Table 5). The six points were divided into 2 groups, d1-d2-d3 and m1-m2-m3, to calculate the mesial and distal PBL% as the distance between the CEJ and the alveolar crest divided by the distance of the CEJ to the apex ( Table 2). For every tooth, two PBL% values (one for the mesial and one for the distal) were determined, and the larger PBL% was recorded. According to the Consensus of the Classi cation of Periodontal and Peri-Implant Diseases and Conditions [9] , PBL% led to the stage result (Table 6). Using the PBL% and not the absolute measures (in mm, etc.) helped to overcome the known issue of patient positioning and magni cation.

Standard of staging periodontitis
When the distance between the alveolar bone and CEJ was within 2 mm, the patient was not clinically diagnosed as having alveolar bone resorption. However, considering the different shooting angles and zoom ratios of panoramic lms, the absolute value of 2 mm is di cult to accurately de ne on the panoramic lm. Therefore, this study did not distinguish between non-absorption and Stage I absorption. First, we performed data augmentation to make the model more accurate by modifying the images. The images were ipped horizontally and vertically and rotated. Therefore, the amount of data for deep learning was increased to 4 times that of the original amount.

Tooth segmentation
The second stage is the automatic detection and segmentation of teeth using the UNet network, which can combine deep and shallow information. Deep methods can provide the contextual semantic information of the segmentation object in the entire image and re ect the characteristics of the relationship between the object and its environment. In addition, medical images provide relatively little data, and the underlying features are more important. Pertinently, shallow information can provide more meticulous features for segmentation, such as gradients. After identifying the contours of the teeth, teeth fragments are isolated by expanding 20 pixels in all directions along the most prominent point of the contour of each tooth.

Object detection
The third stage is object detection.
The rst part involves the use of CSPDarkNet, which can extract rich feature information from the input image. Notably, the interior of the network improved the information ow of dense blocks and transition layers, thus enhancing the learning capacity of the network, optimizing back propagation, and improving processing speed and memory.
The second part entails the use of the spatial pyramid pooling module + path aggregation network (SPP + PAN), which is can fuse feature information of different scales. SPP can enhance the model's detection of objects of different scales so that objects of different sizes and scales can be identi ed. The PAN proposes a two-way integration method that integrates bottom-up and top-down methods.
The third part involves the use of YOLO Head, which is employed for the nal inspection. This part generates the nal output vector with class probabilities, object scores and bounding boxes.

Calculation and staging
The fourth stage is the calculation of the percentage of periodontal alveolar bone loss and the staging of periodontitis. Based on the 6 key points detected for each tooth, the PBL% was calculated according to the aforementioned formula and divided into the corresponding categories (Fig. 2).

Comparison with dentists
A cohort of three general dentists, all working in the A liated Stomatology Hospital, Zhejiang University School of Medicine for at least 3 years, was used as a comparator group to so that the relative performance of the neural network could be compared against that of individual dentists. Each of the participants independently classi ed PBL staging.

Metrics and statistical analysis
The diagnostic performance of the YOLO-v4 and dentists was compared to the periodontists' ndings using confusion matrices. We calculated accuracy, precision, sensitivity, speci city, F1, and AP and compared and analysed the diagnostic metrics between YOLO-v4 and three dentists using the chi-square test. Additionally, we evaluated the consistency of the three dentists' diagnoses using the intraclass correlation coe cients (ICCs). Statistical analyses were performed with SPSS 24.0. Statistical signi cance level was set at p < 0.05.

Declarations
Author