Automated Analysis of Three-dimensional CBCT Images Taken in Natural Head Position That Combines Facial Prole Processing and Multiple Deep-learning Models

Analysing cephalometric X-rays, which is mostly performed by orthodontists or dentists, is an indispensable procedure for diagnosis and treatment planning with orthodontic patients. Artificial intelligence, especially deep-learning techniques for analysing image data, shows great potential for medical and dental image analysis and diagnosis. To explore the feasibility of automating measurement of 13 geometric parameters from three-dimensional cone beam computed tomography (CBCT) images taken in a natural head position, we here describe a smart system that combines a facial profile analysis algorithm with deep-learning models. Using multiple views extracted from the CBCT data as the dataset, our proposed method partitions and detects regions of interest by extracting the facial profile and applying Mask-RCNN, a trained decentralized convolutional neural network (CNN) that positions dental program. The time savings compared with the traditional approach was remarkable, reducing the processing time from about 30 minutes to about 30 seconds.


Introduction
Since Broadbent [1] introduced X-ray techniques to orthodontics in 1931, they have become a common instrument for analysing the lateral or postero-anterior cephalograms on which orthodontic diagnosis and treatment planning are based. However, conventional twodimensional X-ray images are prone to enlargement or distortion [2,3] because they generally superimpose the three-dimensional (3D) structures of the human skull onto one another [4], which provokes confusion and misinterpretation. Therefore, the cone beam computed tomography (CBCT) technique [5] was introduced to the dental profession between 2001 and 2004 with the NewTom 9000 (QR, Verona, Italy), the CB MercuRay (Hitachi Medical Systems America, Twinsburg, Ohio), and the I-CAT (Imaging Sciences, Hatfield, Pa) after Ambrose and Hounsfield [6] invented medical computed tomography (CT). With those advanced instruments, clinicians can clearly depict and digitize landmarks on 3D CBCT images from any aspect with few visual errors.
Those breakthrough visual data have inspired a lot of research to measure parameters useful for diagnosis in orthodontia. Park et al. [7] introduced a method to use the nasion true vertical plane (NTVP) and true horizontal plane (THP), based on natural head position (NHP), to analyse 3D CBCT images and show the anterior-posterior sagittal skeletal relationships and the protrusion of upper and lower anterior teeth. Also, Latif et al. [8] assessed anterior-posterior jaw relationships by analysing visualizations of the facial sagittal profile with Software Do-It Ceph (Dental Studio NX version 4.1, USA). However, the measurement tasks in that research still required a dentist to manually digitize the landmarks.
As in other fields, artificial intelligence (AI) techniques have been applied to dentistry and orthodontia. Takahashi et al. [9] proposed a method using You-Only-Look-Once version 3 (YOLOv3) to recognize dental prostheses and tooth restorations. However, that method had limited accuracy in detecting tooth-coloured prostheses.
Deep learning was used to detect periapical disease in dental radiographs in the research of Endres et al. [10]. Also, Kunz et al. [11] created an automated cephalometric x-ray analysis method using a specialized AI algorithm and compared its accuracy with the current gold standard. Kharbanda et al. [12] proposed a knowledge-based algorithm to automatically detect landmarks on 3D CBCT images.
Despite that previous research, few studies have used AI to analyse cephalometric measurements of three-dimensional CBCT images. In everyday practice, orthodontists still spend much time and effort analysing three-dimensional CBCT images of orthodontic patients. Therefore, we here present a complete vision-based measurement system that can evaluate 13 representative parameters in 3D CBCT images to provide an integrated view of facial morphology, including sagittal and coronal analyses, occlusal cant, facial asymmetry, and anterior teeth inclination. We validated the automated measurements of those 13 parameters by comparing them with manual measurements made by two orthodontists and one advanced general dentist.

Landmark identification and measurements of the 13 parameters
In the sagittal view, the nasion point (N-point), which is the intersection between the nasion true vertical line (NTVL) and a true horizontal line, can be visually determined as the most The 13 parameters in this study were selected to evaluate the severity of facial asymmetry, diagnose skeletal relations, and measure anterior teeth inclination. All the parameters are given as geometrical values for length and angle, which are defined using the specific representative landmarks shown in Figure 1

Data preparation
The protocol of this study was reviewed and approved by the Institutional Review Board of Subjects were asked to stare at their own eyes in a mirror after exercising their head up and down. A 40 × 50-cm mirror was hung on the wall 1.5 m from the subjects. Subjects' heads were positioned without a chin support or any other type of head holder.
Of the 200 cases considered, 170 cases were used to construct the dataset, train the deeplearning models, and provide samples for the developer, and the 30 remaining cases were used as the validation set to compare the accuracy of the proposed method with manual measurements.
In the proposed method, the procedure for determining the parameters is not a straightforward process for calculating output from the loaded input. Instead, it is a flexible sequence for computing-selecting data for extraction-computing. As shown in Figure 2 (a), the input of the next stage is selected based on the results calculated in the previous stage to minimize the human factors required for operation. First, the facial profile processing algorithm, which takes the sagittal and vertical bone views extracted from the CBCT data as inputs, evaluates cephalometric landmarks as key-points for parameters 1-4 and 13. Then, the axial layer of the CBCT data, which corresponds to the mid-point between points A and B in the cephalometric landmarks, is used to define the profile of the jawbone and thereby generate a panoramic view of the tooth structure. Second, the Mask-RCNN and proposed tooth numbering method are used to detect and classify the teeth in the input panoramic view. In this step, representative key-points in the dental parameters (parameters 5-10) are also positioned. Third, the position of the incisors detected in the panoramic view is used to extract tooth sagittal views by applying a decentralized CNN to evaluate the inclination of incisor alignment (parameters [11][12]. An example of the input images used in the proposed method is shown in Figure 2 (b).

Cephalometric analysis algorithm
Extracting skull and mandibular profiles in the sagittal view is the first essential step before positioning cephalometric landmarks. Considering the skull profile, which is demonstrated as a red line in Figure 3, the extraction process uses the similarity between a random position, in terms of its depth value and characteristics at the boundary of the skull, and its neighbouring points. From a starting point in the first pixel row of a sagittal depth image that differs from zero and has the largest x-value, one pixel ( , ) in pixel row is compared with a chosen pixel in the previous row ( − 1 ), and it is considered to be on the facial profile if it satisfies the conditions in Eq. (1), where n is the y-size of sagittal depth image .
Based on the obtained skull profile, the Me point can easily be defined as the final point of the profile, which is then considered as the starting point for extracting the mandibular profile. The procedure for this extraction is similar to the process for the facial profile, but the direction along y-axis is reversed. However, unlike the Me point, which is defined as the end point of the profile, the Ar point, which is the final point of the mandibular profile, is detected based on the change in the mean depth value of the neighbourhood region-of-interest (ROI).
Specifically, a 55 pixel ROI cropped from the sagittal depth image on the lower left side of the considered point and corresponding with a point in the jaw profile is collected, as shown in In the coronal depth image, the N-point in the sagittal and coronal views shares , with the parameter under consideration, as shown in Figure 3(c). Therefore, using the same method, the left side profiles and ′ of the nasal bone in the coronal view from − 2 to + 2 ( = 40 is used in this research) are extracted, and is calculated by Eq. 2 as the mean value.

Deep-learning system for detecting tooth landmarks
The first use of the deep learning technique is to generate the dental arch curve from one CTimages of jaw as shown in Fig. 4 (a) and extract a panoramic view of the teeth. For this process, we use a landmark detection CNN, whose architecture is based on VGG-net [13], as shown in Table 1. The output of the model, which is also the label of the training dataset, is six landmarks that represent the global tooth profile, as shown in Fig. 4 (b), where 1 and 6 represent the head of the mandible, 2 and 5 are located around the third molar, and 3 and 4 represent canines. Three tooth-containing images from the CBCT-image set for each patient were collected with a resolution of 768768 pixels and manually labelled. Because high-precision landmark positioning was considered unnecessary for this task, this labelling process was conducted by developers, not orthodontists. The plot of five sequential segments that connect the six detected points was smoothed using a 1D-Gaussian filter, as shown in Fig. 4 (c), before an area formed by the margins on both sides of the obtained curve was defined, as shown in       The numbering process for upper and lower teeth is implemented from the middle position of all teeth. Currently, the method can deal with patients whose teeth are in the conventional arrangement. The numbering priority is based on the horizontal distance from each centre point to the middle position, without regard to each tooth's appearance. This numbering approach can recognize missing teeth by the abnormal distance between two teeth and the middle position, as shown in Fig. 6 (e). Then, the ROIs of the teeth-of-interest (incisors, canines, and first molars as numbers 1, 3, and 6, respectively, in Fig. 6 (e)) are resized to 300x150 pixel resolution and loaded into a landmark detection CNN with the architecture shown in Table 2.
That CNN is trained to locate three chosen points on each tooth: the middle point and left and right corners of the tooth crown on the enamel region, as shown in Fig. 7.

Decentralized CNN to evaluate anterior incisor inclination
Anterior incisor inclination is evaluated by defining the tooth alignment axis, which is the line from the tip of the incisor root to the tip of the incisor crown. Therefore, landmarks that represent the tip of the root and tip of the crown in the upper and lower incisors are essential.
Unlike the landmark detection already described, the accuracy of this function must be high.
Therefore, we used a decentralized CNN model [17][18][19] previously proposed as a highaccuracy landmark detecting method for medical images. The advantage of this method is that it narrows the ROI after each order, which not only reduces the number of non-related features that can affect the results, but also increases the diversity of the training dataset. Considering the complexity of this detection, we propose a decentralized CNN model with two orders, as shown in Fig. 8: the 1 st order roughly detects the region of tooth tips, and the 2 nd order precisely locates the landmarks within the 1 st order's results. (3), which is useful for handling features corrupted with outliers, where n is the number of labels in the dataset.
After each training step, the weight factor,  , is updated for the (t + 1) ℎ iteration based on the rule shown in Eq. (4), where is the batch size (M = 64), η is the learning rate (initially set as 0.001 and updated using SGDM (momentum = 0.95) [20]), and ∇ is the gradient operator.

System implementation
For user convenience in implementing the software, we developed the graphical user interface (GUI) shown in Fig 9. From the CBCT-images uploaded to the system, raw CT images in the sagittal, coronal, and axial views can be monitored in the upper left corner of the GUI. The program exports sagittal and coronal bone modes, panoramic views, and tooth modes, which are all data for the proposed system to process.
To extract dental features as described above, the GUI allows users to manipulate the data either manually or by the proposed automatic method. For manual manipulation, the dentist digitizes specific landmarks directly on each frame of the GUI for parameter calculation. This option is provided for patients with highly complicated, significantly abnormal facial profiles and tooth structures, which would cause difficulties for the developed program. The option using the proposed method, named AI extraction, automatically and sequentially locates landmarks, calculates parameters, and visualizes the results on frames without requiring any manual manipulation. The total calculation time for the AI-based option is around 12 seconds on our test configuration of a Core i7-7700 with 32 GB of random access memory and a Geforce RTX 2070 (NVIDIA) graphics processing unit.
In the comment box in the upper right showing the calculated parameters, the categories of sagittal cephalometric, which is related to NTVP, and vertical cephalometric, which is related to THP, are also integrated into the program to help free dentists and orthodontists from having to memorize and look up standard tables. Figure 9. GUI designed for the developed system.

Comparison with the accuracy of manual measurements
The gold standard selected to validate the developed program is manual measurement conducted by three experienced orthodontists with more than 10 years working in this field professionally. They made their measurements using In Vivo Dental software (version 5.3; Anatomage Co., San Jose, CA, USA), which was developed especially for medical radiology diagnosis and open dental CBCT, medical CT, MRI, and other medical scans in the standard DICOM format. We chose this approach as the standard reference because it is the most popular method for measuring 3D CBCT images for orthodontic diagnosis. Furthermore, orthodontic experts deem existing commercial software with automatic measuring functions to lack reliability and demonstrate poor effectiveness in practice.
The statistical analyses of this study were performed with Excel program (Microsoft corp., Washington, USA). The deviations between the proposed method and the manual measurements of the 13 parameters are given in Table 3, along with the mean absolute error (MAE) and correlation coefficients among the manual results.  To clearly illustrate the correlation between the developed program and measurements made by orthodontists, scatterplots of those correlations for six groups of parameters are shown in Fig. 10. Specifically, the validation data for parameters 1-3, which all consider the NTVP as the baseline for measurement along the x-axis, are visualized in Fig. 10 (a), and Fig. 10 (b) illustrates validation data for parameters 4-6, which are measured along the y-axis. The correlations between the proposed method and experts for parameters 7-10, which are measured from the THP along the z-axis, are presented in Fig. 10 (c). Because the three-angle parameters all differ in the measuring baselines they use, the validation data for parameters 11-13 are respectively shown in Fig. 10 Table 4 and compared with the standard Z-  In short, compared with manual measurement, this automatic measuring program can save 30 minutes of diagnosis time per patient. In addition to the time savings, the cost of the commercial software used to assist with the manual measurement is high, and it still requires the user to perform directly.

Discussion
This study has introduced a hybrid system that combines a facial profile analysis with deeplearning methods (Mask-RCNN and decentralized CNN) to detect representative cephalometric and facial asymmetry landmarks. The proposed system was used to calculate 13 purposefully chosen parameters, distances and angles defined between landmarks, and we validated its results by comparing them with manual measurements made by two experienced orthodontists and one advanced general dentist as gold standard references. The small deviations obtained in the comparison results verify the practical accuracy and consistency of the proposed method. Our automatic measurement and diagnosis system has the following key innovations and functions.

1) Advanced graphical visualization:
The interactive platform provides advanced graphic visualization by applying rendering processing algorithms to raw CBCT images so that users can see the structure of patient skulls and the tooth arrangements in both the sagittal and coronal views, making it convenient for users to digitize landmarks and extract parameters. In this respect, the newly developed software is equivalent to common software already available commercially.
2) Intelligent analysis: The proposed smart dental-parameter measuring system uses a vision analysing approach that combines facial profile processing with AI-based techniques.
These methods were developed using the experience of experts in this field and have demonstrated excellent consistency with manual standard references in our validation study. Therefore, this system can free laboratory staff or experts from this repetitive task.
3) Enhancement of productivity: The developed program offers a significant advantage in the time required for measurements, compared with the typical manual measurement approach.
As the aesthetic surgery market grows, classical measuring procedures will soon fail to meet the requirements of productivity. Therefore, a computerized approach to data handling, which offers a processing speed almost 60 times faster than the manual method, is a valuable alternative option.
The implemented CBCT data allow users to observe the scans in a 3D environment instead of the usual 2D of other typical radiographs, but the quality of the images still depends on the hardware performance, which can be affected by the scanning condition and influence the operation of our proposed method. In addition, the tooth numbering method used in this study had difficulties when premolars had been extracted. That abnormal situation caused inaccurate detection of landmarks because the tooth order was different from normal cases.
Therefore, our future work will give priority to improving the tooth numbering method, which is based on the classification ability of a CNN applied to an extended database. Moreover, our further work will develop an orthographic surgical plan design system that can offer options by using novel machine-learning techniques with the already extracted parameters to act as a virtual assistant for oral and maxilla-facial surgeons.

Conclusion
This study proposed a vision-based dental system that combines an analysis of facial profiles with deep-learning algorithms to measure 13 parameters in 3D CBCT images taken in NHP to automatize repetitive measuring procedures and increase productivity. The facial analysis and deep learning algorithms were applied to multiple views extracted from CBCT images to localize representative landmarks. Furthermore, application software was designed to run the automatic extracting method proposed and allow users to visualize and digitize landmarks manually via an appropriate GUI. The geometric parameters calculated by the proposed system were compared with measurements by human experts and found to be highly consistent in terms of low deviations and significant correlations among the study population. Consequently, this vision-based dental system achieved an obvious time advantage by reducing the measuring time from about 30 minutes to about 30 seconds.
Author Contribution: T. P. N. designed the overall system, including conceptualization and the software, under the supervision of J. Y. and J. A. In addition, they wrote and revised the paper. Y. J. K. and T. K. contributed to dataset curation and annotation and helped to design the comparative analysis and experiment for validation.