Written informed consent was obtained from all patients before their radiographic examinations. The research protocol was performed following the principles of the Declaration of Helsinki and was approved by the non-interventional Institutional Review Board (IRB) of our university’s Health Sciences Ethics Committee. Deidentification was performed in compliance with the Information Commissioner's Anonymization: managing data protection risk code of practice (https://ico.org.uk/media/1061/anonymisation-code.pdf), and validated by the aforementioned institution. This study does not contain any active involvement of the patients. The study data was created only from the deidentified anonymized data.
Anonymized DICOM files of the CBCT images which were taken by 3 different CBCT units were used in this study. The CBCT units were Pax-i3D Smart PHT-30LFO0 (Vatech, South Korea), Carestream Health CS 8100 3D (Kodak, USA), Orthophos XG 3D (Sirona, Germany). Images were obtained using up to 120mm × 90mm FOV between. All CBCT units have isotropic voxels which differs between 0.1 mm3 to 0.2 mm3.
The primary aim of this study was to generate an AI algorithm for segmentation of craniomaxillofacial anatomy and second to test this algorithm for automatic detection algorithm for pharyngeal airway bot for OSA and control patients. Thus, this study has two notable parts as dataset preparation for the evaluation and to test the practicability of the system in order to enhance the diagnostic capabilities
CBCT anatomy localization generated with AI model
Approach
To handle large volume sizes on a reasonably fine scale, we approach this task with a coarse-to-fine framework, where the whole volume is first analyzed as semantic segmentation in coarse resolution, followed by a patch-based semantic segmentation with a coarse hint in fine resolution. Our approach to training the system consists of the following steps: Preprocessing incoming volumetric image; coarse model training; coarse hint generation; patch-based training in fine resolution with a hint from the coarse model.
Data
We use a simple min-max normalization within a fixed window. We clip the intensities to be inside the [-1000, 2000] interval, then subtract a minimum intensity value and divide by a maximum one. Different methods have also been examined. According to our experiments, the training procedure is not sensitive to the choice of preprocessing and all methods lead to approximately the same results. The data is split into training, development and test sets. We use 90% of the data for training, 5% for the development set and 5% for the test set.
For the Coarse step, we rescale the image to have 1.0mm isotropic voxel resolution using linear interpolation. To provide the Coarse model more information, we obtain soft coarse segmentation ground truth labels by the following procedure. First, we encode the original semantic segmentation mask of shape DxHxW with a one-hot encoding scheme which results in a tensor of shape tensor CxDxHxW, where C represents number of classes and D, H, W are the spatial dimensions of the original volume. Next, we use linear interpolation to rescale this tensor to have a 1.0mm resolution. The resulting tensor consists of the probability distributions over classes for each spatial position and is referred to as soft targets.
For the Fine step, the target voxel spacing of the model is 0.25 x 0.25 x 0.25mm which is also achieved with linear interpolation of the image. For this step, we obtain the ground truth labels via a simple nearest neighbour downsampling of original semantic segmentation masks. During training, we randomly sample patches of size 144 x 144 x 144 voxels.
Model
We formulate both Coarse and Fine steps as a semantic segmentation task, where the background and each anatomical element is interpreted as a separate class. For both Coarse and fine steps, we exploit a same pre-trained semantic segmentation network based on internally modified fully convolutional 3D U-Net architecture [16].
Since the Fine model is trained using a patch-based approach, it’s crucial to provide the model with global information. We achieve it by utilizing a coarse hint. A coarse hint is the Coarse model output which is interpolated to the Fine-scale and passed to the Fine model as additional input channels. To prevent possible data leakage, we train the Coarse model and prepare coarse hints via three-fold cross-validation. Therefore, the only difference between the Coarse and Fine models architectures is the number of input channels: for the Coarse step it equals 1, and for the Fine step it equals the number of classes plus 1.
The class imbalance is known to be a challenging problem in medical semantic segmentation tasks. We approach this issue by using a sum of a standard cross-entropy loss and soft multiclass Jaccard loss. To prevent overfitting and enhance the performance of the model we also utilize a large variety of data augmentations. For the Coarse step the following augmentations are used during training: random blur, random noise, random rotations, random scaling, random crops, random elastic deformation, random anisotropy [16]. For the Fine step, we used the same set of augmentations except for random elastic deformation and random anisotropy since these transformations are computationally expensive when applied to reasonably large images.
Training
To sum it up, our training procedure consists of the following steps. First, we train the Coarse model on the coarse training dataset with soft targets. This checkpoint is used during the testing. We also perform three-fold cross-validation and use the obtained checkpoints later to generate coarse hints for the Fine step. For both cross-validation and full data training, we followed the same procedure. We train for a total of 100 epochs using Adam optimizer with a one-cycle scheduling policy with a maximum learning rate equal to 1e-3, minimum learning rate equal to 1e-6, 0.05 warmup fraction and a batch size of 1.
Next, we prepare coarse hints for the Fine model. We utilize the checkpoints received via cross-validation and make out-of-folds predictions, then linearly interpolate the output probability maps to the Fine model voxel spacing and concatenate them with the original intensity value channel. Finally, we train the Fine model for a total of 40 epochs, using Adam optimizer and the same learning rate scheduling policy, as in the Coarse step.
To train the Fine model we use a patch-based approach. At the beginning of the training epoch, we iterate over the images, randomly sample 20 patches per volume and store them in a queue of size 180. Once the queue has reached a specified maximum length we start to retrieve the random patches from it and pass them to the network while simultaneously preparing new patches and adding them to the queue. For evaluation, we use the checkpoint with the lowest recorded validation loss for both Coarse and Fine models.
Implementation
Our algorithm was based on Python implementation of U-net. All training and experiments were done using NVIDIA® GeForce® RTX A100 GPU. Adam optimizer was used for the network training.
Inference
At test time the patch-based approach is known to produce the predictions of a worse quality near the borders of the output patch. To alleviate this issue, we perform inference in overlapping patches and aggregate the predictions with weights which make the centre voxel of an output patch contribute more to the final result than its borders. We set the patches’ overlap to 16 (Figure 1).
Patient test dataset
To estimate the generalizability of our model, a retrospective patient CBCT dataset from Dentomaxillofacial Radiology Department in Near East University was used. A power analysis was conducted with a statistical power of 90%, the significance level of 0.05 α, and a probability of type II error as 0.2 β. A minimum number of 82 CBCT images for both control and OSA groups were required according to the power analysis.
Hence, our study was conducted with randomly selected artefact-free 100 OSA and 100 control CBCT images existing in our faculty’s database. All patients provided their informed consent before irraditation, and the consent forms were reviewed and approved by the institutional review board of the faculty. Exclusion criteria were evident skeletal asymmetries, cleft palate, cleft lip, current ongoing orthodontic treatment, any teeth that overlie apical region of the incisors.
The dataset of a previous study of ours is used in this study [17]. “CBCT records of 200 patients (100 images of OSA patients and 100 images of the control group) were retrospectively collected and analyzed along with the polysomnography records and body mass index (BMI) of OSA patients at the Department of Allergy, Sleep and Respiratory Diseases. AHI is the number of apnea + hypopnea seen each hour during sleep. Sleep apnea severity was evaluated in 4 different subtypes as minimal, mild, moderate and severe. Patients with Apnea-Hypopnea Index(AHI) value lower than 5 were classified as minimal group while patients with AHI values between 5-15, 15-30 and more than 30 were classified as mild, moderate and severe, respectively. 100 OSA patients had symptoms of this disease and evaluation of these patients was accomplished by a standardized program at the Department of Allergy, Sleep and Respiratory Diseases, which also consists of anthropometric measurements, dental examination, CBCT and polysomnography. Polysomnography uses various methods like electroencephalography, electromyogram, electro-oculography, respiratory effort measurement, airflow measurement and snoring [12]. Control (non-OSA) patients had none of the clinical findings of the OSA patients such as snoring, dyspnea, witnessed apnea, coughing or daytime sleepiness. So their images were used as a control group. The mean age for OSA patients was 53.2 years and for non-OSA patients was 46.4 years. Principles characterized in the Declaration of Helsinki were applied during the protocol of study along with modifications and revisions.
CBCT images of the test group were obtained by NewTom 3 G Quantitive Radiology s.r.l., (NewTom, Verona, Italy). CBCT records for non-OSA patients had been taken for implant planning, evaluation of impacted teeth, prosthodontic and orthodontic purposes. Patients with osteoporosis, skeletal asymmetries and medication related bony alterations were excluded from the study.
Ground truth segmentation process
All CBCT data was exported as DICOM files and then anonymized. The axial, coronal and sagittal slices were oriented in order to ensure a proper evaluation. The axial slices were aligned with maintaining the palate line and the ground perpendicular to each other. Coronal slices were oriented with aligning the both orbits and midline of the head parallel to the ground and the sagittal slices were aligned with linear orientation of the ANS and PNS.
All CBCT images had been segmented prior to our study to be used for diagnosis, pharyngeal airway evaluations and surgical planning using InVivo 5.1.2 ® (Anatomage Inc., San Jose, CA, USA). DICOM files of the axial CBCT images were exported with a 512 x 512 matrix and were imported to InVivo 5.1.2. In this software, the evaluation of the pharyngeal airway can be measured both automatic thresholding and manual tracing with semiautomatic thresholding.
The pharyngeal airway is originated by the nasopharynx and the oropharynx. In order to assess the borders of the oropharyngeal airway volume, ANS-PNS plane which extends to the wall of pharynx was determined as the superior border and the most inferior-anterior point of the 2nd cervical vertebrae which is parallel to the superior border was determined as the lower border of the oropharyngeal airway. Since the superior border of the oropharyngeal airway is also the lower border of the nasopharyngeal airway, a line perpendicular from the PNS to the palatal plane is drawn in order to form the anterior border of the nasopharyngeal airway. Sum of the nasopharyngeal airway and oropharyngeal airway is calculated with both manual tracing with semi-automatic thresholding and automatic thresholding in InVivo 5.1.2. viewer. S.A. and A.K. observed the CBCT images twice with a week interval to avoid any intra-observer disagreement for ground truth measurement.
For automatic thresholding, the software itself detects the pharyngeal airway volume, area narrow point area and measure the narrow point automatically (Figure 2).
The manual tracing with semiautomatic thresholding was done by cropping the airway using the “edit masks” feature and the connection with the outer air was cropped in each slice with the segmentation tools. The “region growing” tool was used in order to split the segmentation produced by thresholding into several objects and to remove floating pixels an the pharyngeal airway volume and area were calculated using the “calculate 3D” tool feature of the software (Figure 3, Figure 4A-B).
3D U-Net architecture framework (AI model)
Our approach is automatic segmentation focusing on the regions of interest: the external surface of the bones, teeth and airways. This process results in 5 segmentation masks as the upper skull, the mandible, maxillary teeth, mandibular teeth and the airways. We performed a series of trials to choose the best training configuration. Following, the generated stl files were downloaded and imported to 3rd party software for volumetric pharyngeal airway measurements (3-Matic Version 15, Materialise) (Figure 5).
Statistical Analysis
Statistical analysis was performed using SPSS 22.0 software (SPSS Inc., Chicago, IL, USA). Due to the non-normal distribution of the data, the Mann-Whitney U test was used for comparisons between paired groups and the Kruskall Wallis H test was used for comparisons between three or more groups. Significance level was set as 0.05 and it was stated that there was a significant difference in case of p<0.05, and no significant difference in case of p>0.05.