Data collection and processing
During an 8-month data collection period, eighty-six children aged 5 to 8 years undergoing dental treatment at the Department of Pediatric dentistry, Peking University School and Hospital of Stomatology in Beijing, China, participated in this study. The inclusion criteria for the tooth images used to train and test the CNN framework were primary teeth without metal crowns or amalgam restorations. Ultimately, we collected 886 groups of tooth photos. This study (PKUSSIRB-201837095) was approved by the local institutional review board ethics committee, and informed consent was obtained from the children’s legal guardians.
An intraoral camera (1280×960 pixels, TPC Ligang, China) was used to acquire photos of the labial surfaces of 886 primary teeth. Then, a disclosing agent (Cimedical, Japan) was applied, and photos of the disclosed teeth were captured at the same angle using the same device. These photos were cropped to ensure that only one complete tooth appeared in each image. A researcher marked the tooth areas in both the original and disclosed tooth photos using LabelMe (MIT, USA) software, which is an open annotation tool for computer vision research. Then, the photos of the disclosed teeth were resized to ensure that the teeth contour profiles of the two groups overlapped. The plaque areas on the disclosing photos were also marked using LabelMe, and the marked areas were transferred to the photos of the teeth before the disclosing operation was performed using the computer program. The adopted AI model then learned the dental plaque features from these photos. The process is illustrated in Fig. 1.
Convolutional neural network training
The dental plaque detection model was built on a conventional neural network (CNN) framework and trained using natural photos to further fine-tune the CNN framework based on transfer learning techniques. The details of this procedure can be summarized into two parts. First, we pretrained the basic DeepLab network with the visual object classes dataset to obtain the initial weights based on transfer learning techniques. Second, we trained a DeepLabV3+ model using our photo dataset of primary teeth [10, 11], which contains photos of 886 primary teeth before and after using a dental plaque-disclosing agent. The dental plaque detected by the AI model was compared with the real dental plaque areas to allow the AI model to compare the results and learn from its mistakes. The comparison process is illustrated in Fig. 2 and Fig. 3. The final dataset contained 886 photos with ground-truth masks identifying the real dental plaque area. Of the complete dataset, 80 percent was chosen randomly and used for training, while the remaining 20 percent was used for testing.
Comparison between the AI model and a dentist
Based on data from a preliminary experiment (α=0.025, β=0.2), at least 87 photos were required to validate the clinical feasibility. An additional 98 primary teeth (not included in the training dataset) were photographed using an intraoral camera (1280×960 pixels, TPC Ligang, Dongguan, China). The inclusion criteria for the validation group were same as those used for the training and testing groups. The photos were assessed by the AI model, and the dental plaque was detected and marked in yellow. Additionally, these teeth were photographed by a digital camera (3216×2136 pixels, Canon EOS 60D, Japan). A pediatric dentist with 20 years of experience assessed the digital camera photos and marked the regions with dental plaque (Fig. 4). Then, a plaque-disclosing agent was applied by a researcher to clearly identify the dental plaque areas. The dentist was not allowed to see those results. To evaluate the consistency of manual diagnosis, after one week, the dentist was asked to mark the dental plaque areas on the 98 photos taken by the digital camera a second time.
In another round of comparison, 102 photos of primary teeth taken by the intraoral camera (1280×960 pixels, TPC Ligang, Dongguan, China) were marked to denote the dental plaque areas assessed by both the AI model and the pediatric dentist to evaluate the diagnostic accuracy of each approach based on photos with lower resolutions (fewer pixels) than the images acquired by the digital camera.
Statistical analysis
We compared the detection accuracy of the AI model to that of the dentist using the mean intersection-over-union (MIoU) metric, which is widely used to assess the accuracy of techniques for semantic segmentation [12]. The MIoU computes a ratio between the intersection and the union of two sets, in our case, the ground truth (the real dental plaque area) and the predicted segmentation result (the dental plaque areas identified by the AI model or the dentist). The MIoU can be reformulated as the number of true positives (intersection) over the sum of true positives, false negatives, and false positives (union). That MIoU is computed on a per-class basis and then averaged.
The parametric data were analyzed using paired t-tests to evaluate differences between the 2 groups. A value of P < .05 was considered statistically significant. SPSS software, version 19.0 (Chicago, IL, USA), was used for the statistical analysis.