Soft-tissue cephalometric analysis can be used to objectify the clinical observations on 3D photographs, but manual annotation, the current gold standard, is time-consuming and tedious. Therefore, this study developed a deep learning-based approach for automated landmark extraction from randomly oriented 3D photographs. The performance was assessed for ten cephalometric landmarks: the results showed that the deep-learning-based landmarking method was precise and consistent, with a precision that approximated the inter-observer variability of the manual annotation method. A precision < 2 mm, which may be considered a cut-off value for clinical relevance, was seen for 69% of the predicted landmarks 16,17.
In the field of craniofacial surgery, different studies have applied deep-learning models for automated cephalometric landmarking, mainly focusing on 2D and 3D radiographs. Dot et al. used a SpatialConfiguration-Net for the automated annotation of 33 different 3D hard-tissue landmarks from CT images and achieved a precision of 1.0 ± 1.3 mm 18. An automated landmarking method, based on multi-stage deep reinforcement learning and volume-rendered imaging, was proposed by Kang et al. and yielded a precision of 1.96 ± 0.78 mm 19. A systematic review by Serafin et al. found a mean precision of 2.44 mm for the prediction of 3D hard-tissue landmarks from CT and CBCT images 4.
Some studies did describe automated algorithms for 3D soft-tissue landmarking on 3D photographs, but these algorithms did not include deep learning models. Baksi et al. described an automated method, involving morphing of a template mesh, for the landmarking of 22 soft-tissue landmarks from 3D photographs that achieved a precision of 3.2 ± 1.6 mm 10. An automated principal component analysis-based method, described by Guo et al., achieved an average root mean square error of 1.7 mm for the landmarking 17 soft-tissue landmarks from 3D photographs 11. Even though a direct comparison is infeasible to make due to the difference in landmarks, datasets, and/or imaging modalities, the precision of the proposed workflow is within the same range as these studies.
The effect of landmark choice on established precision is underlined by the MeshMonk results found in this study. In the original publication by White et al., an average error of 1.26 mm for 19 soft-tissue landmarks. The same methodology was used to establish the precision for the ten landmarks used in this study, and an overall precision of 1.97 ± 1.34 mm was found. This finding highlights the difficulty in comparing landmarking precision from literature. 9Compared to the semi-automatic method, the fully-automated workflow yielded significantly improved precision for six landmarks, emphasizing the feasibility of fully-automatically annotating soft tissue landmarks from 3D photos using deep learning.
The proposed workflow uses two successive networks and additional algorithms for alignment and facial segmentation. Advantages of this approach include that the DiffusionNet assures robustness against sampling densities and the HKS settings inherently account for rotational, positional, and scale invariance that may arise between different 3D photography systems. A limitation of the current study is that the workflow was only applied to 3D photographs captured using one 3D photography system. Despite the robust nature of DiffusionNet/HKS, the performance of the workflow might be affected when applied to 3D photographs captured with different hardware. Furthermore, the DiffusionNet models were only trained on spatial features, whereas in the manual annotation process texture information was used. Even though this has the advantage of making the DiffusionNet models insensitive to variations in skin tone or color, landmarks such as the exocanthions, endocanthions, and cheilions could presumably be located more precisely using manual annotation. This would not apply to the landmarks lacking color transitions, such as the nasion and nose tip. Based on these presumptions, the DiffusionNet-based approach might achieve a better precision if texture data of the 3D photographs would be available to the networks.
Another limitation of the proposed workflow arises from the utilization of HKS settings in the initial DiffusionNet, leading to occasional issues with random left-right flipping in the predictions of symmetrical landmarks (e.g., exocanthions). To overcome this challenge, a solution was devised that involved detecting symmetrical landmarks within a single channel. Subsequently, both landmarks were distinguished from each other using a clustering algorithm, followed by a left-right classification based on the midsagittal plane. Although a success rate of 98.6% was achieved using this solution, the workflow failed when the initial DiffusionNet was unable to predict one of the landmarks in the midsagittal plane (nasion, nose tip, or cheilion midpoint). Since this was mainly due to suboptimal quality of the 3D photo, it might be prevented by optimizing image acquisition. For optimal performance of the workflow, it is important to minimize gaps and restrict the depicted area in 3D photos to the face.
Due to its high precision and consistency, the developed automated landmarking method has the potential to be applied in various fields. Possible applications include objective follow-up and analysis of soft-tissue facial deformities, growth evaluation, facial asymmetry assessment, and integration in virtual planning software for 3D backward planning 20,21. Considering that the proposed DiffusionNet-based approach only uses spatial features, it could be applied on 3D meshes of facial soft tissue that are derived from imaging modalities lacking texture, such as CT, CBCT, or MRI. Nevertheless, further research is necessary to ascertain the applicability of this workflow to these imaging modalities. The fully-automated nature of the workflow also enables cephalometric analysis on large-scale datasets, presenting significant value for research purposes. The position-independency of the workflow might make it suitable for automated landmarking in 4D stereophotogrammetry and give rise to real-time cephalometric movement analysis for diagnostic purposes 22,23.