Craniofacial syndrome identification using Convolutional Mesh Autoencoders

Background Clinical diagnosis of craniofacial anomalies requires expert knowledge. Recent studies have shown that artificial intelligence (AI) based facial analysis can match the diagnostic capabilities of expert clinicians in syndrome identification. In general, these systems use 2D images and analyse texture and colour. While these are powerful tools for photographic analysis, they are not suitable for use with medical imaging modalities such as ultrasound, MRI or CT, and are unable to take shape information into consideration when making a diagnostic prediction. 3D morphable models (3DMMs), and their recently proposed successors, mesh autoencoders, analyse surface topography rather than texture enabling analysis from photography and all common medical imaging modalities, and present an alternative to image-based analysis. Methods We present a craniofacial analysis framework for syndrome identification using Convolutional Mesh Autoencoders (CMAs). The models were trained using 3D photographs of the general population (LSFM and LYHM), computed tomography data (CT) scans from healthy infants and patients with 3 genetically distinct craniofacial syndromes (Muenke, Crouzon, Apert).


Funding
This work has been funded by Great Ormond Street Hospital for Children Charity (Grant No. 12SG15), the Engineering and Physical Sciences Research Council (EP/N02124X/1) and the European Research Council (ERC-2017-StG-757923).
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=3795325 P r e p r i n t n o t p e e r r e v i e w e d Background Early diagnosis of many genetic disorders improves outcome, but the rarity and variety of possible syndromes make clinical diagnosis challenging. Syndromic craniosynostosis (SC) comprises a group of conditions characterised by premature fusion of the skeletal sutures, compromising normal development of the skull and brain 1,2 . Delayed diagnosis is common in phenotypically mild SC risking irreversible functional impairment, including visual failure, neurocognitive defects and airway problems that can be avoided by timely diagnosis and treatment. The inadequacy of current screening paradigms makes syndromic craniosynostosis a prime candidate for computer-assisted diagnosis and referral. Muenke, Apert, and Crouzon syndrome are SC variants caused by genetic mutations of the fibroblast growth factor receptor (FGFR) gene and occur between 1:30,000 and 1:65,000 live births 3 .
Phenotypic presentation is variable with some crossover in phenotype between the syndromes.
The application of computer vision and deep learning approaches has already proven effective for the identification and diagnosis of SC from 2-dimensional (2D) images [4][5][6][7] . Among the more prominent examples is DeepGestalt, a deep convolutional neural network (DCNN) trained on tens of thousands of images to identify facial phenotypes for genetic disorders 8 . While such systems demonstrate impressive results, they are unable to take advantage of the rich geometric information in the face and cranium that may give critical insight into the phenotypical variations associations with different syndromes.
Advances in 3-dimensional (3D) modelling and geometric deep learning have resulted in the introduction of a more shape-based approach craniofacial analysis [9][10][11][12] . Previous work has successfully applied statistical shape models in adult populations for diagnosis and surgical simulation in orthognathic patients 13 . The recent introduction of convolutional mesh autoencoder models (CMAs), a deep neural network approach to 3D model construction, offers further potential for the construction of shape-based models 12,14 . These models learn to extract meaningful shape features from the input data and can consequently be used for classification tasks. Such approaches have not yet been applied for automated diagnosis of craniofacial syndromes using an age matched population.
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=3795325 P r e p r i n t n o t p e e r r e v i e w e d We report on a geometric deep learning approach to the characterisation and identification of SC. The framework leverages convolutional mesh autoencoders and is trained using 3D data from healthy and syndromic individuals, focused on the identification of three distinct types of SC, namely Apert, Crouzon, and Muenke syndrome. Rather than relying on image data, the proposed model leverages the rich geometric information of the 3-dimensional scans. We demonstrate the power of the model for syndrome characterisation and classification, and illustrate its diagnostic sensitivity with an unusual Crouzon case.

Institutional review board statement
Patient data for this study were retrospectively retrieved from electronic medical records after receiving approval from the Institutional Review Board: Great Ormond Street Hospital (R&D no. 14DS25).

Data sources
A full summary of the demographics for the databases used is given in Table 1.
All SC patients diagnosed with Apert, Crouzon, or Muenke syndrome at the Craniofacial Unit of Great Ormond Street Hospital for Children, London, UK were reviewed retrospectively for preoperative 3D imaging. Computed Tomography (CT) head scans were selected as the most suitable image modality for the assessment of the craniofacial anatomy. CT-scans with insufficient quality for 3D reconstruction due to low number of slices or (movement) artefacts were excluded. Baseline characteristics were collected from corresponding medical charts. In total, the CT data from 122 SC patients was included (Apert, Crouzon, and Muenke, mean age of 5.0 ± 5.1 years, 58% male). This database is henceforth referred to as "the syndromic dataset".
At Necker Children's Hospital, (Necker-Enfants Malades Hospital), Paris, France, CT-scans of patients between the age of 0-4 years without a history of craniofacial anomalies were assessed. Patients indicated for a CT-scan between 2011 and 2018 due to headaches, trauma or epilepsy were reviewed for inclusion. The scans were evaluated by two independent reviewers: 1) a paediatric radiologist and 2) a clinical research fellow in craniofacial surgery, to exclude any scans with abnormalities such as fractures, brain tumours, brain damage, or craniofacial anomalies. Henceforth, this database of normal craniofacial images is referred to as 'the paediatric dataset'. For the paediatric and syndromic datasets, several samples used for the face models were omitted when constructing the head models. Scans with incomplete cranium information or those with previous calvarial surgery, such as posterior vault remodelling, were excluded. In total, the scans of 142 healthy infants (mean age, 1.9 ± 1.2 years, 56% male) were included.
The original data consisting of 10,000 faces of the Large Scale Face Model (LSFM) was used for agematched diagnostics for cases between the ages of 4 and 17 years. Of the subjects who met the desired age criteria, 196 samples were randomly selected to provide this age matched reference (7 male, 7 female for each age).
The Liverpool-York head model (LYHM) was constructed using the craniofacial scans from approximately 1,200 individuals aged between 2 and 90 from the Headspace database 11 . The scans from those aged between of 4 and 17 were used to provide an age-matched reference for full head scans from the Syndromic Craniosynostosis dataset. The database included 139 scans that met the required age criteria. The mean age was 10.9 ± 4.4 years, and 55% were male.
A summary of the demographics for the subjects used in this study is given in Table 1.
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=3795325 P r e p r i n t n o t p e e r r e v i e w e d Image pre-processing

CT-scan pre-processing
The DICOM-files of the collected datasets, i.e. the syndromic and paediatric datasets, were converted to 3D soft tissue meshes by applying standardised skin setting using Horos, an open-source medical viewer, and exported at stereolithography (STL) files. The STL files were imported in MeshMixer, an open-source software, to undergo a cleaning process where redundant objects, such as draping and gel This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=3795325 P r e p r i n t n o t p e e r r e v i e w e d pads, back of the CT-scanner, pacifiers, and lines and tubes, were removed. The meshes were saved as an Object (OBJ) files. A sparse set of 68 facial landmarks were manually added to the meshes. To encourage good correspondence around the ears, an additional 55 landmarks per ear were also added to the meshes, following the landmark template outlined in 17 . Using the facial landmarks, the raw meshes were subjected to Procrustes analysis, allowing them to be rigidly aligned with the template meshes prior to obtaining dense correspondence. For the analyses we used a face only template, a head and face template, and a head only template. (Figure 1) The construction of a 3D mesh autoencoder model requires that all meshes are re-parametrised to have a consistent topological structure, where each mesh has the same number of vertices connected in consistent triangulation, with similar vertices having the same semantic meaning. Meshes meeting this criterion are said to be in dense correspondence. In this study approach, a non-rigid iterative closest point registration (non-rigid ICP) was used to achieve dense correspondence 18 . This approach allows for the use of landmark points to guide the correspondence process, and the use of data weights to restrict how certain vertices can move during the registration. As this method requires an initial template mesh with the desired final topology, the cropped facial template from 15 was used for construction of the facial models, while the template from 19 was used for the craniofacial and cranial models.

Dense correspondence for the Syndromic Craniosynostosis dataset
The faces and heads of those in the infant and syndromic craniosynostosis datasets differ greatly in shape both from each other, and from the samples used for the construction of previous (cranio)facial 3DMMs. For this reason, the use of landmark guided NICP was not always sufficient to achieve good quality correspondences. As such, Gaussian Processes were applied to the face template mesh to increase deformation flexibility and improve the quality of the correspondences obtained for the syndromic cases.
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=3795325 P r e p r i n t n o t p e e r r e v i e w e d

3D Mesh Autoencoder Construction
Once dense correspondence had been achieved for all meshes, the 3D models were created using mesh autoencoders. The autoencoder architecture applied here is similar to that described in 12  Encoder convolutional filter sizes of [16,16,32,32,32] were used for the face-only model, while encoder filter sizes of [16,16,32,32] were used for the head-only and combined models. In both cases, decoder filter sizes were the mirror of the encoder. Each convolutional layer in the encoder was followed by a mesh downsampling layer by a factor of 4. In the decoder, this was replaced with a layer to upsample the mesh by a factor of 4. An additional convolutional layer with an output dimension of 3 was added to the decoder to allow for the reconstruction of the 3D shape coordinates. An ELU activation function was applied after each convolutional layer. Spiral convolutional with a fixed length of nine were applied in all layers 20 , and a latent vector size of 128 was used. Model weights were initialised using Xavier initialisation. Adam optimisation with an initial learning rate of 1 x 10 -3 , and a learning rate decay of 0.99 was used. Mesh vertices were used as the autoencoder input and an L1 reconstruction loss was applied to the output. All models were trained for 300 epochs using a batch size of 16.
Models were constructed for the face-only, head-only, and combined head and face regions to gain an insight into how characteristic these regions are of the various craniofacial syndromes. All paediatric healthy cases and syndromic patients with incomplete or poor-quality head CT-scans were omitted from the head-only and combined head and face models. Consequently, these models were constructed using fewer samples. Due to the wide age range of the individuals in the dataset (1 day to 20 years), two additional models were created for each of the regions of interest. The first included all patients and volunteers who were up to and including three years of age. The second consisted of those aged four This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=3795325 P r e p r i n t n o t p e e r r e v i e w e d and above. Nine models were created in total. All results pertaining the age-based models can be found in the supplementary materials.

Error Quantification
To assess reconstruction accuracy, the Euclidean distance, d, between each sample in the real dataset and the corresponding model reconstruction was calculated on a per-vertex basis. For two meshes, A and B, with n vertices, the mean error over all vertices is defined as:

Model Specificity
Specificity is a metric that is commonly used evaluate the validity of novel instances create by generative models. To assess this, 1000 samples were randomly synthesised for each of the models and the Euclidean distance to all ground truth samples was calculated. The specificity error was reported as the mean Euclidean distance over all vertices between a synthesised face and the closest ground truth neighbour.

t-Distributed Stochastic Neighbour Embedding (t-SNE) is a dimensionality reduction technique that
allows the high dimensional shape vectors to be embedded in a lower dimensional space and can reveal hidden structures of the data 21 . To assess the diagnostic capacity of the models, t-SNE was applied to the high dimension latent vector encodings for the patients and healthy volunteers, allowing the global manifold of these vectors to be embedded in a 2-dimensional space for visualisation. Samples were then labelled according to their syndromic class (Normal, Apert, Crouzon, or Muenke) with the aim of This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=3795325 P r e p r i n t n o t p e e r r e v i e w e d uncovering distinct groupings, or clusters. All t-SNE embeddings were created using a perplexity of 30 and run for 1,000 iterations.

Classification
Autoencoders are often utilised for their ability to compress data into a much more compact format.
This manifests as the latent vectors of the model. These latent vectors provide a natural means by which we can attempt to classify the data and determine its applicability as a diagnostic tool.
Classification was performed using a Support Vector Machine (SVM) with linear kernel and balanced class weighting. A stratified data split with an 80%:20% train:test proportion was used. The scikit-learn SVM implementation with default gamma and regularization parameters (C=1.0) was employed. The mean accuracy, specificity, and sensitivity were calculated following a Monte-Carlo cross-validation system where the training and test sets were randomly selected 10,000 times.

Intrinsic Model Evaluation
Using the available databases, three distinct classes of model were constructed to assess the role of facial and cranial shape in the diagnosis of SC; a face-only model, a head-only model, and a combined head-and-face model (see Methods).
For the face, head and combined models, these reconstruction errors of were 1.4 ± 1.2 mm, 3.8 ± 3.1 mm, and 2.9 ± 2.5 mm, respectively. Reconstruction error was higher for models that included the head shape, likely due to the greater degree of variation between subjects in this region. Model specificity was evaluated by randomly synthesising 1000 samples and comparing them to their nearest real neighbour 10 . Values of 2.7 mm, 4.3 mm, and 3.9 mm for the face-only, head-only, and combined models respectively indicate that the samples generated are realistic.
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=3795325 P r e p r i n t n o t p e e r r e v i e w e d

Manifold Visualisation
To assess the diagnostic capacity of the model, t-distributed stochastic neighbour embedding (t

Syndrome Classification
Classification was performed with all syndromic and non-syndromic scans. A split of 80%-20% for training and testing data was assessed over 1,000 iterations. The mean sensitivity, specificity, and accuracy over all iterations for each of the assessed regions in the binary classification experiment, and This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=3795325 P r e p r i n t n o t p e e r r e v i e w e d the average confusion matrices for both binary and multi-class classification, are shown in Table 2 and

Discussion
AI assisted diagnosis is set to play and increasing role in healthcare particularly in relation to rare or difficult to diagnose conditions. In this work, we leverage state-of-the-art geometric deep learning approaches to present a framework for the detection and classification of a subset of syndromic craniosynostoses. Requiring only a 3D input allows potentially deceptive texture information to be omitted, allowing the architecture to focus on the extraction of characteristic shape-based features to return an accurate diagnosis.
Our autoencoder model demonstrates high sensitivity and specificity. The model can be applied for both binary classification (syndromic vs. healthy) and multi-class classification (Apert, Crouzon, Muenke, and healthy). The high degree of separation of the different classes in the shape space of the model, as demonstrated by the t-SNE embeddings, intuitively supports these results. The high sensitivity and specificity of the model highlights its suitability as a diagnostic aid in primary and secondary care settings ensuring reliable diagnosis with few false positive results.
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=3795325 P r e p r i n t n o t p e e r r e v i e w e d Utilising 3D topography rather than the surface texture analysis central to most other facial analysis techniques lends itself to integration with many forms of diagnostic imaging technologies, such as MRI, ultrasound scans, and CT. Our technique facilitates auto-segmentation of surfaces as well as syndrome identification and could therefore be used as a diagnostic machine learning tool to aid diagnosis from radiological or 3D photographic imaging. Integration with ultrasound imaging is of particular interest, as this presents an opportunity for the foetal detection of genetic disorders 16 . The rising availability of 3D scanning applications and cameras on mobile devices presents further possibilities to introduce such a framework in primary and secondary care. In a field where timely diagnosis is necessary for appropriate management, the use of such technologies to detect SC and other conditions will streamline assessment at an earlier stage and could be pivotal to improving long-term health outcomes.
In conclusion, while several machine learning approaches for the identification of craniofacial syndromes have been presented, to the best of our knowledge, this is the first time the problem of syndrome classification has been approached from a shape-based perspective.
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=3795325 P r e p r i n t n o t p e e r r e v i e w e d