Participants
Study materials were collected from digital clock drawing consortium data between the University of Florida (UF) and New Jersey Institute for Successful Aging (NJISA), Memory Assessment Program, School of Osteopathic Medicine, Rowan University. The Institutional Review Boards (University of Florida IRB and Rowan University, New Jersey Institute for Successful Aging IRB) approved this investigation, and permission was collected from the participants at both institutions through their signatures on informed consent forms. All study procedures were carried out in accordance with the Declaration of Helsinki and according to respective institution guidelines30. There were two data cohorts for the current investigation:
Training dataset -included a set of 13,580 clock drawings from participants age ≥ 65 years, primary English speaking, who completed clock drawing to command and copy conditions as part of routine medical care assessment in a preoperative setting.31 Data were collected from January 2018 to December 2019. Exclusion criteria were as follows: non-fluent in the English language; education < 4 years; visual, hearing, or motor extremity limitation that potentially inhibits the production of a valid clock drawing.
Classification dataset– consists of “fine tuning” and “validation” datasets with a set of individuals meeting criteria for dementia and a separate set of data from non-dementia peers. Dementia clocks are from individuals who had been evaluated as part of a community memory assessment program within Rowan University between February 2016 and March 2019. Individuals were seen by a neuropsychologist, a psychiatrist, and a social worker. Inclusion criteria: age ≥ 55. Exclusion criteria: head trauma, heart disease, or other major medical illness that can induce encephalopathy; major psychiatric disorders; documented learning disability; seizure disorder or other major neurological disorder; less than 6th-grade education, and history of substance abuse. All dementia participants were assessed with the Mini-Mental State Exam (MMSE), serum studies, and an MRI scan of the brain. Individuals have been described in prior research studies.32 As described in prior scientific papers, these individuals were diagnosed with either AD or VaD using standard diagnostic criteria, respectively.33,34
Non-dementia peers had completed a research protocol of neuropsychological measures and neuroimaging with all data reviewed by two neuropsychologists. Inclusion criteria included age ≥ 60, English as the primary language, and availability of intact activities of daily living (ADLs) as per Lawton & Brody’s Activity of Daily Living Scale, completed by both the participant and their caregiver.35 Exclusion criteria: clinical evidence of major neurocognitive disorder at baseline, as per the Diagnostic and Statistical Manual of Mental Disorders – Fifth Edition,36 presence of a significant chronic medical condition, major psychiatric disorder, history of head trauma/neurodegenerative disease, documented learning disorder, epilepsy or other significant neurological illness, less than a 6th grade education, substance abuse in the past year, major cardiac disease, and chronic medical illness-induced encephalopathy. Participants were screened for dementia over the telephone using the Telephone Interview for Cognitive Status (TICS37); and during an in-person interview with a neuropsychologist and a trained research coordinator who also evaluated comorbidity rating38, anxiety, depression, ADLs, neuropsychological functioning, and digital clock drawing39. Data from these participants were collected from September 2012 to November 2019. These data have been described elsewhere.2,18
Procedure
Cohort participants completed two clock drawings – one to a command condition and another to a copy condition1. The command condition required individuals to “Draw the face of a clock, put in all the numbers, and set the hands to ten after eleven”. The copy condition required individuals to draw the clock underneath a presented model of a clock. Drawings were completed using digital pen from Anoto, Inc. and associated smart paper16,17. The digital pen technology captures and measures pen positioning on smart paper 75 times/second. 8.5 × 11inch smart paper folded in half giving participants a drawing area of 8.5 × 5.5inch. For the current project, only the final drawing was extracted and used for analyses.
Data from a training cohort of 13,580 unlabeled clock drawings to command and copy condition were used to train the VAE in an unsupervised manner. Thereafter, the trained VAE encoder network, which compresses clock drawings into a low-dimensional latent space, is fine-tuned for distinguishing dementia from control clock drawings. Command and copy clocks were not separated at this stage as we wanted the model to extract features from clock drawings in a way that is agnostic to any cognitive outcome. This serves to keep the extracted features general and useful for any downstream classification task. The latent features extracted by the VAE were passed to a classification network for distinguishing dementia from control. This network was fine-tuned with 53 dementia and 60 control clock drawings and tested on 18 dementia and 20 control clock drawings. The fine-tuning and validation datasets were created by randomly shuffling the classification dataset and splitting it in a 3:1 ratio. Therefore, there was no demographic difference between the fine-tuning and validation dataset.
Individual clock sketches were extracted from file using contour detection. The extracted images were then flattened into one-dimensional vectors. These vectors were filtered to retain the ones with a size of less than 40,000 pixels (200 X 200). This filtering step is necessary to remove clock images containing excessive white space due to information sparsity issues. As clock drawings become sparser, the VAE tries to encode the interior white space of the clock instead of its drawn features, i.e., digits, hands, or circumference. This leads to poor latent space distributions that resemble white noise (Supplementary Fig S1), high reconstruction errors, and ineffective VAE encoder weights. Therefore, we restricted clock sizes to a maximum of 200 X 200 pixels. The filtered clocks were then resized to a fixed size of 100 X 100 and converted into 1-dimensional vectors (10000 X 1). These one-dimensional representations were used as inputs to the VAE. The preprocessing steps are illustrated in Supplementary Fig S2.
Models and Experimental setup
A variational autoencoder (VAE) is an unsupervised generative model with an encoding phase that projects input data onto a lower-dimensional latent space and a decoding phase that reconstructs the input data from random samples drawn from this latent space (Fig. 4). In the VAE model, the latent space distribution is created under the restriction that it follows a Gaussian distribution \(N({Z}_{m}, {Z}_{s})\). This makes the VAE a generative model, as it can randomly sample this latent space distribution to create images resembling the input data which are not necessarily present in the input dataset. The use of a normal distribution as a prior does not lead to a loss of generality because the non-linear decoder network can mimic arbitrary input data distributions. No information about the generative features of clock drawing (e.g., total stroke length, clockface symmetry, coordinates of hands, number of digits, etc.) were supplied to the VAE network and we had no a priori expectations of making the latent dimension represent these generative features. We used unidimensional representations of clock drawings to train the VAE with an input dimension of 10,000, an intermediate dimension of 512, and an embedding dimension of 2. The embedding dimension was intentionally kept extremely low to inquire if such a low-dimensional manifold can extract meaningful clock features useful for classification. The model was trained for 50 epochs with a batch size of 16. The reconstruction loss of the VAE was chosen to be the binary cross-entropy loss. The trained latent representation of the VAE was used as input to a feed-forward fully connected neural network with a single hidden layer of 512 neurons for classification studies. Figure 4 shows the architecture of our networks and a conceptual workflow of our method. The top portion of the figure shows the training of the VAE. The bottom portion of the figure shows how the compressed latent space of the VAE, in the form of trained encoder weights, is used to create a task-specific classifier. The classifier network has a fully connected feed-forward neural network architecture. We fine-tuned the weights of this classifier using a small, annotated fine-tuning dataset to improve its performance. The number of neurons in each layer of this classifier were finalized by using randomized grid search inside a 3-fold cross validation setting. The reported number of neurons gave the best average performance on the fine-tuning data over 3-fold cross-validation. Finally, the performance of this trained classifier was validated on the validation data and several important performance metrics namely, AUROC, Accuracy, Sensitivity, Specificity, Precision and Negative Predictive Value (NPV) were reported. The validation data was bootstrapped 100 times using random sampling with replacement to create confidence intervals of these metrics. The reported values constitute the median score, 2.5th quartile and 97.5th quartile of these metrics over the bootstrapped validation dataset.