Validating virtual administration of neuropsychological testing in Parkinson disease: a pilot study

Background COVID-19 has highlighted the need for remote cognitive testing. Virtual testing may lessen burden and can reach a larger patient population. The reliability and validity of virtual cognitive testing in Parkinson disease (PD) is unknown. Objectives To validate neuropsychological tests for virtual administration in PD. Methods Participants enrolled in an observational, cognition-focused study completed a rater-administered cognitive battery in-person and via video conference 3–7 days apart. Order of administration was counterbalanced. Analyses to compare performance by type of administration (virtual versus in-person) included paired t-test, intraclass correlation (ICC) and linear mixed-effects models. Results Data for 35 (62.9% male) PD participants (62.5% normal cognition, 37.5% cognitive impairment) were analyzed. Only the semantic verbal fluency test demonstrated a difference in score by administration type, with a significantly better score when administered virtually (paired t-test p = 0.011 and linear mixed-effects model p = 0.012). Only the Dementia Rating Scale-2, Trails A test and phonemic verbal fluency demonstrated good reliability (ICC value 0.75–0.90) for virtual versus in-person administration, and values for visit 1 versus visit 2 were similarly low overall. Trail making tests were successfully administered virtually to only 18 (51.4%) participants due to technical issues. Conclusions Virtual cognitive testing overall is feasible in PD, and virtual and in-person cognitive testing generate similar scores at the group level, but reliability is poor or moderate for most tests. Given that mode of test administration, learning effects and technical dificulties explained relatively little of the low test-retest reliability observed, there may be significant short-term variability in cognitive performance in PD in general, which has important implications for clinical care and research.


Introduction
There is increasing interest in virtual administration of motor and non-motor assessments in Parkinson disease (PD), partially to allow more frequent and informative testing, as well as to minimize patient or participant burden. This trend has been greatly accelerated by the ongoing COVID-19 pandemic.
Cognitive assessments are a key component of clinical care and many clinical research projects, including randomized controlled trials (RCTs), yet there is little data reporting on the validity of virtual non-motor assessments in PD. If it can be determined that virtual administration of cognitive assessments is reliable and valid, the results could have a signi cant impact on how PD clinical care is delivered and clinical research is conducted in the future.
A systematic review indicated good reliability of virtual assessments compared with in-person assessments to diagnose dementia in general (1). There has been one study demonstrating good retest reliability for in-person versus virtual administration of the Montreal Cognitive Assessment (MoCA) in a cohort of non-PD elderly individuals with and without cognitive impairment (2). A meta-analysis of cognitive testing via virtual conferencing in geriatric populations also indicated a high potential for virtual administration as a substitute for in-person testing (3).
Importantly, a study measuring computer literacy and its effect on both online and in-person cognitive testing suggested that older populations demonstrate worse computer literacy and perform worse on both online and in-person cognitive testing and indicated a need to correct for computer literacy when examining online cognitive test scores, speci cally for tests that require motor coordination and processing speed (4).
As an alternative to traditional, in-person, paper-and-pencil cognitive testing, remote computerized cognitive testing in various iterations is becoming increasingly common(5). This includes unsupervised, self-completed cognitive testing, with some batteries already piloted in PD (Cogstate Brief Battery)(6), others with previous supervised versions extensively used in PD (CANTAB Connect) (7), and other new batteries not used yet in populations with demonstrated cognitive impairment (Amsterdam Cognition Scan)(8). However, for the time being, supervised cognitive testing, whether traditional paper-and-pencil tests or computerized testing, remains most commonly used in clinical care and clinical research.
The objective of this study was to determine the reliability of virtual versus in-person administration of commonly used cognitive assessments in PD. We hypothesized that virtual administration of cognitive assessments would have high agreement with in-person administration, which would support virtual administration of standard cognitive assessments in the context of both clinical care and clinical research.

Participants
Page 4/17 35 Parkinson's disease patients with a range of cognitive abilities (65.7% normal cognition, 28.6% MCI and 5.7% mild dementia based on consensus diagnosis as previously outlined (9)), were recruited from the NIA U19 Clinical Core at the University of Pennsylvania (U19 AG062418). Subjects were required to have a MoCA score ≥ 20 as well as reliable internet connection to participate. Subjects were asked to complete the virtual portion of the assessment on a laptop, desktop, or tablet, although two participants completed on a smart phone due to technical issues.

Neuropsychological testing
Three trained raters administered a comprehensive neuropsychological battery assessing global cognition and the ve major cognitive domains. Tests administered were the MoCA (version 7.1)(10), Mattis Dementia Rating Scale 2 (DRS-2)(11), verbal uency phonemic (FAS) and semantic (animals) test (12) (19). For the purposes of retest reliability analyses, follow-up testing was performed within 3-7 days (mean 5.37 ±1.7 days), and order of administration type (virtual or in-person) was randomized and counterbalanced. To address possible practice effects, Form 1 and Form 4 of the HVLT-R were utilized, and administered in a randomized fashion as well. A randomization schedule for administration order and HVLT-R version was created and adhered to as closely as patient scheduling allowed. The same assessor completed both the virtual and in-person visit for each participant to minimize impact of inter-rater variability.

Virtual testing
For virtual testing, participants were asked to meet via the BlueJeans or Zoom video conferencing applications. Prior to virtual testing, participants were mailed a "virtual test packet" that included blank paper for drawing, as well as testing templates for some written tests (i.e., Trails A and B, SDMT, Clock Draw and MoCA). A PowerPoint displayed relevant images and instructions that would otherwise be shown using a stimulus booklet or template to be used in conjunction with the virtual test packet (supplementary material). Some images on the PowerPoint were used for instructional purposes (e.g., SDMT, and Trails A and B), some were presented for participants to draw in their own packets (e.g., DRS-2 and MoCA), and other tests required the participant to describe what they saw on-screen (e.g., JLO, BNT and MoCA). Participants were asked to use either a laptop (N = 19), desktop (N = 6) or tablet (N = 8) to adequately view images presented to them on screen; however, two patients completed the testing via the BlueJeans or Zoom mobile app on their smartphone due to technical di culties and reported no issues. Raters used either a desktop or laptop to administer and supervise tests.
Participants seen in-person rst were provided with a stamped and addressed envelope and asked to mail back the completed test packet once their virtual test was complete. Those seen virtually rst returned their mailed packet at their follow-up in-person visit. All virtual tests packets were returned.

Other assessments
Other clinical assessments included the Uni ed Parkinson's Disease Rating Scale (UPDRS) Part III motor score (20), Geriatric Depression Scale-15 Item (GDS-15)(21), Hoehn and Yahr stage (20) and total levodopa equivalent daily dose (LEDD) (22). These data were collected at the rst visit regardless of administration type, with exception of the UPDRS III motor score, which was obtained at the in-person visit.

Functional assessments
Functional assessments were administered to assess daily functioning and to assist in the consensus cognitive diagnosis process. These tests include the Penn Parkinson's Daily Activity Questionnaire 15 (PDAQ-15)(23) (to be completed by both patient and knowledgeable informant, if available, the Activities of Daily Living (ADLi) Questionnaire(24) (to be completed by either knowledgeable informant (preferred) or patient), UPDRS Part II score (20), and Schwab and England score (20). 26 and 7 ADLIs, and 24 and 33 PDAQs, were completed by a knowledgeable informant and patient, respectively. We utilized the knowledgeable informant ADLI and patient PDAQ for analyses. These informants did not assist in completing any of the cognitive testing. These assessments were administered by the rater at the rst visit, with exception of questionnaires completed by knowledgeable informants, which were selfcompleted and returned via mail.

Consensus cognitive diagnosis process
Each participant received a consensus cognitive diagnosis (normal cognition, mild cognitive impairment or dementia) by a trained panel of raters, as previously described (9).

Statistical Analyses
Recruitment began in response to and soon after COVID-19 pandemic onset, and continued until routine in-person resumed (November 2020 -August 2022). Descriptive statistics (percentages and means and standard deviations) were utilized for key demographics, cognitive tests, functional assessments, and other non-motor assessments. Paired t-tests were used to determine the difference in average performance between in-person and virtual tests, as well as between visit one and visit two. Raw scores were used for all analyses.
Intraclass correlation coe cients (ICC) were run to assess the reliability of tests at each visit type. These were two-way mixed absolute agreement correlations, with cutoffs ≥ 0.90 being excellent, 0.75-0.9 good, 0.5-0.75 moderate, and < 0.5 poor reliability. Retest reliability based on visit number (i.e., visit 1 versus visit 2) was also examined. Finally, linear mixed-effects models (LMM) were performed to assess the effect of both administration type and visit order number on cognitive test scores. Fixed effects included administration type, visit order number, age, PD duration, education, and sex. A random intercept term was included in the mixed-effects model to account for the correlations of the cognitive scores.
Given this is a pilot study, uncorrected p value < 0.05 was considered to be signi cant. All statistical tests are two-sided. Statistical analyses were done using SPSS (version 28 (14)).

Participant characteristics
Descriptive information for the cohort is in Table 1. Of the 35 participants, 62.9% were male, and all were white. The mean (SD) age was 69.11 (7.79), education 16.66 (2.09) years, and disease duration 10.46 (5.26) years. Regarding consensus cognitive diagnosis, 23 (65.7%) had normal cognition, 10 (28.6%) mild cognitive impairment (MCI), and 2 (5.7%) dementia.     Examining the impact of visit 1 versus visit 2 on cognitive test performance in linear mixed-effects models while accounting for administration type, Clock Draw (p = 0.02), BNT (p = 0.001), and JLO (p = 0.01) performance were signi cantly better at the second visit compared with visit one.

Discussion
As telemedicine becomes more widely used, both clinically and in clinical research, the need to administer cognitive testing virtually has grown. Our results show that in PD, overall cognitive test performance is similar when administered virtually versus in-person, but that there is signi cant variability in test performance over the short-term regardless of the mode of administration.
Average cognitive test scores for virtual testing were similar to in-person testing, and in the linear mixedeffects models, mode of administration did not predict test performance, except for better performance for several tests when administered virtually. However, the retest reliability for virtual versus in-person testing was poor to moderate for most tests, which prompted us to examine overall retest reliability (i.e., of our participants self-reported experiencing cognitive uctuations that could explain, in part, the variability in test scores we found in such a short period of time. The reasons for low retest reliability could differ between those participants with and without a diagnosis of cognitive impairment, but our small sample size prevented such secondary analyses. Future testing with larger cohorts and a wide range of cognitive abilities will help determine which PD patients are most appropriate for virtual cognitive testing. Alternatively, some of the low retest reliability that we found may be inherent to the tests themselves. None of the variables included in the linear mixed-effects models (i.e., administration type, visit number, sex, age at test, PD duration, and education) had a signi cant effect, except visit number on the BNT, JLO, and Clock Draw, which may have been due to the practice (i.e., learning) effects, as the same test version was administered at each visit. To our knowledge, parallel versions of these tests are not available.
Alternate versions of the MoCA were considered but were not utilized as at the time of study initiation they were not con rmed to be interchangeable with the original version. Virtual administration of cognitive tests is limited to those who have reliable internet access and technology that can support video conferencing. Also, we did not attempt virtual testing with patients with moderate-severe dementia, as we did not think it would be feasible to assess the participants effectively. Additionally, a certain degree of computer literacy is required, which is especially a problem in older, cognitively-impaired cohorts. However, our cohort was highly educated, and likely has a level of computer literacy, and access to high-quality devices and internet connectivity, that is not generalizable to the PD patients in general. Despite this, a virtual option makes cognitive testing much more accessible for non-local participants, and especially so for PD patients with advanced motor disabilities. Raters were limited in their ability to accurately or immediately score some tests, such as the Clock Draw Test and Trail Making Test, until the test packet was returned via mail by participants, instead of relying on screenshots taken during testing. Thus, obtaining accurate data was dependent on both the patient and the mail to return the packets, though this did not prove to be an issue for our cohort. Finally, scheduling constraints prevented the two visits from being conducted at the same time of day for each participant, although all participants were evaluated in an "on" state by self-report.
This study provides preliminary evidence that virtual administration of cognitive testing is feasible in PD and produces similar results to traditional in-person testing for numerous global and detailed cognitive tests at the group level. In a somewhat unexpected additional nding, there was signi cant short-term variability in cognitive test performance overall, regardless the mode of administration, which has implications for interpreting cognitive test results from a single session administered as part of clinical care or clinical research. Future studies with larger sample sizes, are needed to further evaluate virtual testing as a possible substitute for traditional in-person testing, and to explain variability in performance.
Regardless any limitations, in a typically older population for which in-person clinical or clinical research visits can be a challenge, virtual cognitive testing in PD is feasible and convenient.

Declarations
Author Roles: