Feasibility of serial neurocognitive assessment using Cogstate during and after therapy for childhood leukemia

Neurocognitive impairment is frequently observed among survivors of childhood acute lymphoblastic leukemia (ALL) within the domains of attention, working memory, processing speed, executive functioning, and learning and memory. However, few studies have characterized the trajectory of treatment-induced changes in neurocognitive function beginning in the first months of treatment, to test whether early changes predict impairment among survivors. If correct, we hypothesize that those children who are most susceptible to early impairment would be ideal subjects for clinical trials testing interventions designed to protect against treatment-related neurocognitive decline. In this pilot study, we prospectively assessed neurocognitive functioning (attention, working memory, executive function, visual learning, and processing speed), using the Cogstate computerized battery at six time points during the 2 years of chemotherapy treatment and 1-year post-treatment (Dana-Farber Cancer Institute ALL Consortium protocol 11–001; NCT01574274). Forty-three patients with ALL consented to serial neurocognitive testing. Of the 31 participants who remained on study through the final time point, 1 year after completion of chemotherapy, 28 (90%) completed at least five of six planned Cogstate testing time points. Performance and completion checks indicated a high tolerability (≥ 88%) for all subtests. One year after completion of treatment, 10 of 29 patients (34%) exhibited neurocognitive function more than 2 standard deviations below age-matched norms on one or more Cogstate subtests. Serial collection of neurocognitive data (within a month of diagnosis with ALL, during therapy, and 1-year post-treatment) is feasible and can be informative for evaluating treatment-related neurocognitive impairment.

The pathophysiology of chemotherapy-induced neurocognitive impairment (CICI) is multifactorial, and appears to relate to some or all of the following processes: induction of oxidative stress and neuroinflammation, perturbations of folate physiology, epigenetic changes, and direct injury to neuronal, glial, and/or endothelial structures. Elimination of the causative chemotherapeutic agents is not a viable means of reducing CICI, as an increase in leukemia relapse would be the inevitable result. Fortunately, emerging data suggest that pharmacologic intervention might ameliorate or prevent CICI by inhibiting one or more of the causative pathways, particularly if the intervention can be initiated early in the 2to 3-year course of leukemia therapy, before CICI becomes clinically significant.
To maximize the risk-benefit ratio of a clinical trial testing a preventive strategy, we sought to develop methods to identify those subjects with ALL who are most susceptible to CICI, as well as to determine the earliest time point during therapy when CICI could be detected. Few longitudinal studies have examined neurocognitive functioning in children with ALL [36,37] either during treatment or in the survivorship period. One such study reported on patients diagnosed between 1970 and 1999 [3], and one study only completed neuropsychological assessments at two time points (induction and end of maintenance) [38]. Studies of survivors demonstrate a decline over time on neuropsychological tests [39]. More prospective longitudinal studies are needed to determine how treatment over time impacts neurocognitive functioning [40] and what is the earliest time during treatment when treatment-related changes can be detected. Since neuropsychologists are not readily available at all institutions to conduct repeated assessments of functioning, a simpler computer-based assessment, suitable for repeated assessments, offers advantages.
This pilot study was undertaken to demonstrate the feasibility of longitudinal measurements of neurocognitive functioning using Cogstate. We previously reported that Cogstate, a validated, reliable, non-invasive measure of neurocognitive functioning, can be reliably conducted at multiple institutions participating in a cooperative group trial, and that neurocognitive function measured during the first 3 weeks of leukemia therapy is a stable baseline from which to identify subsequent treatment-related changes. Cogstate tests have also been utilized in other medical populations and have demonstrated strong correlations with standard neuropsychological measures of the same constructs [41,42]. In this follow-up report of the pilot cohort, we describe the feasibility of longitudinal assessments of neurocognitive functioning, as measured by Cogstate, at 5 points during the 25 months of therapy for childhood ALL, as well as a sixth evaluation 1 year after completion of all planned therapy.

Subjects
Children between the ages of 5 and 21 years with ALL, enrolled on, or treated according to, Dana-Farber Cancer Institute (DFCI) ALL Consortium protocol 11-001, "Randomized Trial of IV SC-PEG asparaginase and IV Oncaspar in Children with Acute Lymphoblastic Leukemia or Lymphoblastic Lymphoma" (ClinicalTrials.gov identifier NCT01574274) were eligible for a companion study "Serial Neurocognitive Screening of Children and Adolescents During Treatment for Acute Lymphoblastic Leukemia (ALL) on the DFCI ALL Consortium Study 11-001." Patients known to have any of the following conditions were excluded: active meningitis, poorly controlled seizures, neurodevelopmental disorder (e.g., autism) that would prevent completion of Cogstate testing, congenital condition associated with intellectual disability (e.g., trisomy 21), or serious concomitant systemic disorders (including active infections) that would compromise the patient's ability to complete the study. Patients were enrolled at four sites in the consortium: Dana-Farber Cancer Institute, Boston, MA; Montefiore Medical Center, Bronx, NY; Columbia University Medical Center, New York, NY; and Hasbro Children's Hospital, Providence, RI. The institutional review boards of the treating institutions approved both clinical protocols. Informed consent was provided by the subjects' guardians or by those patients over the age of 18 years. Written assent was provided if age-appropriate, following institutional guidelines.

Outcome measure: The cogstate battery
Cogstate is a battery of computer-based tests selected specifically to assess domains of neurocognitive function that have been previously found to be impaired among childhood leukemia survivors. Participants completed each 20-to 25-min computerized neurocognitive evaluation supervised by a research team member. Participants' neurocognitive performance was assessed over time using the Cogstate test battery, at baseline (within the first 3 weeks after diagnosis) and again at 5 additional post-treatment time points (Table 1). The Cogstate battery yields five cognitive domain scores: continuous paired associate learning (CPAL; visual memory, total number of errors) task, detection task (DET, timed psychomotor function), Groton maze learning (GML; novel problem solving/error monitoring, total number of errors) task, identification (IDN, timed attention/vigilance) task, and one-back (ONB, timed working memory task) task. The timed tasks were measured in log milliseconds; shorter raw response times represent better performance. Additionally, Cogstate tests have demonstrated construct validity and there are minimal practice effects when compared to paper and pencil tests [43,44].

Performance and completion analysis
Successful performance was defined as completing each test in a manner that complied with test requirements. Performance check criteria were specified a priori in order to identify scores that indicated either the participant did not understand the test instructions or was not cooperative. Performance check criteria are derived statistically such that when trained and supervised appropriately, the relevant study population will achieve the said criterion for the respective task 90% of the time when they are demonstrating the appropriate level of effort. Specifically, in order to pass the performance check criteria, participants had to obtain a raw accuracy score of 60% or greater for detection and identification and 50% for the one-back test. On the GMLT, participants had to make 120 errors or less in order to pass the performance check criteria. There is no performance check criteria set for the CPAL test as this test does not have a performance check.

Cognitive assessment scoring
Raw scores for all outcome measures (i.e., CPAL, DET, GML, IDN, and ONB) were standardized into z-scores based upon age normative data and rescaled when appropriate, so that a higher standardized score always represented a better outcome. The score is calculated by subtracting the predicted value from the regression curve for the patient's age and the observed value then dividing the predicted standard deviation (SD) for that age. The mean of such z-scores from the agematched control population will be zero. Individual patient z-scores will provide an estimate of deviation from the control's age norms, and the mean z-scores from patient groups will reflect the extent to which those patients, as a group, demonstrate pathology independent of their ages. Data were presented using raw and z-scores. The z-scores were also dichotomized as either below normal limits or impaired, as defined by whether or not their z-score was 1 and 2 SDs or more below their age norm, respectively [45]. At the time, there were no norms for Cogstate testing for patients under the age of 5. However, in our follow-up study, we are including patients who are 3 years old and up at enrollment. Additionally, we are capturing neuropsychological data at the end of treatment for patients who were younger than 3 years old at the time of initial diagnosis with ALL.

Statistical analysis
Descriptive statistics were used to report the patient's baseline characteristics, cognitive assessments, and distribution of impairment. Power analysis concluded a sample size of 100 would allow us to detect a minimum change in scores of 0.28 SD with 80% power at a significance level of 0.05. However, only 43 participants enrolled on this companion study when the treatment protocol was closed to accrual; with this sample size of 43 participants, we had power to detect a change from baseline of 0.44 SD.
Two-sample t-tests were used to compare the changes of cognitive scores from baseline to CNS therapy (i.e., approximately 3 months since baseline) as well as from baseline to post-treatment (i.e., approximately 1 year since baseline). Wilcoxon rank-sum tests were used instead for skewed outcomes. Specific hypotheses were evaluated using twosample t-tests, for example, the changes in cognitive performance between boys and girls and between age groups; CNS prophylaxis-irradiation; and the median household income by patient's zip code. Analyses were based on a priori hypotheses; therefore, we did not compensate for multiple comparisons.
Exploratory analyses were carried out on the profiles of cognitive performance over time using latent class growth mixture modeling (LCGMM). LCGMM is designed to model each child's unique growth curve over time, often used to find similarities between growth patterns by fitting a unique growth trajectory per child. It is part of the broad literature of growth mixture models. We explored whether or not children experienced an initial drop in neurocognitive performance, as well as additional analyses examining the individual rate of recovery or decline over time. The subgroups of children's growth trajectories may be sufficiently similar so that these growth patterns can be aggregated into a common trajectory shared by members of the sample subgroup. The main purpose of LCGMM was to identify a subgroup of children whose growth pattern diverges early from most children in the sample. LCGMM uses all available observations per person, assuming missing at random. While we were also interested in linking the patterns of growth trajectories against known risk factors, our sample size was not large enough for these sub-analyses. Due to the limited sample size, no missing imputations were conducted. The latent class growth mixture modeling (LCGMM) uses all available observations per person, assuming missing at random.

Results
Forty-three participants consented to participate on this neurocognitive sub-study ( Fig. 1). Twelve came off study prior to time point 6 for the following reasons: induction failure, n = 3; CNS and/or marrow relapse, n = 3; death, n = 2; intercurrent illness preventing completion of Cogstate testing, n = 2; loss to follow-up or transfer to another institution, n = 2. These 12 patients completed a median of 2.5 Cogstate points (range 1-5). Of the 31 patients who remained on-study until 1 year after completion of chemotherapy, 22 (71%) completed all time points 1-6, 6 (19%) completed 5 of 6 time points, and 3 (10%) completed fewer than 5 time points.
Review of the completion rates indicates that 100% of subjects completed the detection, identification, and oneback tests in full. Completion rates were slightly lower for the CPAL (95%) and GMLT (88%). Performance checks exceeded 90% for all tests in the battery aside from detection (88%). Table 2 presents the descriptive statistics of demographic and clinical characteristics of the total sample of N = 43 children. The patients' age ranged from 5 to 19 years with a median age of 9 years; 28 (65.12%) were male; 29 (67.44%) were categorized as high risk determined by NCI criteria and biological features; 33 (76.74%) received intrathecal chemotherapy only as prophylaxis against CNS relapse while 10 (23%) also received 1200-1800 cGy cranial irradiation for CNS prophylaxis due to protocol criteria (T-lineage disease, CNS positivity at diagnosis, or very high risk  Table 3 summarizes the raw score means and Fig. 2 displays the distribution of the patients' Cogstate brief battery cognitive assessments over time at baseline (i.e., start of induction), consolidation I (5 weeks), CNS therapy (7 weeks), consolidation II (18 weeks), continuation (62 weeks), and post 1-year treatment (161 weeks), On average, patients consistently scored near the norm with modest fluctuation in distribution over the treatment phases. Comparatively, across the assessments, CPAL, DET, and GML observed larger outliers compared to IDN and ONB. Table 4 shows the mean difference in raw scores from induction to (1) CNS therapy and (2) Table 5 displays the proportion of patients who performed either within the impaired range (as defined by being 2 SD below the population age mean) or outside of normal limits (defined as being below 1 SD of the population age mean) over time. One year after completion of chemotherapy (time point 6), 15 of 29 participants (52%) had z-scores one or more SD below age norms on one or more Cogstate subtest. Ten of 29 (34%) exhibited abnormal neurocognitive function with z-scores at least 2 SD below age norms in at least one subtest. Descriptively, a higher proportion of subjects performed within the impaired range, more than 2 SD below the population mean, during the post 1-year treatment than at prior time points. Each of these proportions shown at T6 in Table 5 exceeds the normative distribution expectation  of 2.3% to be below 2 SD of the mean. The proportion outside of normal limits, i.e. more than 1 SD below the population mean, at T6 were descriptively similar to earlier time points with the exception of GML and ONB. Figure 3 shows the results of the LCGMM analyses. Our goal was to identify classes of children based on longitudinal patterns of Cogstate scores. Noting the relatively small sample of available participants, growth curve models were optimized in both parsimony and fit within the specification of a two-class model. Based on initial model testing, we included age, sex, risk group, CNS prophylaxis, irradiation, and median income in patient's zip code as covariates.

Subject-specific trajectories of Cogstate scores
To determine the appropriate model, we examined the Bayesian (BIC) and Akaike (AIC) information criterion. We sought a model with lower values for the criterion indices. The final model included sex and CNS prophylaxis group.

Discussion
We previously demonstrated the feasibility of using Cogstate to assess baseline neurocognitive functioning during the first month of induction therapy for childhood ALL at multiple institutions within the DFCI ALL Consortium. This follow-up report expands on those findings, demonstrating the feasibility of using the same computer-based battery longitudinal assessment of neurocognitive functioning at repeated time points during treatment and 1 year after completion of planned therapy. The appropriateness of this approach for use in the context of a large multi-institutional clinical trial was illustrated by a high rate of consent/assent by eligible patients and families, as well as high completion rates (88-100%) and performance checks (88-100%) for all subtests. These findings indicate a very high level of tolerance of the cognitive battery and that the tests provided a valid measure of neurocognition. Consistent with prior reports [12,13], more than a third of participants exhibited abnormal neurocognitive functioning more than a year after completion of planned chemotherapy, indicated by Cogstate test scores more than 2 standard deviations below age-matched means on one or more subtest. At this time point, all participants have recovered clinically from any acute or subacute toxicity related to chemotherapy (e.g., nausea or fatigue). Any observed neurocognitive deficits are therefore considered a persistent adverse effect of medical treatment for leukemia, consistent with CICI.
Because the purpose of this pilot study was to demonstrate feasibility of longitudinal monitoring of neurocognitive functioning using Cogstate, the study was not powered to identify predictors of persistent neurocognitive deficits. Nevertheless, some interesting patterns were observed through our analysis of changes over time using latent class mixture modeling. As the graphs of standardized z-scores for the 5 Cogstate subtests over the 6 assessment time points illustrate, the cohort overall appears to be generally performing within the average range. The Groton maze learning (GML) task appears to be the most sensitive to change over time. Specifically, the GML task identified that those  Table 4 Mean raw change score comparisons across baseline characteristics The means (SD) and Cohen's d were reported for two-sample t-tests; median [IQR] and r were reported for the rest using Wilcoxon rank-sum test pending on normality per group level patients between the ages of 5 and 8 years older performed worse over time than those aged 9-19 between induction and the CNS phase (p = 0.004) as well as when change was also analyzed from induction to 1-year post-treatment (p = 0.014). Additionally, females performed worse over time than males over the identical time frame (p = 0.024) on the same GML task between induction to the CNS phase.
Of note, there was also a trend for females to perform worse over time than males over the same time frame on the CPAL task between induction to the CNS phase (p = 0.054). Lastly, those high-risk patients who received 1200 cGy cranial irradiation in addition to intrathecal methotrexate performed better over time between induction and the CNS phase on the identification task; however, this significant difference was in the opposite direction as expected and may reflect a false positive finding (p = 0.08). These data suggest that with a larger cohort, analysis using latent class mixed modeling will identify those patients with neurocognitive difficulties early in therapy, when a proactive intervention might prevent further treatment-related neurocognitive impairment.
Two primary limitations detract from the clinical significance of this work. First, while Cogstate has independent construct validity and correlates with traditional measures of cognitive functioning, it has not yet been demonstrated that changes from baseline during leukemia treatment predict persistent deficits in the survivorship period. Second, the absence of Cogstate norms for young children required that we exclude children younger than age 5 years, thus diminishing our ability to demonstrate feasibility of this approach specifically among the age group where childhood leukemia is most frequently diagnosed.
Both of these limitations will be addressed in an ongoing, prospective study (ClinicalTrials.gov Identifier: NCT03020030), which includes longitudinal assessments of cognitive function using Cogstate during treatment for childhood leukemia and traditional neurocognitive testing 1 year after completion of chemotherapy. With anticipated accrual of 560 children being treated for leukemia at 8 institutions within the DFCI ALL Consortium, this study is powered to describe the relationship between early changes in neurocognitive function detected by LCMM analysis of Cogstate data and neurocognitive impairment 1 year off therapy on traditional cognitive assessments. In addition, we will be able to identify predictors of treatment-induced neurocognitive impairment, by studying biomarkers in cerebrospinal fluid indicative of neurotoxicity, genetic variants associated with increased susceptibility, and social determinants of health such as household material hardship. Finally, a larger proportion of children with ALL will be included in this study: with newly available Cogstate norms, we are able to expand eligibility for Cogstate testing down to age 3 years at the time of study entry, and all children will be eligible for neurocognitive testing after completion of chemotherapy, regardless of age at initial diagnosis.

Conclusion
With this pilot study, we demonstrate that Cogstate, a computer battery of cognitive assessments, can be reliably utilized to characterize changes from baseline in neurocognitive functioning during and after treatment for childhood ALL. Applying this approach in a larger, prospective cooperative group trial, we anticipate being able to identify which children with ALL are most susceptible to treatment-induced neurocognitive impairment, early during the 2 years of treatment, when a proactive intervention might prevent this toxic sequela of curative therapy.
Funding This work was funded in part by the National Institutes of Health/National Cancer Institute: NIH/NCI R21-CA187226.
Data availability In compliance with the journal's policies on sharing research data, and in concordance with disciplinary norms and expectations, all raw data collected using Cogstate and all de-identified clinical parameters referenced in this manuscript will be made available on request to the corresponding author.

Competing interests
The authors declare no competing interests.