Outcome Measures and Statistical Methods
To address the objective 1, we measured the a) the proportion of consenting participants to eligible participants over the 6-week recruitment period, b) the proportion of participants attending 10 intervention sessions to the number of participants allocated to the intervention, and c) the proportion of participants with incomplete outcome data to the number of participants allocated to intervention.
To address objective 2, we measured a) the proportion of participants who were able to tolerate at least 80% of the planned sessions b) the mean length of participants’ VR experience in minutes per session, c) number of times each type of negative behaviour observed and the proportion of participants who experienced each negative behaviour, d) number of times each type of positive behaviour was observed and the proportion of participants who experienced each positive behaviour, e) number of Adverse Events (AE) during the intervention period, f) health care resources used (i.e. the number of participants requiring transfers to emergency, number of participants requiring one-to-one staff use, psychotropic drug prescription, and change in Euro-Qol 5-Dimention [EQ-5D]) to indirectly indicate their tolerance to VR therapy.
According to the Need Driven Behaviour Model [19], an individual with dementia expresses his/her physical/emotional needs or exhibits his/her dementia symptoms through physical/verbal expressions/gestures. As per this model, we reported the following behaviours as “negative”: agitation, wandering, hitting (including self), kicking, grabbing onto people, pushing, throwing objects, biting, scratching, spitting, hurting self or others, tearing objects or destroying property, making physical/verbal sexual advances, inappropriate dressing or disrobing, intentional falling, eating/drinking inappropriate substance, handling objects inappropriately, hiding objects, hoarding objects, performing repetitive mannerisms, screaming, cursing or verbal aggression, repetitive sentences or questions, strange noises (weird laughter or crying), complaining, constant unwarranted request for attention or help, pulling away/walking away, perseveration of word/repetitive talking, raising tone of voice, resisting, and not eating. We labeled the following behaviours as “positive”: being seated still, being focused, sleeping, being calm, smiling, and communicating verbally/non-verbally, based on staff members’ experience with the residents at Henley Place LTC.
We defined AE as any event due to being in a VR therapy session leading to emergency transfer, hospitalization, death, a persistent or significant incapacity or substantial disruption of the participants’ ability to conduct the activities of daily living regardless of a causal relationship with VR therapy.
Transfers to emergency and one-to-one staff usage indicated acute decline or worsening of a resident’s health. For instance, Henley Place may provide a resident one-to-one staffing if he/she causes another resident a physical incident or a sexual incident; becomes physically aggressive (non-manageable) with drug and other therapy; has history of a similar behaviour that ended up on having one-to-one staff usage in the recent past; is in risk of potential self-harm; exhibits an agitated aggressive behaviour or a wandering behaviour; and is at risk of being victimized by other residents.
Psychotropic drugs refer to any drug affecting mental processes and behaviour [20]. Psychotropic drugs include, but are not limited to, antipsychotics, antidepressants, antianxiety drugs, mood stabilizers, anticonvulsants, and hypnotics [20].
The EQ-5D scale is a generic (disease non-specific) quality of life tool, insightful in identifying which dimensions of health are most affected by a given condition or treatment [21]. The tool has five domains: mobility, self-care, usual activities (e.g. work, study, housework, family or leisure activities), pain/discomfort, and anxiety/depression with five possible answers for each domain (1 = no problem, 2 = slight problem, 3 = moderate problem, 4 = severe problem, 5 = unable to do/extreme problem) [21]. EQ-5D is feasible for individuals with dementia [22].
Either PI or RA recorded and labelled participants’ behaviours during the VR sessions based on her direct observation, completed VR therapy records and AE forms, and archived the number of emergency transfers, one-to-one staff usage, and psychotropic drug prescription from the participant’s medical chart. The Henley Place staff members completed the EQ-5D scale.
To address objective 3, we reported a) the number of times a library item was selected, b) whether the SDM was present during the VR sessions, and c) factors that were observed to be enabling or d) disabling during the VR experience from the VR therapy records completed by either PI or RA.
To address objective 4, we collected the pre-to post (before any VR therapy took place [baseline] and at the end of second week of therapy) change of BPSD with Cornell Scale for Depression in Dementia (CSDD), Cohen-Mansfield Agitation Inventory (CMAI) scale, proportion of night-time sleep, and Dementia Observation System (DOS) scale. We reported the scales’ sensitivity to change with Effect Size (ES).
BPSD are usually measured with subjective psychometric tools, originally developed to rate feelings or opinions or attitudes [23]. A systematic review located 83 BPSD tools focusing either on depression (n = 46) or irritability (n = 37) or non-aggressive agitation (n = 26) or anxiety (n = 22) or hallucination (n = 21) or delusion (n = 20) or wandering (n = 22) or apathy (n = 17) or sleep problems (n = 14) [23]. According to Linde and colleagues [23], the frequently used BPSD tools for older adults in clinical settings are the Cambridge Mental Disorders of the Elderly Examination (CAMDEX) [24], the Geriatric Mental State Schedule (GMS)/Automated Geriatric Examination for Computer Assisted Taxonomy (AGECAT) [25], the Apathy Evaluation Scale (AES) [26], the Geriatric Depression Screening scale (GDS) [27], the Neuro Psychiatric Inventory (NPI) [28], the Cornell Scale for Depression in Dementia (CSDD) [29], and the Cohen-Mansfield Agitation Inventory (CMAI) [30].
We did not select CAMDEX and GMS/AGECAT for our pilot as they are predominantly diagnostic tools, irrelevant to our objective. We also did not select AES and GDS as they are self-reported, which is not suitable for our participants. Even though the NPI is a validated clinical tool designed explicitly to provide a comprehensive evaluation of BPSD [31], raters' tight work schedule can deviate the original NPI protocol (e.g., an arbitrary evaluation of symptoms is made based on the general domains instead of using sub-questions) [32] or can lead to a possible recall bias (e.g. rating is based on retrospective information [one month]) [33, 34]. Therefore, considering tight work schedule of LTC staff members and possible recall bias, we did not select NPI.
We selected CSDD as it is feasible for those with advanced dementia [29]. The 19-item CSDD detects depression in dementia, and includes five domains: mood, behaviour, physical signs, cyclic function, and ideation, from interviews with a caregiver [29]. Each item is rated for severity based on symptoms occurring during the week before the interview on a scale of 0–2 where 0 indicates no symptoms and 2 indicates severe symptoms [29].The interrater reliability (k = 0.67) and internal consistency (α = 0.84) of the instrument is high [29]. The association between CSDD and Research Diagnostic Criteria Depression (a scale measuring similar construct) also is strong (r = 0.83, p < 0.001) [29].
We selected 29-item CMAI tool as it is applicable for LTC residents [23]. The CMAI tool assesses the agitated behaviours within four components: 1) Physical Aggressive (PA), 2) Physical Non-Aggressive (PNA), 3) Verbal Aggressive (VA), and 4) Verbal Non-Aggressive (VNA) [30]. Each behaviour is rated on a 7-point scale of frequency, ranging from the resident never manifesting the behaviour (1) to manifesting the behaviour several times an hour (7)[35]. The scale is reliable (test-retest reliability coefficient = 0.830; p < 0.001) [36] in individuals with dementia and has demonstrated construct validity (i.e. strong association with Agitated Behaviour in Dementia, r = 0.62; p < 0.001) [37].
Sleep disturbance (e.g. difficulty falling asleep, repetitive sleep awakenings, and waking up early) is a risk factor for developing depressive symptoms [38], apathy [39], and aggressiveness [40] in dementia. Sleep disturbance may also indicate comorbidities [41]. The association between sleep disturbance with Pittsburgh Sleep Quality Index [42] and NPI-Apathy domain is moderate (r = 0.38; p < 0.01) [43]. Thus, proportion of night-time sleep indirectly predicts BPSD [43]. Since Henley Place routinely records residents’ night-time sleep as a proportion of the total expected asleep time (8 hours, from 10:00 pm to 6:00 am each night), we opted to measure sleep using methods already in place.
We selected DOS [44] as it is designed to be completed by LTC staff members. The scale evaluates objective and accurate data about an individual’s behaviour throughout each 24-hour cycle over a period of several consecutive days to identify patterns, trends, contributing factors and modifiable variables associated with BPSD [44]. The rater record duration of observed behaviours (sleeping, awake/calm, positively engaged, repetitive vocal and motor expressions, and sexual/ verbal/physical expression of risk) using number of half an hour blocks and colour codes [44].
We invited the SDM to complete the CSDD, CMAI, and DOS scale. However, the SDM felt that the regular staff (PSW and Registered Practical Nurses [RPN]) could more accurately complete these tools since they provide care for the residents 24/7. Caregivers (PSW and RPN) were trained to complete the above-mentioned tools. Scales were completed at baseline, at the end of first week, and at the end of second week of therapy. The PI extracted from their medical chart the participants’ proportion of night-time sleep at baseline, at the end of first week, and at the end of second week of therapy.
Sensitivity to change of a tool is its ability to detect change (signal over noise) regardless of whether the change is meaningful to the clinician or decision maker [45]. The most appropriate statistic for sensitivity to change remains a matter of debate [46]. Usually, for a single group index, sensitivity to change is reported using Standardized Response Mean (SRM) or ES [47]. The SRM or ES analysis is based on the assumption that the participants are homogenous at the baseline and may exhibit a change by approximately the same amount over the study period [48]. The SRM refers to the ratio of mean change scores (𝛿x = x2 – x1) to the Standard Deviation (SD) of change scores (SD 𝛿x) [49]. The ES is expressed using a ratio of mean change scores (𝛿x = x2- x1) to the SD of the baseline scores [49]. For both statistics, if change has occurred, a value greater than 1.0 indicates that the instrument is sensitive to change. We selected ES to report the sensitivity to change as ES to determine the sample size [49] for future research and to facilitate comparison between studies in future meta-analyses [50].
To address objective 5, we explored the association of pre-to post change scores between week 1 and pre-intervention and again between week 2 and week 1 for the CSDD and CMAI with the GRC using Pearson’s r (rho) or coefficient r on SPSS© version 26. The GRC scale captures an individual’s perspective (in this case, participants’ caregivers) regarding their change in health condition (in this case, depression and agitation) [51]. The scale quantifies the change (from a small, unimportant change to a very great deal of change) using scores 0 to 7 (0 = no change, + 1 to + 7 = a perceived improvement in condition, and − 1 to -7 = a perceived deterioration in condition) [51]. We classified GRC as a lot worse (GRC=-7), moderately worse (GRC=-4, -5), minimally worse (GRC=-1, -2, -3), stable (GRC = 0), minimally better (GRC = 1, 2, 3), moderately better (GRC = 4, 5), and a lot better (GRC = 6, 7). We had two outliers for GRC-depression (GRC=-4 [n = 1] at the end of the first week and GRC = 7 [n = 1] at the end of the second week). On the same note, we had one outlier for GRC-agitation (GRC=-4 [n = 1] at the end of first week and second week).
To demonstrate longitudinal construct validity, we expected that a GRC rating of 0 would be associated with little to no change in the CSDD/CMAI (i.e. a change score of 0). Considering the short period of intervention (two weeks), we did not expect many participants to experience very large changes, thereby reducing the breadth of the scale and reducing the magnitude of the association. Thus, our a priori hypothesis for the correlation between CSDD/CMAI and GRC was weak to moderate. The categorization of the strength of correlation using coefficient r was strong when r ≥ 0.6, moderate when r = 0.3 to 0.6, and weak when r ≤ 0.3 [49].