Background & rationale {6a}
Approximately 418,000 people live in care homes in the UK, yet accessible, reliable data on UK care homes, their residents, and staff are lacking. The dearth of accessible, high quality data has been highlighted previously, but was starkly exposed in the recent and continuing COVID-19 pandemic (2). Information about care home capacity, staffing, health and social care needs and resident demographics are each required in order to inform resource allocation and meet their care needs. Administrative data (e.g. UK Office of National Statistics census) provides information about age, sex and demographic change in care home population over time, but cannot be readily linked to the long-term health, function or quality of life of individual residents. Length of stay, life expectancy and mortality of the care home population are not reliably known. Large cohort studies of older adults give much richer health data, but the proportion of care home residents in such studies is low (3, 4). For example, Cognitive Function and Ageing Studies (CFAS) reports on 543 residents, English Longitudinal Study of Ageing (ELSA) reports on 303 residents (5, 6). Internationally, large care home datasets are available, for example through insurance schemes in private healthcare systems. However, with any routinely collected data there are always concerns over data quality and for many of these registers the data collected speak to a certain purpose only, and may not contain the most relevant clinical information. In addition to problems sourcing data about residents, it is also difficult to find consistent information about the fragmented care home market, including staffing (ratios and retention), case mix, funding mix, and ownership. The lack of publicly available, national data on the care home sector is detrimental to those who live and work there. By failing to quantify the needs of those requiring care and their journey before entering care homes, local and national planning for the care needs of the ageing population living with dementia, multimorbidity and frailty is impaired (7). For example, it is estimated that care home capacity will need to expand to facilitate care for those with complex needs to receive care at the end of their lives (8, 9). However current staffing, funding source resident pathways to care and capacity to provide care is unknown.
Large randomised controlled trials (RCTs) conducted solely in care homes are a growing resource (10), collecting detailed information about every care home and resident they recruit. While these RCTs may focus on a variety of health/care topics (e.g. falls risk, medication management, nutrition, or infection) from the study team’s experience of working with various care home trials, we know that there is much overlap in outcome measurement, and information collected on both residents and the care home structure. Trials in care homes monitor participants regularly, often for up to one year. Outcome measures, health resource use, and clinical events as well as care home characteristics can therefore be tracked over this period, allowing for longitudinal analysis. Secondary analysis of individual participant data (IPD) allows for more complex and flexible analyses than is possible with only summary-level results. Whilst single care home trial datasets are valuable, if IPD from existing trials could be pooled, they would collectively provide a much larger, richer dataset on residents and staff of care homes. Repurposing care home trial data would permit rapid synthesis of large IPD through which to generate evidence based on high quality data. This principle aligns with current moves towards improving efficiency and reducing research waste (11); a theme of increasing importance to funders and peer reviewers. Pooled IPD would permit exploratory analysis to better understand the care home population, reduce duplication of effort, and refine and pilot future research questions. The International Committee of Medical Journal editors has reiterated its commitment to improve trial transparency by sharing IPD from RCTs and registries(12), and strive to normalise the sharing of de-identified trial data (13). UK Clinical Trials units have also signalled their support (14) and all trials started after January 1, 2019, must include an IPD sharing plan in their trial registration (13).
Data repository models
Generic data repositories such as www.figshare.com and www.datadryad.org are available to access IPD from single trials. To allow data from multiple trials to be pooled into a single source within a secure data infrastructure, we will replicate the model developed by the Virtual Trials Archives (VTA) (15). VTA was established in 2001, bringing together multiple, large, international data sets from completed clinical trials on stroke research (16, 17). It has since expanded to include two additional repositories in areas of cardiovascular and cognition (VICCTA), and renal transplantation (VIRTTA) (18). VTA is a not-for-profit collaboration, with datasets hosted by the Robertson Centre for Biostatistics (RCB) at the University of Glasgow, UK. The VTA facilitates a wide range of empirical and methodological research including recent projects on test accuracy(19), psychometrics(20), prognosis(21), and trial design(22). Unlike with a traditional IPD meta-analysis (23, 24), a key tenet is that data should be used for novel research and not to test original hypotheses from contributed RCTs. Investigators can access data by submitting a research proposal on the VTA website. Following approval by the relevant repository Steering Committee (a virtual collaboration of the original trialists), data extraction is tailored to the specific research question, and the requesting investigator is granted access to analyse the bespoke data extract on a secure analysis platform. On completion, the anonymised data extract is archived centrally. The VTA is funded by administrative charges per data request, which supports data curation, storage, continued development and day-to-day administration of the resource. VTA has a well-established governance infrastructure, with ability to host data securely on a working data-sharing platform, and expertise to manage future trial inclusion and data access requests. To enable the care home trial repository to operate on a long-term basis, we are working closely with the VTA from the outset. Once operational, the repository will formally migrate to the VTA, where it will be named the Virtual International Care Homes Trials Archive (VICHTA).
This protocol describes the creation of a care home trial repository as part of a funded project (the Developing research resources And minimum data set for Care Homes' Adoption and use (DACHA) study; hereby described as the ‘development stage’), and also outlines plans for operation of the VICHTA repository that will be accessible beyond the DACHA study (hereby described as the ‘operational stage’). Our aims are to create a repository of IPD from RCTs conducted in UK care homes; and use the repository data to conduct analyses to inform a care home minimum dataset relevant to the UK context (25).
Study objectives {7}
Development stage (DACHA study)
-
Create a repository of individual participant data (IPD) from trials conducted in UK care homes since 2010
-
Set up a Trialist Steering Committee, who will oversee data sharing and remain gatekeepers of their own trial data
-
Compare the pooled IPD with point estimates from administrative sources to assess generalisability of RCT data
-
Identify key resident characteristics and outcomes from within the trial repository, which could inform a national minimum dataset for care homes
Operational stage (Virtual International Care Home Trials Archive, VICHTA)
-
Enable new trials to be added to the repository beyond the DACHA project duration, including those from non-UK settings
-
Make pooled IPD available to external researchers to allow future secondary analysis
Study design {8}
There will be four phases:
-
Phase 1: Identifying trials and establishing the Trialist Steering Committee (TSC)
-
Phase 2: Creating the repository, preparing data and pooling individual trial datasets
-
Phase 3: Analysis of pooled data to inform DACHA study objectives
-
Phase 4: Preparing for migration to Virtual Trials Archive and operation as VICHTA
Phase 1: Identifying trials and establishing the Trialist Steering Committee
To be included in the proposed repository, trials must meet the following eligibility criteria {10}:
-
Examination of any intervention conducted exclusively in an adult care home setting
-
Minimum dataset of 100 participants
-
Completed since 2010
-
Trial conducted in UK {9)
-
Documented entry criteria
-
Documented participant consent or assent following Health Research Authority approved procedure
-
Monitoring procedures exist to validate data
Internationally there is significant heterogeneity in the terminology used in practice and research to describe the settings in which long-term care is delivered (26, 27). We have used the term ‘care home’ to describe care facilities that provides 24-hour care to their residents, including those with and without on-site registered nursing staff.
Identifying trials
A scoping review identified potential care home trials for inclusion. As part of preparatory work, we contacted a small number of trialists who had completed RCTs in UK care homes to date. Based on provisional agreement from five of these trialists, we anticipate the repository will initially combine trial data for over 4200 residents from 250 care homes across the UK. Through an ongoing scoping review, we have identified a further thirteen potential trials, representing an additional 6000 residents from approximately 500 care homes. We anticipate this will increase further as the project develops. Additional trials will be identified through an ongoing Google Scholar alert, systematically through concurrent reviews (Prospero: CRD42020155923), by contacting all trialists listed in the NIHR “Advancing Care” Themed Review (10) (44 studies featured), the CLAHRC National Work stream Report (28) (32 studies featured), and snowballing techniques utilising the DACHA project management team, study steering committees, and their professional networks.
Approaching/inviting trialists to share their data
We have created a database to track potentially eligible trials, where we will record how IPD are requested, collected, and managed, and log of all contact with trialists. We will write to original trialists explaining the purpose of the repository and how it will operate. A reminder email will be sent two weeks after the initial contact if the trialist has not responded. If the trialist declines or does not respond, we will log this dataset as unavailable. Following a positive response, we will set up a meeting (phone, Zoom, or face to face depending on trialist preference) to outline the project in more detail. If a trialist agrees to participate, they will be asked to sign a data transfer agreement that covers the transfer, use and storage of their trial data (see Terms of Reference, Supplementary Index 1).
Establishing Trialist Steering Committee (TSC)
Contributing trialists will make up the TSC, to oversee sharing, combining and repurposing of the pooled trial data. While day-to-day co-ordination will be led by the DACHA co-ordinator at University of Hertfordshire (LI) and latterly the Virtual Trials Archive (MA), the TSC will agree on Terms of Reference for the collaboration, including the approval process for data requests, and will have the ultimate responsibility for all decisions regarding strategy, confidentiality, scientific matters and determining publication policy. This system mirrors the VTA, to which the care home repository will ultimately migrate.
The main role of the TSC during the DACHA-funded phase will be to provide advice on trial specific details to aid with the pooling of datasets and better understanding of original data. Key information will be drawn from the original trial protocol, funders report, and standard study documentation such as case report form templates and statistical analysis plans, but if any issues are not dealt with from those sources, we will seek clarification from the original trial team.
Phase 2: Creating repository, preparing data and pooling individual trial datasets
Contributing trial data to repository
Once an agreement has been made to contribute data, trial data managers (e.g. within Clinical Trials Units (CTU)) will be engaged to prepare datasets. As standard practice with individual participant data sharing models (29), only completely anonymised data will be held in the repository, to minimise the risk of reidentification. We will request that all data received will be fully de-personalised (such as converting ‘date of birth’ to ‘age at randomisation’). Full instructions on de-identification and how to transfer securely will be provided if necessary.
Additional documents to support datasets will be requested, including the trial protocol and data dictionary. Optional supporting documents will include blank, annotated case report forms, statistical analysis plans, relevant published outputs or grey literature about the trial. We will request evidence of ethical approval and consent procedure (e.g. blank consent and/or assent forms).
Repository Data storage
The Virtual Trials Archive team have developed a DACHA data contribution form (15) where trialists can record information about the trial and complete memorandum of understanding. Following this, the trial dataset and all accompanying files will be transferred in a zipped, password protected folder to University of Glasgow (UG)’s Robertson Centre for Biostatistics (RCB), using the University of Glasgow’s File Transfer Protocol, where it will be held securely for the duration of the DACHA study and beyond. As it does for other VTA repositories, the RCB will act as an independent data host, providing common format and access mechanisms. All data will remain on their server and analysed through their secure analysis platform. During the development stage, access to the data will be restricted to the core team (LI, JB & MA), who have undergone necessary data protection and confidentiality training. At the end of the DACHA project, the VTA will act as custodians of the data under the terms of the data transfer agreement.
Data preparation and quality checks
When trial data are submitted to the repository, the DACHA co-ordinator (LI) at University of Hertfordshire (UH) will access the server remotely via secure virtual private network. A data checking analysis plan will be developed, outlining procedures and decision rules for data pooling, according to established principles (29). We will query any anomalies, including checks for invalid, out-of-range, or inconsistent items with the trialist (or their nominated study contact) to ensure that the data are represented accurately. Trials may use the same outcome measure but administer it differently. If a measure could be completed e.g. face-to-face with a member of the research team, or as self-report, or as proxy-response from care staff, we will ensure this data is coded in a standardised way. Decisions on standardisation will be made by consensus decisions with the wider TSC or delegated groups e.g. trial statisticians. Where possible, we will request all individual domain levels for outcome measures as opposed to the single, composite scores. All trial datasets will be cross-checked against their respective protocol and statistical analysis plan to confirm how each composite outcome was derived. If the scoring was modified, we will seek clarification from the respective trialists in the TSC for their advice and interpretation on whether the composite outcome data should be removed or amended to enable pooling with other trial datasets. We will record the number and timing of measurement points and ensure all timepoints are labelled consistently.
We anticipate there will be a strong opportunity for methodological research to look at groups of measures, e.g. cognitive assessments, to attempt mapping or potentially harmonising similar variables (30, 31). We would encourage external researchers to look at this in the operational phase, however in the development phase we will not attempt to harmonise non-matched data.
We anticipate most RCTs with an economic evaluation component will use a variant of the Client Service Receipt Inventory (CSRI) (32) to record information on resource use and costs alongside the trial. We will request all health service use questionnaires used in the trials and look for differences which may potentially impact findings. Due to differences in price years and interpretation of unit costs, we will focus on resource use (e.g. number of GP contacts) as opposed to costs (e.g. total cost of GP contacts over the follow-up period). We will request datasets to include missing values where possible, and not the imputed values. In developing the repository, we will not perform any missing data imputation.
Database of trial summaries
We have collated aggregate data available in each trial (generated through protocol papers and funders reports) and will build on this database as new trials are published. A summary of available data will be published on the VTA website, allowing viewers to identify what outcome measures have been collected multiple times, how care home characteristics have been recorded, and contextual aspects of each trial e.g. sample size and follow-up points.
The repository will host trials with a range of clinical focus – it is therefore likely that some measures will be unique to single trials. However, a combination of several key outcome measures – e.g. Barthel; MMSE; EQ5D and DEMQoL (33-36), are used in almost all RCTs conducted in care homes. Additionally, clinical indicators such as hospitalisations, falls, and death rates are routinely reported (see Appendix 2: Examples of data available from each trial.)
Phase 3: Analysis of pooled data to inform DACHA study objectives
When the initial set of trials have been added and variables prepared for pooling, we will temporarily lock the repository to allow two pre-specified analyses:
-
Identification of key resident characteristics and outcomes from within the trial repository, which could be used to inform the development of a minimum dataset (MDS) for care homes
-
Comparison of the pooled individual participant data with point estimates from administrative sources to assess the generalisability of RCT data
We will prepare a detailed research plan for each analysis, outlining the purpose of the request, objective/research question, plan for statistical analysis, and repository variables requested. This research plan will then be circulated to the Trial Steering Committee for approval, as per future data requests from external analysts.
Informing development of a prototype minimum dataset (MDS) for care homes
Briefly: We will expand focus on what clinical, demographic, and outcomes data from trials may be appropriate to include in a care homes MDS framework. We will categorise outcome measures to broad areas, e.g. cognition, anxiety & depression, pain, mobility, activities of daily living (ADLs), and specific clinical measures, and will focus on pre-specified outcome measures, in part identified through existing work on evidence reviews (Prospero: CRD42020155923 and CRD42020171323). This identification and critique of relevant outcome measures within existing trials will help inform the development of a prototype MDS (25). We will develop a quality assessment criterion to assess proposed outcome measures in terms of:
-
what has been measured – baseline, processes of care, outcomes
-
how data were collected (resident notes, researcher observation/assessment, use of routine data sources)
-
completeness of the data and where data are incomplete, what is the nature of this (i.e. death, unavailable, withdrawn consent, unable to complete, unclear)
-
where outcomes are measured across multiple studies, what are the range of values
-
where outcomes are measured over time, what is their sensitivity to detect change
-
what information may be derived from collected data, e.g. comorbidity scoring based on medication usage
Generalisability of trial data
Briefly: We will conduct an evidence synthesis of key care home demographic information, by collating data from administrative sources e.g. UK Census, Care Quality Commission. We will report baseline characteristics about care homes and residents as derived from all pooled trial data, tabulated for each individual trial and the pooled dataset. We will then compare point estimates from administrative sources with point estimates from the pooled IPD trial data, to evaluate how generalisable the repository data is, compared to alternative data sources.
Phase 4: Preparing migration to Virtual Trials archive
The VICHTA repository will be a legacy output of the DACHA project – a valuable source of high-quality, anonymised, individual participants’ data (IPD) to inform the development of future research, testing of hypotheses and optimisation of study design issues. We took an early decision to store all trial data solely on the University of Glasgow secure server, where the VTA is also stored. This means the repository will already have a permanent ‘home’ when the DACHA study ends. Management of the repository will be transferred from the DACHA team at University of Hertfordshire (LI, CG), to the VTA team at University of Glasgow (principally the VTA co-ordinator, MA). The VTA will maintain and update the VICHTA repository, and manage requests to access its data, in conjunction with the existing TSC.
Following formal migration to the VTA, external researchers may apply for data extracts, by submitting a project proposal (for review and approval by the TSC) and agreeing to predefined VTA data sharing terms and conditions (See Appendix 1). At the proposal stage, TSC members may declare an interest in joining the analysis team of a proposed project and take an active role, thereby meeting ICMJE criteria for authorship. All completed analyses will be forwarded to the TSC before submission for presentation or publication for review (see Data Processing Flowchart). The TSC is acknowledged on all publications using “on behalf of VICHTA collaborators” by-line. Active involvement from each TSC member is encouraged but not essential, as data request decisions will be made by a quorum (See Appendix 3: Summary of Development and Operational Phases).