Cognitive Impairment and Dementia Data Modelling

: Recently, a huge amount of data is available for clinical research on cognitive diseases. A lot of challenges arise when data from different repositories should be integrated. Since data entities are stored with different names at different levels of granularity, a common data model is needed, providing a unified description of different factors and indicators of cognitive diseases. This paper proposes a common hierarchical data model of patients with cognitive disorders, which keeps the semantics of the data in a human-readable format and accelerates interoperability of clinical datasets. It defines data entities, their attributes and relationships related to diagnosis and treatment. The data model covers four main aspects of the patient’s profile: (1) per sonal profile; (2) anamnestic profile, including social status, everyday habits, and head trauma history; (3) clinical profile, describing medical investigations and assessments, comorbidities and the most likely diagnose; and (4) treatment profile with prescribed medications. It provides a native vocabulary, improving data availability, saving efforts, accelerating clinical data interoperability and standardizing data to minimize risk of rework and misunderstandings. The data model enables the application of machine learning algorithms by helping scientists to understand the semantics of information through a holistic view of patient.


Introduction
Cognitive diseases are disorders of brain function that primarily affect cognitive abilities. While they can be the result of many medical conditions like a single or repeated head injuries, infections, toxicity, substance abuse, benign and malign brain tumours, many genetic diseases, etc., the majority of cognitive disorders are caused by neuro-degenerative diseases, vascular damage to the brain and the various combinations between them. Cognitive dis-orders and especially dementiaa condition severe enough to compromise social and/or occupational functioning [1] have a tremendous social significance to all the parties involved in their diagnosis and treatment. Patients undergo an ever-progressing cognitive decline and gradually lose independence, which puts a heavy burden on their caregiversfamily members, hired professionals or the personnel of specialized institutions. Cognitive disorders also present a huge public and financial burden as the number of cases increases with every decade. Last but not least, they also pose a number of still unanswered moral and ethical issues.
The tendency for increased life expectancy led to an in-creased dementia morbidity as each decade of human life exponentially increases the chance of developing cognitive decline, something true for all types of dementia, but especially for the degenerative types. It is considered that about 5.5% of people over 65 have dementia and that the number of global cases is expected to double in every 20 years reaching the overwhelming number of 115 million until 2040 [2]. In the European Union, more than 160 million people are over 60, about 6.2% of Europeans have some form of dementia (or almost 10 million people) and European cases are expected to rise to 14 million until 2030 and almost 19 million until 2050 [3]. This data highlights the need for both global and national plans to combat this increase in dementia morbidity both clinically by an increased quality of dementia diagnosis and treatment and scientifically by increased dementia research. On a clinical level, diagnosis should be as accurate and as early as possible, pharmacological treatment should be prescribed as early as possible, the required infrastructure for cognitive rehabilitation should be present, vascular risk factors should be strictly controlled for the prophylaxis of vascular dementia, cases should be gradually followed-up and clinical guidelines should be updated as needed. This would result to early and adequate diagnosis of mild cognitive impairment with high risk of transforming to dementia, slowing down and prevention of late-stage cases of dementia where possible, significant decrease in morbidity and the subsequent medical and social burdens, reduction in medication and hospitalization needs and improvement of patients' and caregivers' quality of life [4].
Digitalization of medical data, particularly in cognitive diseases, has the opportunity to not only ease clinical management of those diseases by creating registries of patients, enabling strict follow-up and creating a robust schedule of cognitive rehabilitation, but to also hugely amplify the ability to conduct large-scale research by applying Big Data and Artificial Intelligence (AI) technologies. To this end, many digital repositories have been created around the world with the aim to enhance research on cognitive diseases [5], storing variety factors and indicators of the patient history and current state. Although the transformation of information to particular data model is straight-forward, a lot of challenges arise when data from different sources has to be integrated. Since each data source keeps entities with different names and relationships at different levels of granularity, a part of information can be lost or not properly presented.
In order to avoid incorrect data transformations and enable data integration from variety of sources, the paper proposes a common hierarchical data model, which keeps the semantics of the data in a human-readable format and accelerates interoperability of clinical datasets. The data model can be used by researchers as a stand-alone data model for clinical data as well as a middleware for mapping between different data models. It enables application of Machine Learning (ML) and Artificial Intelligence (AI) algorithms by helping data scientists to understand the semantics of information through a holistic view of patient.
The rest of the paper is organized as follows. Section 2 includes background and related work. Section 3 describes the approach followed for elaboration of the data model. Section 4 presents the developed data model. Section 5 gives conclusions and directions for future work.

Methods
This section describes the process followed for development of the patient's data model. It includes the following iteration step, shown in Figure 1: • Step 1: Definition of requirements to the data model. • Step 2: Building of data model. • Step 3: Formalization of data model using Unified Modelling Language (UML) [6] and YAML Ain't Markup Language [7] notations and corresponding software tools.

Data model requirements
This section presents the requirements that have been identified with the highest priority for creating the data model (Table 1).

ID
Requirement Type DR1 The model should present the functional specification for the patient record reflecting the described structure in horizontal and vertical plan as well as the set correlations and interactions between the included properties.

DR2
The model should be oriented towards presenting data at the Patient level. Domain DR3 The model should allow application for modelling all kind of cognitive disorders in unified way.

DR4
The model should integrate information through biomedical abstractions, using proper medical terminology.

DR5
The model should integrate diverse medical data at different levels of granularity. Domain DR6 The model should have a temporal/historical dimension (analysing patients' data over time) .

DR7
The model should set correlations that discover a relationship between the patient's status, historical data, and disease progression.

DR8
The model aligned with the external knowledge sources, such as ADNI database. Domain TR1 The chosen format of presentation of the model must be known/popular and understandable to the medical expert and technical staff to be successfully validated.

TR2
The model must provide a complete and non-contradictory presentation/structure to be used in the selection of the proper database and its creation.

Technical
The full list of requirements includes others related to the presentation of the model in English language and inclusion of medical terms reviewed and approved by the medical expert in the project.

Data model building
The basic factors and indicators associated with cognitive diseases occurring in senior age are systematically analysed base on in-depth review of the literature. The data model is designed to provide basic data for followup work related to precise diagnosis and prognosis of disease development in patients, as well as generalized conclusions about dependencies and relations in the course of the disease.
The approach for building of the data model includes following steps: a) Determining groups of factors and indicators, which are relevant to risk assessment, diagnosis of the disease and its development. During the building of the data model, the following difficulties are encountered: • Lack of previous centralized digital records of patients in Bulgaria, who are diagnosed with cognitive degenerative diseases in the senior age. • Limited access to existing relevant public digital databases.
• Ambiguous terminology, which complicated the performed gap analysis.
The presented data model claims to be innovative on the basis of completeness and complexity of description (at existing levels, correlations and relations) of the captured data.

Data model formalization
The main entities of the data model are specified using Unified Modelling Language (UML), while the whole model is implemented with "YAML Ain't Markup Language" (YAML). UML is a standardized modelling language, which uses mainly graphical notations to create and exchange meaningful models. It is independent form the development process and support extensibility of the models through extension of the core concepts. YAML is a data serialization language designed to be friendly and useful to people working with data. It is portable between programming languages as well as expressive and extensible, supporting serialization of native data structure. Since YAML is human readable, the validation of the data models created using it can be easily validated of the non-technical experts such as the case of the current data model.

Data model validation
A clinical expert in the field of cognitive diseases performed the evaluation and curation of the data model. He was selected based on his clinical experience in leading neurological departments dealing with neurodegenerative states in Sofia, Bulgaria and his research work on cognitive dysfunction in various neurological diseases.

Results
The data model defines data entities and their attributes and relationships needed to create a patient's profile. It is developed with the purpose of unifying and structuring the information relevant to four main domains of the profilepersonal data, medical history data, objective clinical investigations and treatment prescribed. The model gives evidence about the early-life, midlife and old-life risk and protective factors for dementia. Figure 2 presents a UML diagram of the main entities of the data model.

Personal profile
The personal profile includes the patient's personal data. The personal data is modelled as a Patient entity, including attributes such as date of birth, gender, race, ethnicity and native language. Figure 3 shows the personal profile defined in .yaml format. The rest factors and indicators included in the data model are defined in a similar manner.
Age-related cognitive decline is influenced by midlife and old-age risk factors [8]. For example, older women are more likely to develop cognitive impairments than men of the same age. Differences in the development of dementia decreases between blacks and whites and increased between Hispanics and whites [9].

Anamnestic profile
The medical history data is related to the patient's social status, everyday habits and the presence of any head traumas, modelled as separate entities. The Social status entity is defined with attributesyears of education, marital state, coexistence (living alone, with a caregiver or in a specialized institution), employment, financial state and computer literacy. The dementia risk is reduced by higher childhood education levels and lifelong learning [8,10]. Recent studies shows that cognitive stimulation is more important in early life [54]. A reason for this could be that the people of intensive cognitive function seek out cognitively stimulating activities and education [11]. Similarly, there is relation between cognitive function and employment. People having jobs with a cognitive demand tend to show less cognitive decline before, and sometimes after retirement [12]. The social contact might be considered as a factor that educes the risk for development of dementia, since it enhances the cognitive function. The marital state contributes to social engagement, since the married people usually have more interpersonal contact. People who are living alone or widowed people are at a higher risk of dementia [13].
The Habits entity has attributes such as smoking, alcohol consumption, physical activity and sport, diet, drug abuse, duration and quality of sleep. There is а strong evidence that the smoking increases the risk for developing dementia and its stopping reduces this risk, even in old age [14]. Similarly, the excessive alcohol consumption leads to brain changes and increased dementia risk [15]. Drinking more than 21 units per week is associated with a higher risk than drinking less than 14 units [16]. Drinking more than 14 units might be related to right sided hippocampal atrophy on MRI [17]. Regarding the diet, the recent studies are more focused on the nutrition as a whole rather than on particular dietary ingredients. According to WHO guidelines, a Mediterranean diet reduces the risk of cognitive decline or dementia [18]. The sleep disturbance has been linked with Aβ deposition, low grade inflammation, reduced glymphatic clearance pathways activation, increased Tau, cardiovascular disease and hypoxia [8]. It could be part of natural history of the dementia syndrome and considered as a risk factor. There is an evidence that the higher physical activity reduces the risk of dementia, but the interpretation depends on other factors such as age, gender, social class, comorbidity and cultural differences. Inactivity might be a prerequisite or consequence of dementia. According to WHO, the physical activity has a small effect on normal cognition, but more significant one on mild cognitive impairment [18].
The Head Trauma entity is described with the date of the event, its severity and the availability of head imaging at the time of the eventcomputed tomography (CT) or magnetic resonance imaging (MRI). The severe head trauma, caused by incidents, military exposures, recreational sports, firearms and others is associated with widespread hyperphosphorylated tau pathology and increased dementia risk [19,20].

Clinical profile
The Clinical profile is described with data about medical investigations and assessments, comorbidities and their severity and ultimately, as well as the most likely diagnose. It covers 6 aspects of diagnostics: Imaging, Neuropsychological and neuropsychiatric assessment, Cerebrospinal fluid biomarkers, Blood tests, Genetic data and Comorbidities.
Imaging, including CT and MRI scans, is used for registration of the presence and severity of both neurodegenerative disease and vascular disease. MRI data might be assessed by several scales and scores as follows: global cortical atrophy scale (GCA), medial temporal lobe atrophy score (MTA) and posterior atrophy score of parietal atrophy (PCA, Koedam score) for neurodegeneration and the Fazekas scale for white matter lesions for vascular damage Neuropsychological and neuropsychiatric assessment is based on different tests for evaluation of cognitive domains ( Table 2, rows 1-5) and neuropsychiatric state ( Table 2, rows 6-10). Presence and severity of fatigue Fatigue Severity Scale (FSS) 9 Presence and severity of neuropsychiatric and behavioural changes Neuropsychiatric Inventory (NPI) questionnaire 10 Possibility for independent daily functioning Lawton Instrumental Activities of Daily Living (IADL) scale Cerebrospinal fluid biomarkers are used to measure total tau protein (T-tau), phosphorylated tau protein (Ptau), beta-amyloid peptide 1-42 (Aβ42), P-Tau/T-tau ratio and Aβ42/Aβ40 ratio. Their interpretation is closely related to Personal profile of the patient, for example age and gender. Amyloid and tau indicate increased risk for development of cognitive impairment in older adults [8]. People with cognitive impairment and negative amyloid results is unlikely to be diagnosed with Alzheimer's disease in the next few years.
Genetic data describe the availability of family history for dementia and presence of diagnosed mutations related to dementia and APOE genotyping. Age is one of the most significant risk factors for dementia and people in their late 70s and 80s are at greatest risk of developing dementia. However, if somebody has developed dementia at an earlier age, there is a greater chance that it may be a type of disease that can be passed on. APOE ε4 is associated with Alzheimer's disease for people with a family history of dementia. For individuals that carry 2 APOE ε4 alleles (ε4/ε4 genotypes) this association is highest.
Blood tests include levels of vitamin B12, B9, thyroid stimulating hormone, cholesterol, haemoglobin, glucose, creatinine, sedimentation rate, etc. Observational studies show that folate and B vitamins, Vitamin C, D, E, and selenium are potential protective factors against cognitive decline [21]. Apolipoprotein E is a protein responsible for transportation of the fats and cholesterol in the blood. At the same time, the APOE gene supports the production of the apolipoprotein E.
Comorbidities are related to presence of arterial hypertension, dyslipidaemia, diabetes, obesity, atrial fibrillation, carotid stenosis, autoimmune diseases and autoimmune vasculitis, psychiatric diseases or other comorbid diseases. They may accelerate the progression of dementia [22]. For example, cognitive decline may be exacerbated in older people with type 2 diabetes [23]. Increased risk of a late life dementia correlates with persistent midlife hypertension [8]. In addition, the presence of cognitive disorder could affect and complicate the clinical care of other comorbid conditions [24].
The most likely dementia disease of the patient is described with one of the following diseases: Alzheimer's disease, vascular dementia, mixed dementia, Lewy body dementia, frontotemporal dementia, Parkinson's dementia, Parkinson's plus syndromes.

Treatment profile
The Treatment profile is related to medications prescribed to the patient. The medications are divided in groups medications for degenerative cognitive disorders, medications for cerebrovascular disease, antiplatelet and anticoagulant drugs, neuroleptic drugs, antidepressant drugs, medications for sleeping. Each medication is described with the date of prescription and the daily dosage in milligrams.
The neuroleptic drugs are sometimes used to treat the behavioural complications in presence of dementia. At the same time, they may worsen already poor cognitive function [25]. Regardless of sleep duration, people taking hypnotics are at greater risk of dementia than those who did not. For example, benzodiazepines are associated with falls and possibly dementia [26]. Antidepressants are widely used medication for treatment of anxiety and depression. Resent studies shows that the antidepressant therapy significantly increases the risk of dementia [27]. Along with the antidepressant, the urological and antiparkinsonian drugs with definite anticholinergic activity are associated with future development of dementia, with associations persisting up to 20 years after exposure [28]. In contrast, the antiplatelet drugs, which have a good tolerance in stroke prevention, may decrease the risk of vascular dementia [29].

Conclusions and future work
Cognitive can be the result of many medical conditions like a single or repeated head injuries, infections, toxicity, substance abuse, benign and malign brain tumours. The majority of them are caused by neurodegenerative diseases, vascular damage to the brain and the various combinations between them. The medical investigations and assessments related to diagnosis and treatment of cognitive disorders are characterized by great variety, including medical imaging, neuropsychological and neuropsychiatric assessment, cerebrospinal fluid biomarkers, blood tests etc. The diversity of factors and indicators of cognitive diseases naturally implies the development of comprehensive data model describing history and status of patients with cognitive decline such as that proposed in this study. The main contribution and benefits can be summarized as follows: • A common, actionable and shareable data model is elaborated, allowing researchers to conduct research efficiently using available datasets for patients with cognitive disorders. • The data model delivers a native vocabulary, which improves data availability, saves efforts and standardize data to minimize risk of rework and misunderstandings. • The data model enables application of Machine Learning (ML) and Artificial Intelligence (AI) algorithms by helping data scientists to understand the semantics of information through a holistic view of patient. • The data model accelerates interoperability of clinical datasets and support content mapping by classifying and semantically annotating the data.
The future work includes development of ontology of cognitive diseases on top of proposed data model. The ontology will be base for implementation of a graph database that integrates data from different sources for advanced data analytics. A dedicated Extract Transform Load (ETL) process is planned to be developed to import Alzheimer's Disease Neuroimaging Initiative's (ADNI) study data.