Innovative use of data sources: A Cross-sectional study of Data Linkage Practices across European Countries

Background The availability of data generated from different sources is increasing with the possibility to link these data sources together. However, linked administrative data can be complex to use and may require advanced expertise and skills in statistical analysis. The main objectives of this study were to describe the current use of data linkage at the individual level and the artificial intelligence (AI) in routine public health activities, and to identify the related health outcome and intervention indicators and determinants of health for non-communicable diseases. Method We performed a survey across European countries to explore the current practices applied by national institutes of public health and health information and statistics for innovative use of data sources (i.e., the use of data linkage and/or the AI). Results The use of data linkage and the AI at national institutes of public health and health information and statistics in Europe varies. The majority of European countries use data linkage in routine by applying a deterministic method or a combination of two types of linkages (i.e., deterministic & probabilistic) for public health surveillance and research purposes. The use of AI to estimate health indicators is not frequent at national institutes of public health and health information and statistics. Using linked data, 46 health outcome indicators related to seven health conditions, 34 indicators related to determinants and 23 to health interventions were estimated in routine. Complex data regulation laws, lack of human resources, skills and problems with data governance, were reported by European countries as obstacles to link different data sources in routine for public health surveillance and research. Conclusions Our results highlight that the majority of European countries have integrated data linkage in routine public health activities but a few use the AI. A sustainable national health information system and a robust data governance framework allowing to link

different data sources are essential to support evidence-informed health policy development process. Building analytical capacity and awareness of the added value of data linkage in national institutes is necessary for improving the utilization of linked data in order to improve the monitoring of public health activities.

Background
The availability of administrative data generated from different sources is increasing and the possibility to link these data sources with other databases offers unique opportunities to answer those research questions which require a large sample size or detailed data on hard-to-reach populations [1]. Data linkage can generate evidence with a high level of external validity and applicability for policymaking [1]. Over an extensive period, these population data (i.e., linked administrative data) can ensure high statistical power, reducing methodological issues relating to attrition, recall bias and lost-of-follow up [2], allowing more detailed stratified analyses of subgroups according to age, or specific geographical regions, and provide rapid access to data collected in a standardized format [3][4][5].
The value of any surveillance system ultimately depends on timely and reliable information [6]. There are several data sources which are used for public health surveillance, for example, health interviews and examination surveys, diseases-specific registries, epidemiological cohort studies, hospital discharge data, health insurance claims, mortality database, etc. Traditional data sources (e.g., health interview and examination surveys, disease-specific registries, etc.) and administrative data sources (e.g., hospital discharge, health insurance claims, causes of mortality data, etc.) complement each other and can increase the completeness and comprehensiveness of health information by taking into account various dimensions of health and risk factors influencing health status directly and indirectly.
Linking various data sources improves completeness and comprehensiveness of information to guide health policy process, effective patient care and health services management [7]. Data linkage is an important technique that connects detailed individuallevel information from different data sources. This methodology potentiate the capacity to study disease burden and progression, risk factors, care pathways and long-term outcomes for public health research and health surveillance [1]. However, linked administrative data can be complex to use and may require advanced expertise and skills in statistical analysis [8]. Generating efficiently comparable and timely health information across European Union (EU), European Economic Area (EEA) and other European countries requires to perform data linkage and apply AI to estimate health indicators. Many countries have already invested in data linkage to improve their health information system [9], but there are wide differences in capacity across European countries to perform data linkage in routine.
We explored the differential use of data linkage in routine health monitoring based on the latest developments in new methods and analysis across European countries. This study was carried out under the InfAct (Information for Action) [10] which is a joint action of Member States aiming to develop a more sustainable EU health information system through improving the availability of comparable, robust and policy-relevant health status data and health system performance information. InfAct gathers 40 national health authorities from 28 Member States (MSs). This study is part of a work package (WP9) focused on innovation in health information system (i.e., using data linkages and/or AI) to improve public health surveillance and health system performance for health policy development process. The main objectives of this study were 1. to describe the current use of data linkage at the individual level and the AI techniques applied in routine public health activities and 2. to identify the relevant health outcome and intervention indicators estimated and determinants of health for non-communicable diseases.

Methodology
We performed following steps to achieve the objectives of this study: 1.

Literature search
We reviewed the existing literature published on the use of data linkage and the AI (i.e., one technique of AI is machine learning technique) for health status monitoring using PubMed on Dec. 1, 2018. We included in our search peer-reviewed articles, systematic reviews and published reports published in English language. The search strategies used are reported in additional file 1. Based on the review, we identified different data sources used for data linkage, the use of artificial intelligence [AI]), health outcome and intervention indicators and determinants of health (Additional file2). This was not an exhaustive search and was performed only to identify any existing questionnaire or relevant information to be used to develop a questionnaire to identify the current practices in innovative use of data sources across European countries.

Definition of innovative use of data sources
We developed the definition of "innovative use of data sources" in the context of public health and health information system and defined as: The linkage of different data sources (health surveys and/or disease-specific or population-based registries and/or national cohort and/or clinical research datasets and/or administrative data and/or electronic health records and/or X-data sources i.e., information on determinants of health and can include data on various exposures [Additional file 2]) with each other using linkage technology and/or The use of AI either to linked data or to an individual data set, allowing a better understanding of what determines population health or to promote the efficiency of the health system and guide decision making at different geographical levels or at other categorization parameter level.

Development of web-based survey
We developed a questionnaire and requested information on data sources used for

Study outcomes
The main outcomes of this study were the current practices in data linkage and the AI and related health indicators estimated in routine public health activities across European countries. A descriptive analysis of the web-based questionnaire results has been performed using Microsoft Excel.

Literature search
We reviewed 137 citations from PubMed and four reports from the following organizations: There are ongoing projects on the use of the AI (i.e., in next five years) to integrate this technology in routine public health activities in following countries: Croatia, Czech Republic, France, Germany, Norway, Portugal, and Spain. The objectives of these initiatives are for epidemiological research and surveillance of non-communicable and communicable disease estimating the prevalence and prediction of incidences of certain health conditions at various geographical levels.
Two countries mentioned that due to lack of human resources (Lithuania) and capacities/skills (Republic of Serbia) within their public health institutes, AI techniques are not applied in routine public health activities.
Some European countries also mentioned use of classical statistical techniques without the use of AI (Table 2).

Health indicators estimated using linked data
Using linked data, the majority of European countries estimate following health indicators:

Health outcome indicators
Participants were asked to select at least three health conditions and to report the related health outcome indicators which are most important for public health in their country.
Using linked data, 46 health outcome indicators related to following seven health conditions were reported from 22 countries: ca r d i o v a s c u l a r (14), neurodegenerative disease (6), maternal and perinatal health (6), diabetes (6), suicide/trauma/injury (7), cancer (6) and hepatic failure (1) ( Table 3.1). The main objectives to estimate these indicators were for public health monitoring and research purposes and the level of estimation was mainly at national and sub-national levels.

Health determinants
For the health determinants, participants were asked to report the corresponding determinants of the identified health conditions. 34 health determinants related to various health conditions were reported by 15 member states (Table 3.2). These determinants are related to the physical environment (12), socioeconomic and environment (10), health behavior and lifestyle (6) and biological and metabolic parameters (3) ( Table 3.2). These determinants were used to measure the potential associations between these risk factors and health conditions for public health monitoring and research purposes. These determinants can be stratified by age, sex, socioeconomic status and by area of residence.

Health intervention indicators
Participants were asked to report at least three health intervention indicators under three categories (i.e., prevention, promotion, others) corresponding to the given health conditions which are most important for public health in their country. Using linked data, 23 health intervention indicators related to following six health conditions were reported from 17 member states: maternal and perinatal health (7), cancer (6), diabetes (4), cardiovascular (2), neurodegenerative disease (2), suicide/trauma/injury (1) and lower/upper respiratory infections (1), (Table 3.3). The main objectives to estimate these indicators were to guide health policy process, public health monitoring and for research purposes. These intervention indicators are estimated mainly at national and sub-national levels and currently are in use.

Discussion
The results of this study showed variability in use of data linkage and the AI at national institutes of public health and health information and statistics across European countries. A systematic review has shown some practices applied for data linkage in the field of perinatal health across Europe for health surveillance and research purposes [9]. Several other studies have explored various dynamics of population health such as social care, psychotic disorders, multi-morbidity, diabetes, obesity, mental health, cardiovascular, antibiotic use and Alzheimer using data linkage with different types of administrative data sources (both related to health and non-health) [7,[17][18][19][20][21][22][23][24][25][26][27][28][29]. For the surveillance of cancer, data linkage not only provides the opportunity to improve the population-based screening [30] but also helps in detecting different types of cancer recurrence [31] and evaluation of the socio-economic status of patients with cancer (e.g., return to work) [32]. Linked data also allows evaluating the interventions at various levels of the population [33]. The diversity and the volume of health information have been increasing rapidly and push to discover new parameters to improve population health with innovative approaches. In that context, some initiatives have been launched at the national levels to create health data hub/platform to be used for research and to guide the policy development process [34,35].
There are some studies available which have discussed the advantages of using AI in early  To address these gaps, we propose the following recommendations: A. Legal aspects: 1.

Consent for publication
All authors gave the consent for publication.

Availability of data and materials
Not applicable

Competing interests
All other authors declare that they have no competing interests related to the work.

Funding
This research has been carried out in the context of the project '801553 / InfAct' which has received funding from the European Union's Health Programme (2014-2020).  House of handicap person s' health and social assistance linked with national health database 11

Authors' contributions
Germany National health examination survey in adults linked with mortality database National health examination survey in adults linked with health insurance claims Cancer registry operated by the public health institute and included in health reporting National health surveys use national and sub-national data for weighting National health examination surveys use inter-metropolitan socioeconomic data for improvement field work (in progress) Use of socioeconomic data at the metropolitan level for small area estimation (in progress) Use of real-time emergency room data for surveillance of infectious diseases (in progress in a loc project) Linkage of data from national health surveys, health insurance data, cancer registry and other da sources for national burden-of-disease calculation (in progress) 12 Greece No 13 Ireland (in progress) Cancer registry linked with Hospital admission linked and mortality database Census data linked with mortality database (one off) Prescribed medication data Medical eligibility and claims data linked with income level (one off ) 14 Italy Hospital discharge linked with mortality database and national health examination survey 15 Latvia (in progress) Hospital discharge, primary health care, emergency care records linked with birth and mortali database Patient register with specific diseases linked with mortality database 1.6 Lithuania Compulsory health insurance information system (inpatient, outpatient specialized, primary car emergency care) linked with causes-specific mortality database 17 Luxembourg No  18 Malta Health insurance claims, prescribed drugs, surgical operations, laboratory information system radiology information system, patient administration system, outpatients attendance, patien discharge summaries linked with birth and mortality database Congenital anomalies, injuries, cancer, dementia, organ transplants registries linked with mortali database 19 The Netherlands Health examination and interview surveys linked with mortality database Health insurance claims with perinatal data Cancer registry data with mortality database 20 Norway Linkage between almost all sources by means of unique personal identification. Both within heal and care services, and across other governmental areas. Big data solution in use for accessibili modulation using national health registries linked with land and housing, road and transport, and G databases. 21 Poland Cancer and tuberculosis registry databases linked with mortality, demographic and GIS databases (in progress) National health surveys linked with electronic health records 22 Portugal Hospital discharge, primary care and medical records linked with hospital registry of domestic an leisure accidents, e-death certification Cancer, tuberculosis, HIV and congenital anomalies registries linked with e-death certification an hospital discharge data 23 Romania No 24 Slovakia National registry of EHRs (Hospital discharge, general practitioner record, referrals, prescribe medications, laboratory results, diagnostic procedures medical consultations) linked with nation disease-specific registries National registry of EHRs linked with national registry of health care workers and heath ca providers 25 Slovenia Hospital discharge, drug prescription and perinatal health linked with mortality database Hospital discharge, drug prescription and perinatal health linked with census data on education an socioeconomic variables (inequality analysis) Hospital discharge, drug prescription and perinatal health linked with European