Literature search
We reviewed 137 citations from PubMed and four reports from the following organizations: OECD (Organization for Economic Co-operation and Development) [14], Euro-REACH [15], HBM4EU (Human Biomonitoring for Europe) [16], EUROCISS (European Cardiovascular Indicators Surveillance Set) [17], to develop this questionnaire (Fig. 1).
Fig. 1: Flow diagram of studies using linked data and artificial intelligence for health status monitoring to develop a survey on identifying various data linkage practices across European countries in 2019 (insert here)
*To be more specific for AI techniques, we looked for studies using machine learning techniques (i.e., one type of AI technique) more often used for health status monitoring.
Survey results
The survey results include the countries response, use of data linkage in routine public health activities, use of AI in routine public health activities, health indicators estimated using linked data and main obstacles to linking different data sources. All survey respondents have validated these results.
Countries response
Twenty-nine countries (i.e., EU MSs 27 + EEA 1 [Norway] + Others 1 [Serbia]) participated in the survey with a response rate of 94% (29/31). Hungary, Iceland and Northern Ireland did not participate.
Use of data linkage in routine public health activities
Our survey results highlight that 24 European countries perform data linkage in their routine public health activities. These countries link administrative data such as EHRs (Electronic Health Records), mortality data, and disease specific registries whereas six of them (Cyprus, Italy, Poland, Portugal, Spain and Slovakia) are also developing this technique further to link with other types of data sources (i.e., demographic data, domestic/leisure accidents data, congenital anomalies registry). Ireland and Latvia have ongoing initiatives of data linkage (Table 1).
Table 1: Use of data linkage for routine public health activities in European countries in 2019
Use of Data Linkage
|
|
Advanced
N = 24
|
In progress*
N = 8
|
Not yet
N = 3
|
European Countries
|
AT, BE, BG, CY, CZ, DE, DK, EE, ES, FI, FR, HR, IT, LT, MT, NL, NO, PL, PT, SI, SK, SRB, SW, UK (ENG, SC, WL)
|
CY, ES, IE, IT, PL, PT, SK, LV
|
GR, LU, RO
|
* 6 countries (CY, ES, IT, PL, PT & SK) use data linkage in routine (i.e., advanced) but also developing further this technology to link different other data sources (i.e., in progress).
Three countries (Greece, Luxembourg and Romania) have not yet planned to integrate data linkage in their routine public health activities. The following reasons were mentioned by some countries for not having institutionalized data linkage: lack of a public health institution, which should collect and govern the health related data, data linkage is not part of the health agenda, lack of commitment from the ministry of health, lack of resources to establish a national health information system, the institutional complexity of the Ministry of Health and strict laws and regulations, which hinder data linkage with different data sources.
Objectives of data linkage: Data linkage can be performed in routine for different objectives such as for health status monitoring, health system performance, health policy development or for scientific research (i.e., public health, epidemiology or clinical) purposes. Our results showed that data linkage was performed for health status monitoring in 20 countries (BE, CY, CZ, DE, DK, EE, ES, FI, FR, HR, IT, LT, MT, NL, PT, SI, SK, SRB, SW, UK (SC, WL), for health policy development in 13 (AT, BE, BG, DK, EE, FR, MT, NL, NO, PL, SK, SW, UK (SC, WL) and for scientific research (public health, epidemiological and clinical) purposes in 13 (BE, CZ, DE, DK, EE, ES, FI, FR, NL, PT, SI, SW, UK (ENG, SC, WL). Finland, Spain, Sweden and Scotland also perform data linkages to identify population risk factors. In Sweden, data linkage is also used to monitor compliance with national treatment guidelines to improve health care quality.
Data sources used for linkage: Our results showed that 24 European countries, who perform data linkage in routine, use most frequently the five following types of data sources: health-related administrative data sources, non-health related administrative data sources, disease-specific registries, national health surveys, population-based epidemiological cohort and clinical trials (Additional file 4). These data sources are linked with each other in different combinations and some examples of the various combinations used across member countries, are reported in additional file 5. These countries perform data linkage based on the following information: social security number, patient unique identification number, person unique pseudonymous identifier, encrypted personal identification number, citizen or national identification number. In some countries, for instance in Ireland, the lack of unique patient identifier number, limits the potential to link with different data sources.
General characteristics of linked dataset: Our results showed that among 24 European countries who perform data linkage in routine, 17 do linkage at national level (Table 2). France, Portugal and Scotland do data linkage both at national and sub-national levels. Denmark, Germany, Norway and Sweden do data linkage at all levels. 23 countries either perform the deterministic type of linkage (12 countries) or a combination of deterministic and probabilistic linkage (11 countries). In 16 out of 24 countries, linked data is available and is used in routine. In 12 out of 24 countries, the register owner (i.e., who governs the data register) provides the approval to access linked data. In 15 out of 24 countries, the accessibility to linked data is in routine or permanent whereas, in 13 countries, the accessibility could be ad-hoc or at intermittent basis depending on the project. In 15 out of 24 countries, linked data do not operate in real-time (i.e., integrate the updated information with minimum delay in time). In 19 out of 24 countries, linked data are flexible to integrate new variables.
There are ongoing projects to integrate data linkage (i.e., in next five years) as part of this technology in their routine public health activities in following European countries: Austria, Cyprus, Czech Republic, Ireland, Italy, Latvia, Norway, Poland, Portugal, and Spain.
Table 2: General characteristics of linked datasets in European countries in 2019 (insert table here)
- Use of artificial intelligence (AI) in routine public health activities
The use of AI is not frequent across European countries (Table 3). Only five countries have reported applying the following techniques in routine public health activities: machine learning (Denmark, Finland, Sweden, and UK-Wales), natural language processing (Finland, Sweden, and UK-Wales), Markov decision process (Finland), support vector machine (Finland, UK-Wales), data mining (Finland) and TSP [Travelling Salesman Problem] modelling (Norway). Denmark can apply these techniques not only at national level but also at metropolitan level.
There are ongoing projects to integrate the use of AI in routine public health activities in the next five years in following countries: Croatia, Czech Republic, France, Germany, Norway, Portugal, and Spain. The objectives of these initiatives are for epidemiological research and surveillance of non-communicable and communicable disease estimating the prevalence and prediction of incidences of certain health conditions at various geographical levels.
Two countries mentioned that due to lack of human resources (Lithuania) and capacities/skills (Republic of Serbia) within their public health institutes, AI techniques are not applied in routine public health activities.
Some European countries also mentioned the use of classical statistical techniques without the use of AI (Table 3).
Table 3: Use of artificial intelligence in routine public health activities in European countries in 2019
Use of Artificial Intelligence (AI)
|
|
Advanced
N = 5
|
In progress
N = 9
|
Not yet
N = 16
|
European countries
|
DK, FI, NO, SW, UK-WL
|
AT, CZ, DE, ES, FR, HR, PL, PT, SK
|
BE, BG, CY, EE, GR, IE, IT, LT, LU, LV, MT, NL, RO, SL, SRB, UK (ENG, SC)
|
Level of application of AI
|
National level
|
DK, FI, NO, SW, UK- WL
|
Sub-national level
|
|
Metropolitan level
|
DK, SW
|
Use of classical statistics without the use of AI
|
|
Advanced
N = 19
|
In progress
N = 5*
|
Not yet
N = 8
|
European countries
|
BE, BG, CZ, DE, EE, ES, DK, FR, FI, IT, MT, NL, NO, PL, PT, SI, SK, SW, UK (ENG, SC, WL)
|
AT, CZ, ES, HR, SK
|
CY, GR, IE, LT, LU, LV, RO, SRB
|
Level of use of classical statistics without AI
|
National level
|
BE, BG, CZ, DK, EE, FR, FI, IT, NL, NO, PL, PT, SK, SW, UK- WL
|
Sub-national level
|
DE, ES, IT, PL, NO, SI, UK (ENG, SC)
|
Metropolitan level
|
DK, MT, NO
|
*Two countries (CZ & SK) use classical statistic in routine (i.e., advanced) but also developing further this technology (i.e., in progress)
|
Health indicators estimated using linked data
Using linked data, the majority of European countries estimate the following health indicators:
Health outcome indicators
Participants were asked to select at least three health conditions and to report the related health outcome indicators, which are most important for public health in their country. Using linked data, 46 health outcome indicators related to the following seven health conditions were reported from 22 countries: cardiovascular (14), neurodegenerative disease (6), maternal and perinatal health (6), diabetes (6), suicide/trauma/injury (7), cancer (6) and hepatic failure (1) (Additional file 6). The main objectives to estimate these indicators were for public health monitoring and research purposes. The level of estimation was mainly at national and sub-national levels. These 46 health outcome indicators were classified according to the following categories: 1. health characteristics, 2. mortality, 3. human function and quality of life and 4. life expectancy and well-being. For example for the first category, Czech Republic, France, Lithuania, Sweden and Wales, use linked data in routine public health surveillance to estimate the incidence and prevalence of diabetes (Additional file 6).
Health determinants
Participants were also asked to report the corresponding determinants of the identified health conditions. 34 health determinants related to various health conditions were reported by 15 member states (Table 3.2). These determinants are related to the physical environment (12), socioeconomic status and the environment (10), health behavior and lifestyle (6) and biological and metabolic parameters (3) (Additional file 7). These determinants were used to measure the potential associations between these risk factors and health conditions for public health monitoring and research purposes. These determinants can be stratified by age, sex, socioeconomic status and by area of residence. For example in England and Wales, in relation to the physical environment, the proximity of fast food outlets from areas of residence is used to measure its potential association with chronic health conditions such as adiposity and obesity. This variable can be stratified by the area of residence (Additional file 7).
Health intervention indicators
Participants were asked to report at least three health intervention indicators under three categories (i.e., prevention, promotion, others) corresponding to the given health conditions which are most important for public health in their country. Using linked data, 23 health intervention indicators related to the following six health conditions were reported from 17 member states: maternal and perinatal health (7), cancer (6), diabetes (4), cardiovascular (2), neurodegenerative disease (2), suicide/trauma/injury (1) and lower/upper respiratory infections (1), (Additional file 8). The main objectives to estimate these indicators were to guide the health policy process, public health monitoring and for research. These intervention indicators are estimated mainly at national and sub-national levels and currently are in use. For example in Sweden, one of the estimated intervention indicator relates to preventive therapy, the number of diabetic patients counselled by a nurse to avoid complications (Additional file 8).
Main obstacles to linking different data sources
The majority of European countries we surveyed identified the following main obstacles associated with the implementation and the use of data linkage and advanced statistics: 1. The complex laws and data protection regulations, which block linkage between different data sources with a deterministic approach (legal), 2. Lack of human resources and capacities/skills within national institutes of public health and health information statistics (technical), 3. Lack of governance of health information (data governance) and 4. Limited resources to support the health information infrastructure (organization and structural).