Quasi-experimental study design
Our study resembles an interrupted time series (ITS)design42 with months per centre as observation units obtained from a retrospective (open) cohort of health surveillance data from the PriCarenet surveillance network. The impact of COVID-19 on incident diagnosis patterns among refugees was evaluated using a segmented regression approach.
Setting and data sources
The analysis was conducted within the framework of PriCarenet, a health surveillance network21. PriCarenet is overseen by the University Hospital Heidelberg and comprises healthcare providers operating healthcare facilities on-site as of October 2018. Since December 2023, these facilities are distributed across 24 state-level registration and reception centres, along with one district-level accommodation centre for refugees in Germany. These 25 centres are situated in the German states of Baden-Wuerttemberg, Bavaria, and Hamburg. These states collectively host approximately 30% of the asylum-seeking population in Germany, as determined by administrative quotas 43. Within PriCarenet, healthcare providers are equipped with a customized Electronic Health Record (EHR) system known as Refugee Care Manager (RefCare). RefCare not only includes standard medical record-keeping features but also incorporates a built-in health surveillance module21,44 (Table 2). The surveillance module comprises an automated analysis of locally stored medical routine data using predefined indicators. The indicators are constructed using diagnosis categories based on International Classification of Diseases (ICD-10-GM Version 2021) and drug prescriptions based on the Anatomic Therapeutic Classification (ATC 2023) as defined and outlined in Table 3, and operationalized through a standardised analysis script 21,44. To protect data anonymity, any observations with counts less than 3 are adjusted to 0. More detailed information about the surveillance infrastructure in PriCarenet, and the local analysis of indicators can be found in previous reports 21,44.
Table 2
Software features of the electronic health records “Refugee Care Manager” (RefCare):
Management Features | Patient Medical Records |
Patient Management Task and daily lists External document storage User management External doctor management Facility and clinic data Local export of patient lists for follow-up | Record patient contact (patient history, clinical findings, diagnosis, therapy, etc.) Display and filter contact history Printable medication plan and immunisation status Generate doctors’ letters COVID-19 documentation Print function (e.g. for prescriptions) Patient interface for multilingual communication |
Health Surveillance | Medical Records Transfer |
On-site data analysis “at the click of a button” Review local results for planning purposes Export anonymised results for meta-analysis and reporting | Encrypted transfer of patient records between participating institutions Transfer of patient records to/from other facilities on request or in anticipation of patient transfer |
Table 3
Indicator definitions based on diagnoses (ICD-10 Codes) and prescriptions (ATC-Codes) recorded in the electronic health record
Indicator labels | Indicator definition | Operationalisation (ICD-10 or ATC-Codes) |
Indicators based on recorded diagnoses | ICD-10-Codes |
Disability | Disabilities | H54, R47, H90-H91, H80-H82, Q71-Q73, M20-M21, Z89, G82, F06-F07, I68, P91, F7, F1 |
Skin | Diseases of the skin and subcutaneous tissue | L00-L99 |
Cons.ext.causes | Injury, poisoning and certain other consequences of external causes | S00-T98 |
Digestive syst. | Diseases of the digestive system | K00-K99 |
Blood | Diseases of the blood and blood-forming organs and certain disorders involving the immune mechanism | D50-D90 |
Inf.diseases | Certain infectious and parasitic diseases | A00-B99 |
Inf.notify | Notifiable infectious diseases | B30.0, B30.1, A05.1, A23.0, A23.1, A23.3, A23.8, A23.9, A04.5, A92.0, A00, A81.0, A97, A36, A98.4, A04.4, B67, A04.3, A75.0, A84.1, A95, A07.1, A41.3, A49.2, G00.0, J09, J14, J20.1, P23.6, A98.5, B15, B16, B17.1, B18.2, B19, B16.0, B16.1, B17.0, B17.2, B17.8, B20-B24, D59.3, M31.1, J09, J10, J11, A37, A07.2, A96.2, A68.0, A48.1, A48.2, A30, A27, A32, P37.2, B50-B54, A98.3, B05, A39, A41.0, A49.0, G00.3, P36.2, A22, B26.8, B26.9, A08.1, A70, A01.1, A01.2, A01.3, A01.4, A20, A80, A78, A08.0, P35.0, B06.8, B06.9, A0, A03, A50, A53, A82, Z20.3, P37.1, B75, A15 - A19, P37.0, O98.0, A21, A01.0, A92.0, A92.4, A96, A98.0, A98.1, A99, B02, P35.8, A04.6 |
Circulatory syst. | Diseases of the circulatory system | I00 – I99 |
Hypertension | Hypertension | I10-I15 |
Metabolic | Endocrine, nutritional and metabolic diseases | E00-E90 |
Diabetes | Diabetes mellitus | E10-E14 |
Musculoskelet. syst. | Diseases of the musculoskeletal system and connective tissue | M00-M99 |
Neoplasm | Neoplasms | C00-D48 |
Nervous syst. | Diseases of the nervous system | G00-G99 |
Ear.mastoid | Diseases of the ear and mastoid process | H60-H99 |
Eye.adnexa | Diseases of the eye and adnexa | H00-H59 |
Pregn.condition | Pregnancy, childbirth and the puerperium | O00-O99 |
Psych.condition | Mental and behavioral disorders | F00-F99 |
Genitourinary syst. | Diseases of the genitourinary system | N00-N99 |
Respiratory syst. | Diseases of the respiratory system | J00-J99 |
Indicators based on recorded prescriptions | ATC-Codes |
Psych. prescrip. | Psychoactive drug prescriptions | N05, N06A, N06B, N06C, N07BB |
Legend: ICD-10: International Classification of Disease. ATC: Anatomical Therapeutic Chemical Classification. |
The data used in this paper covers the time span from October 2018 to April 2023. The facilities included in this study joined the surveillance network at different dates (Supplementary Chap. 3.). Some centres have since departed from the network due to closures or changes in healthcare providers, but still contributed their anonymous health surveillance data for the purpose of this study. Provided data consequently varies per centre (Supplementary Chap. 3.).
RefCare is used by health professionals, who are the data holders of the individual-level patient data in on-site health care facilities. The respective authorities in the three federal states are responsible for immigration data, and are data holders of the occupancy data, i.e. the sociodemographic information of the refugee centres’ inhabitants.
The flowchart in Fig. 8 provides an overview of the data selection process, the nature of used data sources and the four derived data subsets (Subset 1–4).
Electronic Health Records (RefCare) data set (subset 1)
Using the 25 refugee centres and months as units of analysis, subset 1 contains 833 observations (i.e. 833 “centre-months”) of recorded medical data with an average of \({mean(n}_{pat})=259\) (standard deviation \(sd\left({n}_{pat}\right)=287\)) patient-months. The sample comprised 215.864 patient-months (= \(\sum _{i=1}^{833}{n}_{pat}^{i}\), where \({n}_{pat}^{i}\) is the number of refugee patients of “centre-month” \(i\)) of a total of 109.175 refugee patients between October 2018 and April 2023 (Fig. 8). For these 833 centre-months, we used reported monitoring data on the number of male, female, adult (≥18 years of age) and underage (<18 years of age) patients; as well as data on the incident coding of diagnoses for 21 indicators (based on ICD-10 Codes) by centre and month. Furthermore, in sensitivity analysis 1 we used data on the country of origin of the patients from the EHR to run models which account for compositional differences in the refugee population within and between refugee centres over time (Supplementary Chap. 2.1.). Estimates for the COVID-19 impact from this sensitivity analysis are reported in Fig. 2.
Occupancy data and aggregate-level socio-demographics
Furthermore, we gathered information on occupancy of each refugee centre within the PriCarenet surveillance network through a monthly online survey conducted with the responsible authorities of these centres. This prospective census survey was initiated in October 2018 and encompasses count data concerning the number of residents on the 15th day of each respective month, categorized by age (adults: ≥18 years and children: ≤18 years), and biological sex (male/female) (Fig. 8). To determine the total occupancy of each centre for every month, we combined the reported counts of male and female adults separately for the adult population and likewise for the children. These cumulative counts of children and adults were then summed to calculate the total occupancy for each centre and month. Furthermore, the overall (unstratified) number of the occupancy was collected.
Participation of authorities in this survey is voluntary. We collected occupancy data from 22 centres, resulting in a comprehensive dataset covering 417 centre-months spanning from October 2018 through June 2023 (Fig. 8). The average occupancy stands at \({mean(n}_{occ})=411\) individuals per centre per month, with a standard deviation of \(sd\left({n}_{occ}\right)=435\).
Description of derived datasets and variables
We matched the EHR data with the monthly occupancy data for each centre, wherever possible (Fig. 8). In 64 cases, the occupancy count was lower than the number of patients (i.e., \({\text{n}}_{\text{o}\text{c}\text{c}}<{\text{n}}_{\text{p}\text{a}\text{t}}\)). This occurrence is reasonable in situations where refugee centres experience a rapid turnover of individuals, such as a high influx of new arrivals and frequent transfers. In such instances, individuals may seek on-site healthcare services but stay within the centres for only a brief period, leading to a temporary misalignment between occupancy figures and the number of patients receiving healthcare services. These observations were excluded for the main analysis which resulted in a total of 314 centre-months between October 2018 and April 2023 of 21 centres (with \({mean(n}_{pat})=243\), \(sd\left({n}_{pat}\right)=240\), \({mean(n}_{occ})=459\) and \(sd\left({n}_{occ}\right)=462\); subset 2).
In 75 cases, the sum of the reported strata counts (female/male x adult/children) did not equal the reported total occupancy. Therefore, we repeated the main analysis on subset 3 (sensitivity analysis 2), where the occupancy totals equal the totals in occupancy age-and sex-strata AND \({\text{n}}_{\text{o}\text{c}\text{c}}\ge {\text{n}}_{\text{p}\text{a}\text{t}}\). (Supplementary Chap. 2.2.). Furthermore, in sensitivity analysis 3, we repeated the main analysis again (which was performed on subset 2), but instead used subset 4 of the linked data which contained no restrictions, i.e. all observations of the linked dataset (Supplementary Chap. 2.3.).
Furthermore, we calculated the following variables (for each subset respectively):
-
time: discrete variable indicating time from the start up to the end of the observation period October 2018 to April 2023 with time ID = {1, …, 56}
-
covid: coded 0 for pre-covid time points and 1 for post-covid time points (0: < March 2020, 1: \(\ge\) March 2020). This variable captures the impact of the COVID-19 pandemic in peri-pandemic time periods, with pre-pandemic time periods used as reference.
-
postslope: coded 0 up to the last point before COVID-19 and coded sequentially from 1 thereafter (0: < March 2020, 1: March 2020, 2: April 2020, …, 37: April 2023). This variable captures the peri-pandemic time trend.
It should be noted that there are two levels in the data: months and centres. As a result, there are multiple observations of these levels per year. Therefore, in order to report the mean incidence (Table 1), we determined weighted mean values averaging over the months for each facility per year, so that there is only one observation per year of a facility. The weighting was based on \({n}_{occ}\), i.e. months with high occupancy were assigned a higher weighting when calculating the weighted mean incidence for each facility (“weighted mean facility observation”). Table 1 shows the mean value with standard deviation (mean+-sd), median with 25th and 75th quartiles (Q1, Q3), minimum and maximum (min - max) and 95% confidence interval (CI) weighted mean values of facility observations. Furthermore, the annual weighted mean value and weighted standard deviation are given, whereby the weighting was accordingly to the mean occupancy of a facility within one year. That is, if the mean occupancy of a facility in one year is higher, the observation weighs more (“weighted annual”: mean and standard deviation of the “mean facility observation” values within one year weighted by the mean occupancy of the respective facility; compare row W, 2018–2023). Additionally, the mean value and the standard deviation of the weighted annual mean values were calculated (row W, last column).
Description of the regression model
In order to assess the impact of the COVID-19 pandemic on the incident health indicators we fitted a negative binominal model with zero-inflation model on the matched data for each indicator. The model allows the conditional mean to depend on the percentage of adult and male occupancy, overall number of occupancy (\({\text{n}}_{\text{o}\text{c}\text{c}}\)) as well as randomly on centres, while β0 captures the baseline level of the outcome at time 0 (beginning of the observation period), βtime estimates the structural trend or growth rate, independently from COVID-19, βcovid estimates the immediate impact of COVID-19 or the change in the outcome of interest after COVID-19 and βpostslope reflects the change in the trend or growth rate in the outcome after COVID-19. Furthermore, the model assumes structural zeros (Supplementary Chap. 1.). The model can be represented by the following set of equations:
$${\mu }=\text{E}\left(\text{c}\text{o}\text{u}\text{n}\text{t}|u,\text{N}\text{S}\text{Z}\right)=\text{exp}\left({{\beta }}_{0}+{{\beta }}_{\text{a}\text{d}\text{u}\text{l}\text{t}}+{{\beta }}_{\text{m}\text{a}\text{l}\text{e}}+{{\beta }}_{{\text{n}}_{\text{o}\text{c}\text{c}}}+ {{\beta }}_{\text{t}\text{i}\text{m}\text{e}}+{{\beta }}_{\text{c}\text{o}\text{v}\text{i}\text{d}}+{{\beta }}_{\text{p}\text{o}\text{s}\text{t}\text{s}\text{l}\text{o}\text{p}\text{e}}+u\right),$$
$$u\sim\mathcal{N}(0, {\sigma }_{u}^{2})$$
,
$${{\sigma }}^{2}=\text{V}\text{a}\text{r}\left(\text{c}\text{o}\text{u}\text{n}\text{t}|u, \text{N}\text{S}\text{Z}\right)= {\mu }\left(1+\frac{{\mu }}{{\theta }}\right),$$
$$\text{l}\text{o}\text{g}\text{i}\text{t}\left(\text{p}\right) = {{\beta }}_{0}^{\left(\text{z}\text{i}\right)}$$
where u is a centre specific random effect, \(\text{N}\text{S}\text{Z}\) is the event “non-structural zero”, \(\text{p}=1-\text{P}\text{r}\left(\text{N}\text{S}\text{Z}\right)\) is the zero-inflation probability and \({\beta }\)’s are the regression coefficients with subscript denoting the covariate and with 0 denoting the intercept 45. The chosen parameterization of the negative binomial uses a logarithmic link and denotes the variance increasing quadratically with the mean as \({{\sigma }}^{2}= {\mu }(1 + {\mu }/{\theta })\), with \({\theta }>0\) 46 (Supplementary Chap. 1). The analysis was performed with R-programming language using the glmmTMB-package 45.
Counterfactual analysis
We performed a counterfactual analysis by predicting the expected values of the 21 health indicators given that the pandemic had not happened (variable covid set at “0”) while considering the socio-demographic characteristics of the underlying refugee population in respective centres and time periods. We plotted the estimated counterfactual, observed, and estimated outcome values given by incidence rates in percent (i.e. the number of cases divided by occupancy and multiplied by 100) of selected indicators together in box plots over the observation period (compare Figs. 3–7).