As with many large health surveys across Europe, the PIENTER studies face decreasing survey response rates through time(3–5, 22). With an all-time low response to PIENTER3, concerns surrounding the influence of non-response biases on future estimates are at the forefront. However, response rates do not always indicate high levels of non-response bias, and the overall influence of non-response bias on survey derived estimates varies per research questions (13, 23). Therefore, the documentation of differences between participants and non-participants is crucial.
Generalisability
We found that the age and gender structure of the P3 sample did not closely mimic that of the Dutch population, but this was to be expected due to the study design. Whilst post-hoc weighting for variables such as age and gender can easily be applied to estimates, adjusting for other factors such as geographical location, urbanisation level, educational level, and health status could prove more difficult. There may well be non-response biases within the weighting classes that are under-represented to begin with, due to the influence of topic saliency, among other factors (11, 14, 23). Subsequently even those in our sample classified as having lower education and poorer health, for example, may not represent this subgroup well at a population level.
Survey Response
The response rates seen in P3 are in line with age and gender stratified response behaviours seen in previous PIENTER studies, and in other large national health surveys (3–5, 22, 24). Gender imbalances in health survey response are common, and are posited to be mediated by gender-related values interacting with decision making (25, 26). Despite some research indicating that men may be more likely to respond to a survey when offered higher incentivisation, the larger renumeration offered in P3 has not obviously influenced the gender distribution of the sample (5, 26, 27). Overall, efforts to increase the numbers of men in the working age ranges in P3 seem to have been largely unsuccessful (5).
Large response differences between the genders in the working age range are commonly seen in other health surveys. In P3 this could largely reflect a perceived burden on time, as this was the most cited reason for non-participation. Further, these differences could be amplified in the Netherlands. In 2017 75% of women aged 20–64 were reported to be working part-time (< 28 hours a week), compared to 22% of men (28). This is more than double the EU28 average for women (31.4%) in this age category (28).
In the non-western migrant (NWM) oversample, the overall response was much lower but varied similarly by age and gender. However, the comparatively high response of Dutch speaking migrants, from Suriname, the Antilles and Aruba (SAN), could indicate a language penetration issue for the survey. The initial invitation and information leaflet were sent in Dutch. There was a single sentence on the second page of the invitation in English, Turkish and Arabic to indicate the letter and information was available in other languages online or at request. These additional steps to access the survey may well reduce individual likelihood of engagement (29). However, as response was similar between SAN and Other NWMs (non-Dutch speaking), the lower response in those with Turkish or Moroccan backgrounds may reflect additional barriers beyond language. This could relate to variable cultural values surrounding health, research and community engagement, or awareness of and/or trust in the RIVM specifically (30, 31).
Comparing Response Types
Although random forests were unable to distinguish ANRs from FPs accurately that does not necessarily indicate a lack of non-response bias. In a large meta-analysis of 539 studies, it was demonstrated that prevalence estimates from participants and non-participants often had large differences that were not strikingly evident when comparing the group demographic characteristics (11).
When predicting NRQs from FPs, self-reported health was the strongest predictor, even when coded missingness was excluded in a form of complete-case analysis. FPs reported most frequently to be of very good health, whilst the majority of NRQs, after excluding missing values, report to be of good health, but not to the higher level seen in FPs. Combined with the large difference in distribution between these health categories in the available data, the high proportion of missing values seen in the NRQs could indicate an unwillingness to divulge poor health status, and thus the presence of healthy responder bias, a well-documented phenomena in voluntary participation health studies (32).
Using the “Continuum of Resistance” theory, that stipulates non-responders are furthest away from full responders on a continuum that ranges from “will never respond” to “will always respond”, we may take the assumption that NRQs act as a reasonable proxy for ANRs. Extending this assumption, we may expect that ANRs on average may report poorer health status. Considerable differences in health status between responders and non-responders to health surveys have been documented previously (24, 33, 34). As non-response adjustments for demographic factors alone may not reduce estimate biases sufficiently, this could have considerable impacts on health-related and prevalence estimates from the P3 sample, depending on the topic (33).
Looking at differentiating FPs from QOs, the most important predictors related to geographical location, urbanisation and age. This is probably reflective of perceived available time and survey mode preference, as QOs were younger, and had both a larger proportion of men and those living in areas of very high urbanisation, established predictors of non-response (35). A study of survey mode preference in the Netherlands found that those in younger age classes preferred app-based approaches, and that men were more responsive to face-to-face and registration linkage survey methods (36) .
Overall Impact
The primary aim of all three PIENTER studies is to assess the populations seroprevalence of infectious diseases, and levels of protection against vaccine preventable diseases (VPDs). For NIP vaccines in the Netherlands uptake is almost universally high (37). It is therefore unlikely that any differences between participants and non-participants would have a large impact on estimates of seroprevalence of VPDs. However, this may not be the case when vaccine uptake or disease exposure is less universal, as estimates may become biased where coverage or exposure varies by under-represented subgroups. The uptake of the HPV vaccine, for example, varies largely by migration background and socioeconomic status (38). The vaccine was rolled out in 2009 and subsequently included in the NIP during 2010, with uptake reaching a maximum of 63% in 2021 (37, 38)
Participation Trends in PIENTER Through the Decades
PIENTER participant characteristics were not highlighted by random forests as important in differentiating the studies from each other. In fact, the strongest predictors of study origin were age and NIP participation; a simple proxy for the cohort effect as more of the population becomes eligible for NIP participation, at a younger age, through the years. Based on this we could posit that the PIENTER participants are largely similar “types” of people, and thus estimates from the three studies could be compared across time. This, combined with the decreasing response rates seen in the PIENTER studies, may indicate that it is the interactions between participant characteristics, social-/physical- environment and the survey that produces a survey participant that have reduced through time (35).
Although RF was not able to distinguish participants by PIENTER year, we did capture evidence of falling confidence in the NIP among full participants. However, we saw in the variable importance plots that “opinions on vaccination have changed” was of lower but similar importance to “educational level”. It is possible that this apparent falling confidence in the NIP may be a product of the over-representation of the highly educated in the P3 sample. High educational levels have been previously and recently correlated with vaccine hesitancy in Dutch populations (39, 40).
Limitations
As for all survey research, we faced limitations regarding missing data and data quality. The non-response survey data, conducted as a telephone follow-up, contained a large proportion of item-missing data. This should be considered alongside our interpretations.
Secondly, our dataset was unbalanced with regard to the response type outcome, and very much so in the QO class, as indicated by our skewed confusion matrices with low pmcs. To check that our conclusions were not distorted by this, we ran analyses on random subsets of data containing more balanced proportions of the two possible outcomes. We found that the rankings of variable importance remained stable.
Future considerations
Adjustments for non-response can only go so far, and it has been shown that a balanced survey response is less biased in its estimates than when using post-hoc adjustments alone (23). After all, post-hoc weights are frequently based on limited available data, can’t improve overall precision, and do not deal with non-response biases within weighting classes.
As survey response is likely to continue to decline, future PIENTER studies may consider alternative methods, such as targeted mixed-method survey designs, to improve overall response (41–43). Further, questionnaires available in different formats and in multiple languages reduce barriers to participation and address survey mode preferences across different subgroups (9, 30).