The beginning days of the COVID-19 pandemic impacted all aspects of life, including the way surveys were organized (49). Ongoing and planned studies using face-to-face data collection needed to adjust their fieldwork (50–52). New surveys aiming to rapidly evaluate the impact of the pandemic faced budget and time challenges and were dependent on the existing survey infrastructure of their country. The choices made regarding the design of new surveys were often driven by urgency and pragmatics. This manuscript described the methodology of the COVID-19 health surveys, a series of surveys in Belgium aiming to monitor the general population on health related topics after the onset of the COVID-19 pandemic. It was not feasible to set up the first survey in a couple of days using a probability sample. Consequently, the COVID-19 health surveys became repetitive non-probability web surveys launched through multiple channels to reach the target population.
Using this approach a large number of participants could be reached. In the beginning of the pandemic the number of participants was the highest; the first survey organized within three weeks after the first restrictions were put in place had almost 50,000 participants. Even though the number of participants decreased throughout time it remained high: the last survey ended at 13,882 participants. The participation trend does not follow the severity of the epidemiological situation as some surveys were organized in other critical phases, but had nevertheless much less participants than the first surveys (e.g. the surveys organized in the winters of 2021 and 2022). The declining participation rate may have several reasons. In general, at the beginning of the pandemic, the news and people’s own thoughts and lives were dominated by COVID-19 as we never experienced this before, this made the survey topic highly salient. Moreover, the first surveys were organized in strict lockdown periods which gave people time to complete the survey. The two former reasons resulted in a wide dissemination of the COVID-19 health surveys by the press in the beginning of the pandemic while the media attention decreased for later surveys. A declining participation trend over time is also seen in other repetitive COVID-19 surveys (13, 39). The majority of participants of all COVID-19 health surveys were reached the first and second day after the launch of the survey. This indicates that the surveys were mainly shared within the first days after the launch and that people completed the survey almost immediately after viewing the link to the survey on a website, a social media page or an invitation e-mail.
The samples of the COVID-19 health surveys were prone to biased estimates as they relied on self-selection and excluded people without internet access or skills. This is the biggest point of criticism that non-probability web surveys receive (22, 23). To try to reduce the self-selection bias, some recommendations to improve the composition of the COVID-19 health survey samples were taken into account. Firstly, informal partnerships were set up with trustworthy organizations such as local community organizations, health insurance funds, young adults and elderly organizations, etc. (6). This was to build trust among different population groups and increase participation. Secondly, the recruitment strategy was diverse and included multiple platforms to reach different subsets of the population (6, 22, 27). Thirdly, the results of recruitment efforts were assessed after every survey and extra efforts were made during the next survey when realizing that some population groups were not enough represented. For example, substantial efforts were made starting from the seventh survey to attract more youngsters.
Despite these efforts, the unweighted sample distribution remained suboptimal. Males participated less than females in all ten COVID-19 health surveys. There was an underrepresentation of the youngest (18–24 years) and oldest age groups (75 + years) in all COVID-19 health surveys. In addition throughout the time, a decline in the number of young participants (18–44 years) and an increase in the number of older participants (55–74 years) can be observed. There was also a strong educational difference with, as expected, low educated people taking less part in the surveys. People from the Walloon Region were less prone to participate in the surveys. The share of vaccinated individuals in the sample of the eighth COVID-19 health survey (October 2021) was also compared to their share in the population. This indicates a modest overrepresentation of vaccinated people: 78% of the participants reported being fully vaccinated compared to 73% in the population. The recruitment approach of the COVID-19 health surveys did not make it possible to get (demographically) balanced samples. Other types of non-probability sampling approaches such as using paid and targeted adds on social media or retaining participants via commercial opt-in panels succeeded better in getting demographically balanced sample (13, 39).
Post-stratification weighting on socio-demographic factors was applied to at least partly take into account the unequal distribution of some population groups in the COVID-19 health surveys. For indicators related to attitude towards vaccination the weighting strategy took into account vaccination status too. However, weighting for these factors is not sufficient to eliminate bias in the estimates. There are also unobservable characteristics which cannot be taken into account using weighting that impact both the chance to participate and the outcomes of the survey (23, 30). For example, personality characteristics such as “tending to be lazy” and “being generally trusting” relate to both participating in a survey and complying with COVID-19 related preventive measures and this generates bias (23). In addition, by referring to terms as ‘mental health’ in the survey recruitment materials more interest of people suffering from mental health problems can be expected (22). Though, it must be acknowledged that people with severe mental health illnesses will not be reached using online surveys (24). Lorant et al tried to estimate the selection bias in their non-probability web survey on mental health. The results show indications for an overestimation of psychological distress but a slight underestimation of the magnitude of risk factors (30).
As a consequence, caution is needed when generalizing results deriving from these type of non-probability web surveys to the general population. It is not recommend to calculate descriptive estimates such as prevalence rates from these surveys (28, 53, 54). However, in the beginning of the pandemic there was an urgent need to have figures about the impact on the Belgian population. As there was no alternative in the form of a probability survey including people without internet access, the prevalence rates of the COVID-19 health surveys were considered as informative. Though, it was important to openly communicate about the limitations of this data when presenting the results. Inferences regarding associations between variables are generally less sensitive to sampling quality (53). The relations found using the COVID-19 health surveys between covariates and outcomes such as depression and anxiety were most likely less prone to bias. Apart from the bias associated with the sampling, bias in the estimates can also result from the self-reporting aspect. For example, there might have been an overestimation of the compliance to preventive measures such as staying at home or wearing face masks by the COVID-19 health surveys as this is socially desirable behavior (55). Besides, participants might have underreported their feelings of anxiety or depression because they consider these feelings to be socially undesirable. In contrast, some participants might have exaggerated these feelings in order to share their frustration with the crisis.
The strengths of the COVID-19 health surveys should also be stressed. The first asset is related to the questionnaire development and content. All surveys included as much as possible validated and frequently used instruments and scales, if possible the ones already used in the national health interview surveys. In addition, the surveys covered multiple health outcomes, highly relevant policy topics and contained a large set of covariates ranging from questions on socio-demographics to questions on financial insecurities and personality characteristics. The second major asset is the organization of a longitudinal study by re-inviting participants for next editions. A large share of participants completed the COVID-19 health surveys at least five times over two years (n cohort = 12599). The benefit of following up the same individuals over time is that the evolution found for certain outcomes throughout the pandemic cannot be due to different sample compositions across different time points. This makes it possible to have a clear view on, for example, the effect of the different phases in the crisis on the mental health and contributing factors of mental health (7). However, we must recognize that certain population groups are underrepresented in the cohort (males, people from the youngest and oldest age groups, low educated people and people living in the Walloon Region) and we relied on non-probability sampling to establish the cohort. The third major asset was the flexibility and timeliness to include new highly relevant topics in the surveys based on the demand of policy makers. Examples of these topics were access to care and the attitude towards vaccination. The last asset is that the participants of the COVID-19 health surveys served as a recruitment pool for other COVID-19 projects including a qualitative study on the attitude towards vaccination.
The pandemic and the associated demand for data on the well-being of citizens taught us lessons for the future of survey methodology. In order to evaluate the impact of unexpected crises, we must ensure that we can survey randomly selected individuals instead of relying on convenience samples. Non-commercial online panels with a probability-based sample established prior to the crisis are an optimal choice for this (6, 21, 23, 35, 37). Especially when providing panelists who do not have access to the internet with access to participate anyway or foreseeing them with paper response options. These studies limit self-selection bias and under-coverage bias and have valid comparison points with pre-crisis data. These types of panels did not exist in Belgium when the pandemic started but it is important to build them into our survey infrastructure. Fortunately, two initiatives are currently taken to make up for this lack. There is the preparation of a large scale Belgian probability panel, a multi-purpose panel that will be co-owned by all Belgian universities (56). In addition, there is the “Belgian Health and Well-being cohort”, a cohort study initiated by Sciensano with a focus on mental health. This is the successor of the COVID-19 health surveys and the respondent pool will consist of both previous participants of the COVID-19 health surveys and individuals selected from the national register. The eleventh COVID-19 health survey organized in June 2021 was the first step towards this cohort. In this edition, only former respondents could participate. They were asked to become a member of the cohort. In addition to setting up large-scale panel studies, it is also relevant to always ask participants of large probability studies such as the national health interview survey if they may be contacted by e-mail or postal mail in the future for follow-up research (6, 22).
The outcomes of the COVID-19 health surveys in terms of the participation and sample composition indicated that certain subgroups of the population are easy to attract for survey research and remain interested for follow-up surveys whereas for other subgroups the opposite holds. Also in probability surveys not organized in COVID-19 context, participation rates differ by socio-demographic characteristics (53, 54). The large participation differences found in the COVID-19 health surveys made us think about using different recruitment approaches for different subgroups, especially for the youngsters. After consultations with internal communication experts, we started using different recruitment channels and different recruitment materials to reach more youngster. Although the results were modest, experimenting with tailoring the data collection to different subgroups by using different recruitment materials, incentives or reminders instead of using a “one-method-fits-all-design” could be valuable. These type of studies have so-called adaptive or responsive survey designs (57, 58).