Population health surveillance using mobile telephone surveys in low- and middle-income countries: methodology and sample representativeness of a behavioural risk factor survey of live poultry exposure in Bangladesh

In low- and middle-income countries (LMICs), population-based health surveys are typically conducted using face-to-face household interviews. However, telephone-based surveys are cheaper, faster, and can provide greater access to hard to reach or remote populations. The rapid growth in mobile telephone ownership in LMICs provides a unique opportunity to implement novel data collection methods for population health surveys. This study describes the methodology, development, and population representativeness of a mobile telephone survey measuring live poultry exposure in urban Bangladesh. A population-based cross-sectional mobile telephone survey was conducted between September and November 2019 in North and South Dhaka City Corporations (DCC), Bangladesh to measure live poultry exposure using a stratied probability sampling design. Data were collected using a computer-assisted telephone interview (CATI) platform. Call operational data were summarized, and participant data were weighted by age, sex, and education to the 2011 census. Demographic distributions of the weighted sample were compared with external sources to assess population representativeness.

household surveys in LMICs (8,9). While sampling frames for probability-based telephone surveys have traditionally been limited to landlines in HICs, the growth of mobile telephone ownership and increasing number of mobile telephone-only households has led to the development of dual-frame sampling designs (10,11). However, in LMICs growth in mobile telephone subscriptions has been exponential, with 22.9 subscriptions per 100 people in 2005 to 99.3 per 100 in 2020 (12). This rapid increase has led to cellular networks leapfrogging landline infrastructure, and mobile telephones becoming the primary mode of communication (13).
High levels of telephone ownership in LMICs provide a unique opportunity to implement novel data collection methods for population health surveys using mobile telephones as a primary sampling unit (13). However, there continue to be important methodological concerns regarding the use of mobile telephone surveys in producing population representative samples due to selection bias, coverage error, and low response rates (14). For example, the sociodemographic pro les of mobile telephone respondents have been shown to be different to those of face-to-face household survey respondents (15,16). Recent systematic reviews identi ed only a few studies published using probability-based mobile telephone survey methods in LMICs and reported a lack of consensus on the best implementation approaches and analytic methods to overcome methodological challenges in these populations (17,18).
In Bangladesh, where the mobile telephone penetration rate is over 87% (19), there is increasing use of telephone-based surveys for behavioural risk factor surveillance (20). In urban areas in particular, where the mobile penetration rate is even higher (21), these surveys have the potential to be especially useful for measuring population health outcomes. However, the population representativeness of these surveys has not been systematically evaluated and analytic methods such as poststrati cation adjustments have not been conducted (20). Therefore, the potential impacts of selection bias and coverage error on study ndings and population estimates remain unknown.
This study aims to address these critical methodological gaps to support the use of probability-based mobile telephone survey methods for routine population health surveillance in LMICs. Here we describe the methodology and development of a mobile telephone survey measuring live poultry exposure in urban Bangladesh. Human-animal contact is a signi cant risk factor for the emergence of novel infectious diseases (22), and is therefore a key measure to capture in behavioral risk factor surveillance. Speci cally, we provide an in-depth discussion of the methodology covering sample design, questionnaire development, data collection, and poststrati cation analytic methods, as well as call outcome results including response rates and population representativeness.

Study Design & Sampling
A population-based cross-sectional telephone survey was conducted between September and November 2019 to recruit a representative sample of adult males and females in North and South Dhaka City Corporations (collectively known as DCC), in Dhaka, the capital of Bangladesh. The sampling frame was a list of mobile telephone numbers from each of Bangladesh's four mobile telephone operators (i.e., Grameenphone, Robi Axia, Banglalink, and Teletalk). Telephone numbers were restricted to those active in DCC, or if they could not be restricted to DCC, to those active in Dhaka District. Over 75% of the population of Dhaka District resides in DCC (23). Telephone numbers were provided by each mobile telephone operator with the permission of the Bangladesh Telecommunication Regulatory Commission (BTRC).
A single stage strati ed probability sampling design was used to select participants. Before selection, the telephone numbers were strati ed by mobile operator and sampled in accordance with each operator's proportionate market share in order to maximize precision of the sample and ensure a representative distribution (Supplementary Table 1) (24).
Within each operator list, simple random sampling was used to select telephone numbers. At the time of contact, each selected mobile telephone respondent was screened for eligibility and an equal number of male and female respondents were recruited to allow for robust sex-speci c analyses. Individuals were eligible for inclusion if they were at least 18 years of age, current DCC residents, and had been residing in DCC for the past one year.

Questionnaire Development
The questionnaire was based on previous poultry exposure surveys conducted in urban China (25)(26)(27)(28), but modi ed to the Bangladeshi context through discussions with an advisory panel (n = 12) consisting of local experts in survey design, mobile telephone surveys, and infectious diseases. Using a structured approach, the panel reviewed each survey question to assess the face and content validity of the items, as well as identify areas for potential adaptation or modi cation and item reduction or addition. Two rounds of review were conducted and any items that did not achieve group consensus (de ned as 60% agreement) were modi ed and re-examined until consensus was reached. Key revisions from this step centered around prioritizing and selecting items that were deemed feasible and reliable to ask participants during a telephone interview. The questionnaire was translated into Bangla, and independently reviewed by two native-speakers with familiarity of the content matter to ensure comprehension and clarity.
The nal survey instrument was comprised of ve sections and captured information on exposure to live poultry through purchasing at live bird markets (LBMs) and food preparation, prevention practices, in uenza-like illness (ILI), and socio-demographics. LBMs were de ned as a collection of stalls or vendors where the general public could purchase live chickens, ducks, geese or any by-products of these in an unprocessed form (29). Speci cally, questions covered the following topics: frequency of LBM visits and associated behaviours at markets, poultry processing practices during food preparation, uptake and adherence to hygiene practices and personal protective equipment (i.e., gloves, facemask, apron) during and after poultry exposure, self-reported ILI using a standard case de nition (30), as well as household and individual-level socio-demographics. To minimize respondent burden while obtaining detailed information where appropriate, the survey used a signi cant amount of branching logic. The questionnaire underwent thorough review, and modi cations were made as needed based on feedback from a pre-testing phase (n = 7) and a small-scale pilot (n = 41). The nal, updated survey took approximately 10-15 minutes for respondents to complete.

Data Collection & Calling Procedure
Both English and Bangla versions of the questionnaire were programmed into a customized computer assisted telephone interview (CATI) platform developed by the Institute of Epidemiology, Disease Control and Research (IEDCR) in Dhaka, Bangladesh. This platform managed both the sampling and data collection processes, including: complex form structure, automated repeat call attempts and interview rescheduling, automated strata monitoring on key variables (i.e., mobile telephone operator, sex of respondent) across interviewers, and pairing with a mobile telephone application to facilitate automated dialling of each selected telephone number. A team of four female data collectors were recruited to conduct telephone interviews, and data were entered into the CATI platform in real-time. Data collectors received four days of training on the survey methods and questionnaire topics before the start of piloting and data collection.
The survey was conducted between September and November 2019. In advance, a Bangla-language newspaper advertisement was placed in DCC's two most commonly circulated newspapers to inform the public that they may receive a call from IEDCR regarding a health survey, that telephone numbers were randomly selected with the permission of BTRC, and that participation was important for improving population health. Telephone calls were made every day (7 days a week) between 8am and 8pm (local time), except on Friday afternoons to account for local religious observances, to limit potential selection bias that could occur by only recruiting during weekdays and work hours.
All telephone numbers were attempted up to four times to establish contact and conduct an interview with the respondent. Each unanswered call was automatically re-scheduled for a different time of day on a different day of the week over the following seven-day period. If the respondent was not reached after the maximum number of four call attempts, with at least one daytime and one evening call attempt, the telephone number was classi ed as 'no contact' and discontinued. At rst successful contact, respondents were explained the purpose of the study, survey length, that participation was voluntary, and that all information they provided would be kept con dential. Eligibility was con rmed and consent for survey participation was obtained at the time of interview. When respondents were unable to complete the interview at the time of recruitment, the telephone interview was re-scheduled for a convenient time within the next seven days. Once an interview was completed, or if a respondent declined, refused or was ineligible, the telephone number was also discontinued from the call bank. An overview of the recruitment process is displayed in Supplementary Fig. 1.

Sample Size
A total of 1040 complete interviews (520 males and 520 females) were required in order to detect an 8-9% difference (65% vs. 56% (26)) in live poultry exposure between strata, with 95% con dence and 80% power. The reason for explicitly stratifying by sex was to have su cient statistical power to permit detailed exploration and identify notable differences in high-risk behaviours between males and females. This is important for ensuring appropriate and targeted risk-based implementation strategies.

Data Analysis: Response, Weighting, and Representativeness
Operational data for each telephone number dialled and the corresponding details for call outcome status were summarized. The overall and mobile operator-speci c response rates were calculated according to the American Association for Public Opinion Research (AAPOR) Response Rate-3 de nition, which includes those who were eligible and those estimated to be eligible in the denominator (31). The number estimated to be eligible was derived by assuming the proportion of eligible individuals amongst those contacted was the same as for those who were unable to be contacted or declined.
Demographic data for completed interviews were summarized, and the sample distributions were compared to the Dhaka City Corporation demographic pro le of the 2011 census (23). To adjust for non-response and disproportionate strati ed sampling by sex (i.e., oversampling of females as compared to the reference population), post-strati cation weights were calculated by age, sex and education to align with the 2011 census. Participants with an invalid response to weighting variables (i.e., age, sex, education) were unable to be assigned a weight and therefore are not included in weighted analyses (n = 16). The demographic distribution of the weighted data was summarized and compared with external data sources to assess the representativeness of the sample population for other key demographic variables, including marital status and region. All analyses were conducted in Stata 16.1 (StataCorp, College Station, TX, USA).

Call Outcomes &Response Rates
Between September and November 2019, a total of 5486 unique telephone numbers were dialled. An overview of participant recruitment and outcome classi cation is presented in Fig. 1. Of all telephone numbers dialled, 2051 (37.4%) were screened and determined to be ineligible. This included 288 telephone numbers not in service, 49 respondents not in the eligible age range, 234 not living in DCC, and 40 living in DCC for less than one year. In addition, 1440 respondents were excluded based on sex given the strati ed sampling design. No information was obtained from 2259 (41.2%) telephone numbers, including 1713 with no contact and 546 where the respondent declined to participate.
Assuming the proportion of eligible individuals (36.4%) amongst those who were contacted and screened for eligibility was the same as those unable to be contacted or who declined, 823 of these 2259 telephone numbers were estimated to be eligible. Interviews were completed with 1047 respondents out of 1999 eligible (known and estimated) telephone numbers, giving an overall response rate of 52.4%. Based on all known eligible respondents contacted, the overall cooperation rate was 89.0%. Table 1 presents these call outcomes and response rates overall and by mobile telephone operator.  (31), assuming the proportion of eligible individuals amongst those who were contacted and screened is the same as for those who were unable to be contacted or declined.

Sample Characteristics & Representativeness
As compared to the DCC demographic pro le from the 2011 census, the unweighted mobile telephone survey sample overrepresented males aged 25-34 years as well as males and females with higher secondary education, while it underrepresented males and females aged 55-74 years and those with primary or less than primary education. Given the strati ed sampling design aiming for equal representation of males and females, the overall unweighted sample overrepresented females and underrepresented males. After post-strati cation weighting on these key variables, the sample matched the population closely on age, sex and education ( Table 2).  (23).
The weighted survey sample is representative of other demographic factors that were not used in construction of the weights ( Table 3). The overall sample shows a close match to the 2011 census gures for DCC by region, with slight discrepancies within the sex-speci c strata. In terms of marital status, the survey slightly underrepresents single males and married females.

Discussion
This study empirically examines call outcomes and sample representativeness of a probability-based mobile telephone survey sampling strategy for measuring live poultry exposure in urban Bangladesh. Our survey had an overall response rate of 52.4%, and initial results comparing the socio-demographic pro le of the survey sample to the census population showed that mobile telephone sampling slightly underrepresented older individuals and overrepresented those with higher secondary education. Using poststrati cation weighting for age, sex, and education corrected for these differences, and the weighted sample was a good match to the census on other key demographic factors.
Therefore, these ndings support the use of mobile telephone-based survey sampling and data collection methods for producing population representative samples with minimal adjustment, which has important implications for improving population surveillance in LMICs.
A response rate of approximately 50% is lower than previous telephone-based surveys conducted in Bangladesh (20,32), but is in line with response rates achieved through similar methods conducted in other LMICs and is in fact higher than those typically achieved in HICs (16,33). Several factors could contribute to this lower rate compared to previous work, including changes in the methods of calculating response rates over time to provide more conservative estimates and general declines in response rates of population health studies over the past 30 years (31,34,35). The response rates were generally similar across mobile telephone operators, with the exception of Banglalink which was considerably lower. This could be due to differences in the geographic sampling frames between each operator, with Banglalink not restricted to DCC and instead sampling from telephone numbers listed in Dhaka district. Overall, this supports the use of strati ed sampling designs by mobile telephone operator to appropriately capture sub-population heterogeneity when conducting population-based surveys (36).
The unweighted demographic pro le of our sample differed most to the census population by educational attainment.
Overrepresentation of respondents with higher education is consistent across survey research methods, including those conducted in LMICs and HICs (14,16,37). The impact of these differences on population-level estimates will be greatest in surveys where education is strongly associated with the outcome of interest (38). However, the magnitude of this impact becomes negligible once weighted to the distribution of the reference population (15,38). Previous research in LMICs has found that minimal adjustment of demographic factors is su cient to reduce non-response and coverage error when conducting robust probability-based sampling (15). Additionally, comparisons with the census population show that our weighted survey achieved a good representation on other characteristics including region and marital status. Remaining differences between the census and the survey population distributions could be due to the fact that our survey sample includes only participants aged 18-74 years, whereas the published census gures include all ages for region and only ages 20-74 years for marital status.
Although we demonstrate that population-based probability sampling using mobile telephones can produce representative samples with minimal adjustment, there are some limitations to this methodology as evidenced in this study. First, while comparisons to the census population can be used to evaluate representativeness of the sample population, it does not preclude potential for selection bias due to coverage error. For instance, those who do not have mobile telephones are likely different to those who do on the basis of factors such as socioeconomic status (39,40).
However, in DCC the mobile penetration rate is very high at over 87% (19), which suggests that impacts on population estimates due to coverage error are likely minimal. Further work could examine this question in populations with lower mobile telephone coverage or non-urban settings. The opposite is also of concern-potential bias introduced due to some individuals in the population having multiple mobile telephone numbers. Recent work quantifying this effect on probability of selection has found that although the theoretical probability of inclusion for those with multiple telephone numbers is greater than those with only one number, the likelihood of contacting any individual is extremely small in practice (13). Although out of the scope of our analysis, this could be examined in future work by capturing information on mobile telephone ownership and applying selection weights. Finally, poststrati cation survey weighting should be conducted using recent reference population estimates. However, in DCC the most recent census available is from 2011 (23), therefore any signi cant changes in the underlying population distribution since this time would not be re ected in the weights and could impact weighted population estimates.

Conclusions
In many LMICs, such as Bangladesh, the coverage of mobile telephones is very high and includes a range of population subgroups. Mobile telephone-based surveys can therefore offer an e cient, economic, and robust way to conduct surveillance for population health outcomes. We conducted a mobile telephone survey in Dhaka City Corporation, Bangladesh to measure live poultry exposure using a strati ed probability sampling design. We evaluated the representativeness of our weighted sample population against the census and found that mobile telephone surveys All participants provided oral informed consent via telephone.

Consent for publication
Not applicable.

Availability of data and materials
The datasets analyzed during the current study are not publicly available due to pre-existing data sharing agreements.
The data may be made available from the corresponding author on reasonable request.

Competing interests
The authors declare that they have no competing interests. Pro le of participant recruitment and call outcome classi cation for the live poultry exposure mobile telephone survey, Dhaka City Corporation, Bangladesh