Methods for Estimating Population Size for Key Populations: A Global Scoping Review for HIV Research

Background: Estimating the population sizes of key populations is critical for understanding the overall HIV burden. This scoping review aims to synthesize existing methods for population size estimation among key populations (people who inject drugs, men who have sex with men (MSM), transgender persons, sex workers, and incarcerated individuals), and provide recommendations for future application of the existing methods. Main text: A scoping review was conducted and 39 of 688 studies met the inclusion criteria and were assessed. Estimation methods included ve digital methods, one in-person method, and four hybrid methods. We summarized and organized the methods for population size estimation into the following ve categories: methods based on independent samples (including capture-recapture method and multiplier method), methods based on population counting (including Delphi method and mapping method), methods based on the ocial report (including workbook method), methods based on social network (including respondent-driven sampling method and network scale-up method) and methods based on data-driven technologies (Bayesian estimation method, Stochastic simulation method, and LMS estimation method). 36 (92%) articles were published after 2010 and 23 (59%) used multiple methods. These include 11 in high-income countries and 28 in low-income countries. A total of 10 estimates the size of sex workers, 14 focused on MSM, and 10 focused on PWID. Conclusion: There was no gold standard for population size estimation. Among 120 studies that were related to population size estimation of key populations, the most commonly used population estimation method is the multiplier method (26/120 studies). Every method has its strengths. For example, some traditional methods are simple and easy to use for researchers. Some novel methods are time- and resources- saving. However, each method has its limitations and bias. For example, for the respondent-driven sampling method, stigma and discrimination may lead to the "hiddenness" of the key population; for the multiplier method, the quality of authentic data may also inuence the accuracy of the estimation.


Background
The global HIV epidemic disproportionately affects key populations, including people who inject drugs, MSM, transgender persons, sex workers, and people who are incarcerated (2). Understanding the HIV burden among the key populations is essential for estimating the overall burden of HIV both globally and regionally. Population size estimation is an important step towards understanding the HIV burden, and accurate size estimation of key populations can inform resource allocation and distribution of HIV prevention services. However, due to the hidden nature of some of these populations, estimating the population size of key populations is challenging. First, the methods for population size estimation have an intrinsic bias. For example, data inputs used by some methods may not re ect actual conditions if the quality of data can not be promised (3,4). Second, key populations may be hard to reach because of various reasons, such as social stigma and discrimination (5,6).
Existing literature related to the size estimation of the key population demonstrated the strengths and shortages of the currently existing methods (7). However, very few studies have systematically summarized the categories of previously used methods or pointed out their problems, which did not provide further guidance in using these methods in the future study. The traditionally used methods have various intrinsic biases. And the availability of reliable and authentic data has been a big challenge (8).
For example, acknowledging the existence of key populations by public health facilities or the government is challenging (9). Instead of stigma issues in other parts of the world, estimating the size of the key populations is particularly challenging in Eastern Mediterranean, Middle East, and North Africa Region because conservative social and religious values may cause harsh judgment and may even bring life-threatening punishment (10).
The lack of comparisons between different methods, and how to nd the best strategy based on the local context is the current knowledge gap. To ll the knowledge gap, this scoping review examined population size estimation methods in different settings among key populations. This study aimed to summarize the application of the existing population estimation methods and discuss their respective strengths and weaknesses.

Main Text
Search strategy Relevant studies published from January 2000 to 4th August 2020 and related to population size estimation were retrieved from PubMed (11). Search terms were chosen based on the relevance to the topic of this study. Search terms included "people who inject drugs"; "men who have sex with men"; "transgender persons"; "sex workers"; "people who are incarcerated" in combination with: "size estimate" and "size estimation". We used the PRISMA checklist for scoping reviews. This review was completed on 20th January 2021.

Selection criteria
After de-duplication, the nonduplicate publications were retrieved from PubMed, and further reviewed independently by two researchers to determine to identify the nal studies to be included. Only publications related to the sampling methods of population size estimation among the key populations and have referential meaning for the application of these various methods were included in the nal review. We excluded studies that were not related to the topic of this review or had no suggestive meaning for the future design of population size estimation methods. The titles, abstracts, and full texts of all publications were screened by two independent reviewers (F. J. and C. X.). If it was not clear whether a study should be included in the nal review, the three authors (F. J., C. X., and W.T.) reviewed the full texts together to discuss whether the article met the inclusion criteria.

Data extraction
A standardized extraction form was performed using Microsoft Excel to extract the rst author, date of publication, size estimation sampling method of key populations. The publications were categorized into ve categories. These include methods based on independent samples, methods based on population counting, methods based on the o cial report, methods based on social networks, and methods based on data-driven technologies.

Text mining
In order to illustrate research trends of HIV key population size estimation papers, we adopted a semantic analysis tool, CiteSpace, which can develop relation graphs of important research words from full-text studies (before the full-text screening, one reference from PubMed cannot be retrieved on Web of Science where the full-text mining tool can run on). This tool can also show relations among key words of existing research. To develop relation graphs among keywords as well as research trends about HIV key population size estimation topic, we employed text mining of all eligible full-text studies to better capture the relationships among several keywords.

Findings
Overall, 688 citations were retrieved from the initial search. After reviewing the titles and abstracts, 568 abstracts were excluded, leaving 120 full-text manuscripts. After reading the full texts, 81 studies were further excluded. Therefore, 39 studies were included in this scoping review (Fig. 1).
Among the included studies, seven used capture-recapture method, six used multiplier method, two used Delphi method, three used mapping methods, three used workbook method, six used network scale-up method, six used RDS method, three used Bayesian estimation method, two used LMS estimation method (12) and one used stochastic simulation method. Among the articles reviewed, 36 (92%) articles were published after 2010 and 3 (8%) were published before 2010. 16 (41%) studies examined one method and 23 (59%) studies used multiple methods. These included 11 studies in high-income countries (HICs) and 28 in low-and middle-income countries (LMICs). A total of 10 estimated the size of sex workers, 14 focused on MSM, and 10 focused on people who injected drugs (PWIDs). These population estimation methods included ve digital methods, one in-person method, and four hybrid methods. Appendix 1 summarizes the publications included in this review.
We used full-text mining of 120 full-text articles that could be retrieved on the Web of Science. Figure 2 shows relationships among several research key points including reference citing and semantic understanding. The capture-recapture method appeared three times in this graph with several edges.
Social network-based methods such as RDS and network scale-up (i.e., calling 'personal network' from full-text semantic extraction) were also in relatively big word size in this knowledge graph which represents the frequency of mentions. It should be noted that the key item named 'log-linear model' is relevant to Bayesian estimation and LMS estimation. Other methods like Delphi and the workbook method are more likely to be independent as they are even not shown up in this knowledge graph. Figure 3 represents the research trend of this topic in the preceding 20 years. We observe that the methods used gradually changed from traditional methods (e.g., capture-recapture) to social networkbased ways (e.g., RDS). Multiplier (14) Two independent sources of data are used to make the estimation, including an authentic count or list of the population whose size is being estimated and a survey of the populations whose size is being estimated.
There is accurate demographic and geographic information of the key population.
Simple and easy to use.
The quality of the data can cause bias; the resulting survey samples may not be fully representative of the key population.
Delphi (15) Estimating the size of key populations by the individual judgment of several experts.
The estimation from an expert team could accurately re ect the reality.
Low cost with high e ciency.
The estimation may be subjective and not reliable because of the quality of the expert team; Lack of strategies to deal with the disparity between the experts.

Sampling method
Mapping (16) The locations of the key population are systematically identi ed and mapped to estimate the size of the key population.
The quality of the data can be guaranteed by the full involvement of the key populations.
The estimate is made with transparency The missing of some geographical locations may underestimate the size of key populations; overestimation may happen if the key population frequently attend multiple locations Workbook (17) The key population is identi ed rst and then the estimates are combined with the total population to calculate the proportion of the key population in a speci c region.
Typically used in countries or regions where the epidemic is low and concentrated.
The estimate is made with transparency; errors can be prevented by automatic consistency and audit check.
In some countries, data may be limited because of stigma and discrimination among the key populations and legal issues, which may make data unreliable or of poor quality.
Network scale-up (18) Respondents are asked about the behaviors of acquaintances from their social network to estimate the number of key populations from the social network of each respondent. Although some novel methods for population size estimation have emerged in recent years, a large number of surveys have been conducted using the capture-recapture method. This method can provide accurate estimates at a low cost (23,24). In general, the premise of this analysis is based on the overlap between several samples of the key population (25). The process of the capture-recapture method includes two separate captures (26). Key populations are marked and counted in the two captures independently. Some participants captured in the second capture may have already been marked in the rst capture. In order to prevent the collection of personal identi cation information, unique objects such as coupons are commonly used to identify recaptures. However, calculating the number of recaptures is challenging because the databases used may not record the same unique objects from individuals (13). In some cases, there is no way to determine if the person with the unique object in the second capture is the same person who received it in the rst capture (27). Bias may exist because on some occasions key populations would surround the researcher who is distributing the objects because they hope to get the object. The choice of an appropriate unique object and distributors are of vital importance to guarantee a successful capture-recapture sampling (28,29). This approach is highly adaptable for key populations such as drug users and sex workers. It is recommended for use when a census or good-quality data are not available.

Multiplier
The multiplier method is always integrated with other methods, such as the respondent-driven sampling method to estimate the size of the key populations. There were three different types of multipliers among the publications reviewed, including service multiplier, unique object multiplier, and web/mobile Apps multiplier (30). The service multiplier method uses the programmatic data collected from key populations by health centers (31). The unique object multiplier method refers to randomly distributing the unique object to the key populations (24). The web/mobile Apps multiplier method assessed the use of a certain website or mobile phone application among the key populations (32). The accuracy of the multiplier is highly dependent on the quality of the data source (33). In addition, different data sources can produce different estimations (34). To improve the reliability and validity of the multiplier, the representativeness of the data source and the completeness of the benchmark should be considered in advance when conducting the survey.

Delphi
The Delphi method refers to convening a group of experts to synthesize and interpret the information in order to estimate population size (15). Typically, this method acts as a way to reach an agreement about the estimates from other methods. The team of experts usually consists of those who are familiar with the local geography and culture from local government, research institutions, and social community sectors. Generally, the median, upper and lower limit for the estimate are identi ed based on local and international data and the expert opinion of the Delphi panel (31). Experts' opinions will be gathered with discussion to reach a consensus that represents the "best" estimates. This method is vulnerable to subjectivity. Bias may arise when the expert team has a limited understanding of the demographic or geographic features of the populations whose size is being estimated.

Mapping
Mapping is similar to the cross-sectional study in epidemiological research. This method identi es the sites where key populations gather, such as public spaces, mobile apps, and websites. Using map sites to estimate the number of populations at each site begins with identifying locations frequented by key populations (16). Only the sites mostly frequented by key populations are identi ed and reported.
Mapping relies on the numeric estimates of key informants instead of the count of key populations at each identi ed site, thus there may be differences between different respondents interviewed at various sites (35). The variability of the estimates of different respondents could in uence the accuracy of the overall estimation (36). Mapping is not based on individuals, thus overestimating or underestimating the number of key populations may happen. The participation of the key populations depends on the extent of their visibility so some individuals may have been omitted, which will lead to underestimation. This method could also overestimate the number of key populations if they frequent multiple locations.

Methods based on the o cial report Workbook method
The workbook method uses data retrieved from health o cials at the national or regional level (17). It relies on the existing o cial records (37). This method emphasizes a range of estimates instead of a single point estimate. The workbook method uses regional spreadsheets to make estimations of various areas. The data are from the surveillance system and large-scale screening to gain an understanding of the distribution of the key populations (38). Inevitably, some regions may not have available data to make an estimation. Missing data are estimated by the data from the area with the most similar socioeconomic and geographic features. In addition, the estimation of missing data is usually adjusted by health o cials and experts who are familiar with the area.

Network scale-up
A network scale-up method is a promising approach to population size estimation. This method starts with estimating the size of a personal network in a small sample. The size of the network of each individual is estimated by predicting the number of key populations they know instead of asking questions about their behaviors directly (39). This follows estimating the number of people of key populations among the total population. The major assumption of this method is that the social network of individuals involved in the survey can represent the total population (18). The average personal network size in a certain area can be calculated by averaging the individual value of reported key populations over a large number of respondents (40). Each individual's report of their network contributes to the estimation. The main challenge of this method is to determine the sample size required since no individual has complete knowledge about all their acquaintances (41). The strength of the network scaleup method is that it does not require access to key populations except for people from the initial random sample. The main bias of this method is that estimating the size of a personal network can be cognitively demanding (42). Different people may have different de nitions of key populations and acquaintances (43).

Respondent-driven sampling
RDS method is increasingly prevalent for population size estimation of key populations in recent years (44). Many publications have demonstrated the success of peer-driven recruitment in collecting data for key populations. It is a network-based sampling method that starts from recruiting a selected sample from the key populations and respondents recruit their peers from their networks (45). The purposively selected sample is named "seeds", who recruit other members (19). There is always a limit for recruitment, usually 3-5 people (46). Coupons, quotas, and incentives are used to assist the recruitment. The coupons are given from the "seeds" and then passed to other members of the key population (47). The nancial compensation for the participation of the key population could facilitate the development of the recruitment chain. Each recruitee could potentially become a recruiter, which makes the recruitment continue in waves (48). The connection between recruiters and recruiters can then be traced using the unique identi cation of coupons. The longer the chains of recruitment, the more representative the surveyed sample (49). Even though longer recruitment chains could reduce potential selection bias, there are still chances for bias. For example, some populations whose activity is stigmatized may decline participation. In addition, the quality of RDS highly depends on the number of seeds used at the beginning of the study (50).

Bayesian estimation
The Bayesian estimation method is based on a prior probability distribution using Bayes' theorem to estimate the new probability. The Bayesian estimation assumes that prior probabilities can be used to enhance estimation (20). If the countries or cities are areas with no direct data on such population size, and there exists a prior probability, the Bayesian estimation method is well suitable (51). However, different investigators may have a different understanding of prior knowledge according to everyone's subjective realization. As a result, they might give different prior distributions and then obtain different posterior distributions, resulting in the subjectivity of this method.

Stochastic simulation
The stochastic simulation model is to estimate a population-based on epidemiologic data. Stochastic simulation (Monte Carlo) rstly generates a simulated system and then analyze it through probability models based on limited observed data (21). When we have information from observational cohort studies and clinical trials, such data can help to set simulation parameters, and then simulation models may work. When we have rich epidemiologic data, we can use stochastic simulation models to estimate population size. The strength of this method lies in the ability to produce plausibility ranges for estimates, which describe the uncertainty surrounding the estimates, based on the data to which the model was calibrated (52). As for shortcomings, rst, some large-scale complex simulation processes can be timeconsuming. Second, the validity of model estimates is highly dependent on the quality of available data used to calibrate the model.

LMS estimation
Laska, Meisner, and Siegel developed an unbiased estimator for the size of a population in a single venue based on a single sample (12). LMS estimation for MSM size population is based on one single sampling (22). In other words, this method assumes that we only have a one-time sampling. Compared with other population size estimation methods, rst, compared with the capture-recapture method, this method only needs one single-time "capture", hence it is time-saving and resource-saving. Second, when comparing with the multiplier method, it is more scienti c according to some statistical principles. However, in the eld of statistics, this method is quite traditional and a little hard to make some huge contributions or incorporate some novel revisions (53). However, as this method only requires one single sample, thus its estimation accuracy might be lower than other population size estimation methods.

Issues of existing population size estimation
Data accuracy, the skills of investigators, duration of size estimation studies, the involvement of the community, geographical areas, and costs and resources required for population size estimation are all essential factors to in uence the accuracy of the size estimation result (8).
The current size estimation methods have several limitations. First, further evaluation of the impact of the potential bias and how the biases may impact the size estimation of the key population is needed. Second, it is still hard to take the hardest to reach individuals into consideration. Traditional methods such as capture-recapture and the multiplier method extract independent samples from the population. It is challenging to achieve when the populations are hidden. Social stigma also makes accurate estimation of the size of key populations challenging. In addition, the engagement of people with illegal behaviors to disclose their behaviors or social network to interviewers may cause serious bias. Considering selling sex is legal in some countries but not in many other countries, this is closely related to local contexts.
We summarize things that the researchers need to think about when choosing methods for population size estimation into the following six categories (Fig. 5). Results may vary for the same population by using different methods. For example, when estimating the population size of MSM, using the capturerecapture method may overestimate the actual number of the population because the mobility of the population being estimated makes the number of recapture population decrease. Using the Multiplier method may not get the actual number of the population because it highly depends on the quality of the data source. In addition, the result may be underestimated because the population being estimated is hard to reach. Delphi method is vulnerable to the subjectivity of the expert team, especially when experts have limited understanding of the demographic or geographic features of the populations whose size is being estimated. Using the network scale-up method may underestimate the size of the population being estimated because the respondents may not have complete knowledge about all their acquaintances, which means the estimation can be cognitively demanding.

Recommendations
This scoping review has several implications. Developing improved methods to measure the size of populations of the key population is demanding. We need a novel, comprehensive method for population size estimation that avoids the aforementioned challenges.
First, when choosing the method for population estimation, we should consider the potential bias associated with each approach. For example, traditional social network-based methods are collecting data from the MSM population all the time, which might cause some potential bias called convenient sample bias.
Second, for the selection of the methods, we need to tailor this based on the features of the key population, local context, and costs. Evidence from a meta-analysis of multiple sources and Delphi panels could be applied where several ndings have been performed on the population whose size is being estimated (54). Behavioral surveys among the key populations should be conducted before the survey. Planning and preparation will improve the validity of the estimates. If possible, working with members from the key population whose size is being estimated in the community may help better select the most appropriate methods. A pilot study among the subsample of the population whose size is being estimated is a valid approach.
Third, using advances in technology and data science to assist the estimation might be the future trend. As mentioned before, from Fig. 3, we can know that the research trend of utilized methods of this topic in recent 20 years is gradually changing from traditional ways like capture-recapture to social networkbased ways like respondent-driven sampling. Perhaps that means the social network data could have great potential in developing accurate estimation models. With the rapid development of data-driven technologies, novel machine learning methods like graph convolutional networks (55) and generative adversarial networks (56) have become popular in Arti cial Intelligence (AI) eld. Using these new datadriven methods in size estimation tasks for public health research might be a valuable try in the future.
Furthermore, using data-driven technologies correctly could be friendly to key populations, because such data-driven approaches depend on existing accessible non-sensitive data, as other model-driven estimation methods may require some hard-reachable data which are private.

Conclusions
The population size estimation methods continue to have limitations. Different methods are likely to give very different results. The estimates depend on subjective judgments, the quality of authentic data and assumptions are always hard to meet. Use different methods to ll the limitation of the estimation methods and to balance the strengths and weaknesses of the used method would be critical to deriving the nal estimate (7).

Ethics approval and consent to participate
This study is a scoping review, and do not directly involve in any study participants and ethics approval is not needed.

Availability of data and materials
The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.  Relation graph of important words from full-text mining. Legend: This graph shows the relations among different keywords from full-text mining. The red font ones are important items relevant to size estimation methods, which are the research objective in our paper (because we do not study on items such as "Africa" and "risk behavior" in our study, hence they are labeled in black font). The appearing times and word size of each item can show its importance and relation centrality in this topic of research (i.e. size estimation for HIV key populations). Type of methods for population size estimation.

Figure 5
The researchers need to think about when choosing methods for population size estimation.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. Appendix.docx