To evaluate sampling strategies across all possible distributions of schistosomiasis in SSA, we first fit country and species- specific geostatistical models to characterise the spatial heterogeneity of schistosomiasis. These models were then used to parameterise simulations in a representative, hypothetical country in SSA, capturing the full range of possible transmission scenarios. Alternative sampling strategies were assessed based on the overall accuracy, numbers of schools correctly and incorrectly treated as well as estimated cost implications.
Country-level geostatistical analysis
To estimate the spatial distribution of schistosomiasis, we obtained data on schistosomiasis surveys from the WHO Expanded Special Project for Elimination of Neglected Tropical Diseases (ESPEN) portal [17]. This represents the largest and most geographically comprehensive database of schistosomiasis surveys. We assembled data for all geo-referenced school-based surveys of school age children in SSA. To exclude survey points with inaccurate spatial data, we removed all surveys with duplicate locations reported within the same year and surveys with coordinates reported outside the district or administrative unit of the named school/site. We additionally excluded survey points reporting only a prevalence rather than the numbers of children sampled and detected positive. We then excluded schools reporting over 100 children sampled as these were not representative of typical survey methodology and potentially reflected aggregated data. In line with current WHO guidelines, we defined S. haematobium using urine filtration and S. mansoni infections diagnosed using Kato-Katz techniques. For countries with repeated survey data reported for multiple years, we included only the most recent survey. Countries with less than 50 survey points were excluded from further analysis.
From this dataset, we fit binomial geostatistical models to schistosomiasis data. Models were fit separately for each country and species, with no additional covariates included. The number of individuals, Yi, who tested positive for schistosomiasis out of the total number of individuals examined, ni, at location xi was considered as the realisation of a binomial random variable Yi ~ Binomial(ni, p(xi)), with p(xi) modelled as:
Eq. 1
Where μ is the intercept, is a zero mean Gaussian process with variance σ2 and an exponential correlation function given by ρ(k; φ) = exp{-k/ φ} where φ > 0 is a scale parameter that controls the extent of spatial correlation and k is the distance between two sampling locations. is a set of independent zero-mean Gaussian variables with variance τ2. Models were fit using Monte Carlo maximum likelihood estimation implemented in R [18].
Schistosomiasis prevalence simulations
As we wanted to evaluate survey strategies for all possible schistosomiasis distributions, we chose to simulate prevalence within a hypothetical representative country rather than using data for single countries. We assigned this simulated country an area representing mean country area of all countries in SSA (km2). We used the median ratio of numbers of district level IUs to country area in SSA to determine the number of districts. Estimating five subdistricts per IU on average, we randomly assigned districts and subdistricts, with all districts comprised of 5 subdistricts. The final country had district and subdistrict sizes comparable to the average geographical sizes observed across SSA countries. As our simulated country had a similar area to Uganda, we distributed schools based on the density of schools within Uganda, assuming 500 SAC per school and at least 5 schools per district, leading to a total estimate of 15,000 schools countrywide (Supplementary information).
To capture heterogeneity of the spatial variability and spatial extent of schistosomiasis across SSA, we combined parameters from all geostatistical models to generate gold-standard prevalence distributions. For each species, we defined schistosomiasis using all possible combinations of the median, 25th and 75th percentile of country level parameters fit from geostatistical models, across deciles of mean prevalence. This allowed simulations of prevalence surfaces capturing all possible scenarios with the full range of prevalence levels, spatial variances and scales. For each combination of model parameters, we conducted 100 unconditional simulations of the number of SAC positive in all schools within the hypothetical country.
Evaluation of alternate survey designs
Using these simulations, we assessed different survey designs. These included: 1) sampling 5 randomly selected schools per district, with the IU defined as a district (existing sampling strategy); 2) sampling 5 randomly selected schools per subdistrict, using subdistrict as the IU; and 3) sampling 1 randomly selected school per subdistrict, with IU defined as a subdistrict. For all sampling strategies, we sampled 50 randomly selected SAC per selected school. As per current guidelines, we used the mean prevalence per IU to determine whether the school was above or below a threshold. To evaluate survey designs, we compared the survey classification of the IU to the gold standard classification calculated from the mean prevalence of all schools within the IU. Survey designs were assessed based on overall accuracy of treatment classifications (at the level of the IU) as well as the proportions of schools over or undertreated (given school-level prevalence) using the resulting assigned IU treatment classification. We additionally compared the survey accuracy, with parameters defining the spatial distribution of schistosomiasis, to determine how survey performance varied across transmission settings.
Cost analysis
As more intensive sampling may give more accurate results but be prohibitively expensive, we additionally assessed the cost of all survey designs. Survey design costs were estimated using the ingredients method using capital resources data obtained from school-based mapping surveys [16]. We considered only financial costs and excluded expenditures related to general programme operating costs or costs borne by the beneficiaries. Expenditures related were extracted for five annual programmatic surveys, including surveys conducted in 2016-2017 in Malawi and Uganda and surveys conducted in 2017-2018 in Tanzania, Malawi and Uganda (SCI, financial expenditure records). We calculated the mean costs per school surveyed separately for each of the five available surveys, using the median cost from all survey data to evaluate cost effectiveness. Consumable item costs were calculated based on the quantity used for each diagnostic method and number of children surveyed, assuming 10% wastage. We defined capital items as items with a typical life expectancy of over one year; the costs of these items were annuitized based on the useful life expectancy in years.
We assumed an average of one school per day would be visited by the survey team, including a half day to register children and collect samples and a half day of sample processing. Based on reported survey activities, teams included one driver, one team leader, one district officer and one central officer with three technicians would be required to sample 50 children. Per diems for sample teams were estimated based on reported country specific expenditures. We calculated mean costs per school based on reported district level fuel costs and numbers of schools covered, assuming vehicle maintenance was conducted once during the survey period. No capital costs for vehicle purchase were included. All costs were converted into US dollars (USD) using the Consumer Price Index and current exchange rates. Cost effectiveness was evaluated based on the cost per school assigned to the correct treatment category using the median survey cost per school. This definition of cost effectiveness prioritises classification accuracy, weighting schools not requiring treatment and requiring treatment equally. However, alternatively, control programmes may prioritise ensuring all schools above the prevalence threshold receive treatment. To address this priority, we additionally evaluated the cost per school requiring treatment which was adequately treated.