Study area
The study was conducted in the gHAT endemic area of north-west Uganda in the districts of Adjumani, Amuru, Arua, Koboko, Maracha, Moyo and Yumbe [20] (Figure 1).
Geographical data processing
In Uganda, Tiny Targets are deployed along the larger permanent rivers and streams. However, to define an intervention area, it is necessary to define an area around the watersheds that is controlled and so to do this we identified the watersheds that had been controlled. To identify watersheds we used NASA Shuttle Radar Topography Mission (SRTM) Digital Elevation Model (DEM) imagery, produced at the spatial resolution of 1 arc second (~30m × 30m) [21] to extract HydroSHEDS [22] in ESRI ArcGIS (v10.5). The HyrdoSHEDS method identifies rivers and streams and orders them according to their number of tributaries (Figure 1) so an order 1 river has no tributaries, order 2 has one or more order 1 tributaries, etc.
Watersheds are defined as the area of land that drains into a single waterbody. To define watersheds, we extracted all order 4 or greater rivers (rivers with 2 or more third order tributaries [23]), and from these identified pour points at the intersections of rivers. Pour points define the end point of the watershed. The pour points were used to define watersheds using the methodology described by ESRI ArcGIS (support article 000012346). The resulting watersheds for the area are shown in Figure 2 (mean area 9.8km2, range 1.02-57.4km2). The resulting number of watersheds that are defined exceeds the number of rivers that were controlled using Tiny Targets, as the defined watersheds include some non-permanent streams, or simply drainage features that do not have any sort of flowing water. Consequently, the density of watersheds is typically greater than the density of rivers defined by Tiny Target deployments.
Tiny Target deployment
We used Tiny Target deployment data in Uganda between December 2011 and 2019, conducted at approximately six-month intervals, giving a total of 15 deployments. For each deployment, the locations of targets were recorded using global positioning systems (GPSs). For the first two deployments (2011 – 2012), Tiny Targets were deployed in five separate 7 x 7 km blocks [9]. Thereafter, (2013-2019) deployments were along continuous stretches of river. The coverage of each deployment was overlain onto the watershed polygons to identify polygons where targets were present. For the first two deployments (2011-12), watershed polygons were regarded as treated if they had at least 500m of river that were deployed in the watershed area. Thereafter, a polygon was considered as controlled if targets were deployed on at least two occasions during 2011 to 2019, with control being deemed to have started at the earliest deployment, for nearly all (98%) deployed areas once an area was deployed it was deployed in all subsequent years. If deployment was between January and June then deployment was deemed to have started that year. If deployment first occurred in the second half of the year, control was deemed to have started at the beginning of the next year. The rationale behind this is that the reduction in tsetse numbers following deployment of Tiny Targets is gradual and takes several months to reach the reductions of up to 90%. Thus for each year between 2012 (the deployment in 2011 was in the second half of the year, so defined as 2012) and 2019 we are able to identify where a watershed had a deployment in that year (Figure 3). As our definition of watersheds includes polygons that are smaller than the deployed rivers, we identified watersheds that did not contain a deployment but were surrounded entirely by polygons containing deployments. Such polygons were marked as controlled for the purpose of analysis as it could be assumed that there was not a permanent drainage feature there or would be controlled by local dispersal of tsetse [23].
To allow for the mobility of tsetse and humans, we classified each watershed polygon in each year on a 0 to 3 scale according to the presence/absence of targets in neighbouring polygons (Table 1, Figure S1).
Table 1. Table to describe the deployment zone score classification system based on the deployment status of the watershed and its neighbours.
Classification score
|
Watershed
|
Neighbouring watersheds
|
0
|
Not deployed
|
No neighbours are deployed
|
1
|
Not deployed
|
One or more neighbours is deployed
|
2
|
Deployed
|
One or more neighbours is not deployed
|
3
|
Deployed
|
All neighbours are deployed
|
HAT case data
HAT case data were obtained from the WHO HAT Atlas for the period 2000 – 2011 [24] and from the Ugandan Ministry of Health through the Trypa-No Programme for the years 2012-2020. Data from the WHO HAT Atlas provided point geolocations for 4,165 cases between 2000 and 2011 but 30 cases were excluded because they could not be georeferenced. The Ugandan Ministry of Health supplied data on a further 56 cases from 2012-2020. Data from 2012-2020 were geolocated by mapping the villages using and cross-checking publicly available datasets, principally Open Street Maps, Google Maps and Geonames [19]. Where these were not able to provide a location for a village, the case was mapped to the school (the school being an identifiable place that reflective of the centre of population) of the patient’s parish. Four of the cases that were reported since 2016 were identified as being refugees from South Sudan, it is estimated that >650,000 refugees from South Sudan have settled in northern Uganda since 2016 (https://data2.unhcr.org/en/situations/southsudan/location/1925). South Sudan is a gHAT focus [25], the provenance of infection of these cases cannot be determined with certainty and so the cases were excluded from further analysis. A further case of a Ugandan who was resident in South Sudan was also excluded. This gives a total of 4,186 georeferenced cases between 2000-2020.
Data analysis
If Tiny Targets reduce the incidence of gHAT, then the incidence of cases will be lower in areas where Tiny Targets were continuously deployed. Accordingly, we proposed the following hypothesis:
Null hypotheses (H0) = The number of cases in a polygon was not significantly affected by the deployment of targets in the polygon and/or its neighbouring polygons.
Alternative hypothesis (H1) = Following the deployment of Tiny Targets in a polygon and/or its neighbours, the relative number of cases within that area declined.
For each year with vector control we define two populations of cases by overlaying the case locations onto the deployment polygons:
- Cases are gHAT cases in the NW Uganda focus that were reported following the start of the Tiny Target intervention (taken as 2012 as the intervention commenced in December 2011) and broken down by the year in which we assume that they were infected. The date of infection we infer from the reporting date, for stage 1 cases, we conservatively assume that the infection date is the same as the reporting date. For stage 2 cases, we assume that the case was infected 263 days before the reporting date this being the lower limit of the duration of stage 1 duration in passive screening from Checchi et al [26].
- Case-controls are all gHAT cases that were infected between 2000 and 2020.
Using these categories, we conduct two analyses, one is the impact of the deployments and the second is of the number of cases prevented.
Deployment impact analysis
For each year from 2012-2020 the cases and matched case-controls are overlain onto the vector control deployment zones for that year and we determine the vector control zone class score (Table 1) for the location of each case and each case-control. Thus, starting with 2012 we take the cases with a putative infection date in that year, and extract the deployment scores for those cases. As the comparison controls, we extract the deployment scores for all cases between 2000 and 2020 and randomly sample a number of controls equal to the number of cases for that year.
The above procedure was implemented using in R version 3.6.0 [27] with the following algorithm:
- Starting with 2012 as the analysis year:
- Take the cases that were estimated to have been infected in that year.
- From the case-controls, we randomly sample a number equal to the number of cases for that year. So if there are 4 cases in a year then we randomly sample 4 from the 4187 case-controls for 2000-2020 and extract their deployment score based on where it would be relative to the deployments in 2012 (and then for subsequent years).
- We sum the deployment zone scores for both the cases and the sampled case-controls.
- Repeat a-c for each year (i.e. 2013, 2014….2019).
- Across the years we sum the scores for the cases versus the sampled case-controls. If the total deployment score for the case-controls is lower than for the cases, that would suggest that the cases are more likely to be in the deployment area compared to the overall distribution of cases. If the scores of the case-controls are greater than for the cases, then the cases are less likely to be in the deployment area than would be expected.
- Repeat steps 1-3 for 1,000,000 replicates.
- The data are analysed by the proportion of iterations for which the cases had a higher score compared to the sampled case-controls. A proportion below 5% would represent statistical significance at the 95% level.
Cases prevented analysis
We also estimate the number of cases that were averted by the intervention. This is based on the premise that there is a case incidence rate in the controlled areas and a separate case incidence rate in the uncontrolled area. For example, suppose that prior to control there may have been an incidence rate of 4/10,000 inside the controlled area and 2/10,000 outside, so the controlled area has twice the incidence rate of the uncontrolled area. We can continue to measure the incidence rate of the uncontrolled area following the start of the intervention and if the intervention has reduced the incidence rate then there should be a drop in the incidence rate in the controlled area relative to the uncontrolled area. So if following the start of the intervention the incidence rate outside the controlled area fell from 2/10,000 to 1/10,000 and the incidence rate inside the controlled area fell from 4/10,000 to 2/10,000 this would indicate that the control had no impact on case numbers, but a fall in the controlled area to 1/10,000 would indicate a 50% drop in the incidence rate as a result of the intervention. Thus, we can use this relative difference to estimate the number of cases that were prevented in the controlled area by the vector control measures.
To carry out this analysis we must invert the deployment zone scores so that a score of 3 indicates that the case is completely outside the deployment area and 0 indicates that the case is completely inside the deployment area. This methodology is then implemented using the following algorithm:
- Starting in 2012 we generate a random sample of 1,000,000 putative controls from the population of 4,186 case-controls by sampling with replacement. We extract the 2012 deployment score for the location of the sampled controls. Subsequently:
- We take the cases that were infected during that year and sum their inverted deployment scores. The resulting summed score gives us our benchmark score.
- For the 1,000,000 samples we take the cumulative sum of the scores of the sampled case-controls
- We perform 10,000 iterations, where we:
- Count how many controls are required until the cumulative score of the sampled controls is greater than the benchmark identified above. This number of controls is our sample index. So if our benchmark score is 5 and the cumulative sum of our controls is 0, 2, 3, 6, 7, 9, 9, 12 then we take the fourth score of 6 as our sampled score and the sample index is four.
- We evaluate the values above and below the sample index to determine which is closer to the total of our cases. So from the example above we take the third value of 3 and fourth value of 6. As our benchmark of 5 is nearer to the fourth value than the third we take the fourth value as our number of controls. Had the benchmark been 5 and the sequence 0, 2, 4, 6 then we select at random between 4 and 6.
- Record how many sampled case-controls were required to match the total of the cases. In the example above this is four.
- From the 1,000,000 samples we remove all values up to and including our sample index and repeat the cumulative sum. So in the example above with the sequence 0, 2, 3, 6, 7, 9, 9, 12, we remove the first 4 values and subtract 6 from the remainder, so our sample now starts 1, 3, 3, 6.
- Repeat i-iv until we have completed 10,000 iterations
- Repeat 1 for each subsequent year 2013, 2104…2019
We analyse this by comparing the number of cases from each year to the numbers of controls that were sampled in each iteration in each year above. In a given year if more than 95% of iterations had a number of controls that was greater than the number of cases for that year then it would indicate that there was a significant difference between the number of cases and the random sample. The difference between the median of the samples and the number of cases gives the median number of excess cases.
Sensitivity analysis
We tested the results using a number of sensitivity analyses:
- The earlier years of Tiny Target deployments had a smaller extent of deployment and a greater number of cases, so we test the robustness of the results by excluding these years.
- We adjust the deployment scores using three methodologies. The first increases the scores of 2 and 3 by 1 point, so the scale is 0,1,3,4. The second is multiplicative, by two, so the revised scale is 0,2,4,6. The third is a square transformation, so the scale is 0,1,4,9.
- We conducted two separate leave one out analyses on the deployment impact analysis. In the first instance we repeat the analysis and resulting probability by dropping in turn each case infected between 2011 and 2019. The second analysis takes the ranks of these probabilities and drops first the highest ranked case, then also the 2nd, 3rd, …nth ranked case repeating the analysis after each drop to re-evaluate the remaining probability. A further analysis evaluates the resulting probability after adding further cases in the controlled zone (deployment score 3) in 2019.