Design
We used a sequential explanatory mixed-methods approach with iterative quantitative and qualitative methods from March 2018 to February 2020 to optimise maps that linked patient demographic, clinical and drug resistance profiles in order to identify areas where additional surveillance or containment efforts are needed. The quantitative component was grounded in the post-positivist theory, where a descriptive and exploratory spatiotemporal analysis was conducted, using trend and time-series decomposition analyses to define spatial and temporal data patterns for data linkage and mapping.39–41 The qualitative component used a pragmatist approach with co-design techniques to innovate and implement tools to bridge gaps identified from the ongoing spatiotemporal activities and analysis. The MEP and project team worked iteratively to improve data architecture, system and maps produced.
The quantitative component included data aggregations, curation, and analysis to generate data visualisations of the obtained data. These visualisations and summaries of the analysis were shared monthly with the MEP. The study team then worked with the MEP to design activities, tools and training to enhance data quality and improve surveillance metrics, including coverage, accuracy and linkage (Fig. 1). Data were grouped monthly and quarterly to estimate the trends. Monthly evaluations focused on measuring changes in data from the health information system over time. Quarterly evaluations were used to compare the flow of data and improvement of GPS coordinate data over time.
Spatiotemporal information in the malaria routine case data were used to produce draft maps of malaria incidence and molecular marker prevalence. These maps were then presented to the provincial malaria team to evaluate their ‘understandability’, using semi-structured interviews and feedback meetings. These fed back iteratively into the analysis and co-design process until the final maps were agreed upon between researchers and policymakers. This optimisation process involved repeatedly deploying and updating the tools and maps produced to enhance routine data and optimise the maps generated.
Study setting
While most of South Africa is considered malaria-free, approximately 5 million South Africans (10% of the country's population) reside in the malaria-endemic areas of Mpumalanga, Limpopo, and KwaZulu-Natal provinces.33,42 Most malaria cases in South Africa are imported, with some local transmission occurring in the low-altitude international border regions shared with Botswana, Eswatini, Mozambique and Zimbabwe. Malaria transmission in South Africa is seasonal occurring mainly during the summer rainy season (September to April).
This study was conducted in Nkomazi Sub-District, a pre-elimination area in Mpumalanga province, South Africa. All patients meeting the diagnostic criteria were tested for malaria using a falciparum-specific HRP2-based mRDT.38,43 Those who tested positive with uncomplicated malaria are treated with the WHO recommended weight-based 3-day artemether-lumefantrine (Coartem®) regimen as per the South African malaria treatment guidelines.37 Artemether-lumefantrine has been used in the study area since 2007.27,37,38 Additionally, all consenting malaria-positive patients, excluding pregnant women, breastfeeding mothers and children under 10kg or one year of age, are given a single low-dose of the WHO-recommended transmission-blocking antimalarial primaquine (0.25 mg base/kg (15 mg base adult maximal dose)44–46. An additional dried blood spot (DBS) on filter paper (Whatman Paper No 1) was collected from RDT malaria positive patients by dabbing the remaining blood at the mRDT finger prick site, labelled and barcoded, then sent to the malaria laboratory at the National Institute for Communicable Diseases (NICD) in Johannesburg, South Africa, together with its respective positive mRDT cassette, for antimalarial resistance marking. 47 An additional 10% of the negative mRDTs were also collected and sent to the NICD for quality assurance assessment.
As malaria is a notifiable condition in South Africa, demographic and malaria case information collected at the malaria diagnosis and treatment initiation phase are reported on the Notifiable Medical Condition (NMC) form or mobile application. If reporting is paper-based, forms are collected by a case investigator from the Malaria Elimination Programme (MEP) assigned to that healthcare facility, ideally within 24 hours of diagnosis and delivered to the sub-district malaria office for data quality verification and data capture. Within 24-72 hours of case notification, case investigators should visit the malaria patient's household for in-depth case investigation to assess for the presence of malaria risk factors (e.g. last indoor residual spraying of that household or nearby mosquito vector breeding sites) and to conduct contact tracing and testing. During these case investigations, the household's GPS location coordinates are recorded, the case investigation form completed, and information on malaria treatment adherence and response (including any side effects) captured. Once completed these forms are submitted to the sub-district malaria office for quality checking and electronic capture into the DHIS2.
Data
Malaria case data
Malaria case data consisted of notifiable medical condition form (NMC), case investigation and treatment adherence and response forms downloaded as individual case records from the DHIS2.
Geospatial data
The geospatial data consisted of four types. Firstly, each patient’s residential address and GPS coordinates were sourced from a) android tablets/handheld GPS devices and b) as recorded in DHIS2. Secondly, population settlement shapefile data were obtained from a) the Ehlanzeni District Municipality, and b) the open-source Global Administrative Areas (GADM) website for South Africa. Thirdly, a list of locality addresses of malaria cases were obtained from the Mpumalanga MEP office, curated and validated using Google Maps. Lastly, the modelled Facebook population of South Africa density raster was downloaded from The Humanitarian Data Exchange (HDX v1.52.1).48
Laboratory data
Parasite DNA was extracted from the mRDTs and DBSs using the Qiagen DNA mini extraction kit (Qiagen, Germany), according to the manufacturer’s instructions. The extracted DNA was subjected to multiplex PCR to confirm Plasmodium species.49 To assess possible decreases in lumefantrine susceptibility, samples containing only P. falciparum parasites were then subjected to molecular analysis to detect polymorphisms in the chloroquine resistance transporter (crt) and multi-drug resistance 1 (mdr1) genes. 49–51 Codons were classified as pure sensitive, pure mutant or mixed (both mutant and sensitive genotypes present in an individual patient’s sample). Genotyping assays were run in duplicate, with a third assay performed on any discordant results. When calculating overall prevalence of infections with mutant genotypes, codons with mixed genotypes were grouped with pure mutant codons. The copy number of the mdr1 gene was assessed using quantitative PCR (qPCR), with primers, probes and qPCR cycling conditions previously described.51 Every qPCR run contained three reference DNA samples from D10 and Fac8 clones, having an mdr1 copy number of one and three respectively, as well as a no-template control. Assays were repeated if the threshold cycle values were greater than 35. For the assessment of artemisinin resistance, the propeller domain of the K13 gene was amplified as previously described.52 The amplified products were sent to Inqaba Biotechnologies (South Africa) for Sanger sequencing. Sequences were then aligned against a reference Pf K13 gene (XM_001350122.1) using a BLAST search and BioEdit Software to detect polymorphisms in 27 codons associated with delayed parasite clearance in South East Asia.20,53 Molecular data were compiled monthly and shared with study investigators for further curation and analysis. Results were presented to the Mpumalanga MEP monthly and quarterly for them to take timely action with investigation and/or response in the event of any significant resistance-associated mutations.
Definition of metrics
Coverage
Four measures of coverage were used: 1) percentages of malaria cases with blood samples taken (mRDT/DBS), 2) percentages of cases assigned a correct barcode (necessary for linkage of laboratory results to NMC data), 3) percentage of cases investigated and 4) percentages of investigated cases with GPS coordinates relative to all reported malaria notifications captured in the DHIS2.
Accuracy
Two measures of accuracy used were: 1) percentage of investigated cases with GPS coordinates falling within the study’s residential areas, and 2) percentages of notified cases with correctly formatted barcodes, calculated monthly and quarterly.
Linkage
This was measured using the percentage of cases with accurate barcodes linked to the NMC data, the molecular markers results data and accurate GPS coordinates at the health facility, ward, locality, and household levels.
Study procedures
Using monthly timelines, we evaluated the accuracy, coverage and ability to link the malaria notifications to the case investigation, laboratory data and drug report data. Data from DHIS2 was downloaded monthly from the malaria programme and shared with the project team for curation and analysis. A checklist was used to record the settings of the different GPS devices, and their data was downloaded for further analysis. All data were securely downloaded, encrypted and transferred to the password-protected study computer for further compilation and analysis.
Analysis
All data analyses were conducted using R programming language (versions 3.6 and 4.0) and Esri ArcGIS ArcMap (version 10.8). The analysis focused on data linkage, spatiotemporal trends in molecular marker and usability assessments. Coverage, accuracy and linkage metrics were used as units of analysis for temporal trends in the numbers of malaria cases reported, malaria cases investigated, the laboratory received samples, and post-treatment case investigation reports.
Trend analysis
Monthly percentages were calculated using the monthly reported malaria case totals as the denominator. Time-series decomposition was used to evaluate for the non-stationarity of data and account for trend (t), seasonality (s) and random noise (r) 54. Loess regression was used to obtain the optimum distribution and the 95% confidence margins of the trend (Fig. S2(a), S3(a) & S4(a)). This time series was further decomposed to evaluate trend, seasonality and random errors using Sen's slope and Mann-Kendall test.39,41 Seasonality was further explored using box plots.
Molecular markers analysis
The classification of the 27 K13 mutations after codon 400 assessed in this study was guided by the WHO and the Worldwide Antimalarial Resistance Network (WWARN).20,55 We used the 2020 WHO categories of 'validated', 'associated/candidate', 'not associated' or 'wild type'.55 We renamed the 'wild type' parasite to 'sensitive' for further clarity. Mutations at codon 86 of the mdr1 gene and codon 76 of the crt gene together with increases in mdr1 gene copy number were assessed to determine susceptibility to lumefantrine. Parasites with the mdr86ASN and crt76LYS alleles but no increase in mdr1 copy number were categorised as less susceptible (or tolerant) to lumefantrine.
Spatial and usability analysis
All shapefile data and residential coordinates from malaria cases were converted to the HartebeestHoek94 Datum coordinate system and projected to the Universal Transverse Mercator zone 32. Two draft maps were then drawn to display 1) thematic maps for the distribution of malaria cases by ward and 2) density maps of cases distributed by settlement with their ward boundaries. Twenty-four case investigators with experience ranging between one and 24 years working in Nkomazi sub-district, reviewed both maps to identify and label the Nkomazi wards (administrative level four). Feedback obtained from malaria case investigators was used to develop malaria case distribution maps and evaluation of the shapefiles.
Thematic maps of the distribution of malaria cases by ward were produced by using a spatial join tool of GPS coordinates to the sub-district polygon. All cases falling in the same ward were summed, and an equal-interval scale and continuous colour ramp were used for displaying the distribution of cases by ward. Density maps of malaria cases per 1000 population were produced using kernel density estimation at 1 x 1 km and 0.5 x 0.5 km grids with a buffer around the sub-district polygon of 1 km and 0.5 km, respectively. The two grids were purposely selected for comparison. The Kernel density estimation used Quartic implementation as per the formula below:

where:
- i = 1,…,n are the input points. The sum of points was used if they were within the radius distance of the (x,y) location.
- popi is the population value of point i.
- disti is the distance between point i and the (x,y) location. 40
Feedback was obtained from the case investigators using semi-structured interviews to assess if the maps were well understood and whether the distribution of malaria cases corresponded with their local knowledge. A case-based orientation was used to optimise the maps to arrive at the most correct and easily understood versions.
Descriptive exploratory proximity analysis was further conducted on residential coordinates to ascertain the probability of these locations falling in the actual residential area (within 0.5 x 0.5 km) at a given time (t). Two types of analyses were used. Firstly, for identifying the progress of the accuracy and distribution of the malaria case residential coordinates over time, a time-series line using Loess regression was plotted, as were maps to assess the distribution of the coordinates. Secondly, the quadrats of the observed malaria cases at 0.5 km radius compared to expected cases calculated by a Poisson process using the known malaria incidence and population settlement data to obtain likelihood ratios and Chi-square test were analysed. To assess the sparsity of malaria case residential data, we used average nearest neighbour analysis to explore for precision in the GPS coordinate dataset.