Data for this study was obtained from the demographic and health survey (DHS) program website (https://dhsprogram.com) funded by the United States Agency for International Developments (USAIDS) following a written administrative clearance(29). The DHS program has been described elsewhere(27). Households are grouped as either an urban (city block or apartment building) or rural (village or group of villages) cluster points (figure 1) and displaced a distance up to 2 km for urban-city clusters and 5 km for rural clusters due to confidentiality. For this study, points and lattice data for the Cameroon DHS VI malaria survey was obtained from the DHS spatial data repository site(30). The malaria survey data for 2015 was linked with environmental covariates data; enhanced vegetation index (EVI), rainfall, drought episodes, population density, aridity, proximity to water, and analyzed using the ArcGIS 10.3 (ESRI, Redlands, California, USA) software.
Malaria and environmental data description
The WHO recommends that all cases of suspected malaria should be confirmed using parasite-based diagnostic testing (either microscopy or rapid diagnostic test) before administering treatment. Thus, malaria cases were confirmed based on both rapid diagnostic tests and on laboratory analysis. A clinical case was defined as a malaria-attributable febrile episode (body temperature in excess of 37.5 °C), accompanied by headaches, nausea, excess sweating and/or fatigue censored by a 30-day window (6,27). Since the households are the variables to be analyzed, a malaria year as described in this study, is the average number of people per year who show clinical symptoms of Plasmodium falciparum malaria within the cells whose centroid falls within a radius of 10 km (for rural points) or 2 km (for urban points)(27). The environmental covariates data set used for this study are described in table 1.
Permission to use the data was obtained through a written request and subsequent approval from the DHS division of the USAID. During the DHS project, interviews and blood test analyses are conducted only if the respondent provides voluntary informed consent. Written informed consent was obtained from all participants.
Getis-Ord Gi* statistics
The Getis-Ord Gi* statistics is a local statistic that allows us to discover new locations with significant clusters of hot and coldspots. It assesses each malaria household cluster (or feature) within the context of neighboring malaria households and compares the local situation to the global situation. The 2015 Malaria indicator survey data was analyzed for malaria hotspots.
The Getis-Ord Gi* local statistics is given as:
Where is the attribute value for feature j; is the spatial weight between i and j; n is equal to the total number of features and:
Spatial regression analysis
To investigate the spatial relationship between the distribution of malaria hotspots and environmental covariates, we used the regression analysis technique. The mathematical computation applied to the global OLS is given as;
is the dependent or response variable for the observation i (the process to be predicted or understood), for example, the malaria hotspot cases, Xik is the value of the independent or explanatory variable kth for i used to model or predict the dependent variables. The explanatory variables include: population density ( ), enhanced vegetation index ( ), rainfall ( ), aridity ( ) drought episodes( ), and proximity to water ( ).The spatial model to be built will explain if the distribution or occurrence of malaria hotspots are due to the combination of these explanatory variables. This will help us create a prediction map that can be used for public health resource allocations due to the spatial relationship between the dependent (malaria hotspots) and explanatory variables. βk is the regression coefficients for the variable k. They are values, one for each explanatory variable, that represent the strength and type of relationship the explanatory variable has to the dependent variable. is the regression intercept. It represents the expected value for the dependent variable if all the independent variables are zero. The residuals represent the potion of the dependent variable that isn’t explained by the model(14).
The global OLS method is based on the use of only one equation (a) to explore the relationship between variables. To account for the spatial component of the OLS variables for a specified model, the GWR local regression model creates an equation (b) for each element of the dependent variable data set, in order to capture geographic variations (23). The GWR model is computed as;
where ui and vi represent the point coordinates of ith in space Thus, the GWR equation (b) distinguishes that spatial variations in relationships might exist and provides a way in which they can be measured and this is based mainly in its spatial ability in a GIS environment with emphasis on the differences regarding space and local disaggregation of the local statistics as thoroughly illustrated by Fotheringham(39). Although the GWR model can be advantageous in order to distinguish heterogeneity of the space itself, it also makes it easier to go from a global OLS perspective to a local analysis, thus obtaining a better grade of details and precision using the GIS environment(39).
For this study, our regression models (OLS and GWR) was built using the modelling spatial relationship in ArcGIS. Multiple models were made using a set of candidate variables until selecting a model with high explanatory power, and that contained the six explanatory variables ( to relevant for a properly specified model with respect to malaria transmission(23). A positive coefficient of an OLS candidate variable means the explanatory X variables and the response or dependent Y variables changed in the same direction and if the environmental risk factor increases, then the number of confirmed malaria cases will increase. Similarly, a negative coefficient means X and Y changed in reverse directions (15). To have a properly specified GWR model, the generated OLS summary result of the model variables should pass a statistically significant six-test check (data pre-processing background analysis).
The GWR model was constructed using a default fixed distance as the kernel type and AICc (Akaike's Information Criterion) as the bandwidth method to find the optimal distance for better model performance. The GWR generates a map that was represented as raster surfaces for the model predictions (combined strength of the relationship amongst the variables used), residuals, local R squared (model significance), condition number (difficulties identifying spatial relationship) and coefficients (explaining the strength of the relationship between the dependent and explanatory variables) and an output table demonstrating the strength of adjusted R2 significance (from 0- 1).