Geographically weighted regression modelling of the spatial association between malaria cases and environmental factors in Cameroon.

Background: Studies have illustrated the association of malaria cases with environmental factors in Cameroon but limited in addressing how these factors vary in space for timely public health interventions. Thus, we want to find the spatial variability between malaria hotspot cases and environmental predictors using Geographically weighted regression (GWR) spatial modelling technique. Methods: The global Ordinary least squares (OLS) in the modelling spatial relationships tool in ArcGIS 10.3. was used to select candidate explanatory environmental variables for a properly specified GWR model. The local GWR model used the global OLS candidate variables to examine, predict and explore the spatial variability between environmental factors and malaria hotspot cases generated from Getis-Ord Gi* statistical analysis. Results: The OLS candidate environmental variable coefficients were statistically significant (adjusted R 2 = 22.3% and p < 0.01) for a properly specified GWR model. The GWR model identified a strong spatial association between malaria cases and rainfall, vegetation index, population density, and drought episodes in most hotspot areas and a weak correlation with aridity and proximity to water with an overall model performance of 0.243 (adjusted R 2 = 24.3%). Conclusion: The generated GWR maps suggest that for policymakers to eliminate malaria in Cameroon, there should be the creation of malaria outreach programs and further investigations in areas where the environmental variables showed strong spatial associations with malaria hotspot cases .

communities in 2016 (2). The epidemiological transmission of malaria in Cameroon is high (> 1 case per 1000 population) in about 71% (16.6 million people) and low (0-1 cases per 1000 population) in about 29% (6.8 million) in people of all sexes and age groups with children less than five at greater risk of the disease (2). In 2014, the morbidity of malaria in Cameroon was 30% in children and 18% in Adults(3). The government of Cameroon and partners have been combating malaria through the creation of national intervention programs including the distribution of free insecticide-treated nets (ITN) that was established in 2011 to populations at high risk, provision of sulfadoxine-pyrimethamine drugs to pregnant woman, parasitological screening of suspected malaria cases, and the application of other WHO standard treatments(3,4). The socio-economic and environmental challenges posed by the malaria disease to Africa countries is a global concern. The WHO's Global Technical Strategy (GTS) for Malaria 2016-2030 has been developed with the aim of helping countries reduce the human suffering caused by the disease. Adopted by the World Health Assembly in May 2015, the strategy provides comprehensive technical guidance to countries and development partners for 15 years, emphasizing the importance of scaling up malaria responses and moving towards elimination. It also highlights the urgent need to increase investments across all interventions -including preventive measures, diagnostic testing, treatment and disease surveillance -as well as in harnessing innovation and expanding research(5). Intensifying investments in malaria research by endemic SSA is a key to attaining the GTS targets and eradicating the disease from the SSA geolocations.
The application of spatial statistical methods to geolocational health data research has enabled complex scenarios of the malaria disease to be visualized through the creation of spatial maps within the Geographical information systems (GIS) technology (6)(7)(8)(9)(10)(11). The study of the spatial variation between disease outcomes and associative socioeconomic or environmental factors using the GIS has greatly improved our understanding of these factors with the health outcome in question. Malaria has been reported to be associated with environmental and climatic factors such as rainfall, humidity, temperature (12,13) and understanding the behavior of these factors in space with the application of spatial regression statistics(14) will further improve on timely control measures and resource allocations.
Regression analyses are statistical techniques that allow for the modelling, examining, and exploring of spatial relationships, to better understand the factors behind observed spatial patterns and hotspots, and to predict outcomes based on that understanding(14). Ordinary Least Squares regression (OLS) is a global regression method that provides a global model of the variable or process to be predicted or studied. It creates a single regression equation to represent that process.
Geographically Weighted Regression (GWR) is a local spatial regression method that allows the relationships to be modelled to vary across the study area by fitting a regression equation to every feature in the dataset using candidate explanatory variables from the OLS. It is a local form of linear regression used to model spatially varying relationships. GWR statistical modelling technique has been applied to a range of malaria studies: Hasyim (15), used the GWR to find the spatial association between malaria cases and environmental factors in South Sumatra, Indonesia where altitude, distance from forest and rainfall were associated with malaria, Moise(16), in the seasonal and geographic variation of pediatric malaria in Burundi, identified the spatial variation between monthly rainfall and malaria prevalence. The GWR spatial modelling technique has been a powerful tool in the understanding of malaria prevention and the spatial variability of malaria cases and environmental factors (15)(16)(17)(18). It application has been valuable in the understanding of other infectious diseases such as the spatial association between dengue fever, and socioeconomic and environmental determinants (19,20). Also, GWR has been applied in other health outcomes and social science studies including cancer events(21), ,mental depression(22),fire events(23), hospital accessibility study (24), alcohol and violence (25) and real estate housing crisis (26).

Massoda (27), compared malaria survey programs in different ecological zones in Cameroon and
recommended on the needs of intervention programs during high transmission rainy seasons.
Furthermore, Tewara (28), in a recent study on small area spatial statistical analysis of malaria clusters and hotspots in Cameroon, illustrated the linear association between malaria cases and environmental factors using the Pearson correlation statistics(29) but didn't demonstrate any spatial variability that would become the main aim of this study. The specific objective of this study is to find the spatial variability between malaria hotspot cases and environmental predictors using the GWR spatial modelling technique.

Methods
Due to technical limitations, the Methods section is only available as a download in the supplementary files .

Hotspot analysis
The analysis depicted that, there exist high malaria hotspots distribution in most areas (rural, urban, and urban-city centers) of the Northwest, Southwest, Littoral, Yaoundé, and Center regions and low hotspots location elsewhere, as shown in figure 2.

Ordinary Least Square (OLS)
The OLS global regression illustrates the statistical significance of the model variables and feasibility to be used in specifying the GWR model. The result demonstrated that two of the explanatory variable coefficients (aridity and proximity to water) had a negative relationship (negative sign) with the dependent variable, while all the other explanatory variables had a positive relationship with the dependent variable. The adjusted R-squared for the OLS performance was 22.3 % (table 3).

Geographically Weighted Regression (GWR)
In the current study, the GWR produces maps for each coefficient of the spatial association between the dependent and each independent variable. The coefficients (β ) of the population density(X 1 ), enhanced vegetation index (X 2 ), rainfall (X 3 ) and drought episodes (X 5 ) exhibit high (strong) correlation with malaria hotspots in most areas in the western portion of the country and few elsewhere, while aridity (X 4 ) and proximity to water (X 6 ) showed a weak association as exemplified in The GWR output also produced a predicted malaria map, local R-squared (R 2 ) and residuals. The For the model specification, the OLS model (table 2) illustrated that aridity and proximity to water had negative associations. For example, the negative coefficient for aridity in our model means the malaria cases decrease with lack of water since aridity is a deficiency of moisture probably due to the lack of rainfall while an increase in rainfall in a given rainy season in Cameroon will promote malaria cases and these periods can be targeted for malaria prevention programs since rainfall creates breeding sites for female Anopheline mosquitoes(14). This understanding can promote malaria prevention campaign such as getting rid of stagnant waters around habitable household clusters or discarding water cans to prevent the growth of malaria-causing mosquitoes. Moreover, filling up of potholes during the dry season in high-risk areas will help diminish the mosquito breeding sites. The Koenker statistics (background check-test for non-stationarity) was statistically significant (P< 0.01) and reflects that the relationships being modeled were consistent across the entire study area and thus nonstationary (except for drought episodes and EVI). Furthermore, the VIF values (< 7.5; table2) indicates no redundancy among explanatory variables and hence no multicollinearity. The OLS model residuals were tested for clustering using the spatial autocorrelation (pre-analysis check ) tool in ArcGIS as described elsewhere (43) and it indicated that the variables used were randomly distributed (no clustering). A statistically significant spatial autocorrelation in the model residuals would indicate that we neglected one or more key explanatory variables. This is a positive indicator of choosing a good model(44). The Jarque-Bera Statistics was statistically significant (p < 0.01) indicating that our model predictions were biased (the residuals were not normally distributed); this may be due to the changing signs in some of the coefficients in the explanatory variable and thus causing variability.
Although the test was biased, we proceeded to the GWR model because recent studies(15) have reported on spatial variations similar to our specified model variables and our main goal was to understand the behavior of these environmental variables with malaria cases for future research and intervention projects. The 22.3 % evaluation of the OLS model performance, indicates the explanatory variable coefficients were telling 22 % of the spatial relationship story between the malaria cases and the environmental factors we were trying to model. This may seem low per the R 2 range (0-100 %) where higher R 2 values depict good model performance. Hasyim (15), in the spatial modelling of malaria cases associated with environmental factors in South Sumatra, Indonesia, had low R 2 (6.2%) variation of malaria incidences by environmental factors for the OLS model. Moise (16) also had low R 2 ( < 5%) in their OLS model variables in the seasonal and geographic variation of pediatric malaria in Burundi. This is to say that, the OLS model R 2 varies with the explanatory variables under investigation and lower R 2 does not always signify poor model performance(14). The local GWR model was built based on the validated variables from the global OLS model. A validated OLS can promote a global policy for malaria control programs where similar statistically significant environmental or climatic predictors could be targeted across many malaria-endemic countries while a validated spatial relationship with GWR is an appropriate method to initiate prevention programs in local systems or village level within endemic countries (15). The GWR output coefficients maps (figure 3) indicated that population density, EVI, rainfall, and drought episodes had a strong spatial correlation or positive influence on malaria cases in our study locations. The strong correlations were seen in areas of the western part of Cameroon and few in the north. The population of these localities should be considered for malaria control programs in high transmission seasons in Cameroon. Whereas, aridity and proximity to water had a weak association in the above-cited locations meaning there was little or no spatial interdependence between these factors and malaria cases. The spatial variability ability of the GWR model is observed in figure 4A where the generated local R 2 illustrated areas with 24.3% of the model performance between the environmental factors and malaria cases. The high spatial interdependence observed in the Northwest, Southwest, Littoral, Douala, Central and Yaoundé DHS regions calls for effective malaria surveillance. Likewise, active control measures are needed in areas (East, Adamawa and North regions) with low spatial variability.
The necessity for the creation of efficient malaria surveillance systems can further be demonstrated from our predicted map (figure 4B) highlighting strong spatial variability in some household clusters of the Northwest, Southwest, Adamawa and North regions for which public health interventions should be prominent.
The GWR residuals (figure 4C) indicated that areas, where the model did not work, were common in the southern part of the country. This implies that our model was unable to explain the spatial variability story between malaria cases and environmental factors in the locations depicted by the GWR residuals. Though our model finds it difficult explaining the spatial interdependence in these areas, the overall condition numbers from the attribute table output (background analysis verification) of the GWR indicates the model did not have a hard time solving; since the condition numbers from the explanatory environmental variables were < 30 (>30 would mean the model had difficulties solving the spatial relationships(44). Furthermore, table 3 illustrated that the GWR improved our understanding of the spatial relationship between malaria cases and environmental factors from the OLS as the AICc and R 2 were lower and higher respectively. This signifies that the GWR is a better indicator for explaining spatial variability at the local level (15,24).
We had the following limitations: Firstly, we used only the malaria hotspot locations to specify our GWR model. This was because running the model on entire malaria-cases-location for the whole country would have missed key explanatory variables. Moreover, this will help cut down on resources allocation by targeting vulnerable hotspot communities. Secondly, our OLS model was biased and failed to pass the six tests check at the level of the Jarque-Bera statistics that was significant (p< 0.01), indicating our explanatory predictors were not normally distributed in some locations. Thirdly, our model did not include other key environmental variables such as temperature and humidity and socioeconomic predictors for which data were limited or missing. Further investigations for practically available data for these environmental and socioeconomic predictors are required to provide detailed spatial variability coverage for the malaria disease in Cameroon. Though our study had some remits, it has demonstrated a rigorous understanding of the spatial interdependence between malaria cases and environmental risk factors and has provided new insights into the malaria disease at the local level in Cameroon by applying the GWR spatial modelling technique that was limited in most studies.
Moreover, the methods in this study can be used to study other health outcomes in Cameroon that have been applied in available literature in other countries (17,22,24).

Conclusions
This study demonstrated that rainfall, EVI, drought episodes, and population density had a strong spatial association with malaria distribution and could be targeted as important risk predictors for control programs at the local level in Cameroon. Given the greater availability of spatial data(31) and desktop GIS packages and statistical techniques, the challenges faced in the malaria disease investigation will be improved in the future. The generated GWR maps suggest that for policymakers to eliminate malaria by 2030, there should be the creation of outreach programs that will target malaria hotspot locations and carry out further investigations in areas where the environmental variables showed strong spatial associations with malaria cases. The average aridity index of the cells whose centroid falls within a radius of 10 km # or 2 km*.  Note: * An asterisk next to a number indicates a statistically significant p-value (p < 0.01) VIF (Variance Inflation Factor) checks for redundancy among explanatory variables.   Map of malaria hotspot points and raster density.

Supplementary Files
This is a list of supplementary files associated with this preprint. Click to download.