Estimating The Seasonally Varying Effect of Climatic Factors On The District-Level Incidence of Acute Watery Diarrhea Among Under-Five Children of Iran, 2014-2018: A Bayesian Hierarchical Spatiotemporal Model

: Background: Under-five years old acute watery diarrhea (U5AWD) accounts for most diarrheal diseases' burden due to rotavirus infection. It is claimed that many climatic and socioeconomic factors are associated with U5AWD. However, little information is available about the occurrence of U5AWD in Iran and the adjusted effect of potential determinants. Methods: We collected a dataset containing the seasonal numbers of U5AWD cases at the district level of Iran through MOHME. We calculated the district level Standardized Incidence Ratio and Moran's I values to detect the significant clusters of U5AWD over sixteen seasons from 2014 to 2018. In addition, we examined twelve Bayesian hierarchical models to recognize the strongest in predicting the seasonal number of incidents. Results: Iran has many hotspots of U5AWD, especially in southeast areas. An extended spatiotemporal model with seasonally varying coefficients and space-time interaction outperformed other models, becoming our proposal in modeling U5AWD. Temperature had a global positive relationship with seasonal U5AWD in districts (IRR: 1.0497, 95% CrI: 1.0254-1.0748), due to its varying effects in winter (IRR: 1.0877, 95% CrI: 1.0408-1.1375) and fall (IRR: 1.0866, 95% CrI: 1.0405-1.1357) seasons. Also, elevation (IRR: 0.9997, 95% CrI: 0.9996-0.9998), piped drinking water (IRR: 0.9948, 95% CrI: 0.9933-0.9964), public sewerage network (IRR: 0.9965, 95% CrI: 0.9938-0.9992), years of schooling (IRR: 0.9649, 95% CrI: 0.944-0.9862), Infrastructure-to-Household Size Ratio (IRR: 0.9903, 95% CrI: 0.986-0.9946), wealth index (IRR: 0.9502, 95% CrI: 0.9231-0.9781) and urbanization (IRR: 0.9919, 95% CrI: 0.9893-0.9944) of districts were negatively associated with seasonal U5AWD incidence. Conclusions: The development of specific alert systems could be a strategy to predict high-risk areas of U5AWD using climatic inputs. Also, the study anticipates a higher incidence of U5AWD in districts with weaker hygiene and socioeconomic status. Therefore, policymakers should take appropriate preventive actions in such areas.


Background
Diarrheal diseases are among fatal causes of death, ranking second for children younger than five years old [1]. World Health Organization (WHO) and the Institute for Health Metrics and Evaluation (IHME) respectively estimate more than 1.7 billion and 947 million incidents of diarrheal diseases, killing more than 500,000 children each year [2,3]. Under-five years old acute watery diarrhea (U5AWD) has more responsibility for the burden of diarrheal diseases than other clinical types, even though interventions such as rotavirus vaccine have dramatically reduced the morbidity and mortality caused by U5AWD. However, a limited number of 70 countries have taken such immunization programs into account till 2014, revealing the underuse of the vaccine in Asia and Africa [4]. If the dullness of vaccinations persists, U5AWD will inevitably remain an intensive issue, especially for the developing countries of these regions.
Iran is known as a seriously affected country by U5AWD, despite recent progress in rotavirus immunization programs [5]. According to the Global Burden of Disease (GBD) Study in 2019, diarrheal diseases account for more than 11.9 million incidents and 238 deaths in under-five children of Iran [2]. However, to date, no national study has focused on the district-level incidence of U5AWD in Iran. Hence, it is vital to recognize the geographical locations undergoing U5AWD outbreaks, especially in small areas with lower quality of life, to help the health officials meet Sustainable Development Goal 3 (SDG 3) by 2030.
Small-area studies are generating widespread interest in terms of the methodology required to perform such studies. Herein, spatiotemporal models attract considerable attention since they can presume the simultaneous dynamic trends inherent in space and time and take space-time interactions into account [44].
Another advantage is that they allow us to quantify the nonlinear effect of the suspicion factors on U5AWD.
Walking on this path, we believe critical attention should be paid to the potential nonstationary effect of environmental variables. For example, climatic factors (e.g. rainfall, temperature) are recognized as timevarying determinants of viral infectious diseases elsewhere [45].
Implementing such complex models is plausible in a Bayesian hierarchical modeling (BHM) setting, but there is still a tough numerical challenge. Markov chain Monte Carlo (MCMC) methods used for BHM often impose an unendurable burden working with large datasets [46]. In 2009, Rue et al. [47] introduced a relatively fast and accurate approach to do BHM named integrated nested Laplace approximation (INLA), which researchers broadly utilize as an appropriate surrogate for MCMC. They also developed a statistical tool for INLA in R Statistical Software which is downloadable via https://www.r-inla.org/download-install.
The present paper aims to: 1) demonstrate the spatial and temporal patterns of U5AWD in Iran; 2) locate the statistically significant hotspots in the area; 3) infer about the association of the mentioned determinants with U5AWD, and 4) propose an efficacious easy-to-use model with solid predictive ability.

Study area
From Apr 2014 to Mar 2018, the Ministry of Health and Medical Education (MOHME) reported a total of 1,552,255 children <5 years old who were confirmed to have diarrhea in Iran. Based on the latest divisions, Iran consists of 429 districts that define our study's geographical framework. We followed every district for sixteen seasons (spring: Apr-Jun, summer: Jul-Sep, fall: Oct-Dec, winter: Jan-Mar), demonstrating the study's timeline.

Data collection
We used average cumulative rainfall, temperature, wind speed, elevation, the prevalence of traveling history, piped drinking water, public sewerage network, Infrastructure-to-Household Size Ratio (IHSR), years of schooling (YOS), wealth index and urbanization as the independent variables. We calculated the rainfall, temperature and wind speed in each district-month by aggregating over the hourly reported data of Iran Meteorological Organization (IRIMO) stations. The 2016 Iran National Population and Housing Census supplied us with the district-level population data necessary to control for the population effect in subsequent analyses. The prevalence of remaining socioeconomic variables are taken from Households Income and Expenditure Survey (HIES). A full list of variables, data sources and the temporal domain is provided in Table   1.

Standardized Incidence (Morbidity) Ratio (SIR)
To illustrate the U5AWD risks, we notably used SIR [48], which is an adequate descriptive measure in estimating disease risks in different district-seasons and defined it as follows: where is the observed number of U5AWD cases in district , season and is the expected number of U5AWD cases in district , season . can be simplified as: where is the population in district , season . The maps on SIRs are presented in section 3.3 of the results.

Spatial autocorrelation analysis
Anselin outlined the term Local Indicators of Spatial Association (LISAs) [49] where is the Local Moran's I value in district , season ; is the observed number of U5AWD cases in district ; ̅ is the average number of U5AWD cases in season ; denotes the corresponding spatial weight between district and ; 2 is the sample variance of the district observation among other observations in season .
Positive signifies the homogeneity of U5AWD incidence with neighbors in district , whereas negative has an unlikely interpretation, revealing a heterologous pattern. Besides, the scaled incidence values in district and surrounding neighbors can also get positive or negative values. Taken together, Local Moran's I value can fit into four groups: 1) high incidence clusters with high incidence neighbors (High-High); 2) low incidence clusters with low incidence neighbors (Low-Low); 3) high incidence clusters with low incidence neighbors (How-Low); 4) low incidence clusters with high incidence neighbors (Low-High). We will display the significant clusters at three confidence levels (90%, 95%, and 99%) in section 3.3. The package used to detect the clusters was "spdep" in R Statistical Software (version 4.0.5) [50].

Bayesian hierarchical modeling
Using a Bayesian hierarchical structure, we aim to model the incidence rates of U5AWD, which is believed to have a seasonal trend: where is the observed number of U5AWD cases in district , season ; is the population in district , season . We used the logarithmic transformation of incidence rates in equation 6 to facilitate the running process by assuming the normal distribution for the response variable. We will compensate for this by reversing the transformations on predicted posterior samples to report case numbers.
We will display the national trend of incidence rates ( × 10.000) in results (section 3.2).
Among different options, we evaluated twelve candidate models. We should note that we adopted the same notations to shorten the explanations and streamline the comparison between models:  Model 1: The ordinary multivariate regression model (non-spatial and non-temporal) as: where 0 is the intercept; , is the value of -th covariate in district , season and is the estimated coefficient of -th covariate, showing its linear fixed effect through space and time.
 Model 2: The spatial ecological regression model as: where and are the spatial unstructured and structured random effect in district , respectively; is the number of districts sharing boundaries with -th district and 2 is the variance between -th district's effect and neighbors. In every model, we selected an independent and identically distributed (iid) Gaussian distribution as diffuse prior for and the intrinsic conditional autoregressive (ICAR) structure as the prior distribution of (equation 9-10). + togetherly construct a prominent convolution model known as Besag York Mollié (BYM) [51].

 Model 3:
The time series regression model as: which can be stated more straightforwardly as follows: where is the mentioned interaction term in district , season ; and are 429×429 and 16×16 identity matrices, respectively and ⊗ is the Kronecker product operator used to build a block matrix out of and .
Model 9-12 have the same formulation for and only the distribution of differs.
 Model 10: Model 8 + space-time interaction between spatially unstructured ( ) and temporally structured ( ) random effects (interaction type II) as: where is a 16×16 matrix with RW1 structure.
 Model 11: Model 8 + space-time interaction between spatially structured ( ) and temporally unstructured ( ) random effects (interaction type III), where is a 429×429 matrix with ICAR proximity structure which we specified the prior in equation 10.

Model validation
As theorized in the last section, we considered twelve models that seemed to fit right in modeling such data. In an effort to select the best model among the candidates, we considered a few model selection criteria which are commonly used to evaluate Bayesian models: Deviance Information Criterion (DIC); effective number of parameters for DIC ( ); Watanabe-Akaike Information Criterion (WAIC); effective number of parameters for WAIC ( ); Logarithmic score (LS). The lowest values of these criteria suggest the best mixture of fitness and parsimony. To also give a frequentistic vision, we computed some classic tools such as the coverage gained by 95% credible interval (CrI), mean length of 95% CrI, mean absolute error (MAE) and the elapsed time for fit to evaluate these models. We did the software implementation using a computer with an Intel (R) Core (TM) i7-4820K CPU @ 3.70GHz and a RAM of 16.00 GB.

Descriptive Characteristics
The mean cumulative rainfall, temperature, wind speed, elevation, usage of piped drinking water, public  We depicted a single map to show the heterogeneity in the geographical distribution of the cumulative number of U5AWD cases during this period in Iran (Figure 1b). Hamadan reported the highest incident value with 57379 cases and Simorgh reported the lowest (44 cases). It seems that the central parts have much lower incident cases than the southeast, where too many cases were reported. We will rigorously examine the clustering possibilities and find the anomaly areas in section 3.3.

Hotspot analysis
Before following LISAs, we presented the subnational estimates of SIRs. As illustrated in Figure 2, there is an explicit spatial dependency in SIR's values warning us about potential seasonal outbreaks in many districts.
We revealed a total of 185 significant High-High (HH) clusters of U5AWD by computing district-specific Moran's I in each year-season. Nevertheless, no Low-Low, High-Low, and Low-High cluster was detected. We should point out that HH clusters are hotspot districts also surrounded by high-risk neighbors. Figure 3 displays significant HH clusters' locations at three confidence levels (90-94%, 95-99%, and ≥99%). Ijrud and Khoramdareh were the most frequent HH clusters, followed by Jask and Mahneshan. Like Figure 2, the southeast of Iran overlays more HH clusters than other parts. A complete list of cluster names is provided in Supplementary Table 2.

Model selection
We listed the values of the mentioned selection criteria for all twelve models in Table 3. Model 9 and Model 12 show the ability to outperform others, having lower DIC, WAIC and LS and higher 95% coverage and MAE values. Choosing between the two, Model 12 seems to fit more considering DIC, , and LS as the main criteria and Model 9 has better performance in terms of WAIC, 95% coverage and MAE. Finally, we picked up Model 12 as the best since it is preferred based on Bayesian model validation tools.

Modelled trends
Considering Model 12, Figure 4 shows the modeled time trend of U5AWD on a national scale during sixteen seasons from 2014 to 2018. Visually, the seasonal variations in the raw case numbers seem to be wellcovered by the fitted values. As described before, every four years' high and low incident peaks occurred in summer (July to September) and winter (January to March), respectively. Moreover, the 95% credible interval has a perfect coverage overlaying all the observed national values, though the district level coverage is around 65.7%.

Bayesian inference
Lying under a Bayesian framework, we illustrated the marginal posterior distribution of the best model's coefficients (described in section 2.5: Model 12). We accordingly reported the exponentiated form of posterior samples of coefficients, named incidence rate ratio (IRR) in Table 4. We then estimated the combined IRR effects for climatic variables by multiplying the global and seasonally varying effects in every season (exp( + , ) = exp( ) exp( , )).

Discussion
It is plain to see that an apparent seasonal pattern exists in the temporal trend of U5AWD incidence in Iran, like trends in other countries [17,23]. This variation has also been observed elsewhere, focusing on various diarrheagenic Escherichia coli pathotypes [52]. We observed a decreasing trend from summer (Jul-Sep) to winter (Jan-Mar), which is in complete agreement with the diarrhea trend in Afghanistan[23], the eastern neighbor of Iran. However, Our overall trend is different from the observed seasonal trends in Australia[20], India [8] and Nepal [53], which could be due to the differences in the classification of seasons in these countries.
Unlike the Nepalese study mentioned above, some districts of Iran are significantly and spatially clustered in terms of U5AWD rates. Again, the significance of spatial autocorrelation in Iran concurs with the study in Afghanistan[23]. As displayed in Figure 2-3, the southeast of Iran has a worse situation than other parts. The first potential reason might be the immigrants and travelers who covey the rotavirus coming from eastern neighbors of Iran (e.g. Pakistan and Afghanistan). However, traveling history had a subtle global effect and did not show any remarkable association with U5AWD in this study.
Our results do not show any global or seasonal association between rainfall and U5AWD, being consistent with studies in Indonesia [12], Rwanda [7] and Afghanistan [23]. On the other side, our findings on rainfall do not fit several studies in Fiji [6], Taiwan [15], India [8], China [13,17], Ethiopia [16] and Senegal [11]. In 2014, Carlton et al. [9] found heavy rainfall as a key risk factor of diarrhea incidence, especially in periods of low rainfall. Iran is experiencing an unprecedented drought period, receiving less than a third of the global average precipitation [54]. From 2014 to 2018, we estimated that the average annual precipitation of Iran is equal to 69.74 mm. Our estimate is 183 mm lower than the mean annual precipitation from 1980 to 2004, showing a dramatic reduction. Meanwhile, Iran has recorded significant flood events during this period. It is predicted that flood events will increase in southern parts [54], where we detected significant HH clusters of U5AWD.
Over recent decades, Iran's warming trend has also been exacerbated as a consequence of global warming [54]. We quantified the risk of U5AWD to increase by 4.97% for 1 °C increase in temperature (Table   4). In 2003, WHO estimated that 1 °C increase in temperature is associated with 5% growth in global incidence of diarrhea [55]. After a decade, the influence of temperature on U5AWD in Iran is quite close to the global effect in 2003, recording 0.03% risk difference. Globally, there has been an extensive discussion about the link between temperature and the incidence of diarrheal diseases through decades. We found temperature positively and importantly associated with U5AWD, fitting with studies in Fiji [6], Bangladesh [10], Taiwan [15],  [11,33] have acknowledged that a lower incidence of diarrhea accompanies better economic status. The community-level effect of wealth status on the incidence of infectious diseases has made many global institutions (e.g., IHME) present the burden estimates by economic strata.
Urbanization was also another important predictor of U5AWD. Based on a systematic review and metaanalysis [43], Ethiopian under-five children living in urban areas were more likely to have diarrhea than those inhabiting rural areas, completely agreeing with what we found in Iran. The underlying reason might be the exceedance allocation of healthcare facilities to urban areas. Beyond that, the incompleteness of registered case numbers is more common in rural areas than urban areas due to the lower number of public health sectors. Also, a study in Ghana[41] found rural areas at higher risk of childhood diarrhea, considering interaction terms between urban-rural residence and wealth quintile of households.
We found the best blend of simplicity and effectiveness in Model 12, followed by Model 9. The most critical aspect that discriminates Model 12 is the inclusion of interaction type IV. This term assumes that the temporal trend of U5AWD in every district is correlated with the average time trends of adjacent districts [44].
The observed superiority of interaction type IV lends support to the findings of a similar small-area study watching other infection dynamics [57]. On the other side, interaction type I considers an unstructured spacetime relationship. We constructed a matrix of size 16×16 (Instead of 4×4) in both models to consider maximum long-term temporal variations placing year-season combinations on the rows and columns.
This study contains several strengths in both methodology and application. This study investigates the etiology of seasonal U5AWD incidence in an extensive geographical area, while most studies are restricted to local areas and cross-sectional time frames. The study also considers small areas (districts) as the study units, unlike most previous studies where individuals and their sociodemographic characteristics are assessed. The study tried to meet the recommended requirements for a reliable small-area study at the data cleaning stage as surveillance data should be dealt with and many design issues are likely to happen [59]. In this case, the misclassification of disease counts to districts is corrected by uniting the dataset and updating the older administrative divisions to the present. In addition, this study simultaneously discusses the spatial and seasonal distribution of U5AWD and the adjusted seasonally varying climatic effects by including the information of available socioeconomic determinants.
Certain limitations deserve to be mentioned. First, Iran's surveillance system for communicable diseases suffers from incompleteness. For example, it does not cover private health centers and self-medicated patients are not counted. To address this issue, MOHME should re-establish the surveillance system and improve the data gathering procedure. Second, stational data may not express all the heterogeneity of climatic variables in districts as a limited number of them collect the climatic data under the supervision of IRIMO. Third, we were forced to define a community-level frame because we had permission only to use the aggregated incident counts of U5AWD. Thus, any individual-level inference by the readers might be perplexing or inappropriate. Fourth, our results are not entirely immune against many forms of ecological fallacy since we assumed a monotone distribution for U5AWD within districts. Lastly, we failed to access the U5AWD data after the January of 2018, and further studies are required to provide us with an up-to-date picture of U5AWD in Iran.

Conclusions
Taken together, the current study provided strong evidence about the spatiotemporal dynamics of U5AWD in Iran applying novel geospatial methodologies. U5AWD showed a decreasing trend from summer to winter, and by utilizing spatial autocorrelation analysis, most hotspots of U5AWD were found in southeast areas. A Bayesian hierarchical spatiotemporal model consisting of seasonally varying coefficients (for climatic variables) and space-time interaction (type IV) outperformed the other eleven choices in predictive ability and flexibility. In this model, the estimated seasonal coefficients of meteorological factors suggested temperature as a critical factor in explaining U5AWD, especially in winter and fall. The selected model proposed elevation, piped drinking water, public sewerage network, IHSR, YOS, wealth index, and urbanization as deep-rooted community-level predictors of U5AWD. The development of specific alert systems could be a strategy to predict high-risk areas of U5AWD using climatic inputs. Also, the study anticipates a higher incidence of U5AWD in districts with weaker hygiene and socioeconomic status. Therefore, policymakers should take appropriate preventive actions in such areas. the findings of this study are available from MOHME but restrictions apply to the availability of these data, and so are not publicly available. Data may however be available from the authors upon reasonable request and with permission of MOHME.

Competing interests:
The authors declares that he has no competing interests.            The geographical map of measured Standardized Incidence Ratios (SIRs) for U5AWD in every season and district of Iran. The geographical location of signi cant High-High (HH) clusters of U5AWD in every season with 90-94%, 95-99%, and ≥99% con dence levels in Iran.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.