A comparison of modelling the spatio-temporal pattern of disease: a case study of schistosomiasis japonica in Anhui Province, China.

The construction of spatio-temporal models can be either descriptive or dynamic. In this study we aim to evaluate the differences in model fitting between a descriptive model and a dynamic model of the transmission for intestinal schistosomiasis caused by Schistosoma japonicum in Guichi, Anhui Province, China. The parasitological data at the village level from 1991 to 2014 were obtained by cross-sectional surveys. We used the fixed rank kriging (FRK) model, a descriptive model, and the integro-differential equation (IDE) model, a dynamic model, to explore the space-time changes of schistosomiasis japonica. In both models, the average daily precipitation and the normalized difference vegetation index are significantly positively associated with schistosomiasis japonica prevalence, while the distance to water bodies, the hours of daylight and the land surface temperature at daytime were significantly negatively associated. The overall root mean square prediction error of the IDE and FRK models was 0.0035 and 0.0054, respectively, and the correlation reflected by Pearson's correlation coefficient between the predicted and observed values for the IDE model (0.71; p<0.01) was larger than that for the FRK model (0.53; p=0.02). The IDE model fits better in capturing the geographic variation of schistosomiasis japonica. Dynamic spatio-temporal models have the advantage of quantifying the process of disease transmission and may provide more accurate predictions.


Introduction
With the progress of computer technology and geographic information systems (GIS), the popularity of geographic modelling contributes to the assessment of the risk patterns of diseases in a growing number of studies.In addition, full awareness of the spatio-temporal dynamic process of infectious diseases strongly influences prevention and control in general [1].In principle, spatio-temporal models are either constructed by a descriptive approach or a dynamic one [2].The former kind uses mean function and covariance to analyse the temporal and spatial variations of the variables of interest [3,4], while the latter describes the dynamic evolution quantitatively under different space-time states, which can easily be simulated and explained [2].
Dynamic methods, on the other hand, have been applied in various fields, such as environmental science [18], public health [19], etc., to explore the basic process of spatio-temporal transmission dynamics.However, there is a paucity of comparative studies on the performance of the descriptive spatio-temporal models and dynamic spatio-temporal models in practical applications.
Generally, when exploring the spatio-temporal variations of diseases, we are likely to select descriptive, spatio-temporal models in the absence of kinetic knowledge.Its main idea is to specify the dependence structure in the random process [2], which fails to capture the evolution of the dependent random process through time quantitatively.
By describing the mechanical process via scientific knowledge and probability distributions, the dynamic space-time model quantifies the evolution of current and future spatial fields from past fields [2], which is more suitable for prediction in the complex dynamic process [18].In addition, there is evidence that dynamic models are well suited for modelling dynamic processes [20,21,22] with high predictive accuracy.
These models quantify the dependencies of the underlying dynamic processes of disease transmission.The process of disease transmission may be influenced by many unknown and uncertain factors, especially in complex health environment.Elements contributing to this complexity include human behaviour, pathogen evolution, environmental change and socio-economic status, etc [23,24,25,26].Hence, dynamic spatio-temporal models are considered to have the potential to explore the apparently intractable complexities of disease dynamics [24].
In our study, we employed a descriptive method (Fixed Rank Kriging -FRK) [2] and a dynamic method (Integral-Difference Equation -IDE) [2] to model the spatio-temporal pattern of risk due to schistosomiasis japonica in Guichi, Anhui Province, China.The results of the models were compared, which could provide better options for spatio-temporal modelling in further studies.

Study area
Guichi, a district situated in the eastern part of Anhui Province, China, covers an area of about 2,500 km 2 , with a population of approximately 0.67 million (2018) (http://tjj.ah.gov.cn/oldfiles/tjj/tjjweb/tjnj/2019/cn.html).It's located in the middle and lower reaches of the Yangtze River close to Qiupu, another major river that runs across the region (Figure 1).The presence of the two rivers and the humid subtropical monsoon climate provide a favourable environment, on the one hand for the survival and breeding of the schistosomiasis intermediate snail host [27] and, on the other, for the sustained parasite life cycle that involves humans and other mammals as definitive hosts.

Parasitological data
The prevalence of Schistosoma japonicum infection from 1991 to 2014 were collected by cross-sectional surveys each year, which were through the organization of Health professionals in Anhui Institute of Parasitic Diseases (AIPD).The data were obtained by annual village-based area surveys including residents aged 5 to 65 years based on a two-pronged diagnostic approach: First, all participants were screened by the Indirect Hemagglutination Assay (IHA) [28] test for anti-schistosomiasis antibodies and then the Kato-Katz technique [29] was applied using three thick smears from stool samples of all seropositive individuals.The number of sample villages ranging from 60 to 132 in our study, and we included those with more than 100 participants .

Environmental data
The environmental factors studied included hours of daylight, distance to water bodies, average daily precipitation, the normalized difference vegetation index (NDVI), and land surface temperature at daytime (LSTday).Data on hours of daylight and average daily precipitation were acquired from the China Meteorological Data Sharing Service System (http://cdc.cma.gov.cn/home.do).Inverse distance weighting (IDW) interpolation was applied to analyse the meteorological data available at 610 weather stations nationwide by ArcGIS 10.4 (ESRI, Redlands, CA, USA), then selecting the data within the study area.NDVI and LSTday were downloaded from the Level 1 and Atmosphere Archive and Distribution System (http://ladsweb.nascom.nasa.gov)with all 8-day global 1-km products for LSTday and monthly global 1-km products for NDVI.Water-body data were extracted from the World Wildlife Fund's Conservation Science Data Sets (http://worldwildlife.org).Euclidean distance from each village to water bodies were calculated using ArcGIS 10.4.

Statistical analysis
A Box-Cox transformation [30] of the crude prevalence was performed to apply the Gaussian model before implementing the spatio-temporal model.For covariate selection, univariate analysis was used for preliminary screening with the exclusion criteria of p>0.1.A multicollinearity analysis was then conducted among the remaining variables.Correlation coefficients>0.6 were considered collinearity, and we eliminated meaningless variables by the backward stepwise regression method.
Finally, we tested the spatial autocorrelation of the residual of the multiple regression (non-spatial model) via semi-variogram [31].Lack of spatial autocorrelation with the residuals would indicate spatiotemporal variability of the variables in the non-spatial model; otherwise, spatio-temporal models considering spatial autocorrelation were employed.

The FRK model
The FRK model was used to explore the spatio-temporal pattern of schistosomiasis and to evaluate the influence of environmental factors.We assumed a two-stage model based on the idea of hierarchical model (HM), which was specified as a data model with the conditional probability distribution given by a latent process and some parameters, and a process model that described the process and its uncertainties given by some parameters.
1.The data model: let Z (s; t) be the observed value of schistosomiasis in village s at year t, which considered an additive measurement error.The data model used was expressed as: where Y (s; t) is the latent process, and ε(s; t) is the observation errors independent of Y(s; t), which is assumed to be the independent identically distributed (iid) mean-zero measurement error; and 2. the process model was expressed by the equation: (; ) = (; )′ + (; ) (2) where x (s; t) is a fixed term indicating the covariant environmental factors, and η (s; t) denotes a mean-zero random process that is dependent on space-time.Random effects can be divided into two parts, as follows: where {  (; ),  = 1, … ,   } , corresponding to location (s; t), specifies a spatiotemporal basis function, {  }, which are random effects explaining variations in schistosomiasis risk, and (; ) represents the small-scale spatiotemporal random effects not captured by the basis function.

The IDE model
To capture the spatiotemporal dynamic evolution of schistosomiasis in the villages, we also constructed a dynamic model based on HM, as follows: 1.The data model: where   is the data vector, including arbitrary position at time t;   an additive offset term that describes the main trend of   corresponding to non-dynamic spatio-temporal structure;   the observation mapping matrix which relates the process to the observations gained by mapping;   the latent dynamic spatio-temporal process; and   an additive error process that is time varying (but statically dependent in time, assumed to be a mean-zero, Gaussian process); and 2. the process model, an approach represented by a first-order, integral-difference equation in a continuous spatial field.It is generally assumed that the process satisfies a Markovian spatio-temporal process, which indicates that its value at the current, given position is composed of two parts: a weighted combination of processes in the entire space domain at the previous time, and an additive, Gaussian and spatially coherent "innovation" expressed as: where (, ;   ) is a transition kernel, depending on the parameters   ; and   () a mean-zero Gaussian spatial process independent of  −1 (•) that changes over time but is statistically independent.In a one-dimensional spatial domain, (, ;   ) is assumed to be a Gaussian-shape kernel as a function of x relative to the location s: where the coefficients of diffusion are  ,1 (the spatially varying amplitude) and  ,2 (the spatially varying kernel aperture); and the mean (shift) parameters  ,3 and  ,4 correspond to a shift of kernel relative to location s.Here we specified a spatially invariant kernel.All the parameters were estimated using the expectation-maximization (EM) algorithm and the inference for the latent process   was implemented using Kalman filtering and smoothing [32].The parametric significance (p-value) was calculated using a permutation approach with 99 random simulations of the observed prevalence.

Model validation
The accuracy of the models determines the quality of the prediction results.A 10-fold cross-validation was implemented to compare the performance of the FRK and IDE models.As validation data, 10% of the sample was randomly selected, with the remaining 90% used as training set.The root mean square error (MSPE) is given by the equation:  (7) where {  (  ;   )} are observations of 10% spatio-temporal validation samples; and {  ̂(  ;   )} predictions based on the rest of the observations.We also investigated the correlation between predicted and observed values and producing the correlation scatter plots of the two fitting models.All statistical analyses were performed using R3.6.3 software (R Development Core Team, Vienna, Austria -http://cran.r-Project.org).The study area was divided at the 1-km resolution level to obtain the above environmental variables, and then to make predictions.

Results
The median annual prevalence of schistosomiasis decreased from 0.037 in 1991 to 0.000 in 1998; however, a recurrent sharp upward move started in 2005 reaching a second lower top at 0.023 in 2007 after which a gradual decline followed over the next few years (Figure 2).The interquartile range (IQR) was 0.008-0.033 in 1991, decreased to 0 by 1998, increased again reaching 0.011-0.028 in 2007 to finally shrink back to 0-0.003 in 2014.

Figures 2 and 3 about here
Figure 3 shows that the residual semi-variogram of the regression model increases gradually with the increase of time and distance indicating a spatial autocorrelation of the residuals, with the spatio-temporal model outperforming the non-spatial one.The estimates for all environmental factors are presented in Table 1.In both models, FRK and IDE, the five covariates were all statistically significant, with the average daily precipitation (p<0.01) and NDVI (p≤0.02)positively associated with schistosomiasis risk, and distance to water bodies (p≤0.02),hours of daylight (p <0.01) and LSTday (p <0.01) negatively associated.

Table 1 about here
The results of cross validation of the two models are shown in gradually decline (shown as large blue patches in the figure).The FRK model showd a similar pattern except the higher values (more yellow parts).Figure 6 presents the corresponding standard errors of the predictions.The uncertainty of the IDE model (right) was higher at places far from the sample villages during our study period.Note that the standard errors after 2005 were slightly lower than previous years.The FRK model in Figure 6 (left) indicates that the standard errors were higher than that shown by the IDE model.

Discussion
This study compared the performance of two spatio-temporal models, FRK and IDE, by fitting to the prevalence of schistosomiasis in Guichi, China.We found that the dynamic approach is more accurate in terms of data fitting when exploring the evolution pattern of schistosomiasis.The IDE model had smaller MSPEs and a higher correlation between predictions and observations.Thus, the spatio-temporal patterns of disease dynamics are better captured by the IDE model, which should help to identify accurate areas for future schistosomiasis control and surveillance.
Our study shows that both models investigated fit better than non-spatial models, with the IDE model considered optimal.The process of disease transmission can be affected by many uncertainties, such as pathogen variation and evolution, climate change, variation of intermediate and definitive hosts and not the least active control activities, as coordinated interventions across times and locations would influence the patterns of disease transmission strongly [11].Therefore, quantitative analysis is crucial to explore the complex mechanism of space-time dynamics [33].In our study, the dynamic model predicts the disease more accurately as it is committed to quantify the dynamic changes of disease risk infection by kinetic equations, while the descriptive model is only based on functions that describe the distribution of the disease.This gives dynamic models advantages in predicting the likelihood of disease reoccurrence, particularly when the mechanisms governing its spreads are understood so that effective measures can be instituted at the right time [19].In assessing the association of schistosomiasis infection with covariates, the results of the two spatial models were similar (Table1).The environmental variables here have been investigated in previous studies [8,9,10,11] and are interpretable with respect to the epidemiology of schistosomiasis and snail biology.
The prevalence of schistosomiasis showed different distribution patterns during the study period.Figure 5 shows that the high-density incidence areas are mainly located near the rivers (parts of the central and northern regions), indicating that these regions should remain foci for active schistosomiasis control [34].The goal of schistosomiasis control is to prevent infection by interrupting the life cycle of the parasites, e.g. by eliminating the intermediate snail hosts [35].It is clear that the identification of high-risk areas for schistosomiasis is conducive to the implementation of effective integrated control measures, such as environmental modification, snail elimination and health education [36,37].Indeed, the national schistosomiasis control strategies (interupting transmission, controlling the source of infection, chemotherapy of the definitive hosts, etc.) implemented at different stages in Guichi [38] have effectively controlled the infection of schistosomiasis and kept it at a stable low level after 2005.
The limitations of this study also need to be elucidated.First, we considered only environmental factors, and some social factors associated with schistosomiasis should be considered, such as medical conditions and the general economy as expressed by the gross domestic product (GDP).Hence, more comprehensive data collection for research should be considered in the future.Second, since it is affected by many uncertain factors (human behaviour, pathogen variation, environmental change, etc.), the process of disease transmission is extremely complex and changing, which may not be adequately captured by linear models.Non-linear models will therefore be needed for the exploration of the more dynamic disease patterns in future studies.

Conclusions
Our study constructed two spatio-temporal models (the FRK model and the IDE model) to compare the differences in capturing changing pattern of schistosomiasis risk.The former used a descriptive approach and the latter a dynamic one.The results show that the IDE model outperforms the FRK model.Dynamic models should be given more consideration for spatio-temporal modelling of disease patterns in the future.

Ethics approval and consent to participate
The Ethics Committee of Fudan University approved the oral consent and other

Figure 4 .
The overall MSPE of the IDE model and the FRK model were 0.35e-02 and 0.63e-02, respectively.In addition, the MSPEs of the IDE model was lower than that of FRK model in most years.Pearson correlation analysis showed that the correlation between the predicted values and the observed values of the IDE model (r=0.71)(p <0.01) was larger than that of the FRK model (r=0.51)(p=0.02).Predictions of annual prevalence by the IDE model are presented in Figure 5. From 1991 to 2001, the disease indicated a generally decreasing trend; however, with an occasional relatively high rise of prevalence (>0.14, shown as strong shades of yellow and red in the figure) but remained relatively stable state from 2006 to 2009 to finally.

Figure legend Figure 1 .
Figure legend

Figure 2 .
Figure 2. Box plot for the observed prevalence of schistosomiasis for sample

Figure 3 .
Figure 3. Empirical spatio-temporal semi-variogram of the residuals in

Figure 4 .
Figure 4. MSPE of the FRK predictions (red), the IDE predictions (blue) and the

Figure 5 .
Figure 5. Plots for the annual prevalence of schistosomiasis predicted from 1991

Table Table 1
Environmental factors of schistosomiasis japonica: the spatio-temporal FRK model and the spatio-temporal IDE model.