Applying the spatial transmission network to the prediction 1 of infectious diseases across multiple regions 2

18 Background: Timely and accurately forecasting of the infectious diseases is essentially 19 important for achieving precise prevention and control. A good forecasting method of 20 infectious diseases should have the advantages of interpretability , feasibility and 21 forecasting performance . Since our previous research had illustrated that the spatial 22 transmission network showed good interpretability and feasibility , this study further 23 explored its forecasting performance for the infectious diseases across multiple regions. 24 Methods: Under the topological framework of spatial transmission network, the vector 25 autoregressive moving average (VARMA) model was built in a systematic way for 26 parameter learning. Moreover, we utilized the prediction function of the VARMA model 27 to further explore the forecasting performance of the spatial transmission network. The 28 fitting and forecasting performance of the spatial transmission network were 29 subsequently evaluated by comparing the accuracy and precision with the classical 30 autoregressive moving average (ARMA) model. The influenza-like illness (ILI) data in 31 Chengdu, Deyang and Mianyang of Sichuan Province from 2010 to 2017 were used as 32 an example for illustration. 33 Results: ① The estimated spatial transmission network revealed that the influenza 34 may probably spread from Chengdu to Deyang during the study period. ② For fitting 35 accuracy, the spatial transmission network had different fitting performance for each 36 city. The spatial transmission network performed slightly worse than the ARMA model 37 in Deyang, but had better fitting performance in the other two cities. ③ For 38 forecasting accuracy, the spatial transmission network outperformed the ARMA model by at least 1% for both mean absolute error (MAE) and mean absolute percentage error 40 (MAPE). ④ The forecasting standard errors of the spatial transmission network were 41 smaller than those of the ARMA model. 42 Conclusions: This study applied the spatial transmission network to the prediction of 43 infectious diseases across multiple regions. The results illustrated that the spatial 44 transmission network not only had good accuracy and precision in forecasting 45 performance , but also could indicate the spreading directions of infectious diseases 46 among multiple regions to a certain extent. Therefore, the spatial transmission network 47 is a promising candidate to improve the surveillance work. 48 analysis 50


7
week could be noted as , where t x was a vector with three 126 series components (boldface notation indicated vectors and matrices in the paper). The spatial transmission network contained two kinds of information. The first was 137 structural information, which was related to the existence and direction of disease 138 transmission between places in the network. The second was parametric information, 139 which measured the intensity of disease transmission between different regions. 140 Correspondingly, the construction of spatial transmission network consisted of structure 141 learning and parameter learning, which extracted the above two kinds of information 142 from the original data respectively [6] . We used the dynamic Bayesian network model technique (such as the Kronecker indices method) before any of its further application.

155
A general VARMA (p, q) model can be written as where p and q are nonnegative integers, a0 is a three-dimensional constant vector, ai 158 and bj are 3×3 constant matrices, and   t  is a sequence of independent and 159 identically distributed random vectors. Once the identification problem has been settled, 160 equation (1) could be transformed into equation (2) to address its meaning in public 161 health. In equation (2), the autoregressive coefficient         1  0,1  ,11  ,12  ,13  1  1  ,11  ,12  ,13   2  0,2  ,21  ,22  ,23  2  2  ,21  ,22  ,23  1  3  0,3  ,31  ,32  ,33  3  3  ,31 ,3 In this study, the Kronecker index approach was used to perform structural 169 specification of VARMA model [7] . For a multivariate time series t x , the Kronecker 170 index approach seeks to specify an index for each component of (2) were determined. Tiao and Tsay proposed to use the two-way p-value table for 179 extended cross-correlation matrices to specify the order (p, q) [14] . Once the orders were where t F denotes the information available at t and l  is the coefficient matrices. 187 We used the minimum mean-squared error criterion for the forecasts of the VARMA (p, After the spatial transmission network was built, future values of logarithmized ILI% Residuals cov-matrix= 0.09 0.01 0.00 0.01 0.12 0.01 0.00 0.01 0.05 According to the estimated results of equation (6), the logarithmic transformed ILI%   of future events was the major concern for infectious diseases surveillance.  were reduced by as high as 5.6277% and 5.4052% with that of ARMA model.

320
While it was plausible to imply that the spatial transmission network was generally 321 better than the ARMA model in accuracy, the results also indicated the robustness of 322 spatial transmission network from fitting to forecasting. For example, it should be noted 323 from Table 1 and Table 2      The results of this study illustrated that the spatial transmission network had 386 advantages in forecasting performance of infectious diseases across multiple regions.

387
The accuracy and precision of its forecasting performance were superior to the ARMA 388 model as a univariate time series. Combined with our previous research results [6] , that 389 was, the spatial transmission network also showed good interpretability and feasibility, 390 it could be seen that the spatial transmission network would be very helpful in guiding 391 the practical prevention and control work of infectious diseases. According to the 392 description of Stoto, a practical surveillance system should include three parts: 393 continuous monitoring of multivariate data, applying algorithms to raise the alarm when 394 something unusual is happening, and a protocol on how to respond to an alarm [18] . It 395 followed from our results that the spatial transmission network could at least assist to 396 improve all the three parts of surveillance system. Firstly, as a multivariate time series 397 analysis model, the VARMA model could inherently integrate and extract information 398 from multivariate data, which not only limited to a single variable (e.g., ILI%), but 399 could also contain other-type variables (e.g., the data involves lots of factors like ILI% both strength of association and temporality, which are the two key points of the Hill's 418 criteria for causality [19] . Meanwhile, under some mild conditions, the probability 419 distribution of VARMA model could be reliably represented as a casual network, and 420 the latter is a commonly used tool in causal inference [20] . To this end, it is plausible that 421 the spatial transmission network could at least partly serves for the etiological study to  Therefore, it is highly expected that the spatial transmission network could provide a 439 new way for causal inference in the surveillance of infectious diseases. Ethics approval and consent to participate 460 The ILI surveillance was a routine surveillance activity. The analysis of ILI data was 461 not considered human subject research. No administrative permission was needed to 462 assess the data. 464 Not applicable.

465
Availability of data and materials 466 The data that support the findings of this study are available from Sichuan Center for 467 Disease Control and Prevention but restrictions apply to the availability of these data, 468 which were used under license for the current study, and so are not publicly available.

469
Data are however available from the authors upon reasonable request and with 470 permission of Sichuan Center for Disease Control and Prevention.

471
Competing interests 472 The authors declare that they have no competing interests.