Identifying the dynamic pattern and inﬂuencing factors of inﬂuenza in Northwest China from 2013 to 2020, based on dynamic regression model and wavelet analysis

. Background: Inﬂuenza remains a serious global public health problem and a substantial economic burden. The dynamic pattern of inﬂuenza diﬀers considerably among geographic and climatological areas, however, the factors underlying these diﬀer-ences are still uncertain. The aim of this paper is to characterize the dynamic pattern of inﬂuenza and its potential inﬂuencing factors in Northwest China. Methods: Inﬂuenza case in Ningxia China from Nov. 2013 to Jun. 2020 was served as inﬂuenza proxy. Firstly, the baseline seasonal ARIMA model of inﬂuenza cases and seasonal pattern were analyzed. Then, the dynamic regression model was used to identifying the potential inﬂuencing factors of inﬂuenza. In addition, the wavelet analysis was further used to explore the coherence between inﬂuenza cases and these signiﬁcant inﬂuencing factors. Results: The high risk periods of inﬂuenza in Ningxia presented a winter cycle outbreaks pattern and the fastigium came in January. The seasonal ARIMA(0 , 0 , 1)(1 , 1 , 0) 12 was the optimal baseline forecast model. The dynamic regression models and wavelet analysis indicated that PM 2 . 5 and public awareness are signiﬁcantly positively associated with inﬂuenza, as well as minimum temperature is negatively associated. Conclusion: Meteorological (minimum temperature), pollution (PM 2 . 5 ) and social (public awareness) factors may signiﬁcantly associated with inﬂuenza in Northwest China. Decreasing PM 2 . 5 concentration or increasing the public awareness prior to the fastigium of inﬂuenza may be the serviceable methods to reduce the disease risk of inﬂuenza, which have an important implication for policy-makers to choose an optimal time for inﬂuenza prevention campaign.


Introduction
Influenza is one of the most serious diseases worldwide, resulting in an estimated 1 billion cases, 3∼5 million hospitalizations, and 290,000∼650,000 respiratory deaths globally per year [1]. Usually, the virus is spread through the air from coughs or sneezes [1]. The disease is caused by influenza viruses, can be divided into A, B, C three types. The influenza A viruses antigenic variation often happens, thus the infectious, spread rapidly and easily happened a pandemic [2]. We are now face with a situation where the virus has spread widely, when many people in all age groups in many countries have some immunity to the new virus, and where seasonal influenza A-type and influenza B-type viruses are being reported successively in many provinces [3]. The variation of influenza virus antigens [4] (mainly is hemagglutinin and neuraminidase) is constantly mutating, and then it makes that influenza pandemic has so far been unable to control effectively [5]. Based on this overall picture, evidence is strong that the recent influenza pandemic patterns are transitioning towards seasonal patterns of influenza [6]. WHO recommended that [3] all countries should maintain monitoring for influenza, through routine respiratory disease surveillance and reporting as well as monitoring and investigating unusual disease patterns suggestive of potential changes in the severity. Therefore, identifying the dynamics pattern of influenza and its potential influencing factors are interesting issues to help for guiding the decision making by disease prevention and control agencies and governments.
Time-series methods have served as an powerful tools to explore the dynamics of numerous epidemics [7], however, seasonality as a temporal pattern of systematic periodic oscillation within a predetermined cycle that can be characterized by peak timing, amplitude, and duration [8]. The dynamic pattern of influenza continues to attract many researchers' attentions, however there was likely to be no definitive explanation that covers all regions, climates and populations, at all times.
The influenza in each area was most likely the result of a number of factors contributing to different degrees to the observed incidence and timing of influenza infections. Given this, there were some retrospective studies on the seasonality and trend analysis of influenza data to describe the trends of influenza incidence [9,10]. At the same time, the mathematical models have become a powerful tools to investigate the time-series of infectious diseases recently, thus, various models (such as generalized regression [11], stochastic model [12], compartmental model [13,14] and autoregressive moving average models [7,15]) have been used to forecast influenza in order to analyze the trends and predict the root cause of the influenza incidence epidemic. In addition, the wavelet analysis recently has become an powerful tool to investigate the potential inter-connectedness between influenza and climate factors according to the method proposed by Grinsted, Moore and Jevrejeva [16,17]. These methods provide the approaches to investigated the dynamic pattern of influenza, which is important to analyze and know the epidemic situation of influenza.
The dynamics pattern of influenza might be related to many factors, including influenza virus mutation, people susceptibility, public awareness and climate changes, etc. In many environmental factors, the environmental pollution and climate are two important factors that can affect the influenza. The dynamic pattern of influenza differ considerably among geographic and climatological areas. For example, in the Northwest of China, coal fire-power industries and heating systems, as well as vehicle emissions, all conduce to air pollution (airborne fine particulate matter PM 2.5 , PM 10 , and SO 2 , etc.), which has affected the transmission of respiratory system infectious diseases [19,20]. Numerous epidemiological studies have consistently demonstrated that exposure to ambient PM 2.5 is associated with increased respiratory health outcomes [21,22]. Lowen and Steel [23] found that humidity and temperature play important roles in shaping the influenza seasonality. Experimental researches in guinea pigs indicated that the influenza is stable at low relative humidity (RH) and relatively unstable at intermediate RH. Shaman, Goldstein and Lipsitch [24] pointed out that lower temperature and absolute humidity (AH) increase influenza virus survival and transmission in temperate regions. Some studies respectively found that the influenza negative associated with temperature and AH in Beijing, North of China [25]. In addition, some studies illustrated that public awareness and preventive behaviors may significantly associated with infectious diseases [12,18,26,27]. As well known, health promotion efforts may increase the level of preventive awareness during disease outbreak, but most of the researches were based on the methods of questionnaire, the effect of public awareness on the influenza varies as time was still uncertain. Thus, identifying the potential interaction relationship between influenza and environmental factors (environmental pollution, climate and public awareness) is meaningful to provide scientific references for influenza and consequently carry out prevention strategies.
Ningxia Hui Ethnic Autonomous Region is located in the northwest inland plateau of China, with a typical continental semi-humid semi-arid climate, where have approximately 650 million populations [28]. Existing surveillance result has shown that influenza was one kind of major category C infectious diseases (accounted for about 2.85% of infectious diseases) in Ningxia, Northwest China [29]. However, there are few related studies focus on the influenza in Northwest China. In order to precisely predict of dynamic pattern and effectively identify the potential influencing factors of influenza, in this study, we analyzed the dynamic pattern of influenza by using ARIMA or ARI-MAX models, and explored the various environmental factors, including environmental pollution (PM 2.5 , PM 10 , SO 2 , NO 2 , CO and O 3 ), climate (minimum temperature and maximumu temperature) and public awareness (Baidu index), as possible drivers of influenza in Ningxia, Northwest China, from 2013 to 2020. These analyses aim to provide evidence for the prevention and control strategies of influenza in future.

Data collection
The data information was collected by three parts, the surveillance data for influenza, meteorological and public awareness data in Ningxia. The monthly influenza case in Ningxia Hui Autonomous Region from Nov. 2013 to Jun. 2020 (80 months) was obtained from the Notifiable Disease Surveillance System (NNDSS) (see Figure 1 (a)). The meteorological factors including PM 2.5 (particulate matter < 2.5µm in diameter, unit: µg/m 3 ), PM 10 (particulate matter < 10µm in diameter, unit: µg/m 3 ), SO 2 (sulfur dioxide, unit: µg/m 3 ), CO(unit: mg/m 3 ), NO 2 (unit: µg/m 3 ), O 3 (unit: µg/m 3 ), minimum temperature (unit: • C) and maximum temperature (unit: • C) were gathered from the National meteorological information Center (as shown in Figure 1 (c)-(j)) [30]. In addition, we used the monthly Baidu Index data from China's largest search engine as a proxy of public awareness [18,31,32]. We extracted the monthly Baidu Index data according to the Chinese keyword "influenza" and position located in Ningxia, China (see Figure 1 (b) for more details). The total data set was divided into two sets: the data from Nov. 2013 to Dec. 2018 was viewed as training set and the data from Jun. 2019 to Jun. 2020 was viewed as the validation set.

Seasonal ARIMA model
The time series analysis commonly identified as the time domain approach and the frequency domain approach. The time domain approach mainly reveals the evolution of time series from the perspective of sequence autocorrelation. The frequency domain approach mainly resort to Fourier analysis to explore the time series from the frequency viewpoint [33].
The autoregressive integrated moving average (ARIMA) model is one kind of time domain approach to explain all of the interesting dynamics of a time series. Since 1970, Box and Jenkins proposed the landmark Box-Jenkins method for parameter estimation and forecasting of ARIMA models [34]. Considering the ARIMA model contains the seasonal components, the product sea-sonal ARIMA model (ARIMA(p,d,q)×(P,D,Q) τ ) was given as : where where B denotes the backshift operator, ǫ t denotes the estimated residuals at time t with 0 mean and constant variance, τ is the seasonal cycle length. p, P, d, D, q, Q are non-negative parameters, which were needed to be determined when fitting the seasonal ARIMA model. p and P are the orders of autoregressive and seasonal autoregressive respectively, d, D are the orders of difference and seasonal difference respectively, q, Q are the orders of the moving average and seasonal moving average respectively [35].
Generally, the following methodological steps should be carried out to establish the ARIMA models.
1. Observed the character of the time series, which should be a non-white noise sequence, which usually examined by Ljung-Box (Q) test. Moreover, the data needs stationary to be a time series. The difference or seasonal difference is required to process the non-stationary original sequence, and the stationarity of the sequence can be tested by Augmented Dickey-Fuler (ADF) test. 3. According to the above step, there may exist several models meet the conditions, Akaike information criterion (AIC) and Bayesian information criterion (BIC) are used to identify the optimal model.

Dynamic regression model
To further identify the influencing factors of influenza in Northwest China, dynamic regression model (also known as transfer function model, abbreviated as ARIMAX) plays an important role in describing and analyzing the relationships among several time series [34]. The dynamic regression model not only can understand the dynamic relationships over time between the influencing factors and seasonal influenza, but also can improve accuracy of forecasts for influenza series by utilizing the additional information available from the related environmental influencing factors series in the forecasts.
The dynamic regression model can be written as: where y t denotes the response variable, x t denotes the input variable, µ is the intercept of regression, is the residual sequence of regression, Φ(B) is the autoregressive coefficient polynomial of residual sequence, Θ(B) is the moving average coefficient polynomial of residual sequence, and a t is a white noise sequence with mean 0.
To establish the optimal dynamic regression model, we include the meteorological or sociobehaviors factors (Baidu Index as a proxy variable of public awareness) into the model to check whether this factor can improve accuracy of forecasts for the baseline seasonal ARIMA model.
The following sequential procedure for fitting the dynamic regression model to data. Firstly, use the cross-correlation functions (CCF) between the influenza cases and meteorological or sociobehaviors factors to suggest a potential influencing factor and its corresponding order of delay.
Then, to reduce the effect of autocorrelation of sequences on the processing results, the prewhitening process is preformed to filter the input and the response variables, which can be used to where X t is the real incidence at time t,X t is the estimated incidence at time t, and n is the number of predictions.

Wavelet analysis
In addition, an important approach of analyzing time series data in the frequency domain, as well as the time domain, is the investigation and exploitation of the properties of coherency.
Coherency is a frequency measure based on the correlation between two series at a given frequency.
Due to the epidemiological time-series are typically noisy, complex and strongly non-stationary [36], recently, wavelet analysis has been proposed to characterize the non-stationary time-series and also to estimate dependencies among the signals. Following the general approach proposed by Cazelles et al. [36], wavelet analysis performs a time-frequency decomposition of the signal, which provide the estimation of its spectral characteristics as a function of time and frequency [37]. Meanwhile, the potential causality links between two time-series can be explored by the wavelet coherence (phase relationships), which revealed areas with high common power in time-frequency space [16].
Wavelet coherence (WTC) finds regions in time frequency space where the two time series co-vary.
WTC shows the significant coherence against red noise in time-frequency space , which can describe the significant covariance at specific periods (frequencies) and phase shift between two time series. The phase difference between the two series is indicated by arrows. Arrows pointing to the right mean that the variables are in phase. Arrows pointing to the left mean that the variables are out of phase. The down arrows show that climate factor is leading. The up arrows mean that influenza virus is leading. In phase indicate that variables will be having cyclical effect on each other and out of phase or anti-phase shows that variable will be having ant-cyclical effect on each other. To explore the period synchronization phenomenon, and characterize the potential association between the time-series of the potential influencing factors and influenza cases at the monthly interval from Nov. 2013 to Jun. 2020, we used the wavelet analysis approach.

Seasonal ARIMA model
A total of 10,745 influenza cases were reported during this period. (see Figure. 1, a time series plot was used to evaluate the evolutional trend over a period of years and the division of training set and testing set). Since the influenza case data presents a significant periodic oscillation, we shall first choose seasonal index to assess the risk period and seasonal pattern. Seasonal index can be used as the estimate of the seasonal component, which was a well known and widely used techniques for dealing with seasonal patterns of data [38,39]. To assess the risks of each month in a year, we calculated the seasonal index of influenza for each month, as shown in Figure. 2, January has the high risk of influenza in each year, which is three times more higher than that of the other months. Thus, the infected risk of influenza in Ningxia usually presented a winter cycle outbreaks pattern, and the fastigium of influenza in December and January during one year.
Since the Ljung-Box (Q) test of influenza case time series obtain that P = 3.196 × 10 −8 < 0.05, indicating that the influenza case is not a white noise sequence. In addition, the ADF test of influenza case time series shows P = 0.01 < 0.05, which means that the sequence of influenza caseis a stationary sequence. Now, we can determine the ranges of seasonal ARIMA model parameters (p, q, P, Q) based on the autocorrelation function (ACF) and partial autocorrelation function (PACF). The ACF and PACF plot of were given as shown in Figure 3. ACF figure indicates that q = 0 or 1, and PACF figure shows that p = 0, 1 or 2. Since the cycle of influenza case is in years, the period is S = 12.
Then, the seasonal ARIMA(0, 0, 1)(1, 1, 0) 12 model of influenza has the following formula:  and forecasting results were displayed in Figure 4, which showed that real data scattered in the confidence interval.

Dynamical regression model
To explore the optimal multivariate dynamic regression model, we considered the meteorological factors and socia-behavior factor into the baseline ARIMA(0, 0, 1)(1, 1, 0) 12 model to explore the factors can improve accuracy of forecasts. As shown in Table. 3, the optimal ARIMA models of the potential influencing factors were fitted, which determined the transformation of influenza series in pre-whitening. The cross-correlation function (CCF) between influenza cases and these environmental factors series were plotted in Figure 5 indicated the significant association between influenza cases and meteorological factors or socia-behavior factor, which can be used to suggest the potential lag orders. According to the AIC criterion, as listed in Table. 4, the optimal lag order predictors was determined in pre-whitening. There are three ARIMAX models have lower AIC and passed the parameters test, that is, ARIMA(0, 0, 1)(1, 1, 0) 12 +PM 2.5 , ARIMA(0, 0, 1)(1, 1, 0) 12 +Minimum temperature, and ARIMA(0, 0, 1)(1, 1, 0) 12 +Baidu index. These results demonstrated that PM 2.5 , Minimum temperature and Baidu index may be the potential significant influencing factors of influenza in Northwest China.

Wavelet analysis
According to the ARIMAX models, we found that PM 2.5 , minimum temperature, Baidu index might be close related with influenza cases, thus, further identifying the co-varies and correlation direction between the above environmental factors and influenza is meaningful for early warning and vaccine strategies of the influenza prevention.   As shown in Figure 6, wavelet transform and wavelet transform coherence (WTC) provided the information with respect to whether two time series are correlated or co-varies at a particular time and frequency. In Figure 6 (a), the wavelet power spectrum of influenza demonstrated that from Jul. 2017 to Sep. 2018 of period 12, it has the maxima of the undulations of the wavelet power spectrum, which implies that this area may be the most relevant region. Here we notice that the common features we found by eye from the individual wavelet transforms stand out as being significant at the 5% level. In Figure 6 (b), the results of WTC shows that the influenza is statistically significantly associated with PM 2.5 at the period of approximately 40 months band (2015-2018) (P < 0.05), with arrows pointing to the right, which indicating that the influenza and PM 2.5 are positively correlated during 2015 to 2018. At the same time, we also found a significantly negatively correlation between minimum temperature and influenza at the period of approximately 40 months band (2015-2018) (P < 0.05) and the arrows pointed to right (see Figure   6 (c)). For Baidu index, we also observed one significant high power spectrum in the 30-60 month band (2016-2018) with in-phase, which means that the series are positively correlated during the period (see Figure 6 (d) for details). shown with dark red shade. The color code for power ranges from blue (low power) to red (high power).
X-axis is the time since Nov. 2013, and y-axis denotes frequency periods.

Conclusion
One important feature of influenza was the role of seasonal drivers that tend to limit the spread of the virus to particular periods of time of the year [9]. As a result of these seasonal drivers, influenza virus infections can generate recurrent epidemics year after year or even multiple waves of pandemic influenza [40]. Regardless of whether a place was in a pandemic situation or not, influenza viruses pose a risk of disease to many individuals and therefore, individuals should take prudent steps to reduce their risk of infection [6]. Therefore, identifying the seasonal pattern may contribute to the prediction of high risk period in the future and have implications for the outbreaks preparedness.
In this study, we reported monthly influenza cases data in Ningxia, Northwest China from Dec. 2013 and Jun. 2020, and investigated the dynamic patterns of influenza over the eight-year period. The seasonal ARIMA model and seasonal index of influenza cases were analyzed, the results indicated that the high risk periods of influenza presented a winter cycle outbreaks pattern and the fastigium came in January. The dynamic regression model was used to identifying the potential influencing factors of influenza. Moreover, the wavelet analysis was further used to explore the coherence or co-varies between influenza cases and these significant influencing factors.
These results found that PM 2.5 and public awareness are significantly positively associated with influenza, as well as minimum temperature is negatively associated with influenza. The seasonal dynamic pattern of influenza may be ascribed to the meteorological, pollution and social factors.
Many studies have illustrated some evidences for the association between particulate air pollution and human illness (especially cardiovascular and respiratory diseases) [28,41,42,43]. In China, the main research attention focused on the impact of PM 2.5 on the respiratory diseases for the developed provinces in southern China. Lei et al. [11] pointed out that children and the elderly people are more likely to be affected by air pollution due to their relatively weak immune systems, or/and less exposure to air pollutants. Many experiments have show that the exposure to air pollutant may result in lung function decline [44], and particulate matter of air pollutants may cause neurogenic inflammation of sensory nerve endings in the trachea [45]. The associations between the PM 2.5 concentration and influenza cases have been explored widely. However, the different regions meteorological/environmental conditions were diverse and the influenza dominant strain subtype in each year varied markedly [4]. The analysis of Ningxia can be viewed as a representation of an arid and semi-arid climate characteristics in Northwest China, since it was difficult to degrade and diffuse the environmental pollutant. Thus, decreasing PM 2.5 concentration prior to the fastigium of influenza may be the serviceable methods to reduce the disease risk of influenza.
Up to now, there were some results with respect to the association between temperature and influenza, especially in warm regions [46]. The temperature difference is an important feature of climate that can have important impacts on the influenza virus transmission [47] and may cause the seasonal influenza outbreaks [48]. Su et al. [17] observed a negative association between average temperature and three influenza virus subtype by using of Spearman's correlation and WTC. Chong et al. [49] verified that the temperature may be connected with the influenza virus epidemic, not only influenza virus A, but also influenza virus B. In this study, we also observed that minimum temperature is negatively associated with influenza cases, which was consistent with the finding of related studies of Lowen and Steel [17]. The mechanism may be that the influenza transmission was most efficient at low temperature (e.g. 5 • C), the virus transmission and survival capacity may decrease as temperature increase from 5 • C ro 20 • C and completely blocked at 30 • C in guinea pigs experiments [23]. Some studies also found that influenza virus, which is more stable at low temperature, may decrease the activities of proteases, reduce the mucus and ciliary movement and inhibit defense and immunity toward infection [50,51,52]. Thus, the minimum temperature may shape the seasonal dynamic pattern of influenza by affecting the host susceptibility and increasing the transmission of influenza virus in winter.
Public awareness may be conducive to the prevention of influenza infection have been reported by several groups [26,27]. Specifically, by using a structured questionnaire, Khowaja et al. [26] found that three types of preventive awareness during the influenza epidemic were emerged, and were related to preventive behaviors against influenza infection. Balkhy et al. [27] identified awareness, attitudes, and practices related to influenza A (H1N1) among the Saudi public. Public awareness play an important role in helping curb epidemic, and health messages delivered through various media, such as media, internet, etc. Zhao et al. [18] pointed out that google index was a effective proxy indicator of public awareness, which had an important influence on infectious disease transmission. In this study, we used the Baidu index as a proxy of public awareness, and found that public awareness are significantly positively associated with influenza case in Northwest China. One of the most important experiences was raising public awareness, which effectively prevents the spread of influenza, and enables residents to proactively prevent the disease. These results provided the scientific support for further measure to prevent and control influenza virus, including vaccination time, the time of taking health education to the high-risk populations.
However, this study also has several limitations: 1) the time-series model has some inaccurate forecasts for some periods may due to the changes of dominant strains each years, however, our time-series data was not include subtype-specific of influenza virus (e.g., H1N1, H3N2, Victoria, Yamagata subtypes).
2) The influenza surveillance data used in our study were limited to sentinel hospitals, limited to one single province and also limited to a relatively short period of time.
Thus, our data may underestimate the influenza cases, but we thought that the trends of influenza infected number in consistent seasonal pattern was still hold. 3) In addition, the influenza-like illness should be include in the future, which can provide a more stronger evidence for different region characters. These issues leave to the further consideration.
Despite all these limitations, our works concerned with prediction of the seasonal dynamic pattern of influenza and association between influenza cases and potential meteorological (minimum temperature), pollution (PM 2.5 ) and social (public awareness) factors. These analysis will help policy makers and health institution in order to explore potential benefits of vaccination strategy (vaccine recommendations of flu high-risk season) with a view to control and/or eradication of transmission of influenza appropriate for specific periods of time.
Ethics approval and consent to participate No involved.
Consent for publication All authors consent for publication.

Conflict of interest
The authors declare that they have no conflict of interest.