A Proposal of Two-step Autoregressive Model

Autoregressive (AR) model and conditional autoregressive (CAR) model are specific regressive models in which independent variables and dependent variable imply the same object. They are powerful statistical tools to predict values based on correlation of time domain and space domain, which are useful in epidemiology analysis. In this research, I combine them by the simple way in which AR and CAR is estimated in two separate steps so as to cover time domain and space domain in spatial-temporal data analysis. Moreover, I integrate logistic model into CAR model, which aims to improve competence of autoregressive models.


Two-step autoregressive model
Autoregressive (AR) model evaluates new value of a variable based on its old values; in other words, independent variables called regressors and dependent variable called responsor refer to the same concerned object (CO). AR model is often used to predict new values in time series, which is called temporal autoregressive model. Conditional autoregressive (CAR) model also follows ideology of AR model where regressors and responsor imply the same CO but CAR model added more values of neighbors of CO. Therefore, CAR model is often used to predict new values in space domain, which is called spatial autoregressive model. There are some researches like (Mariella & Tarantino, 2010) to integrate AR model and CAR model but here I combine them consequently by the simple way in which AR and CAR is estimated in two separate processes. In other words, the research uses mainly autoregressive model through two steps: 1.
Step 1: CAR model known as spatial autoregressive model is estimated to predict responsor based on its neighbors and its environmental features like humidity, precipitation, etc. As a convention, responsor is called case like disease case in epidemiology.

2.
Step 2: AR model known as temporal autoregressive model is estimated to forecast next case based on current cases. Let S = {1, 2,…, n} be spatial domain and so each site s can correspond to an area. Let nb(s) be set of neighbor sites of site s. Let Xs = (xs1, xs2,…, xsk) be the vector of environmental features at site s and so xsj is the j th environmental feature at site s. Let Ys be the outcome number of cases at site s. If time is concerned, Xs and Ys are replaced by Xst and Yst, respectively. Suppose the time interval is T = 2, this research firstly uses spatial autoregressive model to estimate the number of cases Yst = Ys at current timepoint t. Later on, the research uses temporal autoregressive model to forecast the future number of cases Ys(t+1) at timepoint t+1 based on current Yst. Therefore, the computational model of the research is called the two-step autoregressive model.
In the first step, spatial autoregressive model is estimated. Given current timepoint, we have Ys = Yst and Xs = Xst. When building up spatial autoregressive model, we ignore timepoints in dataset. Because the number of cases is moved around an average number, Poisson distribution is more suitable to build spatial autoregressive model. However, for simplicity, I use normal distribution for spatial autoregressive model. Let W be the symmetric weighted adjacency matrix (Mariella & Tarantino, 2010, p. 227): Where φ(s, s * ) measures the approximation of site s and site s * with note that nb(s) denotes the set of neighbors of site s. Moreover, the set nb(s) does not contain s. Because W is symmetric, we have (Mariella & Tarantino, 2010, p. 227): ∀ , * ∈ , * = * Note, W is constant. Let WD be the diagonal matrix of normalization (Mariella & Tarantino, 2010, p. 227): = diag( 1+ , 2+ , … , + ) Let A be the matrix of interaction (Mariella & Tarantino, 2010, p. 227): Note, A is constant. Let ΣD be the diagonal matrix of variances (Mariella & Tarantino, 2010, p. 227): Note, σ 2 is called global variance. The probability density function (PDF) of the number of cases Ys at Xs given its neighbors Xs* is defined according to normal distribution as follows (Mariella & Tarantino, 2010, p. 227): Where αs = (αs1, αs2,…, αsk) is site regressive coefficient vector and σs 2 is site variance. The constant ρ is called effective neighborhood constant, |ρ| < 1. Note, the superscript "T" denotes vector/matrix transposition operator. Each site variance (local variance) σs 2 was defined.

=
2 + Therefore, only parameters αs and σ 2 are parameters in the PDF of Ys at Xs given its neighbors.
The mean and variance of Ys at Xs given its neighbors are: mean( | , * , , 2 ) = + ∑ * ( * − * * ) * ∈nb( ) var( | , * , , 2 ) = 2 (1.6) From the mean of Ys at Xs given its neighbors, the spatial autoregressive model of the number of cases is defined as follows: The essence of the first step in this research is to estimate all regressive coefficients αs when the effective neighborhood constant ρ is pre-defined. Let xs (i) be the i th instance of the vector of environmental feature Xs at site s. Let ys (i) be the i th instance of Ys at site s. Let Xs be the set of Ns instances xs (i) and let Ys be the set of Ns instances y (i) . Each site s owns one dataset (Xs, Ys).
(2) ⋮ ( ) ) (1.8) Let X and Y be the sets of all site dataset Xs and Ys, respectively. Let α and σ 2 be the sets of all αs and all σs 2 , respectively. The joint probability of the global dataset (X, Y) is: The log-likelihood function is logarithm function of such joint probability: Let αs * = (αs1 * , αs2 * ,…, αsk * ) be the optimal estimate of regressive coefficients and thus, αs * is a maximizer of l(α, σ 2 ). * = argmax 1 , 2 ,…, ,…, | | ( , 2 ) By taking first-order partial derivatives of l(α, σ 2 ) with regard to αs, we obtain: When the first-order partial derivatives of l(α, σ 2 ) with regard to αs are equal to zero, it gets locally maximal. In other words, αs * is solution of the following equation resulted from setting such derivatives to be zero. It is not difficult to solve the linear equation system above. As a result, the spatial autoregressive model is re-written as follows: In the second step, temporal autoregressive model is estimated. For simplicity, we follow the Markov condition in which, future cases only depend on current cases. When building up spatial autoregressive model, we ignore environmental features Xs in dataset or we focus on a given site s. Suppose the current number of cases Yst at site s was determined previously by spatial autoregressive model. As a convention, the "s" index is removed from the response variable and so we have Yt+1 = Ys(t+1) and Yt = Yst. We will forecast next (unknown) Yt+1 based on current (known) Yt. The temporal autoregressive model is defined as follows (Eshel, 2003, p. 1): +1 = + (1.11) Where ε is the error random variable and β is temporal coefficient with note that mean of ε is 0 such that ( +1 ) = Let yt be the value of Yt at timepoint t. Suppose there are T timepoints and we have t = 1, 2,…, T. The sum of squared errors is defined as follows: Let β * be the estimate of β. The least squares method sets β * as the minimizer of S(β). * = argmin ( ) By taking first-order derivative of S(β) with regard to β, we obtain (Eshel, 2003, p. 1): The temporal coefficient β * is solution of the following equation resulted from setting such derivative to be zero.
It is easy to determine the formula for calculating β * (Eshel, 2003, p. 1) (1.12) Please pay attention that values yt are calculated by CAR model estimated in the first step. The temporal autoregressive model is re-written as follows: ( +1) = * (1.13) Equation 1.12 to calculate β * is simplest form of the well-known Yule-Walker equation.

Integrated with logistic function
In normal CAR model, the responsor Ys which is the outcome number of cases at site s is real variable but in the situation that Ys is category variable whose G ordered values are v1, v2,…, vG, new modification of CAR model is required. The multinomial logistic function of Ys with regard to value vg at site s, which is associated spatial autoregression, is proposed as follows: ( | , * , ) = 1 1 + exp (−( + ∑ * ( * − * * ) * ∈nb( ) )) (2.1) Note that αsg = (αsg1, αsg2,…, αsgk) which is site regressive coefficient vector with regard to value yg is the main parameter which is estimated here. Let pg be the probability that Ys receives value yg, where g is from 1 to G. If G values v1, v2,…, vG are not ordered, pg is the same to the proposed multinomial logistic function of Ys. ( 2.2) If G values v1, v2,…, vG are ordered such that v1 ≤ v2 ≤ … ≤ vG then, according to Bender and Grouven (Bender & Grouven, 1997, p. 547), pg becomes the cumulative probability such that Ys ≤ vg as follows: ( | ) = ( ≤ | , * , ) = ∑ ( | , * , ) =1 (2.3) Obviously, p(Xs|αsg) is the CAR model integrated with logistic function, which is called LCAR model. Essentially, LCAR is determined by estimating parameters αsg. As a convention, let = ( | ) The logarithm of psg which is called logit of psg is defined as follows: Suppose Ys obeys multinomial distribution so that the parameters αsg at site s are determined by maximum likelihood estimation (MLE) method (Czepiel, 2002). The joint probability of the global dataset (X, Y) in reduced form is: Where psg (i) is the evaluation of psg with the instance Xs = xs (i) . The log-likelihood function is logarithm function of such joint probability: Let αsg * = (αsg1 * , αsg2 * ,…, αsgk * ) be the optimal estimate of regressive coefficients which is a maximizer of l(α) with the constraint ∑ =1 = 1 which is equivalent to: Where λi are Lagrange multipliers such that λi ≥ 0 with note that each λi (sg) implies λi at site s given value vg. Because logarithm is concave function, we apply Jessen inequation to approximate the Lagrange function as follows: