An Interpretable Hybrid Predictive Model of COVID-19 Cases using Autoregressive Model and LSTM

The Coronavirus Disease 2019 (COVID-19) has posed a severe threat to global human health and economic. It is an urgent task to build reliable data-driven prediction models for Covid 19 cases to improve public policy making. However, COVID-19 data shows special transmission characteristics such as signiﬁcant ﬂuctuations and non-stationarity, which may be difﬁcult to be captured by a single predictive model and poses grand challenges in effective forecasting. In this paper, we proposed a novel Hybrid data-driven model combining Autoregressive model (AR) and long short-term memory neural networks (LSTM). It can be viewed as a new neural network model and the contribution of AR and LSTM is auto tuned in the training procedure. We conduct extensive numerical experiments on data collected from 8 counties of California that display various trends. The numerical results show the Hybrid model’ advantages over AR and LSTM by its predictive powers. We show that the Hybrid model achieved 4.195% MAPE, outperformed the AR 5.629% and LSTM 5.070% on average, and provide a discussion on interpretability.


Introduction
The coronavirus disease 2019 (COVID-19) pandemic has posed a severe threat to global human health and economics while producing some of the richest data we have ever seen in terms of infectious disease tracking. The quantity and quality of data placed epidemic modelling and forecasting at the forefront of worldwide public policy making. Compared to previous infectious diseases,  shows special transmission characteristics, yielding significant fluctuations and non-stationarity in the new COVID-19 cases. This poses grand challenges in effective forecasting, and, on the other hand, draws attention of the global community to epidemic tracking and forecasting.
In the last three years, various models and methods have been developed to forecast COVID-19 cases (see survey in 1 and references therein). These models can be roughly grouped into two categories: mechanistic models and data-driven models. The mechanistic models aim at directly characterizing the underlying mechanisms of COVID-19 transmission. Typical examples of mechanistic models are based on differential equations, such as the compartmental models SIR and SEIR [2][3][4][5] . The data-driven models formulate the prediction of the COVID-19 cases primarily as a regression problem and exploit fully data-adaptive approaches to understand the functional relationship between COVID-19 cases with a set of observable variables. Data-driven models include classical statistical models such as Autoregressive models (AR) [6][7][8] and Support Vector Regression (SVR) [9][10][11] , and deep learning models [12][13][14][15][16][17][18] . In this paper, we will focus on data-driven models.
Autoregressive models express the response variable as a linear function of its previous observations, and enjoy simple structure and strong interpretability. Furthermore, AR models are found to be powerful in capturing short-term changing trends in the time series. However, they may fail to capture the highly nonlinear patterns caused by long-term effects in the dynamics. In contrast, deep learning models have been well-examined in their ability to learn complex patterns, but they are short of interpretability due to its black-box nature. This lack of interpretability prevents people from drawing useful conclusions from the model outputs, and thus hinders effective policy making 19 . This observation motivates us to consider a Hybrid model that additively combines the two types of models and tries to possess the advantages of each one.
Specifically, we consider the Hybrid modeling of COVID-19 data using AR and LSTM (Long Short Term Memory Neural Network). AR 20 is a popular tool in time series analysis, with numerous applications in various application fields, including infectious decease modelling 21,22 . For the neural network part, we consider LSTM 23 for its impressive power in capturing complex dependence structures in sequential data. LSTM has been used to achieve the best-known results for many problems on sequential data. We build a Hybrid model that additively combines LSTM and AR and examine its performance on COVID-19 case prediction in California counties. In this paper, we collect and consider data from 8 counties in California. But our model is applicable to other time series forecasting tasks. All codes are accessible through links on the reference page 24 . A long-term mission is to stretch the application of Hybrid models beyond COVID-19 forecasting: toward other fast-moving epidemics and cases that require accurate prediction and interpretability.
Modeling and forecasting the spread of COVID-19 is an important yet challenging task. We propose a Hybrid model, that outperforms the neural networks yet enjoys the interpretability of linear models. The analysis of the cumulative and new number of COVID-19 cases helps to predict its prevalence trends, and to inform policy makers to improve pandemic planning, resource allocation, and implementation of social distancing measures and other interventions. We tailor data analysis for California counties, thus to help local government to provide a more accurate reference for the prediction and early warning of infectious diseases.
Although in this paper we focus on confirmed cases prediction, we note that the proposed framework can be easily extended to tackle other COVID-19 or more general epidemiological tasks (e.g., hot spot prediction). Furthermore, the proposed method has its own research significance from a methodological perspective. For example, it raised the open questions on studying its theoretical guarantees, mathematical quantification of prediction, and interpretability.

Related work
Using combination of different models has proven to be an effective way of improving emprical predictions in various applications. One can refer to the Hybrid modelling of other types of infectious disease [25][26][27][28][29] and sunspot monitoring 30 . The idea using an additive combination of AR model (or more generally, ARIMA model) with LSTM has recently appeared in 31,32 for time series forecasting with applications in gas and oil well production and sunspot monitoring.
We note a significant difference with previous methods: our approach include a different data-processing procedure and trains the two components in the model jointly, while previous Hybrid modeling techniques take a sequential approach to training. Specifically, previous methods first perform filtering using AR or did a prior decomposition for AR and LSTM part before modeling the counterpart. For example, in 31 , the preprocessed data is used to fit an ARIMA model first before the residual term is used as the input to train a LSTM model. In contrast, we design a general network architecture that includes both AR part and LSTM part additively and jointly trains the whole architecture by minimizing the empirical risk. By doing so, we do not arbitrarily give preference to any of the two additive components. Instead, the relative weights of the interepretable AR part and the predictive LSTM part are determined fully by data.

Methods
In this section, we first briefly overview the two building blocks of our additive Hybrid model, namely the AR and the LSTM, and their relative advantages. Then we present our Hybrid model which combines these two building blocks additively, and we intuitively elaborate why it is better than the two individual components.

AutoRegressive (AR) Model
In time-series, we often observe similarities between past and present values. For example, by knowing the price of a stock in the past few days, we can often make a rough prediction about its value tomorrow. AR is a simple model that utilized this empirical observation and can yield very accurate prediction in applications. It represents the time series values using linear combination of the past values. The number of past values used is called the lag number and often denoted by p. Mathematically, let ε t denote the Gaussian noise at time t with mean 0 and variance σ 2 . The structure equation of AR(p) model can be represented as where a 0 is the intercept, and a 1 , · · · , a p represent the coefficients. AR model is often effective on stationary data. To ensure stationarity, a common trick is to apply the differencing operation on the time series. A time series value at time t that has been differenced once, Y (1) , is defined as follows: and higher order differencing operation can be defined recursively. However, an AR model is not sufficient to capture the non-linear dependence structure, which is found to be an important feature of the COVID-19 data, indicated by Figure 1. A purely AR based model is thus often insufficient for the task of COVID-19 cases prediction.

Long Short Term Memory Network (LSTM)
RNN (Recurrent Neural Network), different from traditional neural networks, is able to utilize previous knowledge to predict future ones. However, it suffers from the long term dependency problem: as the network grows larger through time, the gradient decays quickly during back propagation, making it impossible to train RNN models with long unfolding in time. To solve this problem, Hochreiter and Schmidhuber (1997) introduced a special type of RNN called LSTM with a proper gradient-based learning algorithm 23 .
We employ a LSTM regression model, which is represented as where we use Y t−1 , ...,Y t−p as the sequential input data; G represents the neural network architecture (Supplementary Figure 1 in Supplementary material) and θ represents the weight parameters in neural networks. An LSTM layer consists of several cells. The core concepts of a LSTM cell are the cell states and its various gates, see Supplementary Figure 2 in Supplementary material. The cell state C t−1 at time step t − 1 acts as a transport highway that transfers relative information all the way down the sequence chain, which intuitively characterizes the "memory" of the network. The cell states, in principle, carry relevant information throughout the processing of the sequence. So even information from the earlier time steps can make its way to later time steps, reducing the effects of short-term memory. The Forget Gate decides what information should be kept. The Input Gate decides what information is to be added from the current step and update the cell state C t at time step t. The Output Gate determines what the next hidden state h t should be. The four gates comprise a fully connected feed forward neural network.
To achieve optimal prediction results using LSTM model, it is crucial to have a careful hyperparameter tuning, including the choice of units (dimension of the hidden state), the number of cells (i.e. the number of time steps), and layers. This is usually a difficult task in practice. For example, few LSTM cells are unlikely to capture the structure of the sequence, while too many LSTM cells might lead to overfitting.
However, just like other neural networks, a well-known limitation of LSTM is its lack of interpretability 19 .

The Hybrid Model
As discussed above, both AR and LSTM have their relative strength and limitations in their prospective domains.
To conserve the advantages and overcome the limitations, we combine the two models additively into one single Hybrid model, which is expressed as where p is the lag number and α weights the contribution of two components: by tuning the value of α, one can strike a balance between the prediction given by AR and LSTM parts, and thus a prediction of linear and nonlinear signals. The addition as a combination method is preferable since it preserves the low complexity and enjoys a better interpretability than other methods do, for example, the one by multiplication.
We will compare the performance of the Hybrid model to its two separated components. We illustrate the structure of the Hybrid model in Figure  to train the weights in each of the two components in a fully data-adaptive manner by minimizing the empirical risk. We will compare the contribution of the Hybrid model's AR component and LSTM component in Section Results.

Results
The results include three sections: Model Evaluations, Prediction, and Interpretability. In Model Evaluations, we introduce the metrics we use to evaluate the models and on which we compare the models' performances.
In the Prediction, we exhibit the visualizations of several interesting trials and compare the numerical predictions and evaluations of the three models. In the Interpretability part, we compare the AR component of the Hybrid model with the AR model. This is to examine how we may interpret the Hybrid model. We leave other training details in the Appendix.

Model Evaluations
We use a quantitative measure to evaluate and compare the performance of models: the Mean Absolute Percentage Error (MAPE), with formulas provided below: We use this relative error as the data size vary largely and it is more reasonable to consider the relative error. A model is desirable when values of MAPE are close to 0.
We examine the performance of all 3 models on different time periods within the available range. This is essential in our research, since the performance of a model is not constant on different trends; by intuition, a model performs better on smooth curves than it does on steep curves. By repeating our evaluation process on different time periods thus different trends, we wish to learn on what trends do the model give the best performance. Such understanding will help us decide to what degrees we may trust the performance of the models. Another reason for the repetition is that, while the AR method yields the same model with the same piece of data, the LSTM and the Hybrid method do not. We evaluate the models repeatedly to reduce the influence brought by the instability of model training.
For the repetition, we choose a step number of 7: we leave 7 days between the first date of any two consecutive training sets. Although a larger number of repetitions seems desirable, increasing the repetition number is at the cost of making neighboring training sets closer to each other. However, the difference in performance between two neighboring training sets, that are too close to each other, would be attributed more to the instability of model training than to the difference in trend. Such results give us little information about the model performance over trend. In the end, we let the step number be the same as our lag number. By doing so, we assume the concept of a week is important in forecasting.

5/11 Prediction
In this section, we present the visual and numerical results for all three models. We perform a comprehensive comparison of the performance for three models on various situations and multiple counties, showing the advantage of the Hybrid model. All predictions are transformed back to the original scale.

Visualizations
We compare the three models' performance on COVID cast forecasting in 8 counties of California. For each county, we test the models' performance on several different situations: for example, when the training data has an up trend and the testing data has a down trend. From all trials we practiced, we choose the following trials as representatives of different combinations of training and testing data, since they reflect the general model performances well.
To ensure the results above are representative, we run each selected trial 100 times, visualize the mean and standard error of these trials, and present averaged MAPE. While AR outperforms LSTM on some cases, the Hybrid model outperforms both in most cases, except that in Case 2 and in Case 6. The MAPE, averaged on the 100 trials, shows that LSTM (4.469%) outperforms Hybrid (4.993%) slightly in Case 2. However, as shown in the right panel of Figure 3 row (b), the Hybrid model captures the general trend of ground truth better than LSTM does. Similarly, in Case 6, AR (3.675%) outperforms Hybrid (3.718%) slightly. Yet, as shown in the right panel of Figure 4 row (c), the Hybrid model captures the general trend of ground truth better than AR does.
Beside, interestingly enough, the Hybrid model always seem to capture the ground truth's trend. Actually, the shape of Hybrid 's forecasts resembles either that of the AR model or that of the LSTM model, or it resembles a combination of both. When AR model captures the trend better than the LSTM does, the Hybrid model resembles the AR model in forecast shape: for example, in Case 2, San Francisco 2020-02-17 to 2020-05-14, and in Case 5, Santa Barbara 2022-01-17 to 2022-04-14. When LSTM model captures the trend better than the AR does, the Hybrid model resembles the LSTM model in forecast shape: for example, in Case 4, San Francisco 2022-06-10 to 2022-09-05, and in Case 5, Riverside 2022-02-16 to 2022-12-20. On jagged testing data, where AR performs better on some part and LSTM better on the other, the Hybrid model presents advantages of both models: for example, in Case 6, the Hybrid model resembles AR on the two ends, where AR performs better, and it resembles LSTM in shape between day 5 to day 15, where LSTM seems to capture the trend better.

General Performance
After looking at the representative trials in Section Visualizations, we evaluated the model performances numerically, on the 8 California counties and multiple trials, with the method we explained in Section Training, Model Evaluations. The results are given in Table 1, General Performance.
We can see the Hybrid model outperforms the AR model and the LSTM models stably: it generally yield the smallest average MAPE. To be specific, the general MAPE of each model (AR, LSTM, LSTM with 2 layers, and Hybrid), averaged on the results for all 8 counties, is 5.629%, 5.070%, 6.205%, and 4.195% in order. In general, the Hybrid model has the best general performance, and it outperforms the AR model by approximately 1.5%. The LSTM model suffers from overfitting when a second LSTM layer is added.
Besides, as shown in Section Visualizations, the Hybrid model captures general forecasting trend more stably than the other two models do: it always take the shape of the component model that captures trend better.

Performance on latest data
We also checked the model performances for the same 8 counties on the latest trial, from 2022-06-10 to 2022-09-05, thus to provide some insights on the timely prediction of our models. The results are presented in Table 1.
Notice-worthily, the AR model takes much smaller amount of time to train. While it takes approximately 0.1 time unit to train an AR model, it takes approximately 10 time unit to train a LSTM model or a Hybrid model. There is no significant difference between training time of LSTM and Hybrid, possibly because they are different by only an AR layer, which is relatively fast to train.

Interpretability
In this section, we study how AR and LSTM components contribute to the Hybrid model when fitting the data. Our purpose is to seek the insights into explaining why the Hybrid model enjoys the better performance in general. And more importantly, we seek to use the interpretation from the fitted Hybrid model to provide practical guidance to the public health policy making process.
Note that all models are trained on the normalized data as described in Section Training. Consequently all figures below report predictions on the normalized scales.
In Figure 5, we present three settings with different signal strength ratio (represented by the value of α) of the AR components and LSTM components in the prediction of the Hybrid model. Specifically, the larger value of α indicates the AR component dominates the LSTM component in prediction, and the smaller value of α indicates otherwise. We found that the    component that has stronger signal characterizes the general trend in the data while the other helps to stabilize the variance. This observation sheds light into why the Hybrid model provides better predictive performance in general than a single model.
Moreover, the fitted value of α provides a characterization of the intrinsic nonlinearity of the data, and consequently the difficulty of exploiting interpretation in the linear components of the fitted Hybrid model. The smaller the value of α, the higher weight the nonlinear fit using LSTM has in the final prediction. In such a setting, coefficients in the AR components should be given less weight into generating interpretation for policy making. Equivalently, for larger value of α, it is more trustworthy to derive coefficients interpretation from the important AR part. This observation is helpful for public policy maker to distinguish among different virus transmission stages.
Finally, we observe interesting patterns of the coefficients estimates in the AR components of the Hybrid model compared with the coefficients in the pure AR model. As shown in Table 2, across the three settings of different values of α, the pure AR model tends to put heavier weight in coefficients of larger lags, say Y t−7 . In contrast, the AR component in the Hybrid model tends to focus on capturing the short history, i.e., the coefficients associated with smaller lags (e.g., Y t−1 ) tend to have larger estimates. This indicates that the short history pattern in the data could be well approximated by a simple (say, linear) model, while the longer history in the data possesses more complicated nonlinear structure that requires a LSTM component to fit.

Discussion
In this paper we introduce a novel Hybrid model that borrow strength from a highly structured Autoregressive model and a LSTM model for the task of Covid-19 cases prediction. Through intensive numerical experiments, we conclude that the Hybrid model yields more desirable predictive performance than considering the AR or the LSTM counterpart alone. In principle, the Hybrid model enjoy the advantages of each of its two building blocks: the expressive power of LSTM in representing nonlinear patterns in the data and the interpretability from the simple structures in AR. Consequently, the proposed Hybrid model is useful in simultaneously providing accurate prediction and shedding light into understanding the transition of the virus transmission phases, thus providing guidance to the public health policy making process. It is also noteworthy that the predictive performance of the proposed Hybrid model can be further improved by properly choosing the hyperparameters. Furthermore, while we considered LSTM as the nonlinear component in the Hybrid model, it can be substituted by any other deep learning models.