Recurrent Neural Network:
Rumelhart (1986) introduced the Recurrent Neural Network (16). The unfolded architecture of RNN is given in Fig. 1.The initial hidden state (internal memory) is denoted as h0. The hidden state at time step ‘t’ is denoted as ht which is updated by previous time step and has the shape ‘number of neurons by 1’. There are two weight matrices and a column vector of bias associated with each RNN layer. A weight matrix and a bias associated with the output layer (Fig. 1).
Work flow of RNN
Kavita et al. (2017) investigated the evolution of RNN model in the last three decades (17). The RNN is an updated version of feed forward neural networks, improved by the addition of recurrent borders which cover adjoining temporal phases, establishing a concept of time to the design. The hidden state (ht) is computed from the actual sample (xt) and the previous hidden state (ht−1) in the network i.e \({h}_{t}=\text{t}\text{a}\text{n}\text{h}(U*{x}_{t}+W*{h}_{t-1}+{b}_{h})\), where U is the weight matrix between input and hidden layer, W is the weight matrix between hidden layer to hidden layer and bh is the bias matrix. Then result yt is computed by performing the activation function to state ht and weight V, that \(f\left(V*{h}_{t}+{b}_{o}\right)\), V is the weight matrix between ‘hidden and output layer’ and bo is the bias matrix in the output layer. Hence, at time t – 1, feed xt−1 may affect the result yt at time through such recurrent links.
After computing the outputs (forward pass), calculate the loss /error. Using this loss, calculate the gradient of the loss function for back-propagation. With the gradient that obtained, update the weights (U, W, and V) and bias (bh, and b0) in the model accordingly so that future computations with the input data will produce more accurate results. This entire process of calculating the gradients and updating the weights is called back-propagation. Combined with the forward pass, back-propagation is looped over and again for a given number of epochs, and the optimal values of weight matrix and bias are obtained. Unlike the ANN, the weight matrix is same throughout the process.
LSTM models:
LSTM is a neural network model proposed by Schmidhuber et al. (1997)(15). The architecture of LSTM module is given in Fig. 2 which contains forgot gate, input gate, internal state and output gate (18). (Fig. 2)
Work flow of LSTM module:
The work flow of LSTM model was described by Wang et al.(19). The LSTM first determines what information will be excluded from the state of the unit. This operation is implemented by a ‘sigmoid’ layer called a forget gate. The forget gate read the last hidden state \(h\left({t}_{i-1}\right)\) and the current input\(x\left({t}_{i}\right)\), and gives output a value \(f\left({t}_{i}\right)\) between 0 and 1\(i.e. f\left({t}_{i}\right)=\sigma \left({w}_{f}x\left({t}_{i}\right)+{w}_{hf}h\left({t}_{i-1}\right)+{b}_{f}\right).\)
Then, it determines what information is reserved in the current unit. This operation includes two parts. The first part uses the sigmoid layer to decide the updating target by using last hidden state \(h\left({t}_{i-1}\right)\)and the current input \(x\left({t}_{i}\right)\) that is, \(a\left({t}_{i}\right)=\sigma \left({w}_{a}x\left({t}_{i}\right)+{w}_{ha}h\left({t}_{i-1}\right)+{b}_{a}\right)\).
The second part uses the ‘tanh’ layer to construct the vector \({c}^{\text{'}}\left({t}_{i}\right)=\text{tanh}\left({w}_{c}x\left({t}_{i}\right)+{w}_{hc}h\left({t}_{i-1}\right)+{b}_{c}\right)\) and \(c\text{'}\left({t}_{i}\right)\) will play a role in the subsequent unit status updates. Then the last cell state \(c\left({t}_{i-1}\right)\) updated in the forget gate will take part in the following calculation to attain the current cell state\(c\left({t}_{i}\right):\)
$$c\left({t}_{i}\right)={f}_{t}\times c\left({t}_{i-1}\right)+{a}_{t}\times c\text{'}\left({t}_{i}\right)$$
Lastly, the LSTM determines the output for the next moment. The \(c\left({t}_{i}\right)\) calculated by the previous step will have two output directions, one of which is directly connected to the unit at the next moment as its input data, and the other requires a filtering process.
The process first passes the current data \(x\left({t}_{i}\right)\) and the hidden state from last state \(h\left({t}_{i-1}\right)\) into the ‘sigmoid’ layer as the input, and we have the output\(o\left({t}_{i}\right)=\sigma \left({w}_{o}x\left({t}_{i}\right)+{w}_{ho}h\left({t}_{i-1}\right)+{b}_{o}\right)\). Then, \(\text{tanh}\left(c\left({t}_{i}\right)\right)\) is multiplied with \(o\left({t}_{i}\right)\), and we gives the current hidden state \(i.e. h\left({t}_{i}\right)=o\left({t}_{i}\right)\times \text{tanh}\left(c\left({t}_{i}\right)\right)\). Finally, \(h\left({t}_{i}\right)\) is passed in the unit of the next moment.
The following two LSTM models have been demonstrated
-
Simple LSTM (ii) Convolutional Neural Network (CNN) LSTM
Steps to construct the LSTM models are given below:
-
Plot ACF/PACF and choose time step / lag
-
Data pre-processing
-
Reorganize the data as in the format of input to the model
-
Model building
-
Fit the model and evaluate the performance
-
Predict for future
Step 1: Plot ACF/PACF and choose time step / lag
Using the time series data, draw auto correlation function (ACF) / partial auto correlation function (PACF) plots then choose the parameter (p or q), which defines the lag (number of time steps that the observations are correlated). The lag size has significant impact on the performance of time series forecasting and it has to be identified using ACF/PACF plot.
Step 2: Data pre-processing
Unlike the classical time series model, LSTM model requires input (X) and the outcome (Y) to train the model which is same as supervised learning such as regression or classification. Therefore, we have to split the data into input (X) and outcome (Y) by using the lag or time step.
For instance, {Xi, i = 1, 2, 3, …, n} be the daily incident cases. Suppose, lag is three, then the fourth day counts depends on previous 3 days (third, second and first) and fifth day counts depends on fourth, third, and second day counts, and so on. Therefore, the data should be organized as follows:
![](data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAK8AAACcCAYAAAAJfPt2AAABRWlDQ1BJQ0MgUHJvZmlsZQAAKJFjYGASSSwoyGFhYGDIzSspCnJ3UoiIjFJgf8bAyiDOwM8gy2CWmFxc4BgQ4ANUwgCjUcG3awyMIPqyLsisXslpV5bfUWjyZE5t3xHStRtTPQrgSkktTgbSf4A4NbmgqISBgTEFyFYuLykAsTuAbJEioKOA7DkgdjqEvQHEToKwj4DVhAQ5A9k3gGyB5IxEoBmML4BsnSQk8XQkNtReEOB2cfXxUQg1MjE0JeBaMkBJakUJiHbOL6gsykzPKFFwBIZSqoJnXrKejoKRgZERAwMozCGqP98AhyWjGAdCLP8jA4P5OSDjGEIsYTIDw7Z3QG9/R4ipBTEwCLowMOwtKEgsSoQ7gPEbS3GasRGEzb2dgYF12v//n8MZGNg1GRj+Xv////f2////LmNgYL7FwHDgGwB2kGDqHLZyHQAAAFZlWElmTU0AKgAAAAgAAYdpAAQAAAABAAAAGgAAAAAAA5KGAAcAAAASAAAARKACAAQAAAABAAAAr6ADAAQAAAABAAAAnAAAAABBU0NJSQAAAFNjcmVlbnNob3Q3O6jMAAAB1mlUWHRYTUw6Y29tLmFkb2JlLnhtcAAAAAAAPHg6eG1wbWV0YSB4bWxuczp4PSJhZG9iZTpuczptZXRhLyIgeDp4bXB0az0iWE1QIENvcmUgNi4wLjAiPgogICA8cmRmOlJERiB4bWxuczpyZGY9Imh0dHA6Ly93d3cudzMub3JnLzE5OTkvMDIvMjItcmRmLXN5bnRheC1ucyMiPgogICAgICA8cmRmOkRlc2NyaXB0aW9uIHJkZjphYm91dD0iIgogICAgICAgICAgICB4bWxuczpleGlmPSJodHRwOi8vbnMuYWRvYmUuY29tL2V4aWYvMS4wLyI+CiAgICAgICAgIDxleGlmOlBpeGVsWURpbWVuc2lvbj4xNTY8L2V4aWY6UGl4ZWxZRGltZW5zaW9uPgogICAgICAgICA8ZXhpZjpQaXhlbFhEaW1lbnNpb24+MTc1PC9leGlmOlBpeGVsWERpbWVuc2lvbj4KICAgICAgICAgPGV4aWY6VXNlckNvbW1lbnQ+U2NyZWVuc2hvdDwvZXhpZjpVc2VyQ29tbWVudD4KICAgICAgPC9yZGY6RGVzY3JpcHRpb24+CiAgIDwvcmRmOlJERj4KPC94OnhtcG1ldGE+CqVybRsAABb5SURBVHgB7V0J1BZTGL4hW2WNnxIlQnbJkrIkS3YlomTLVskSUdbjZDnKyVIppE62nFLJkp2kjujQKSShRZQclCWVcr3P68yc+b5v/v+buXPv/N98/3vP+f6Z/87Me+993nfu3PV5a2kKSoIgkEEENspgniXLggAjIMYrhpBZBMR4M6s6ybgYr9hAZhEQ482s6iTjYrxiA5lFQIw3s6qTjIvxig1kFgEx3syqTjIuxis2kFkExHgzqzrJuBiv2EBmERDjzazqJOOblDIEd911l5owYUJBFmvXrq2GDx+uWrZsydfOPPNMtWjRIuUtkMP1AQMGqPbt2xc8KxHJEejYsaNasGBBjqBNN91UXX/99apLly5+/ODBg9WYMWPUhg0bOK5WrVqqa9eu6qabbvLvSXSCJZGlHGbPnq3POeccLNvk3yabbKJ/++23gixfffXVfP3CCy/Uv/76a8F1ibCHwPr16/XAgQN9nUA3VHmEJkDGy/c1a9ZMQ5c2A2qrkg/05uoOHTr4YI0bNy4nz8uWLdM77LCDprde//vvvznX5B93CPTs2dPXyfPPPx+aUJ8+fXSdOnX0/PnzQ68nicyE8aKAq1at0k2bNmWwGjRo4Ne+69at061bt9atWrXSa9euTYKFPBsTgT///FM3adKEddK4cWON/4Ph+++/1/Xq1dPUxAtGWzvPjPGixNOnT9cbbbQRg9WtWzcGoVevXrqiokL/8MMP1kARQdERePvtt/3at3fv3jkPnnHGGbpNmzbOvoaZMl4g07dvXx8stG/RBp46dWoOaPJPugh0796ddYKKZcaMGZw4mnbUidPz5s1zlpnMGe+aNWv0fvvt5xswjSo4A0cER0Ng5cqVumHDhqyT5s2b6+XLl+uddtpJ02hRNAGGd2VunHezzTZTw4YN80dY5s6d65/LSfUgsPXWW/PQJVL/8ssv1f7776+23XZb1a9fP7cZMjT6an2sU6dOeuONN/Zr3/zRh2rNXA1OHKM9ZK38mzZtmnMkMlfzPvDAA+rNN99UNGao9t57b36ze/TooX7++We3b7lIL4rA+eef799z+OGH++euTjJlvNSzVbfeeivP2lC7V40aNUpRJ4ENFwYsoWYhkBnjXbx4sercubOi0QZFQzCspSOOOIKnJPHP+PHj1dixY2uW9mp6aZ03TCwk8NNPP2lqIui2bdtqzLYFw+rVq/3JC+ok6IULFwYvy3mKCEycONFv8+ZPWLjIRsnXvJ9++qk6/vjj1VdffaXuu+8+biYEK5wttthC0eA4R9GaB66VaWYneIucp4TAnDlz/JSgL+fBxRthSyYZrT+jRkDwWCLe7mDo37+/Ro2L694Pg+VYqCMhHQQ++ugjfcghh/j4Qw+YoDj55JP1X3/95SwTtSCZEpMgCGQOgZJvNmQOUclwagiI8aYGtSRkGwExXtuIirzUEBDjTQ1qScg2AmK8thEVeakhIMabGtSSkG0ExHhtIyryUkNAjDc1qCUh2wiI8dpGVOSlhoAYb2pQS0K2ERDjtY2oyEsNATHe1KCWhGwjIMZrG1GRlxoCYrypQS0J2UZAjNc2oiIvNQTEeFODWhKyjYAYr21ERV5qCIjxpga1JGQbATFe24iKvNQQEONNDWpJyDYCYry2ERV5qSEgxpsa1JKQbQTEeG0jKvJSQ0CMNzWoJSHbCIjx2kZU5KWGgBhvalBLQrYREOO1jajISw0BMd7UoJaEbCMgxmsbUZGXGgLOjBf8rDvuuKMit6r+75lnnuGCkSvPnGs777yz+uWXX2IV+rLLLvPlIo0DDjjAf578guVcgyuAOMGl7Dj5KIV7XesxURlNyVPhOI68rRf8gk6tn3zySZ+zlTyC5yTlObomA9dEIJ1zLco/YN4GWzoVXpN7K71ixQr/MZwjHj+k888///jXopy4kE2OYAqwAn4zZ86MkiVn91SXHm3gYVzzvv766+qll15S33zzjQI7ufeDW3ovXHLJJWrXXXflf2vXru1FK/LKriZNmqS23HJL9eqrr6qDDz7Yvxb1hJwxqxtvvJFvJ+Pk9L1n6aXh0/POO08NHTpUkZdM71KkowvZRLbsYwQnMMAOP3I7GylPrm6qLj1awcP0ld5ll124Zrv77rurFAEviAQ8O7bGjfDKfuqpp2oycv3iiy9W+Wyxi6gh69aty/LJvRXf/uGHH2p6UTS5UtJ///13MRGVXncpG18nYILfhAkTKs1DGhdKQY+meBi7b41aaKqZWUkwVngB94z53nvvtaIb+B+GEVx++eX6xx9/1NR+1o0aNWIXokkTcCXbVFlJyxP2fCno0RQP58YLwI488kg2sHbt2nGN27Vr1zAcjeLeeOMNlk0uRLm2pU++JgeDRrLyH3Il21RZ+fmz8X9U40VarvRoiodxm5dqu8iBjJXvhRNAAkB5bdLIAqq4EZ6CyEmzWrVqlfrkk0/Us88+qw488MAqnoh+yaXs6LkonTtd6tGklKkYb3AYC86U4fzaViAfxGrfffdlcej4UQ/elmjlUra1TKYoyKUeTYrh3HgxfnvRRRf5eUMPO0rAcxgPLhYeeugh9c477/Bt8NlWlQ826iyyP7e33npLLV++vJhoFUc2hFGbW02dOpX9Iq9bt66o/CzdYKpHahIoGg50UlSnxgsFdujQQS1btkzB4TUC3KyuXbu20sLAwIYPH66aNWumRo4cWel9uIBhtj59+igMibVu3Rrtd/Xcc8+FPoPmROPGjVXHjh3ViBEj1O67766Cjp7zH4oj+9tvv1WtWrVSLVq0UI888gg7Pdxrr734RcmXm8X/TfTolfOCCy5g3Xj/Wz2aNvqjNPQvvvhi7kw9/fTTev369bqiooL/JwMOTfbzzz/Xhx56qD722GP5PpoZC70PkXPnztX16tXT9CljR3VDhgzhZ8ihdugzmKygtjDnAzfcf//9fD/SzA9xZb/77rs8PLdgwQIWhQ4jKUlfe+21+aL5f9MOSqiwhJEu9Ohl6amnnmIcaJzdiwo9muLhbLTBM47rrrvOz3DPnj25MJhZCgvfffedhjdFBHixvP3228Nu0/BFvNtuu7HnSzyDQM0ATW1Ulv/ZZ58VPIdx26A3xilTpvC9VCPn3GsiGzN4K1eu9OVAGRgaDJbdv0gnpsoKyrB1Xsx4TfSIvC1dulTXr19fX3HFFTpTxovJBygPLj3pk+PjTLNq/psIY6sqVGa8mHggb+8sJ3+S46CDDuL4a665pirRfA3T1ZtvvrmmGS7/XhuyYciYct1mm200rQvwZQdPsmK8SfSIiahbbrlFDx482JnxWm/zYjgMQyqkLNWjRw8VnBb2zqkJoQYOHBi7+YO2cqdOnRTVzopqDG5Pe0KQHqYcEUaPHs3tbO9a/nHWrFmKmhlq0KBBqkGDBnzZhmya1OB8od1NM2cK7d6shiR6HDNmjJo3b56644471IYNG9xBEKwR4pyHfW7ef//9HEfXWHSD6WCEL774gp0pU0m4dsQx/5MdTD+s5m3btq3/LJ5HO9cLXbp0ybm25557epdyjpjlI4PVtC4iJ96G7BkzZmjyTM9taywWQtMkLJR6zZtEj5jlhCNzTPAgUAXBNS+NVljvAzhr84YpLU5cmPHGeT7s3tWrV3NThkYZwi5bi1u8eDG/xEcffXSozFI33tBMR4wcMGAAN8cwRY/fVlttxZUKZkCPO+64UCmmeFhvNrj7RiSXTJ0H/ox5M3xY0RVlLLlYyvRS5IwbYyVdkyZNVLmN9RbDAdexdpr6DmrJkiX8Q9MBq/qoQ6toVCaKiMj3lKTxYgIB473U849ckGI3YukfFsOfe+65avLkyXyOMUjqlBR7tOh1yNhjjz1YWbiZhtoUxn5POumkos+W+w14sdHHwbJV2yHeQlfbqefJQ011wgknKGof85UnnnhCoXN11VVXKVo1lnd3vH+9DmL+rgp0AJMG1LL0ieRpanTSUKPTMJmiob6kojP9/Jw5c3g9NQpB4+xW17RAZkkZL0YLML3qInhTyC5kY3YPves1a9ao33//nbc4uUgnazKxFiLKNLxpuUrKeE0LUSrP0bixwk9COgiUZJs3naJLKllHQIw36xqswfkX463Bys960cV4s67BGpx/Z8brmqzCJTGIS9lZszXXekyER+h8XYTIsLUN+Y8J6Ug+Iv//bzodGi4tWWx16xG5N8XDWc2LN0pIRxLVKyXzsEs9Jiqk6Xsb5Y2FbI+ngbbJcFJYZSakI0I6ErQ705rX+aoyIR0Jqun/c1NlFUpKHhO1EnKpR1M8nBsv4HVFVgHZrohBXMo2VRbyZDtENV6k60qPpng4bfN67RmXZBUuiUFcyvawydLRpR5NcEjFeF2SVbgkBnEp20RZ1f2MSz2alM258cYlq8A6XowtuiAGgWys/sIKs4ULFxbFKy7piCcQtK9Yz1tOIY4esQ4by1q9H+3mVjfccIN9OEzbUFHaSrSpUWMrDHH3apAJU+55Vy0tHQxNlhaL89aR5s2bayIH4ec6d+4cei8iX3nlFd5uQ6QjmpYlsnxs1Q4L2C/XsGFD3gZ04okn8r6qs88+2+dxyH8mjuzgs9gjhy0vVW01Mm3jBdOxde5Cj0QEw7qAvr0fLdavNMumeDjtsMUlHXFJDALZ2DLvhf79+zOw4PPND3FJR4LPt2/fnuWWk/HG1SOMF9wc4ODwfrQwPQhTznnJGa8JWYVLYhAQjmADphcef/xxNrLp06d7UXw0IR3xBIwaNUqDO+Koo44qm5rXRI8wXpCNRA0lZbxJyCqCBXZFDLJo0SK9zz77aGx3B0mIF5KQjoC8ZPvtt9cff/wxDymVQ81rqkfPeEHxVVkT0cMcR1Pjtd5hS0JWEWzRuyAGQWftmGOOUWS4vONh3Lhxvr+KpKQjV155pSLuCNWyZUvePBosSxbPk+gRFLYgXoGXJnK7oMjFgr8v0SoWwTcgznlYQz8JWUUwbVfEIHjD0RHr3bs315IgJkEcQhLSERAJAg/av8ay4A8DNe/LL7+sX3jhBY4L/jGtaYIybJ270CO+Zmh+IYBLDp01IgDXxJ4Tmm1TPKzuYUOtVhm9D40gVElt6r2R2PMPgmjI8nb8eteq2kSJbe2enzfv/vwj8YcpWlfBP+z4pWYJP9OrVy+f4zf/GfxfTPaDDz6o/vjjD5/kGpSuGB4CF7FH7RomN2ocNqVSxzLq7Xwfaj/PW1KsB+nmpHoETwN88CEAZ/Bl9O3bV2E3MfUJON7GH6vGayNDYcQg1JFLzPuFsUe4ziJaVM4mFIRgg08AhhoMNKrB/L+VcQUH741yTlPgimikotzq3wPXYqbG6wsxPCHGIOZswyQPAngbEIC/zVBSxusRg9xzzz1MDIJCg7sB5B00tJWo3OB9AAEG2nIIEydOZNp+TAHbDkgHXxBbgVgvmRQ7jjzUftUVQNyNLw513BQqHnyBmjZtyoThNvNUfSUMKYXXTHBBDHLYYYcxTT/x+rJPDOJCU+iwBac8Q7IUO+rRRx9VRFjNzhXxgtBESGwZ+Q/AvS1+WQlo9j388MOK+gLs6gA+Q9577z3r2S8p462qTZu05LfddpvCD5xZ+Jx5zYekcvOfRy2Jn82AaW3v0xtVLvEj59DLRn3Oxn2gd0WAp1Mi2vNHdGzIDsqwPlQWFF6K5+i0uTJcV+UFbRQ6YHF+KGd1h+22286Z4aJsJVXzVjfYpZo+/MrBaUycUBOYe8R441hENd0LZkv8JOQiUOOaDbnFl/+yjIAYb5a1V8Pz7sx4XZNVuCQGcSk7a/bmWo+J8AidbI4QGTYnnv+YkI7kI/L//6Zz+eHSksVWtx6Re1M8nNW8eKNcklXUqVPHn/7EFC+mQ73g+ZxAD33o0KGxh2tcyvbymKWjSz0mwsH0vY3yxkK2kI4UImxa0xRKSh5TCno0xcPpNiBA65KsAvLJcR/viKC1Cxo+wGgalffBFfOwiWeLBVeyTZVVLL8m16Mar0s9muLh3HgBqCuyCsgW0hGgYB6iGi9ScKVHU+N12ub12jMuySpcEoO4lO1hk6WjSz2a4JCK8QZXbvXr14/n6E0yG/aMS2IQl7LDylLqcS71aFJ258Ybh6wCBUiLGGTmzJm86qkq0OKQjnz99dc+yQbINrCyCg6kyyXE1WOw3CtWrLDiaTQok89NW0tR2kpxSUfSIgYZOXIkd/LIwCotflzSEXLZyjIJVP9IX5lQ+aZtvFBhCSNd6NHL0rRp09iJOK1w09gJXlkwxcNphy0uWUUaxCBLly5lRhsYWWXGa0I6AuOlmton2QDZBtIKC6bKCpOVNC6K8cbVI/I0YsQITU4h9WOPPVbpxksv76Z4ODNeE7KKNIhBTjnlFN2zZ89Ka15T0hEYL/Grefqo8miqrCqFGl4sZrwmesSOYVqSqWk3RaRcmeLhxHhNySqCJXVBDDJ69GgmGwG4YTVvEtIRz3ghg9rtwaIUnJsqq0CQhYiqjNdUj+SQnDnpaEe1nj9/fqVfIC/7pnhY77AlIatAI9wVMQi2o4OpkD5nvA0IaQVDUtIR+E0G6Qix5ijsIOjWrRv7Ig6mkaXzJHrENn1MsYNs5KyzzuKdxO3atYtEfRALI8/64x7D3lgbpCN4C10Qg5x++um6e/fuXEzamu3XvDfffDPPzCUhHYFQ6o0zFxpq3fHjx2saZtM9evQIhdW0pgkVljDSth5BvEIGqE877TSfG452gHMcbU4Nza0pHk6aDaE5NIgcPHhwlYWOKpKGrrgNRkQYPHUM9hYATLUDx2Na2XYg2idmigmTa6qsMFlJ48KMN4lM+sIxtkRf4ItB84G24mtaaurHBU9M8bDebIhV7efdDGIQMM94wRYxCNh6wKMA+UuWLFEY40XA1mzEJ91WTmRyCs2SYMBuX9skG0H5pXpeUVHBk1CzZ8/2s4gmBJpVHouOfyHhSUkZL4hBgjwHrohBQAqCAKOzEV577TWmNVqwYAGLwyQFlEcjGzbEZ0oGttwTIbiiMV5FnW7OO3X8GGviLrZalpLagJkWMQgYHRFouaaitq5CbZEkgMgEeW/Tpo0i9nVF47vM3TBgwIAkYjP77KBBg3j2EmQj2PmMrylmG4GPzVBSxpsWMQiI62yGFi1aqA8++AD9B4Vp1Pr169sUnzlZKP/kyZN5dAFfN3Jz4KQMJWW8XglLgTDDy0ucIz6ZNd1wg3h5JCnBOJvnJdXmtVkwkVX+CIjxlr+Oy7aEYrxlq9ryL5gYb/nruGxLKMZbtqot/4KJ8Za/jsu2hGK8Zava8i+YGG/567hsSyjGW7aqLf+CifGWv47LtoRivGWr2vIvmBhv+eu4bEsoxlu2qi3/gonxlr+Oy7aEiZdEYtfApEmTfICwewBbPlwG2uDIpNFYcgeXrwi0nV2Bjh9hyJAhvhNr7MyA48BLL71UEWUpX/f+wAPmsGHDFNw+TZkyxYt2cpw1axYvUodwuDQttRBVj7SxUhG/hsKCfuyYQLjzzjt5PTPt2/MdhcNlK+iy4Gk0zDmkFTyCG+HinDdq1Ig32lHec44uNjPm54tW6nOaZLz+JXJe7ecDNENeoK3oHA+S6/xA7mL5GggyXIeOHTv6+QtiRludXCddpfy4eqxbty6Xgxx5+3LJXSvHEaumH4edwl45/cjAiQ08jGteIhvmnQOUwZyA2tB1wNuM4B2jnAfv5YcDz4dd8+6xdRw7dqzasGFDgTjXX6mCBPMi4urRw8o7Qpx37h0riwsmbQOPWngZgkLlXBDICgLSYcuKpiSfBQiI8RZAIhFZQUCMNyuaknwWICDGWwCJRGQFATHerGhK8lmAgBhvASQSkRUExHizoinJZwECYrwFkEhEVhD4D0Jj287M7oJQAAAAAElFTkSuQmCC)
Step 3: Reorganize the data as in the format of input to the model
The input data X has the shape [samples, timesteps] and it reshaped before feed into the model. The input shape is differing from each model. The input shape of the simple LSTM model is [samples, timesteps, features], where feature is always one for univariate time series analysis.
But CNN-LSTM model has one more argument called ‘subsequence’. Split the lag into number of sequence and sub-steps. The sub-steps are multiple of the kernel size in the convolutional layer. For example, lag is six days then split the six into three subsequences and two sub-steps (3*2 = 6). The input shape of the CNN LSTM model is [samples, subsequences, sub-steps, features].
Step 4: Model building
Simple LSTM:
Schmidhuber et al. (1997) proposed the LSTM model which is the simple LSTM configuration model compared to other models with one hidden LSTM layer and output layer(15). But, in our model we used one LSTM layer followed by four to eight dense layers which were determined based on the model performance. The structure of the LSTM model is given in Fig. 3.
CNN LSTM model
LeCun et al. (1998) developed CNN model which is a type of neural network developed to work with two-dimensional image data (20). The CNN can be very effective on automatically extracting and learning features from one-dimensional sequence data such as univariate time series data. A CNN model can be used in a hybrid model with an LSTM backend where the CNN is used to interpret subsequences of input that together are provided as a sequence to an LSTM model to interpret(21). This hybrid model is called a CNN-LSTM.
The Time distributed wrapper function is used to read the subsequence of data separately. The CNN model first has a convolutional layer for reading across the subsequence that requires a number of filters and a kernel size to be specified. The number of filters is the number of reads or interpretations of the input sequence. The kernel size is the number of time steps included in each ‘read’ operation of the input sequence. The convolution layer is followed by a max pooling layer that distils the filter maps down to 1/2 of their size that includes the most salient features. These structures are then flattened down to a single one-dimensional vector to be used as a single input time step to the LSTM layer.
Our CNN-LSTM model has one dimensional CNN layer followed by max-pooling layer and flatten layer. Then it has two LSTM layers with 500 neurons followed by four to eight dense layers.
Step 5: Fitting and evaluating the model performance
Fit the model using the train data and evaluate the performance. The Mean Squared Error (MSE) / Mean Absolute Error (MAE) can be used as the metrics to evaluate model performance of both train and test data.
\(MSE=\frac{1}{n}\sum _{i=1}^{n}{\left({Y}_{i}-\widehat{{Y}_{i}}\right)}^{2}\) and \(MAE=\frac{1}{n}\sum _{i=1}^{n}\left|{Y}_{i}-\widehat{{Y}_{i}}\right|\)
Step 6: Future prediction
In order to predict the future value, we have to feed past data as the input. In the absence of past value we can use our predicted value of the corresponding day as the input. The Python code for all the above steps are given in the link for the future work.
Data:
Daily COVID-19 cases in India were taken from https://covid19tracker.in/. Data from 1st December 2021 to 10th February 2022 were used to train the models. Daily COVID-19 cases in UK were extracted from https://covid19.who.int/info and data from 1st May 2021 to 10th February 2022 were used to train the models. The model was validated with the data from 11th February to 25th February 2022. The analysis was done using python in Google colab platform. The codes and data are available in https://drive.google.com/drive/folders/16G92yYDemYELBF7fNVvUjhRdxZKGz5y8?usp=sharing (Figure 3)