The deep learning approaches used here were different variations of the artificial recurrent neural network (RNN) called long short-term memory (LSTM). LSTMs were introduced by Hochreiter & Schmidhuber in 1997 and can learn long term dependencies 23. LSTMs have varied uses and multiple ways of implementation. The methods studied here are used to tackle time-series prediction problems and these methods are namely
- LSTM Network for Regression
- LSTM for Regression with Time Steps
- LSTM with Memory Between Batches
- Stacked LSTMs with Memory Between Batches
A generic LSTM unit has three gates that regulate the flow of information within the unit. These gates are called input, output and forget. All the models had the dataset split into training and testing datasets. Two-thirds of the data was assigned to train the models and the remaining one-third was used to test the models. All the models were trained for both 100 and 1000 epochs.
3.1. LSTM Network for Regression
The network has three layers with the visible layer having one input. The hidden block was made up of 4 LSTM units and the output layer produced a single value prediction. The data from the dataset is then fit into the model and from this the performance of the train and test datasets can be estimated. After this, the model is used to make predictions on both the train and test datasets and from that, the visual skill of the model can be identified.
Fig. 3(a) indicates the PM2.5 values against time for 100 epochs. Green indicates the training dataset and red indicates the testing plot. The RMSE values obtained indicated that the model has an average error of 0.1552x10-4 mg/L for the training dataset and 0.1289x10-4 mg/L for the testing dataset. The R2 values obtained were 0.77 and 0.67 for the training and testing datasets, respectively. Fig. 3(b) shows the LSTM trained on regression for the dataset and the comparison of predicted values (blue) vs the training and testing datasets.
Fig. 4(a) indicates the PM2.5 values against time for 1000 epochs. Green indicates the training dataset and red indicates the testing plot. The RMSE values obtained indicated that the model has an average error of 0.1553x10-4 mg/L for the training dataset and 01276x10-4 mg/L for the testing dataset. The R2 values obtained were 0.77 and 0.68 for the training and testing datasets, respectively. Fig. 4(b) shows the LSTM trained on regression for the dataset and the comparison of predicted values (blue) vs the training and testing datasets. It can be inferred that running for 100 or 1,000 epochs doesn’t create any major differences in results and the model has done a good job in fitting the model for both the training and testing datasets.
3.2. LSTM for Regression with Time Steps
Time steps can be used as inputs to predict the output at the next step. They provide another method in tackling the time series problem. Any point of failure or surge and the conditions that lead up to them are the features that define a time step.
Fig. S4(a) indicates the PM2.5 values against time. Green indicates the training dataset and red indicates the testing plot. The RMSE values obtained indicated that the model has an average error of 0.15x10-4 mg/L for the training dataset and 0.1329x10-4 mg/L for the testing dataset. The R2 values obtained were 0.79 and 0.65 for the training and testing datasets, respectively. Fig. S4(b) shows the LSTM trained on regression for the dataset and the comparison of predicted values (blue) vs the training and testing datasets.
Fig. S5(a) indicates the PM2.5 values against time for 1,000 epochs. Green indicates the training dataset and red indicates the testing plot. The RMSE values obtained indicated that the model has an average error of 0.1483x10-4 mg/L for the training dataset and 0.1394x10-4 mg/L for the testing dataset. The R2 values obtained were 0.79 and 0.62 for the training and testing datasets, respectively. Fig. S5(b) shows the LSTM trained on regression for the dataset and the comparison of predicted values (blue) vs the training and testing datasets. It can be inferred that running for 100 or 1,000 epochs doesn’t create any major differences in results and the model has done a good job in fitting the model for both the training and testing datasets.
3.3. LSTM with Memory Between Batches
LSTM in Python is executed through the Keras deep learning library and the library supports both stateless and stateful LSTMs. The stateful LSTMs provide finer control over the internal state of the LSTM and when the internal state of the LSTM is reset. This can be used to make predictions to by building state over the entire training sequence.
Fig. S6(a) indicates the PM2.5 values against time. Green indicates the training dataset and red indicates the testing plot. The RMSE values obtained indicated that the model has an average error of 0.1602x10-4 mg/L for the training dataset and 0.1653x10-4 mg/L for the testing dataset. The R2 values obtained were 0.76 and 0.46 for the training and testing datasets, respectively. Fig. S6(b) shows the LSTM trained on regression for the dataset and the comparison of predicted values (blue) vs the training and testing datasets.
Fig. S7(a) indicates the PM2.5 values against time for 1,000 epochs. Green indicates the training dataset and red indicates the testing plot. The RMSE values obtained indicated that the model has an average error of 0.1582x10-4 mg/L for the training dataset and 0.1648x10-4 mg/L for the testing dataset. The R2 values obtained were 0.76 and 0.46 for the training and testing datasets, respectively. Fig. S7(b) shows the LSTM trained on regression for the dataset and the comparison of predicted values (blue) vs the training and testing datasets. It can be inferred that running for 100 or 1,000 epochs doesn’t create any major differences in results and the model has done a good job in fitting the model for both the training and testing datasets.
3.4 Stacked LSTMs with Memory Between Batches
Stacked LSTMs are an extension of normal LSTMs which have a single hidden layer. Thereby, stacked LSTMs have multiple hidden layers with multiple memory cells. Stacking LSTM layers make the model deeper and thus justify the usage of the term deep learning.
Fig. S8(a) indicates the PM2.5 values against time. Green indicates the training dataset and red indicates the testing plot. The RMSE values obtained indicated that the model has an average error of 0.1597x10-4 mg/L for the training dataset and 0.1724x10-4 mg/L for the testing dataset. The R2 values obtained were 0.76 and 0.40 for the training and testing datasets, respectively. Fig. S8(b) shows the LSTM trained on regression for the dataset and the comparison of predicted values (blue) vs the training and testing datasets.
Fig. S9(a) indicates the PM2.5 values against time for 1000 epochs. Green indicates the training dataset and red indicates the testing plot. The RMSE values obtained indicated that the model has an average error of 0.1595x10-4 mg/L for the training dataset and 0.1717x10-4 mg/L for the testing dataset. The R2 values obtained were 0.76 and 0.42 for the training and testing datasets, respectively. Fig. S9(b) shows the LSTM trained on regression for the dataset and the comparison of predicted values (blue) vs the training and testing datasets. It can be inferred that running for 100 or 1,000 epochs doesn’t create any major differences in results and the model has done a good job in fitting the model for both the training and testing datasets.