After data preprocessing, we apply deep learning and different transfer learning model to our dataset. First we applied multiple convention algorithms such as NB, SVC, KN, ETC, RF, LR, Bgc, xgb, AdaBoost, GBDT. But their performance was not up to the mark as a result we created a hybrid Bi directional LSTM model which is our proposed model. The step by step brief breakdown of our model is given below:
5.1 Embedding Layer (embedding 5)
This layer, an Embed- ding layer, takes input sequences of length 85 and maps each word to a 100-dimensional vector. With 1,063,400 trainable parameters, it fine-tunes the mapping weights during training for optimal performance in capturing semantic relationships.
5.2 Bidirectional LSTM Layer (bidirectional 10)
This layer is a Bidirectional LSTM that handles sequences of length 85, employing 1024 units for each time step. With 2,510,848 trainable parameters, it fine-tunes the bidirectional LSTM operations during training, showcasing its ability to capture intricate patterns in sequential data.
5.3 Bidirectional LSTM Layer (bidirectional 11)
This layer is a Bidirectional LSTM layer designed to process sequences. With an output shape of (None, 512), it utilizes 512 units for each time step, leveraging bi-directionality to double the number of units. Housing 2,623,488 trainable parameters, this layer plays a crucial role in fine-tuning the bidirectional LSTM operations during training, emphasizing its ability to capture temporal patterns and enhance the network’s understanding of sequential data.
5.4 Dense Layer (dense 10)
This layer, a Dense layer with 64 units, transforms and consolidates features with an output shape of (None, 64). It contains 32,832 trainable parameters, adjusted during training for optimal predictive performance.
5.5 Dense Layer (dense 11)
This layer, a Dense layer with an output shape of (None, 1), serves a role suggestive of binary classification. Housing 65 trainable parameters, this layer is fine-tuned during training to produce a single output value, contributing to the network’s decision-making in binary classification tasks.
5.6 Total Parameters
Total trainable parameters in the entire network: 6,230,633 (23.77 MB). All parameters are trainable, indicating that the entire network will be updated during the training process.
5.7 Non-trainable Parameters
None - All parameters in this network is trainable, there are no fixed or non-trainable parameters.
The neural network model presented demonstrates a structured architecture tailored for sequential data processing, possibly in the context of natural language or time- series analysis. With key components including Embedding and Bidirectional LSTM layers, the model efficiently captures intricate patterns within input sequences. The Dense layers contribute to feature consolidation and predictive capabilities. Overall, the model’s parameters, intelligently distributed across its layers, highlight a balanced complexity, allowing it to learn and generalize effectively during training. Figure 2 above illustrates the architecture, which is especially well-suited for tasks requiring an understanding of sequential dependencies, like binary classification, shown by the final Dense layer's features and output appearance.
In contrast to the traditional one-directional LSTM, the BiLSTM is composed of two separate LSTM structures that carry out feature learning for the input sequence in both forward and reverse order. Through doing this, the model may be trained both from input to output and from output to input, which effectively increases the model's dependency and predicting accuracy [21]. Figure 3 presents a more precise depiction of the LSTM model. An input gate, a forget gate, and an output gate are the three gates that together make up an LSTM cell. At time t, the input gate, forget gate and output gate denoted as it, ft, ot respectively.
When two independent LSTM networks are connected to the same output layer, bidirectional LSTM networks work by displaying each training sequence both forward and backward [22]. This indicates that for every point in a specific sequence, the Bi-LSTM holds complete, sequential information about all points before and after it. The figure 4 illustrates the architecture of Bi-LSTM model.
To predict the amount of energy used by the residential and commercial sectors, a hybrid model called CNN-BiLSTM that is a combination of CNN, BiLSTM, and connection layer is presented. The CNN layer receives the input from the CNN-BiLSTM initially, then convolution computation and max pooling are then performed to create a new feature matrix. The BiLSTM is then used to extract the hidden output, using the feature matrix that was acquired from the CNN as its input. The linear layer that makes up the connecting layer receives the concealed output. The connection layer is then used to obtain the final predictable results.