Though the pioneer works in this domain of Fake News Detection revolved around machine learning techniques, the advent of Deep Learning could be attributed to the surge in the improvements recorded for the task. Inspired by the memory power of Vanilla Recurrent Neural Networks (RNN) state of the art results were recorded in standard datasets coupled with a refined word representation technique. Liu et al. [10] modelled the classification of propagation paths for an early curb of Fake News using RNN. Also a deeper understanding of the propagation of spurious social content was put forth by Wu et al. [18] using these vanilla networks.
Furthermore an advancement of the existing memory networks are the Long Short Term Memory Cells (LSTM) which are capable of retaining a superior amount of details due to the gating mechanism involved. Bangyal et al. [6] adopted various Machine Learning and Deep Learning approaches with TF-IDF Vectorization strategy but achieved the optimal performance with LSTMs. The works which emphasized on LSTMs varied in terms of the embedding strategy used. Ahmad et al. [1] experimented LSTMs with traditional methods like term frequency–inverse document frequency, count vector, character level vector and N-Gram level vector upon the Fake News Detection corpus. Rodríguez et al. [14] used BERT word embeddings and exploited various models like LSTM and CNN. Other works used one hot representation to convert text into numerical form before feeding into the LSTM network.
As a counterpart to these LSTM networks, the Gated Recurrent Units (GRU) gained prominence down the lane as in Kaliyar et al. [7] work on FNDNet with Word2Vec encoding. Moreover the comprehensive study of neural network and transformer based language models carried out by Al-Yahya et al. [3] used GRUs over the Arabic Corpora on Covid Fake News. A plethora of embedding strategies were tested with GRU for this task. Glove was resorted to as the embedding mechanism by Bajaj et al. [5] on the signal media news dataset, while on the other hand fasttext was used by Verma et al. [17] in their work to detect fake news using Natural Language Processing.
While memory retention had been attended to with the above networks, the challenge was to instill further semantic understanding of the sentence in hand. One way to bring about this was the inclusion of the bidirectional feature, where a better understanding is achieved upon scanning from both directions of the query. Bahad et al. [4] experimented Bidirectional LSTMs on structured news articles for this task using Glove.6B embeddings, while Kashyap et al. [11] put forth an end-to-end model for evidence aware credibility using the same network. Reddy et al. [13] attempted a Bi-GRU based mis-information detector for the Urdu Corpus. Appending attention to these networks Kumar et al. [8] proposed the Fake News Net with an ensemble approach with CNN, Bi-LSTM and attention.
Having exploited the various advantages of the available recurrent networks, further work was carried out by incorporating an amalgamation of convolutional networks with the recurrent ones. Convolutional Neural Networks are primarily used for dealing with images. However, their ability to extract local features had proven to be cardinal as in the experiments conducted by Lin et al. [9] where semantic information is captured with a multi-level CNN architecture and Yang et al. [20] novel approach on TI-CNN which uses the information from both Text and Image to detect fake news.
Despite experiments incorporating the hybrid structure of Convolutional and Recurrent Networks as in [1, 2, 8, 12, 15, 16, 19, 21], the existing implementations fall short in generalizing across news from disparate domains. The correlation between the encoding strategies on the results obtained have been difficult to assert and hence an unanimous decision can’t be arrived at. In this work, we propose a hybrid architecture which classifies the input information as Fake or Real and also to explore the suitable architectures for the decoder of the model, having a CNN encoder. We aim to understand the working of different word embeddings and identify the one with the optimal performance for the task at hand.