Due to a growth in internet and social media platform users, abusive language identification as well as hate speech detection have been the focus of study over the past decade. Because of the emergence of transformers and pretrained language models, present solutions for detecting abusive language rely heavily on deep learning techniques. In addition, various shared tasks on low-resource languages have been published to direct the focus of researchers, and academicians have worked to develop models for these tasks. Several of these initiatives are outlined below:
2.1 Shared tasks
In 2020, the first shared task1 on identifying the abusive language in Dravidian languages such as Tamil, Malayalam, and Kannada has been released. The purpose of this task is to engage academic researchers to create models for recognizing abusive/ offensive language content in the code-mixed dataset collected from social media comments/posts in Dravidian Languages. The findings of this shared task have been reported by [17], and the authors have also provided an overview of the dataset used for this task as well as the methodologies and results of the systems proposed for this task. This shared task has stimulated interest in low-resource languages and encouraged further research. Another shared task2 has been released to classify abusive comments as homophobia, misandry, counter speech, misogyny, transphobia, and so on. Models using machine/deep learning algorithms and transformers have been proposed for this shared task. The results of this shared task were analyzed by [18] and found that the transformer-based MuRIL model did the best out of all the others.
Chakravarthi et. al. [6] collected comments and posts in Dravidian languages (Malayalam-English and Tamil-English) from social media and released them as a shared task 3. This shared task of figuring out which texts in Dravidian languages were offensive was summed up, and the results were published [6]. This report reveals that numerous models employ transformers and pre-trained embedding systems. In addition, Chakravarthi et. al. [19] has released a shared task4 with the primary objective of detecting homophobic and transphobic texts in social media comments in Tamil, English, and Tamil-English and also reported on the results of this shared task. For this shared task, numerous pre-trained models and transformer models, such as BERT, mBERT, XLM-RoBERTa, IndicBERT, HateBERT, etc., have been utilized. Moreover, it has been found that the most effective approach utilized pre-trained XLM RoBERTa language model for zero-shot learning to address data imbalance and multilingualism. To detect hate speech and offensive contents in both English and Indo-Aryan languages, a new shared task 5 has been posted. The authors of [20] gave an overview of this shared task, which included the descriptions of the tasks, the data, and the results. Thus, the shared tasks are intended to motivate academics to address and advance problems related to abusive/offensive text recognition and to draw attention to the need for more study into the identification of abusive contents in under-resourced languages.
2.2 Deep learning models
Since machine learning based models depend on well-defined feature extraction strategy, automated feature extraction models have come into use. In addition, these automated models are increasingly using text representation and deep learning approaches to detect abusive comments in order to enhance performance. We provide a quick summary of these models below.
Ashraf et. al. [21] investigated YouTube comments for identification of offensive comments. Several baseline machine learning models, including Multi-Layer Perceptron (MLP), AdaBoost, Random Forest (RF), Naive Bayes (NB), Support Vector Machine (SVM), Logistic Regression (LR), and Decision Tree (DT), as well as two neural network models namely Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM), were tested in this study. This work produced F1-scores of 91.96% for Ada-boost and 91.68% for CNN respectively. Lee et. al. [22] looked at neural network-based models including CNN, Recurrent Neural Networks (RNN), and the Gated Recurrent Unit (GRU) network models in addition to conventional machine learning techniques to learn about hate speech and abusive language detection on Twitter. With an F1-score of 80.5%, a bidirectional GRU network based on word-level features and Latent Topic Clustering modules did better than the other models. Emon et. al. [23] tested machine and deep learning algorithms such as Linear Support Vector Classifier (Linear SVC), LR, MNB, RF, Artificial Neural Network (ANN), and RNN with a Short Term Memory (LSTM) to check if they could find abusive Bengali texts. With an accuracy of 82.20%, the RNN algorithm with LSTM does better than other algorithms. In an attempt [24], transformer-based deep neural network models like BERT, ELECTRA etc. have been used. These models were tested on a new set of data with 44,001 comments from Facebook posts. Both BERT and ELECTRA had test accuracy rates of 85% and 84.9%, respectively. Sharif et. al. [25] proposed a few machine learning models (LR and SVM), deep learning techniques (LSTM and LSTM + Attention), and transformers (m-BERT, Indic-BERT, and XLM-R) to find offensive texts in the shared task1 dataset. The authors showed that XLM-R performed better than other methods for Tamil and Malayalam comments, but m-BERT achieved the best score for Kannada comments.
Around 6,175 user-generated comments in code mixed Kannada were gathered by Hande et al. [26] from YouTube and classified as either hope speech or not-hope speech. Additionally, they developed DC-BERT4HOPE, a two-channel model that uses the English translation as additional training to strengthen the ability to recognize the word "hope". The weighted F1-score for this method is 75.6%, which is better than other models compared in their work. A detection strategy based on the ensemble of RNN classifiers that integrates user-related information, such as racism or sexism has been proposed by
Pitsilis et. al. [27]. The user-related information and word frequency vectors from the text have been submitted to the classifiers. The classifiers have been evaluated on a public corpus of 16k tweets, and the results showed that the proposed classifiers can recognize racism and sexism messages from normal text better than existing state-of-the-art algorithms. [28] examined various machine learning techniques for identifying hope speech in short, casual texts written in English, Malayalam, and Tamil. The authors showed that, given enough training data, even extremely simple baseline algorithms do pretty well on this task. In this work [28], however, cross-lingual transfer learning, with XLM-RoBERTa, is found to be the best-performing algorithm. Glazkova et. al. [29] created models for the Shared Task 20215, for Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages. The authors used a one-versus-the-rest technique based on Twitter-RoBERTa to identify hateful, offensive, and profane comments, and these models obtained F1scores of 81.99% and 65.77% for two subtasks of the shared task, respectively. For the Marathi tasks, the authors presented a language-independent BERT Sentence Embedding system (LaBSE) and it produced an F1score of 88.08%. Steimel et. al. [30] recommended machine/deep learning models to classify comments in English and German into abusive and not abusive. The authors have experimented with several promising architectures, including fully connected neural networks and CNN, along with different word embeddings, with both BERT and Flair embeddings. In this work [30], it has been concluded that that a multilingual optimization of classifiers is not possible even in settings where comparable datasets are used.
El-Alami et. al. [31] presented a transfer learning-based method for classifying offensive language in multilingual texts. The transformer models, including BERT, mBERT, and AraBERT, which were honed for the multilingual offensive detection problem, served as the foundation for this approach. The findings of this study demonstrated that the proposed models were both more accurate and received higher F1score. Sundar et. al. [32] came up with a multilingual model that used a stacked encoder architecture to automatically find hope speech. In this work, language-independent cross-lingual word embeddings were used because the dataset was made up of mixed-code YouTube comments. An empirical analysis was also done, and the proposed architecture was tested against different traditional methods, transformer, and transfer learning methods. With this method, the F1 score for Tamil was 61% and for Malayalam it was 85%. Apart from developing models for classification, the researchers have also constructed benchmark datasets in various research efforts [33–35]. These attempts have created datasets for hope speech detection, abusive text identification etc. and made them publicly available to the research community. A set of metrics for evaluating and categorizing a dataset is also presented in these research attempts. These datasets will spur further research.
To summarize, we have investigated the use of fine-tuned deep learning-based transformer models to detect abusive comments. Despite the considerable amount of work on abusive comments detection using fine-tuned transformer models, we find the adapter-based models have not yet been tried out. Adapters accomplish the same functionalities as fine-tuning, but by adding layers to the pre-trained model and updating the weights of these additional layers while maintaining the pre-trained model's weights in a frozen state. Therefore, fine-tuning updates the weights of the pre-trained layers. As a result, adapters are significantly more time and storage efficient than fine-tuning. From the literature survey, we understand that no work uses an adapter-based transformer model and believe that such models will improve the efficacy. So, in this work, we integrate adapters into the transformer models and evaluate the performance.