In this the section, most of the SA research on the Arabic language will be reviewed and summarized based on the used approach and the proposed solution that they introduced.
3.1 The Lexicon-Based Approach
The lexicon-based approach mainly depends on a list of sentiment terms, and each term has its own score value. Usually, positive terms have a score value greater than zero, negative terms have a score value less than zero, and neutral terms have a score value of zero. An example of a study that employed the lexicon-based approach is that by (Al-Twairesh, Al-Khalifa, Al-Salman, & Al-Ohali, 2017) who built a wonderful Arabic corpus composed of 17,573 Saudi tweets, which were extracted from a huge dataset of 2.2 million Arabic tweets for use in SA research. Then the corpus was annotated manually to add one of four labels; positive, negative, neutral, or mixed. The latter is a relatively new label to indicate that the text contains both positive and negative opinions such as {قارئ جرير رائع لكن الاسعار غالية} (The Jarir Reader [app] is fabulous but the prices are expensive). A group of benchmark experimental results has occurred in order to verify that the dataset suitable for the Arabic sentiment analysis (ASA) community. In addition, (Abdulla et al., 2014) focused on lexicon-based SA for the Arabic language because it has lower importance than the other approaches. The authors proposed a new tool that handles some of the most important lexicon features such as negation and intensification. They considered that lexicon construction and SA tool design are the main challenges in this approach. They therefore studied three different lexicon construction techniques, one manual and two automatic.
Adopting a different perspective, (Al-Smadi, Al-Ayyoub, Jararweh, & Qawasmeh, 2019) introduced a lexicon-based SA approach by carrying out in-depth research into a relatively less-used SA, namely, aspect-based sentiment analysis (ABSA). The authors investigated this type of SA from two points of view: aspect category determination and aspect category polarity determination. They applied their technique on a dataset called the Human Annotated Arabic Dataset and reported that their proposed methodology outperformed many models when evaluated on the same dataset. In another approach, (Tartir & Abdul-Nabi, 2017) proposed a semantic method to extract user opinions from social media in both standard and dialectal Arabic language. The authors also introduced a new Arabic lexicon containing positive and negative keywords, which they named Arabic Sentiment Ontology. One of the main objectives of their study was to understand the effect of sentiment terms on text polarity. Their proposed methodology was applied to different Twitter feeds and different domains in order to measure the model’s efficiency.
Al-Rubaiee, Qiu, and Li (2016) proposed a model to classify opinions about Mubasher products that were given in tweets in the Saudi Arabian dialect. The model involved the use of preprocessing and tokenization techniques before inserting the data into support vector machine (SVM) and naïve Bayes (NB) classifiers. The accuracy of the experimental results was found to be promising. In another approach, (Al-Smadi et al., 2019) proposed a novel SA approach to detect the effect of social media news on Arabic readers. The study used news items collected from two of the most popular news agencies, namely, Al-Jazeera and Al-Arabia, during the Gaza–Israel war in 2014. The research put extra effort into the ABSA and text context. Two ML classifiers were used in the study, conditional random fields and J48, and they both gave good results with ABSA.
Furthermore, Farha and Magdy (2021) provided a thorough comparative overview of the most successful methods for Arabic SA. They re-implement the majority of current approaches for Arabic SA and assess their efficacy on three of the most common Arabic SA benchmark datasets. Furthermore, they investigate the use of transformer-based language models for Arabic SA and prove that they outperform current methods, with the best implementation achieving F-scores of 0.69, 0.76, and 0.92 on the SemEval, ASTD, and ArSAS benchmark datasets. In another search ,Abdul-Mageed (2019) reported on efficient lexical input models in Arabic, a language with a rather complex morphology. Specifically, we assess the effects of both gold and automated segmentation on the challenge and develop successful models that outperform our baselines. Using forecast segments improves subjectivity classification by 6.02 percent F1-measure and sentiment classification by 4.50 percent F1-measure compared to the plurality class baseline surface term types in these models. Also I, these models perform in-depth (error) analysis of the models' actions, as well as comprehensive descriptions of subjectivity and emotion expression in Arabic against the morphological richness context in which the work is situated.
3.2 The Machine Learning Approach
Machine learning (ML) is a technique that aims to train a machine to act automatically like a human. This goal can be achieved by training the machine by using actual data patterns that can then be recognized when needed. In SA, ML is executed in the same way using a standard list of algorithms. The ML approach is widely used in SA rather than the lexicon-based approach for many reasons, but primarily because the former has better accuracy. In another search, Al Shboul, Al-Ayyoub, and Jararweh (2015) proposed a new SA approach called multi-way sentiment analysis in which they tried to classify text by using a star ranking system, where 1-star means very negative and 5-stars means very positive. They applied their approach to a large Arabic Book Reviews dataset, extracted from an online Arabic book reviews website. The J48, decision tree, k-nearest neighbor (KNN), SVM, NB, and multinomial NB were used as ML classifiers. The results were not accurate because the proposed method is more complex than the traditional SA. In contrast, Abdul-Mageed (Abdul-Mageed, 2017) focused on the segmentation of rich morphological languages such as Arabic and their effect in yielding accurate subjective sentiment analysis. He created a new Arabic TreeBank annotated dataset and introduced detailed error analyses of the response of some lexical models. He used an SVM classifier to measure model accuracy.
In another study, (Alayba, Palade, England, & Iqbal, 2018) explored the advantages of convolutional neural network (CNN) and long short-term memory (LSTM) approaches on ASA. As a consequence, the obtained accuracy for ASA on multiple datasets was improved. Furthermore, the DL approach for ASA was introduced in (Al-Azani & El-Alfy, 2017). On two freely accessible databases, the authors studied different variations of skip-gram and CBOW, namely CNN and LSTM. Using the introduced combined LSTMs, the best results in terms of accuracy and other efficiency measurements were obtained.To detect emotion polarity in Arabic microblogs, (Al-Azani & El-Alfy, 2018) used LSTM and its simpler version gated recurrent unit (GRU). They contrasted the efficiency of deep learning to that of conventional machine learning approaches as a benchmark. Versions focused on LSTM and GRU outperformed other classifiers, according to the findings. In fact, integrating deep learning models for Arabic SA is a successful alternative to conventional machine learning approaches that helps to improve accuracy.
Furthermore, (Mubarak, Rashed, Darwish, Samih, & Abdelali, 2020) built the largest Arabic dataset to date, complete with special tags for vulgarity and hate speech They closely examined the dataset to decide which subjects, dialects, and gender are most often correlated with offensive tweets, as well as how Arabic speakers use offensive language. Finally, they performed several trials in order to obtain good findings. According to the findings, using an Arabic-specific BERT model (AraBERT) and static embeddings trained on tweets provided competitive results on the dataset.
In addition, (Elnagar, Al-Debsi, & Einea, 2020) adopted and impartial datasets for single-label (SANAD) and multi-label (NADiA) Arabic text categorization tasks. All these corpora are publicly accessible to the Arabic computational linguistics research community. In addition, they investigated the effect of using word2vec embedding models to increase classification task efficiency. Their test results demonstrated that both models performed well on the SANAD corpus, with a minimal accuracy of 91.18 percent obtained by convolutional-GRU and a maximum accuracy of 96.94 percent reached by attention-GRU.Al-Smadi et al. (2019) used supervised machine learning to offer an improved method for Aspect-Based Sentiment Analysis (ABSA) of Hotel Arabic ratings. They utilizeed cutting-edge research to train a series of classifiers with morphological, syntactic, and semantic features to tackle the research tasks: T1: aspect category identification, T2: opinion target expression (OTE) Extraction, T3: sentiment polarity identification. Nave Bayes, Bayes Networks, Decision Tree, K-Nearest Neighbor (K-NN), and Support-Vector Machine (SVM) are among the classifiers used. The method was tested using a sample dataset from the Semantic Evaluation 2016 workshop (SemEval-2016: Task-5). The results demonstrate that the supervised learning method outperforms comparable studies using the same dataset.
In addition ,Abdi, Shamsuddin, Hasan, and Piran (2019)introduced a deep-learning-based approach for classifying a user's analysis opinion (called RNSA). The RNSA employs the Recurrent Neural Network (RNN), which is made up of Long Short-Term Memory (LSTM), to take advantage of sequential computation and solve many shortcomings in conventional approaches, such as the loss of order and knowledge about the expression. The experimental results indicate that mathematical, linguistic, and sentiment awareness function vectors, sentiment shifter rules, and word-embedding can increase the classification accuracy of sentence-level sentiment analysis.
3.3 The Hybrid Approach
The hybrid approach exploits the strong points of the lexicon-based and ML approaches by combining them in one approach, where the lexicon-based approach is executed first. If the accuracy level does not match, then the lexicon algorithm and its output is used as input for an ML algorithm. Another way of combining these two approaches is to use the lexicons and any other linguistic resources to convert the text into vectors, which makes the subsequent ML process more accessible and more practical because most ML classifiers tend to deal with vectors and numbers directly instead of raw text.
Many studies have employed a hybrid approach, such as Pandey, Rajpoot, and Saraswat (2017), who proposed a new metaheuristic model based on the hybridization of cuckoo search and k-means clustering, named CSK for short. The primary purpose of this hybridization is to find the best cluster-heads in a dataset’s opinionated content. The authors prepared the data and converted it into feature sets before starting the evaluation process. They applied the proposed model to some benchmark datasets, namely, Twitter-sanders-apple2, Twitter-sanders-apple3, Twitter dataset, and Testdata.manual.2009.06.14. The experimental results were compared with the results produced other methods, namely, differential evolution, particle swarm optimization, cuckoo search, improved cuckoo search, Gauss-based cuckoo search, and two n-grams. The comparison showed that the proposed hybrid model outperformed these traditional methods.
In another work with a focus on tweet text, (Al-Twairesh, Al-Khalifa, Alsalman, & Al-Ohali, 2018) presented a hybrid dialect-independent approach by engineering a suitable feature-backward set. They then applied this methodology to a dataset of several Arabic dialects and especially Saudi tweets to test the model and the feature selection efficiency. They used two-way (positive and negative), three-way (positive, negative, and neutral), and four-way (positive, negative, neutral, mixed) SA. The F1 scores showed that the two-way option was the most accurate, followed by three-way and four-way.
In addition,(Al-Moslmi et al., 2017) introduced a new Arabic senti-lexicon which contains a list of 3,824 polar words. These words are categorized based on their polarity (positive and negative), polarity score, POS, inflected forms, and dialect synsets. The authors also presented a publicly available MASC that contains 8,860 positive and negative texts. Several domains such as politics and software were collected from several resources including the Jeeran website, Qaym website, Google Play, Twitter, and Facebook. After applying the necessary preprocessing steps, they converted the data into vectors by choosing some feature sets such as the number of positive words in the review or the tweet. After that, they inputted the resultant feature sets into SVM, NB, KNN, NN, and logistic linear regression classifiers. The experimental results showed that some features worked better with some classifiers. The present research will use the aforementioned senti-lexicon corpus and MASC software dataset in verifying the results of the proposed model.
In a recent work, (Alrefai et al., 2018) analyzed the research studies on ASA and categorized them by methodology and technique into ML, lexicon-based, and hybrid approaches. The authors presented the advantages and disadvantages of each approach and how it could be refined. They also counted the research studies based on the ML process and then proposed a new hybrid technique that combined the most-used classifier, namely, the SVM, with the evolutionary algorithm (EA), which is one of the fastest optimization algorithms used in solving complex cases. The aim of this hybridization was to adjust the SVM parameters by EA instead of using manual adjustment.