Sentiment analysis on labeled and unlabeled datasets using BERT architecture

doi:10.21203/rs.3.rs-1822017/v1

Sentiment analysis (SA) is the study of human perception in any subject of practice. It retrieves data from datasets using Natural Language Processing (NLP) methodologies and algorithms that are either regulation-based, blended, or rely on machine learning approaches. SA is garnering fame for its capacity to fit in a large chunk of data with user evaluations, uncover a trend, and come to a consensus derived from real facts rather than hypotheses established on a limited number of observations. The flexible nature of sentiment gathering has helped in playing a critical role in both commercial and research applications in the last few years. This study presents new sentiment analysis models based on Bidirectional Encoder Representations from Transformers (BERT) for both labeled and unlabeled datasets. The labeled datasets using supervised learning are modeled in a hybrid architecture of fine-tuned BERT and interval Type − 2 fuzzy sets. The inclusion of interval Type-2 fuzzy logic for handling reluctance or inaccuracy in data shows commendable results for the labeled datasets. For the prediction of sentiments in unlabeled datasets, they are embedded through a BERT tokenizer with the help of a threshold and activation functions. The coupling of a multi-layer perceptron with the BERT parser substantially decreases the time and complexity compared to supervised learning. Both the models have been implemented on multiple datasets and have outperformed existing state-of-the-art techniques in this field.

sentiment

labeled

unlabeled

BERT

fuzzy

LAMB optimizer

proximity

The technique of determining the perceptions, judgments, feelings, attitudes, perspectives, inferences, and other thoughts concerning any observation is referred to as sentiment analysis. It is a method for analyzing writings, photographs, emoticons, and other behaviors to determine how other individuals consider a brand, commodity, organization, brand name, or reaction to a place or event, mass movement, or other issues. Sentiment analysis is becoming increasingly important in gaining a better understanding of individuals from various groups and their perspectives. It is significant in tracking digital platforms and gauging public sentiment in favor of making well-informed recommendations [1]. The word “sentiment” etymologically refers to thought, judgment, or strategy based on a sensation about a particular state, or a technique of expressing one's view about anything. Analysis of sentiments is one of the most crucial technologies in the current profoundly emotion-driven scenario of the masses. The goal of sentiment analysis is to identify people's expressions and categorize them as optimistic, unfavorable, or unbiased. It has a wide range of applications, including corporate products, market analysis, product positioning, market targeting, product recommendations, testimonials, and brand management [2]. The major platform where sentiment analysis has gained momentum is social media. It has become an absolute necessity to construe the sentiments of the masses that place their opinions on social media. The unearthing of the real meanings of words, entire sentences, or documents is widely done by Natural Language Processing (NLP) [3]. NLP is used to train systems, allowing them to better grasp human language and interactions while dealing with matters. Typically, NLP-based Machine Learning algorithms are utilized to construct Artificial Intelligence (AI) applications that can grasp the language-based context and exhibit people's attitudes. There is an assortment of methods for analyzing a variety of sentiments that people portray in different cases [4]. Fine-grained sentiment analysis deals with aiding in the calculation of polarity precision. Sentiments in this case may be categorized into positive, extremely positive, negative, extremely negative, and neutral. Aspect-based sentiment analysis tries to unearth the facet on which people are emoting their sentiments. Further, emotions can also be detected via emotion-detection systems which can classify anger, sadness, happiness, frustration, fear, worry, and panic from texts. One more interesting type of analyzing sentiments is intent analysis. Uncovering the intent of a person helps in decision-making by understanding whether a person is fascinated by a product or not. Unnecessary attention to a person not interested in a product consumes time, money, and effort of the company which can be easily evaded by exploring intent analysis. Generally, there are a few steps that comprise the working procedure of sentiment analysis [5]. The process of fixing the algorithm which will identify the accurate sentiment is hailed as the most important step for sentiment analysis. Though there are various methods in which the algorithms can be selected, both rule-based and automated machine learning-based approaches yield the best results depending on the specific problem. Traditional rule-based approaches quantify the number of positive and negative terms while automatic approaches use machine learning mechanisms to bring to light sentiments based on various characteristics. Sentiment analysis techniques can also be employed on complete texts, sentences, and phrases. Methods of analysis curated appropriately can provide high value to emotion mining systems.

Analysis of sentiments can be approached diversely using a variety of techniques [6].In most cases, NLP mechanisms are deployed comprising of rule-oriented systems, automatic systems, and also by the amalgamation of both the above systems [7]. While rule-based methods tightly adhere to a set of rules created manually, automated systems learn from data using machine learning techniques [8]. Rule-based strategies polarize words using a combination of manually curated rules and basic NLP approaches like tokenization, stemming, and parsing. Automated systems, on the other hand, rely largely on machine learning algorithms to train and identify moods from data.

This paper presents novel models for analyzing sentiments of both supervised and unsupervised datasets. In the first part of the manuscript, sentiments will be unearthed from labeled datasets using the BERT [9] model and fuzzy logic system. BERT stands for Bidirectional Encoder Representations and is a deep neural network architecture for NLP-based tasks and is considered to be one of the most recent breakthroughs in deep learning. It was launched by Google in 2018 and obtained exemplary performances in numerous natural language processing tasks. BERT has been implemented with fuzzy logic which works to describe rational thinking using ambiguous or imperfect facts. The truth values of the variables in fuzzy logic can be any real integer between 0 and 1, making it a type of many-valued semantics. The numbers between the range of 0 and 1 pertain to partial truth, i.e., intermediate degrees of truth, with 0 representing "completely false" and 1 representing "entirely truthful"[10]. Because of the capacity of emulating uncertainty-like reality, fuzzy logic frames the base for artificial intelligence-based automated systems customized through rules-based inferences. The second part of the manuscript deals with an unsupervised approach using the BERT model for unlabeled datasets. A BERT [9] tokenizer is used to allocate a representation of each word based on its context, capturing word usage across circumstances and assembles information that is transferable across languages. This is followed by leveraging the co-reference resolution problem and finally predicting a sentiment of unlabeled datasets.

The paper has been organized in the following manner. Section I deals with the introduction to the problem statement followed by Section II containing the motivation of this work. Section III deals with the latest works in the field of sentiment analysis using fuzzy logic and BERT. The proposed methodologies are elaborated in Section IV. Results and discussions are elaborated in Section V. Statistical analysis of the results have been presented in Section VI and the paper ends with the conclusion in Section VII.

The goal of this research is to make it easier to extract sentiments from both labeled and unlabeled data. While there are numerous approaches for determining feelings in labeled data, there are few techniques that have been explored for unlabeled data. The unique strategies described in this study have successfully decreased the complexity of the BERT models [9] while maintaining their efficacy. While most of BERT's characteristics work in its favor, the inordinate amount of time and space it takes to execute the algorithm poses deterrence to real-life applications. With these considerations in mind, our contribution includes four aspects –

a) For labeled datasets, with the initial freezing of top encoding layers, the BERT model [9] is fine-tuned to reduce its "bigness" and assist it to increase its pace while execution.

b) In order to help BERT [9] perform faster while keeping accuracy, we have applied the rarely used LAMB optimizer [11].

c) A fuzzy-logic module of interval Type-2 has been integrated with BERT [9] due to its simple yet flexible method to mitigate the risks of uncertainty.

d) Finally, a basic but robust non-trained proximity-based BERT model [9] for predicting sentiments in unlabeled datasets has been proposed.

For quite some time, scholars focused on sentiment analysis as an active area of research. Sentiment analysis is a collection of methodologies, strategies, and tools for identifying and extracting descriptive data from text [12]. Apart from the traditional methods of mining opinions, deep learning [13] has established a new standard to quantify sentiments from texts. With the help of deep learning models, a number of common architectures can be swiftly prototyped and customized according to specific datasets to attain greater efficiency. The input text is first transformed into an embedded representation in most of the advanced sentiment algorithms. These embeddings are occasionally learned in tandem with the model, however, the latest findings reveal that pre-trained embeddings like Word2Vec, GloVe, BERT [9], or FastText [14] usually provide more accuracy. Some of the very recent works that have been carried out in the field of sentiment unearthing have been listed below in Table 1.

Table 1

Recent works on sentiment analysis
Ref. No.	Aim of the work	Methods used	Results &Discussion
15	A novel BiERU model has been proposed which deals with conversational sentiment analysis. The emotional recurrent unit (ERU) has been presented as a parameter-efficient architecture to deploy sentiment extraction.	To deal with verbal sentiment classification, a recurrent neural network with a generic neural tensor block (GNTB) and a two-input characteristic extractor has been developed.	The model was tested on three different datasets and proved to be effective in generating elevated emotion characteristics for sentiment analysis. The proposed model can be used to streamline the framework while also improving quality.
16	A deep learning model has been constructed for text categorization of COVID-19-related tweets	The suggested technique relies on an LSTM-RNN network with attention layers for increased attribute ranking. Through the attention mechanism, this algorithm provides an upgraded feature transformation framework.	A setup with four class labels was used to create an LSTM-RNN-based network model with better features graded by an attention layer. The model works well for selected characteristics but suffers from the lack of optimizing the characteristics being fed to the model.
17	S_I_LSTM framework is considered for anticipated stock price forecast stressing the consequences it has on conventional and non-traditional data.	Convolution Neural Network and Long Short Term Memory network with attention mechanism have been implemented on data from multiple sources.	Investor attitude and technical indicators have been emphasized to predict stock market prices. The limitation lies in the fact that less priority is meted out to train labeled data.
18	The aim of the work is to see how Twitter® feeds and key global financial metrics influence stock market performance.	Sentic computing was used incorporating skewed correlation analysis shunning the lexicon-based method. The idea was to unearth correlations in Twitter® data to aid in financial decision-making.	The strength of correlation was found to be proportional to the number of followers in the associated Twitter account. During the Covid-19 outbreak, the dates of Twitter® posts also had a significant impact on financial metrics. This research could be expanded in the future with different other indicators and can be applied to other social media platforms.
19	To construct domain-dependent feature sets and demonstrate a unique strategy to discover unique lexical sets to analyze real-time tweets for spam Twitter® sentiment analysis.	To detect spam in tweets, machine learning and deep learning approaches were applied. In addition, a logistic regression (LR) classifier was used to detect spammed tweets dynamically.	The multinomial naive Bayes and Support Vector Machine classifiers performed well in detecting spam and analyzing sentiments in Twitter®, respectively, whereas the deep learning model, Long Short Term Memory (LSTM) outperformed all other models in both the cases. Finding the connection of the account from which the spam tweet was posted, as well as the intent behind it, can be considered as a future development of the work.

The majority of contributions to sentiment analysis are limited to binary classification. As a result, researchers choose to use fuzzy logic to enhance sentiment analysis. Because of the ambiguity of language, fuzzy systems had to be used to promote decision-making accuracy. Fuzzy logic was introduced by Lotfi Zadeh in the year 1965 [20] to interpret the truth values as degrees of truth. Fuzzy logic deals with inaccuracy or inconsistency by assigning numerous credence levels to statements. It can assist neural network models, data analysis, and business requirements to impersonate cognitive thinking that humans associate with. Table 2 lists a collection of some of the most recent works combining fuzzy logic with sentiment extraction.

Table 2

Recent works combining fuzzy logic with sentiment extraction
Ref. No.	Aim of the work	Methods used	Type of Fuzzy used	Results & Discussion
21	A product selection framework based on sentiment classification and an intuitionistic fuzzy TODIM algorithm is proposed.	The Apriori algorithm first extracts product attributes based on online assessments. The lexicon-based sentiment analysis approach then determines the sentiment direction and strength of the sentiment terms. A study of cellphone selection is also provided to demonstrate the proposed method.	The IF-TODIM approach was used to compare the gain and loss of each product feature by developing new ranking methods of intuitionistic fuzzy values (IFVs).	The proposed method takes into account each product feature's perceptual needs as well as several sentiment inclinations (positive, neutral, and negative). Emoticons and images can be incorporated into the analysis assignment as part of future work.
22	To categorize the orientation of phrases or documents, a fuzzy rule-based system (FRBS) with a crow search algorithm (CSA) is used.	This method incorporates CSA with a fuzzification system to maximize the FRBS output findings and achieve excellent classification results in comparison to existing techniques for all testing, considering efficiency as a key component.	The Mamdani [23] systems were applied to three prominent data vocabularies, viz., AFINN, SentiWordNet, and labMT, using the triangle membership function.	Though the fuzzy logic system's findings were superior to those achieved using other state-of-the-art methodologies, the vagueness of the rules used to make judgments needs to be addressed further.
24	For ranking hospitals based on opinions, a fuzzy multicriteria decision-making (MCDM) system has been developed.	This application consists of an algorithm for detecting the various aspects that characterize a hospital, a fuzzy model for representing and aggregating the sentiments toward the various aspects, and a ranking mechanism based on the MCDM process that sorts the most appropriate hospitals based on the user characteristics.	In this research, an intuitionistic fuzzy set (IFS)-based mechanism for capturing multiple perceptions toward various healthcare parameters, has been established	Here, a method for ranking different hospitals based on the feedback from previous patients has been presented. The extraction of effective data from small texts and understanding its intent remains a challenge.
25	A fuzzy semantic optimum control of neural networks is proposed for an English translation model.	The model is fine-tuned using English translation data with optimum control having fuzzy semantic information, utilizing ordinary English translation data for model pretraining	The ITS-UKF method selects various English translation attributes using the remarkable fitting capabilities of neural network fuzzy models.	Increased accuracy and enhanced model generalization are obtained by integrating fuzzy semantic optimum control on the English translation model.
26	To forecast the sentiments of diverse types of datasets, a new three-step fuzzy-based BERT attitude model approach is proposed.	After cleansing, the datasets were run through the BERT basic model before being sent into the fuzzy logic module to predict appropriate attitudes.	In the fuzzy module, triangular, trapezoidal, and gaussian membership functions, as well as a set of seven Mamdani [27] rules, were adopted since they cover practically all sorts of common membership functions.	The accuracy of fuzzy and the speed of BERT have successfully predicted ideal feelings. As an extension to the work, other fuzzy modules can be introduced and fine tuning the BERT model could be experimented with.

As stated earlier, there are two proposed models in this work, viz., one for supervised learning on labeled data and the other for unsupervised learning on unlabeled data. The labels in the datasets are used to anticipate a feature of the data based on the others. Labeling, for example, is the process of determining the sentiment of data (positive, negative, or neutral) based on the words in the text or other information. Figure 1 represents the difference between labeled and unlabeled data.

Supervised learning is a sort of prospective machine learning method that functions with annotated data and forecasts the label of incoming data samples based on prior data. Unsupervised learning, on the other hand, works with unlabeled data, which is typically a dataset with characteristics but no prediction aim. A dataset is a logically organized collection of data that is usually linked to a certain piece of research. Numeric, Bivariate [28], Multimodal [29], and Qualitative [30] data sets are some example data sets. A sample population may, for example, represent each student's quarterly grades in a specific curriculum. The datasets that have been used in this research are listed in Table 3.

Table 3

– Details of datasets used
Dataset	Name	Labeled/ Unlabeled	Balanced/imbalanced	Total count of data	Class of data	Specifications
Dataset 1 [31]	Sentiment140	Labeled	Balanced	1,600,000	Binary	Positive – 50,000 Negative – 50,000
Dataset 2 [32]	IMDB review	Labeled	Balanced	50,000	Binary	Positive – 12,500 Negative – 12,500
Dataset 3 [33]	Clothing	Labeled	Imbalanced	22,640	Binary	Positive – 18,539 Negative – 4,101
Dataset 4 [34]	Amazon MP3 dataset	Labeled	Imbalanced	28,469	Multi-class	Positive – 21987 Negative – 6482 Neutral – 2531
Dataset 5	IMDB review	Unlabeled	Balanced	50,000	Binary	NA
Dataset 6	Clothing	Unlabeled	Imbalanced	22,640	Binary	NA
Dataset 7 [35]	Unlabeled dataset	Unlabeled	imbalanced	20500	Multi-class	NA

The following subsections elaborate on the methods suggested for the above two scenarios in detail.

a) Supervised Sentiment Analysis on labeled datasets

The main aim of this first part of the research is to unearth the subtleties of the BERT model [9] and try to diminish the computational complexity or time without affecting the performance of the original model. For researchers with insufficient computational resources, BERT's [9] lengthy training and inferential times are a significant impediment [36]. More than cutting training time, lessening training speed would result in faster problem recurrence and, eventually would lead to faster solutions. Figure 2 depicts a comprehensive flowchart of the proposed methodology.

i) Basic data cleaning

The initial model in this study entails gathering relevant datasets and preparing them in readiness for the functions of the future modules. With the substantial number of unstructured information available on the web nowadays, extracting precise sentiments from unclean data is a difficult undertaking. Natural Language Processing plays a critical role here by assessing and interpreting large levels of organic data sets. The first step in preparing datasets is to remove noise from the data. Noise in data is defined as the use of special characters, brackets, square brackets, white spaces, URLs, and punctuations. BeautifulSoup [37] is a Python module that has been used here to reduce noise from data. The next step is to use the NLTK Tokenizer Package to parse the data into tokens. The third step focuses on creating consistency in all of the words. All words are converted to small letters as a matter of course. After this, stopwords are deleted from the corpus to further minimize the varied dimensions of data. Stopwords are English words that don't add meaningful content to a sentence [38]. They can be easily dismissed without jeopardizing the statement's meaning. Words like “are”, “he”, and “the”, for example, are referred to as stopwords in datasets intended for text processing. Following the eradication of stopwords, text pre-processing options include stemming and lemmatization. Stemming is the process of removing suffixes from phrases in order to convert them to their elemental state. On the other hand, lemmatization identifies the impacted form of a word and transforms it to its source. "Moving," for example, is stemmed as "mov," and "worst" is lemmatized as "bad." In this case, the lemmatized version of the words has been considered for further processing.

ii) Fine tuning of the BERT–base model

Once the basic pre-processing is completed, the inputs to be fed are to be passed to the BERT model [9] for further classification. BERT [9] has been acclaimed as one of the most groundbreaking inventions in the NLP arena since its introduction in 2018. It's a transformer architecture-based deep learning model. The transformer-based design enables BERT [9] to read input bi-directionally, both from right to left or left to right simultaneously. Every outcome object is linked to every node in the BERT model [9], and the distributions between them are continually determined based on their association. The incorporation of Transformers, which can analyze information in any sequence, is primarily responsible for BERT's bidirectionality. Transformers aid in the design and effectiveness of the BERT model [9] by leaning on the self-attention method and permitting training on massive amounts of data. The BERT model includes special tokens such as [SEP], [CLS], Token ID, Mask IDs, Segment IDs, and Position Embeddings. BERT is programmed to accept one or two sentences as input. It utilizes the [SEP] token to distinguish two sentences and the [CLS] token to categorize them based on the problem. BERT [9] is effective for task-oriented concepts, can produce excellent outcomes in several languages, and fine-tuning BERT [9] results in a significant boost in performance. The BERT-based architecture [9] has 12 encoder layers, comprising 100 million factors to configure. The bigger version is the BERT-large architecture which has 24 encoding layers and 350 million characteristics. However, aside from consistently excellent outcomes, BERT [9] has a number of disadvantages; and the majority of them are related to its magnitude. BERT's massive size is mostly attributed to its structure and volume. There are a lot of weights to update, and the computation is complicated, therefore it's costly. Though fine-tuning BERT [9] is always an option, there are a number of issues that need to be addressed with fine-tuned models, such as non-converging of the outputs and over-fitting. Traditional strategies for minimizing BERT [9] include changes to the basic design, modifying layers based on their priority, effective optimization policies, shrinking or thresholding models, and model distillation. As researchers started to explore more of BERT in a myriad of tasks, new models such as ALBERT, RoBERTa, and ELECTRA [39] have emerged to address specific challenges and minimize the complexity of the basic BERT model [9]. In order to improve the training speed of the proposed method, the authors suggest freezing the first layers of BERT as a means of optimizing BERT [9]. The speed boost and lower memory usage during training are managed by the freezing of the early layers of BERT [9], which results in a reduction in the number of attributes to be updated. During the initial phases of learning, the fundamental layers would be frozen and then unfrozen when the model stabilized. It is a delicate task to choose which layers to suspend during training. It should be implemented in a manner that minimizes over-fitting and prevents significant gradient changes from destroying the pre-trained properties. Another key consideration during freezing is to keep an eye on the dataset size and learning efficiency. The objective of this work is to freeze the top encoding levels of the BERT model [9] because the early layers of deep learning models are assumed to capture generic properties while the upper layers are more task-oriented. The BERT base model [9] has been used in this experiment and its specifications, as well as its 12-layered encoder architecture, has been illustrated in Table 4 and Fig. 3, respectively.

Table 4

BERT – base [9] specifications
Model	Parameters	Layers	Hidden units
BERT-base	110 million	12	768

The intake to the BERT base model [9] consists of a series of tokens that are wrapped into feature vectors before being handled by the model. The result is a series of vectors, all of which correlates to the source token and with a similar position. The outcome from each of the preceding encoding layers can be regarded as a succession of context-specific embeddings as the process progresses. Hence, in the forward pass, the upcoming layer receives every contextual embedding series as an input to it. As a result, as the input progresses through the phases, different aspects of the source are retrieved, and each subsequent layer builds on the trends revealed by the preceding layer. The scope of this work lies in the fact that this implementation of the encoder layer upon layer on a regular basis may easily lead to overfitting. Hence, the n initial encoding layers will be frozen but not the embedding layer. Thereafter, the data missing out(if any) during freezing the layer would be overcome by channelizing the projections through the fuzzy logic module.

iii) Type 2 fuzzy module

Fuzzifier, rules, inference system, and defuzzifier are all parts of a fuzzy logic system (FLS) also referred to as a fuzzy inference system or a fuzzy controller [40]. Zadeh invented the notion of Type-2 fuzzy sets as an elaboration of the notion of a conventional fuzzy set, i.e., a Type-1 fuzzy set. Type-2 fuzzy sets have membership grades that are also fuzzy [41]. A Type-2 fuzzy membership grade can be any subset of the primary membership, and there can also be a secondary membership that corresponds to each primary membership and determines the alternatives for the core membership. For instance, consider the quandary of neutral being referred to as either 10 "meter" (mt) or 100 "centimeter" (cm) for a certain distance. Now, both phrases signify the same thing. But depending on this threshold of 10 meters or 100 centimeters, various people may have different interpretations of neutral. Similarly, how closer is the interpretation to the neutral might also vary depending on the individual. For example, 10.1 mt, which is just crossing the threshold, or 9.8 mt, which is approaching the threshold, may be rounded off as the neutral value. If we take into account the opinions of different people, the neutral distance will eventually get increasingly fuzzier, and every iteration of neutral will change the fuzzy set in what seems like a three-dimensional function. The idea that concepts have diverse interpretations for different persons motivates Type-2 fuzzy logic. A Type-2 fuzzy set is just the aggregate of all the points that make up a set in three axes.

Figure 4 represents the membership function of a broad Type-2 fuzzy set in three axes. The unpredictability in the fundamental memberships of a Type-2 fuzzy set is represented by the footprint of uncertainty (FOU). The membership degree is a hazy set rather than a precise number. The value of the membership function at each location on the footprint of uncertainty, which is a two-dimensional region, forms the third dimension. While classical Type 1 fuzzy logic suffers to handle degrees of risks, Type-2 fuzzy logic systems (T2FLSs) are efficient in handling the same. In comparison to Type-1 fuzzy sets, this added dimension provides higher permutations for better representation of ambiguity. The authors have combined all of the theoretical benefits of full Type-2 fuzzy sets in this work for the ease of processing by using interval Type-2 fuzzy sets. The zone between the upper and lower memberships is known as the "Footprint Of Uncertainty" (FOU). The working of an interval Type-2 fuzzy comprises the following methods-

a) The initial phase in the operation of an interval Type-2 fuzzy is fuzzification, in which the crisp inputs are transformed into input for interval Type-2 fuzzy sets.

b) Subsequently, the membership function is generated. Both the primary and secondary membership functions in this instance have been implemented using the triangle membership function [43], as shown in Fig 5.

c) In addition to the membership function, the rule base is the same as in the type-1 fuzzy logic system. A set of seven rules has been considered in this case as mentioned in [12]. Depending on the categorization of the datasets (binary or multi-class), the rules are selected.

d) Output type-2 fuzzy sets are created by the interval type-2 fuzzy that is sent into the inference engine together with the rule basis.

e) The output for interval type-2 fuzzy logic systems is then produced by the inference engine after combining all the fired rules.

f) The type-1 fuzzy sets are subsequently created by type-reducing the type-2 fuzzy outputs.

g) Defuzzification is then completed to create crisp output sets.

The intermediate extremities method has been utilized to demonstrate the affinity for a label, as stated later in the findings section.

b) Unsupervised Sentiment Analysis on unlabeled datasets

The second part of this study focuses on extracting sentiments from unlabeled datasets. Because most datasets aren't labeled, creating a model that effectively uncovers sentiments from those datasets is a sensible endeavor. The key challenge with these activities is determining the best label for accurate model prediction. The successful use of BERT [9] in calculating text embeddings has also been demonstrated in this case. As previously noted, BERT's size [9] provides an issue in terms of both time and limitations, necessitating constant fine-tuning in order to lower the model's computational complexity. Here, the first n levels of the BERT model [9] have been considered, so that the time and space costs incurred due to the original nature of the BERT architecture [9] do not affect the prediction model. Furthermore, as the datasets in this situation are not labeled, considering only a part of the layers in the BERT model [9] affects the effectiveness of the original BERT [9] to some extent. As a result, a neural network with one hidden neuron has been implemented on top of BERT [9] during the classification process to help achieve excellent results in identifying feelings from unlabeled data, reducing the loss due to voluntary layer selection. The approach for extracting sentiments from unlabeled datasets is depicted in Fig. 6.

The first step in identifying sentiments from unlabeled data is to clean or pre-process the data to make it suitable for passing as inputs to the BERT model [9] for determining word embeddings. The basic approaches for transforming unprocessed data into cleaner datasets are included in pre-processing datasets. As previously indicated, the BERT-base models [9] have been employed for both experiments with 12 transformer blocks, 768 hidden layers, and 12 attention heads. Furthermore, any input to BERT [9] must correspond to the basic tokens that it already uses for task training. After the embedding is completed, the target variables are chosen for the clusters. The desired labels for a binary classification task would be 'Positive' and 'Negative.' The intended labels in a multi-classification task could be 'Positive,' 'Negative,' and 'Neutral. Finally, the affinities between the vectorized forms of words are calculated, and the cluster with the closest proximity is assigned. The accuracy of the predictions is calculated using several measures once they have been evaluated on a model.

This section illustrates the detailed experimental findings.

a) Supervised sentiment analysis on labeled datasets

After the basic cleaning steps of the datasets, they are first passed through the BERT model [9]. To be clear, the aforementioned datasets have been experimented upon three times: once using only the BERT model [9], once using the fine-tuned BERT model [9], and finally using the suggested model. Table 5 displays the basic parameters of the BERT base model [9].

Table 5

– Basic parameters of BERT-base [9]
Model	Name of the parameter	Value
BERT - base	Batches	8/16/32
	Epochs	1/2/3/4/5
	Hidden layer size	768
	No. of hidden layers	12
	Learning rates	1e-5/2e-5/3e-5
	Sequence length	512
	No. of parameters	110 million

The accuracy of all the labeled datasets on the BERT model [9] on the maximum batch, epoch, and learning rate as indicated in Table 5 is shown in Table 6. Both the IMDB dataset [32] and the clothing dataset [33] perform well for BERT only [9] due to its balanced nature.

Table 6

– BERT-base [9] implemented on various labeled datasets
Dataset	Batch size	Epoch	Learning rate	Accuracy
Dataset 1	32	5	3e-5	85.7%
Dataset 2	32	5	3e-5	86.0%
Dataset 3	32	5	3e-5	81.0%
Dataset 4	32	5	3e-5	78.0%

It is evident from above that IMDB performs well. Table 7 illustrates the increase in the accuracy post the fine-tuning of BERT [9] only by freezing the different layers on Dataset 2.

Table 7

– Fine-tuned BERT [9] implemented on IMDB dataset 2
Layers freezed	Epoch	Learning rate	Accuracy
Layer 1	4	3e-5	82.0%
Layer 1 + 2	4	3e-5	84.1%
Layer 1 + 2 + 3	4	3e-5	87.0%
Layer 1 + 2 + 3 + 4	4	3e-5	88.0%
Layer 1 + 2 + 3 + 4 + 5	4	3e-5	82.0%

This work considers 4 layers to be the optimal number of layers required for freezing during fine-tuning due to the loss of precision as is evident from freezing layer 5 in Table 7. Finally, Table 8 shows the comprehensive results of merging all the BERT [9] parameters on all the labeled datasets. The limitations of resources have forced to stop the training in 4 epochs only. It is intentional to limit and maintain a low learning rate since a learning rate that is too high can lead to the model converging quickly to an undesirable conclusion while a learning rate that is too low can lead to stagnation.

Table 8

– Proposed model implemented on the mentioned labeled datasets
Dataset	Batch Size	Learning rate	Number of Epoch	Accuracy
Dataset 1	8	3e − 5	2	71%
		3e − 5	3	72%
		3e − 5	4	70%
		4e − 5	2	81%
		4e − 5	3	81%
		4e − 5	4	86%
		5e − 5	2	86%
		5e − 5	3	86%
		5e − 5	4	87%
	16	3e − 5	2	82%
		3e − 5	3	83%
		3e − 5	4	81%
		4e − 5	2	84%
		4e − 5	3	85%
		4e − 5	4	84%
		5e − 5	2	85%
		5e − 5	3	85%
		5e − 5	4	86%
	32	3e − 5	2	70%
		3e − 5	3	71%
		3e − 5	4	72%
		4e − 5	2	73%
		4e − 5	3	74%
		4e − 5	4	74%
		5e − 5	2	81%
		5e − 5	3	82%
		5e − 5	4	85%
	64	3e − 5	2	88%
		3e − 5	3	89%
		3e − 5	4	89%
		4e − 5	2	90%
		4e − 5	3	90%
		4e − 5	4	90%
		5e − 5	2	90%
		5e − 5	3	91%
		5e − 5	4	89%
Dataset 2	8	3e − 5	2	75%
		3e − 5	3	78%
		3e − 5	4	74%
		4e − 5	2	82%
		4e − 5	3	83%
		4e − 5	4	84%
		5e − 5	2	87%
		5e − 5	3	87%
		5e − 5	4	82%
	16	3e − 5	2	75%
		3e − 5	3	76%
		3e − 5	4	77%
		4e − 5	2	77%
		4e − 5	3	77%
		4e − 5	4	79%
		5e − 5	2	80%
		5e − 5	3	81%
		5e − 5	4	83%
	32	3e − 5	2	84%
		3e − 5	3	88%
		3e − 5	4	90%
		4e − 5	2	82%
		4e − 5	3	85%
		4e − 5	4	86%
		5e − 5	2	87%
		5e − 5	3	89%
		5e − 5	4	89%
	64	3e − 5	2	81%
		3e − 5	3	82%
		3e − 5	4	85%
		4e − 5	2	88%
		4e − 5	3	89%
		4e − 5	4	87%
		5e − 5	2	89%
		5e − 5	3	86%
		5e − 5	4	88%
Dataset 3	8	3e − 5	2	65%
		3e − 5	3	68%
		3e − 5	4	70%
		4e − 5	2	72%
		4e − 5	3	74%
		4e − 5	4	65%
		5e − 5	2	73%
		5e − 5	3	74%
		5e − 5	4	71%
	16	3e − 5	2	75%
		3e − 5	3	76%
		3e − 5	4	71%
		4e − 5	2	64%
		4e − 5	3	68%
		4e − 5	4	72%
		5e − 5	2	75%
		5e − 5	3	76%
		5e − 5	4	78%
	32	3e − 5	2	82%
		3e − 5	3	83%
		3e − 5	4	84%
		4e − 5	2	87%
		4e − 5	3	85%
		4e − 5	4	83%
		5e − 5	2	79%
		5e − 5	3	79%
		5e − 5	4	79%
	64	3e − 5	2	75%
		3e − 5	3	75%
		3e − 5	4	76%
		4e − 5	2	76%
		4e − 5	3	77%
		4e − 5	4	75%
		5e − 5	2	85%
		5e − 5	3	86%
		5e − 5	4	86%
Dataset 4	8	3e − 5	2	82%
		3e − 5	3	81%
		3e − 5	4	83%
		4e − 5	2	83%
		4e − 5	3	84%
		4e − 5	4	85%
		5e − 5	2	82%
		5e − 5	3	84%
		5e − 5	4	78%
	16	3e − 5	2	63%
		3e − 5	3	64%
		3e − 5	4	61%
		4e − 5	2	68%
		4e − 5	3	70%
		4e − 5	4	71%
		5e − 5	2	75%
		5e − 5	3	76%
		5e − 5	4	79%
	32	3e − 5	2	78%
		3e − 5	3	81%
		3e − 5	4	84%
		4e − 5	2	82%
		4e − 5	3	83%
		4e − 5	4	87%
		5e − 5	2	87%
		5e − 5	3	87%
		5e − 5	4	87%
	64	3e − 5	2	84%
		3e − 5	3	85%
		3e − 5	4	85%
		4e − 5	2	86%
		4e − 5	3	87%
		4e − 5	4	88%
		5e − 5	2	88%
		5e − 5	3	88%
		5e − 5	4	89%

Various variations of outcomes have been depicted in the datasets. Datasets 1 and 2 perform well because of their symmetrical nature. However, Datasets 3 and 4 also score well enough for larger batches.

A sample representation for the fuzzy module on the datasets has been showcased in Fig. 7. The negative sentiment slope in the illustration is thin and shows a low value for the example dataset. It implies a great affinity for the positive component. According to the sample, the review is more favorable to the positive side, with little or no trend to the negative side.

b) Unsupervised sentiment analysis on unlabeled datasets

The main aim here is to devise a rapid, lightweight, and characteristic-ignorant method to identify sentiments from unlabeled data. The IMDB dataset [32] (Dataset 5) is used in the first scenario for unlabeled data analysis, but the label column is left out. It is a movie review dataset with 12500 great reviews and 12500 negative ones. The sentiment column of the dataset is turned off to provide the impression of an unlabeled dataset as shown in Fig. 8.

Dataset 6 is the unlabeled format of Dataset 2 while Dataset 3 is an unlabeled dataset with 418831 positive, 1569 negative, and 91 neutral reviews. The primary processes that have been conducted to clean data in the case of unlabeled data are the elimination of unrequired marks and symbols, deleting stop words, and ultimately lemmatizing them. The words are then embedded in a lower-dimensional space as closely packed word vectors. Finding word embedding aids in deciphering the semantic significance of words while converting them to a quantitative form that may be used in further mathematical studies. Although the word2vec model [45] was the first model to derive word embeddings, the BERT model [9] improves on that by not only discovering the stationary semantics of the desired word but also a context-specific meaning. BERT [9] is a bidirectional transformer-based high-level language model with encoders that surpasses state-of-the-art models while decelerating in time complexity. For transforming words to tokens, the BERT [9] library components-modeling and tokenization are imported. BERT's [9] basic parameters are all initialized at the same time. Because only the first few levels of BERT [9] are being addressed in this situation, the indices must be given explicitly so that it is apparent that the predictions are only to be taken from those n layers. Finally, an estimating module is developed and used to forecast annotations using the inputs provided. As a result, a dictionary comprising the word embedding of the dataset is created.

Subsequently, the BERT [9] Word Embedding is used to represent each text as an array, and then each text is summarized into a median vector. The next step is to create a classifier that categorizes reviews according to their resemblance to each target cluster. The cosine similarity metric is then employed in Natural Language Processing to quantify the text similarity amongst two vectors, irrespective of their size. It is the same as the innermost product of two equivalent vectors normalized to have the same length. In the end, each row would symbolize an article, and each intended cluster will have a single semantic similarity. The principle for cosine similarity amid two non-zero vectors is as shown in Eq. 1.

The cosine similarity of a document varies from 0 to 1. When the cosine similarity score is 1, it signifies that two vectors are oriented in the same way. The nearer the value is to 0, the less similar the two texts are. To calculate the probability, the cosine similarity is compared using a threshold and then sent via the softmax algorithm [46]. The threshold has been chosen as 0.9 which is near to 1 since it is known beforehand that the dataset is balanced. In this scenario, choosing a threshold is primarily reliant on intuition, and there is no set rule for determining a specific threshold value. Finally, the predicted scores' log loss function is calculated, and the predictions in terms of 0 and 1 are recorded in a file. The projected scores are then used to test a basic model, which yields an accuracy of 85%. The low accuracy can compensate for the lack of a model to train since the dataset has been tagged and analyzed without any conventional model. While the accuracy of using only non-tuned BERT [9] on Dataset 5 is 68% (Table 9), the accuracy rises once specialized layer extraction is carried out, as seen in Table 10.

Table 9

Accuracy of unlabeled Dataset 1 after applying BERT tokenizer [9] only
Tokenizer used	Dataset	Accuracy
BERT only	IMDB unlabeled	68.13%

Table 10

Accuracy of unlabeled Dataset 1 after considering layers of BERT tokenizer [9]
Tokenizer used	No. of layers extracted	Dataset	Accuracy
BERT	2	IMDB unlabeled	69.81%
	3		72.22%
	4		74.97%
	5		58.91%

The inclusion of the single layer results in a change in accuracy, as seen in Table 11. It can also be concluded that the initial n layers that need to be concentrated for the BERT [9] tokenizer are 4, according to Tables 10s and 12. Tables 12 and 13, show the categorization reports for Datasets 5 and 7, respectively.

Table 11

Accuracy of unlabeled Dataset 1 after implementing the proposed model
Tokenizer used	No. of layers extracted	Dataset	Accuracy
Fine-tuned BERT	2	IMDB unlabeled	73.12%
	3		77.01%
	4		84.97%

Table 12

Classification report of unlabeled Dataset 1 with two labels
Labels	Precision	Recall	F1-score
Positive	0.94	0.71	0.88
Negative	0.78	0.99	0.70
Average	0.86	0.85	0.79

Table 13

Classification report of unlabeled Dataset 3 with three labels
Labels	Precision	Recall	F1-score
Positive	0.73	0.70	0.71
Negative	0.81	0.72	0.70
Neutral	0.69	0.71	0.71
Average	0.74	0.71	0.71

A comparison chart of the overall metrics for the three datasets used as unlabeled data is shown in Fig. 10. The better performance for Dataset 5 can be attributed to its well-balanced composition. On the other hand, the performance of Dataset 7 has dropped, which could be due to the many class classification.

Investigating trends, patterns, and correlations using quantitative data is known as statistical analysis [47]. Scientists, governments, corporations, and other organizations use this type of analysis as a crucial research tool. It is essential because it gives investigators some assurance that the results are accurate, trustworthy, and not by chance. In this study, for the binary classified datasets (datasets divided into positive and negative) the paired t-test is employed to examine the similarity between two groups [48]. The paired t-test has been used with hypothesis testing [49] where two hypotheses have been considered. Hypothesis H₀ shows that there is no benefit of the proposed method on the prediction of the datasets (d = 0). And hypothesis H₁ states that there is benefit of the proposed method on the datasets (d < 0). Here, d is the difference in predicted values of the reviews. The output in all the cases lies above 0.5 which concludes that the null hypothesis H₁ cannot be discarded. Hence, the prediction has benefitted from the proposed method as the p-value is always less than the value of alpha (0.5).

For the multi-categorized datasets, one-way ANOVA test has been implemented [50]. ANOVA establishes whether the groups produced by the levels of the independent variable are statistically different or not by comparing the overall mean to the mean of the individual groups. The null hypothesis is disproved if any one of the group means deviates from the average mean in a substantial way. There has been a statistically significant difference between the positive and negative set of data for the balanced datasets. The distinction between positive-neutral and negative-neutral data is much less significant for imbalanced samples. For both labeled and unlabeled datasets, the enhancements are noteworthy.

This study has primarily attempted to reduce the significant investments required to fine-tune and maintain BERT models [9]. New, reliable, and time-efficient algorithms have been developed to predict attitudes from both labeled and unlabeled data. The BERT base model [9] has been fine-tuned by combining an interval Type 2 fuzzy module for extracting sentiments from labeled data. As the major aim was to reduce the computational overhead of the "large" BERT model, the top four layers have been frozen during development, and the deficit sustained as a consequence of this freezing has been augmented by the fuzzy module, which excels at dealing with unpredictability in the datasets. A no-training, proximity-based algorithm has been developed to overcome the issue of extracting sentiments without labels at a faster speed for unlabeled datasets. In order to determine the vicinity to the cluster, the minimum cosine distance between the texts and the target variables has been taken into account. Other techniques of shrinking the distance between vectored texts could be a significant future research focus.

The study also discovers that the class balance is a major element in obtaining good outcomes for unlabeled datasets. Dataset 5 is balanced, with an equal number of positive and negative evaluations, however Datasets 6 and 7 are not. The class imbalance factor might be blamed for the disparity in their metric values. Enhancing the accuracy of the models is obviously crucial for the successful investigation of the models, however, there are a few more aspects to consider in order to achieve the best performance in both the cases of labeled and unlabeled datasets. Aside from that, it is also discovered that in both circumstances, the number of layers to consider for BERT [9] extraction is four. The importance of the earliest layers in BERT [9] has the potential to expand this important field of study. But, the lack of a reliable mechanism for determining the importance of the BERT [9] layers and the number of levels that can be used to address a certain issue is a source of worry. A trial-and-error method has been used in this study to figure out how far layer freezing or extraction may be considered. A more generic approach to these types of problems could be provided by a correct formulation for the same. Neutrosophic fuzzy sets on BERT models [9] could be examined as a future extension of this study. Extracting fine-grained sentiments from databases could be an additional focus. Due to the lack of computational resources, it is difficult to explore more of BERT’s [9] huge models for sentiment analysis due to computational constraints, which could be further trodden upon. However, this work can be extended for further classification of datasets into more fine-grained labels and exploration of the models in the direction of emotion detection.

Funding

The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.

Competing Interests

The authors have no relevant financial or non-financial interests to disclose.

Author Contributions

All authors contributed to the study, conception and design. Material preparation, data collection, and analysis were performed by [Koyel Chakraborty], [Siddhartha Bhattacharyya], and [Rajib Bag]. The first draft of the manuscript was written by [Koyel Chakraborty] and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Data Availability

No dataset was generated during the study. The datasets used in the study are included in the manuscript.

Chakraborty K, Bhattacharyya S, Bag R (2020) A survey of sentiment analysis from social media data. IEEE Trans Comput Social Syst 7(2):450–464
Mejova Y (2009) Sentiment analysis: An overview. University of Iowa, Computer Science Department
Chowdhary K (2020) Natural language processing. Fundamentals of artificial intelligence, 603–649
Abirami AM, Gayathri V (2017), January A survey on sentiment analysis methods and approach. In 2016 Eighth International Conference on Advanced Computing (ICoAC) (pp. 72–76). IEEE
Gonçalves P, Araújo M, Benevenuto F, Cha M (2013), October Comparing and combining sentiment analysis methods. In Proceedings of the first ACM conference on Online social networks (pp. 27–38)
Ahmad M, Aftab S, Muhammad SS, Ahmad S (2017) Machine learning techniques for sentiment analysis: A review. Int J Multidiscip Sci Eng 8(3):27
Hutto C, Gilbert E (2014), May Vader: A parsimonious rule-based model for sentiment analysis of social media text. In Proceedings of the international AAAI conference on web and social media (Vol. 8, No. 1, pp. 216–225)
Goularas D, Kamis S (2019), August Evaluation of deep learning techniques in sentiment analysis from twitter data. In 2019 International Conference on Deep Learning and Machine Learning in Emerging Applications (Deep-ML) (pp. 12–17). IEEE
Devlin J, Lee MChangK, Toutanova K (2018) "Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805
Aguzzoli S, Bova S, and Brunella Gerla (2011) “Free algebras and functional representation for fuzzy logics”. In: Cintula P, Hájek P, Noguera C (eds) Handbook of Mathematical Fuzzy Logic. Mathematical Logic and Foundations, vol 2. College Publications, Volume 38), London, pp 713–791
You Y, Li J, Hseu J, Song X, Demmel J, Hsieh C Reducing BERT Pre-Training Time from 3 Days to 76 Minutes. arXiv 2019, arXiv:1904.00962
Chakraborty K, Bhatia S, Bhattacharyya S, Platos J, Bag R, Hassanien AE (2020) Sentiment Analysis of COVID-19 tweets by Deep Learning Classifiers—A study to show how popularity is affecting accuracy in social media. Appl Soft Comput 97:106754
Dang NC, Moreno-García MN, De la Prieta F (2020) Sentiment analysis based on deep learning: A comparative study. Electronics 9(3):483
DHARMA EM, GAOL FL, WARNARS HLHS, SOEWITO B (2022) The Accuracy Comparison Among Word2vec, Glove, And Fasttext Towards Convolution Neural Network (CNN) Text Classification.Journal of Theoretical and Applied Information Technology, 100(2)
Li W, Shao W, Ji S, Cambria E (2022) BiERU: Bidirectional emotional recurrent unit for conversational sentiment analysis. Neurocomputing 467:73–82
Singh C, Imam T, Wibowo S, Grandhi S (2022) A Deep Learning Approach for Sentiment Analysis of COVID-19 Reviews. Appl Sci 12(8):3709
Wu S, Liu Y, Zou Z, Weng TH (2022) S_I_LSTM: stock price prediction based on multiple data sources and sentiment analysis. Connection Sci 34(1):44–62
Valle-Cruz D, Fernandez-Cortez V, López-Chau A, Sandoval-Almazán R (2021) Does twitter affect stock market decisions? financial sentiment analysis during pandemics: A comparative study of the h1n1 and the covid-19 periods.Cognitive computation,1–16
Rodrigues AP, Fernandes R, Shetty A, Lakshmanna K, Shafi RM (2022) Real-Time Twitter Spam Detection and Sentiment Analysis using Machine Learning and Deep Learning Techniques. Computational Intelligence and Neuroscience, 2022
Zadeh A (1965)"Fuzzy Sets," Page 338. Information and Control,
Zhang Z, Guo J, Zhang H, Zhou L, Wang M (2022) Product selection based on sentiment analysis of online reviews: An intuitionistic fuzzy TODIM method.Complex & Intelligent Systems,1–14
Mohammed M, Yu L, Aldhubri A, Qaid GR (2022) Study on Sentiment Classification Strategies Based on the Fuzzy Logic with Crow Search Algorithm
T. J.ROSS, Fuzzy Logic With Engineering Application.
Serrano-Guerrero J, Bani‐Doumi M, Romero FP, Olivas JA (2022) A fuzzy aspect‐based approach for recommending hospitals. Int J Intell Syst 37(4):2885–2910
Zhang B, Liu Y (2022) Construction of English Translation Model Based on Neural Network Fuzzy Semantic Optimal Control. Computational Intelligence and Neuroscience, 2022
Chakraborty K, Bhattacharyya S, Bag R (2022) A Three-Step Fuzzy-Based BERT Model for Sentiment Analysis. Intelligence Enabled Research. Springer, Singapore, pp 41–52
Hameed IA (2011) "Using Gaussian membership functions for improving the reliability and robustness of students’ evaluation systems". Expert Syst Appl 38(6):7135–7142
Ren X, Tian Y, Li S (2015) Vine copula-based dependence description for multivariate multimode process monitoring. Ind Eng Chem Res 54(41):10001–10019
Calhoun VD, Sui J (2016) Multimodal fusion of brain imaging data: a key to finding the missing link (s) in complex mental illness. Biol psychiatry: Cogn Neurosci neuroimaging 1(3):230–244
Grasso M, Colosimo BM, Semeraro Q, Pacella M (2015) A comparison study of distribution-free multivariate SPC methods for multimode data. Qual Reliab Eng Int 31(1):75–96
https://www.kaggle.com/datasets/kazanova/sentiment140, retrieved on May ‘22
https://ai.stanford.edu/~amaas/data/sentiment/, retrieved on May ‘22
https://ai.stanford.edu/~amaas/data/sentiment/, retrieved on May ‘22
http://times.cs.uiuc.edu/~wang296/Data/, retrieved on May ‘22
https://raw.githubusercontent.com/amankharwal/Website-data/master/reviews%20data.csv, retrieved on May ‘22
Ray B, Garain A, Sarkar R (2021) An ensemble-based hotel recommender system using sentiment analysis and aspect categorization of hotel reviews. Appl Soft Comput 98:106935
Richardson L (2007) Beautiful soup documentation. Dosegljivo: https://www.crummy.com/software/BeautifulSoup/bs4/doc/.[Dostopano: 7. 7. 2018]
Saif H, Fernández M, He Y, Alani H (2014) On stopwords, filtering and data sparsity for sentiment analysis of twitter
Cui Y, Che W, Liu T, Qin B, Yang Z (2021) Pre-training with whole word masking for chinese bert. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, 3504–3514
Bachina S, Balumuri S, Kamath S (2021), August Ensemble albert and roberta for span prediction in question answering. In Proceedings of the 1st Workshop on Document-grounded Dialogue and Conversational Question Answering (DialDoc 2021) (pp. 63–68)
Zadeh LA (1975) The concept of a linguistic variable and its application to approximate reasoning—I. Inf Sci 8:199–249
Nguyen T, Khosravi A, Creighton D, Nahavandi S (2015) EEG signal classification for BCI applications by wavelets and interval type-2 fuzzy logic systems. Expert Syst Appl 42. 10.1016/j.eswa.2015.01.036
Ben, Yahia Nesrine & Bellamine Ben Saoud, Narjès & Ben Ghezala, Henda. (2012). Integrating fuzzy case-based reasoning and particle swarm optimization to support decision making.International Journal of Computer Science Issues.9
Yahia NB, Bellamine N, Ghezala HB (2012) Integrating fuzzy case-based reasoning and particle swarm optimization to support decision making. Int J Comput Sci Issues (IJCSI) 9(3):117
McCormick C (2016) Word2vec tutorial-the skip-gram model. Apr-2016.[Online]. Available: http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model
Liu B (2020) Text sentiment analysis based on CBOW model and deep learning in big data environment. J Ambient Intell Humaniz Comput 11(2):451–458
Park CW, Seo DR (2018), April Sentiment analysis of Twitter corpus related to artificial intelligence assistants. In 2018 5th International Conference on Industrial Engineering and Applications (ICIEA) (pp. 495–498). IEEE
Kim TK (2015) T test as a parametric statistic. Korean J anesthesiology 68(6):540–546
Cover TM (1969) Hypothesis testing with finite statistics. Ann Math Stat 40(3):828–835
Kim TK (2017) Understanding one-way ANOVA using conceptual figures. Korean J Anesthesiology 70(1):22–26

Sentiment analysis on labeled and unlabeled datasets using BERT architecture

Status:

Journal Publication

Version 1

Abstract

Figures

I. Introduction

Ii. Motivation

Iii. Background Study

Iv. Proposed Methodologies

V. Results And Discussion

Vi. Statistical Analysis Of The Results Obtained

Vii. Conclusion

Declarations

References

Status:

Journal Publication

Version 1