An integrated fuzzy neural network with topic-aware auto-encoding for sentiment analysis

Recent advanced deep learning architectures, such as neural seq2seq and transformer, have demonstrated remarkable improvements in multi-typed sentiment classification tasks. Even though recent transformer-based and seq2seq-based models have successfully enabled to capture rich contextual information of texts, they still lacked attention on incorporating global semantic information which enables to sufficiently leverage the performance of downstream SA tasks. Moreover, emotional expressions of users are normally in the form of natural human-written textual data which contains a lot of noises and ambiguities that impose great challenges on the processes of textual representation learning as well as sentiment polarity prediction. To meet these challenges, we propose a novel integrated fuzzy neural architecture with a topic-driven textual representation learning approach for handling the SA task, called as: TopFuzz4SA. Specifically, in the proposed TopFuzz4SA model, we first apply a topic-driven neural encoder–decoder architecture with the incorporation of latent topic embedding and attention mechanism to sufficiently learn both rich contextual and global semantic information of the given textual data. Then, the achieved rich semantic representations of texts are fed into a fused deep fuzzy neural network to effectively reduce the feature ambiguity and noise, forming the final textual representations for sentiment classification task. Extensive experiments in benchmark datasets demonstrate the effectiveness of our proposed TopFuzz4SA model compared with contemporary state-of-the-art baselines.


Introduction
Normally considering as an important task of natural language processing (NLP) domain, sentiment analysis (SA) (Zhang et al. 2021(Zhang et al. , 2018Do et al. 2019) aims to automatically analyse the underlying opinions/emotions towards specific entities (Abboud and Tekli 2019;Chen et al. 2019), e.g. news, products, services, etc. Recently, thanks to the tremendous development of Internet, there are a huge number of people who are frequently actively involving in multiple activities in multi-typed digital platforms, such as social networks and ecommerce platforms. As a result, a large amount of data is generated every day which contains emotional expressions, opinions, attitudes, etc. of people about entities that they have been interacted with. These emotional expressions can be considered as valuable resources for analysing and supporting companies/organizations can deeply gain the insights of their provided products or services. Thus, sentiment analysis aka opinion mining is considered as an important task of NLP due to its primitive applications in multiple areas. To identify the sentiment polarity of a user towards a specific product/service, multiple machine learning (ML)-based techniques have applied to formulate, model and characterize the underlying emotional aspects from the raw data, normally in the form of texts (e.g. comments, reviews, micro-blogs, etc.). In fact, textual data in the form of natural language is considered as a common source for expressing opinions, emotions or feelings of users upon their interacted entities in online platforms. From the past, there are multiples researches which formulate the SA task as the text classification problem in which handcrafted textual representation learning techniques (e.g. BOW and & Tham Vo thamvth@tdmu.edu.vn its family) and ML-based classification algorithms (SVM, logistic regression, etc.) have been applied to predict the sentiment polarity from texts. Specifically, a SA task is designed as the text analysis and binary classification model to categorize the given user's comments/reviews into the positive or negative emotional states. However, these traditional techniques encountered limitations related to ambiguity and sparsity of short textual data to effectively exploit the sentimental aspects. Recently, with the emergence of deep learning it has shown promising performances in multiple domains of computer science, including the NLP. The utilization of neural network architecture in text analysis has dramatically alleviate the efforts of handcrafted feature engineering for the process of text representation learning. Advanced deep neural architectures, such as recurrent neural network (GRU, LSTM, Bi-LSTM, etc.) Cheng et al. 2017;Rao et al. 2018) and convolutional neural network (CNN) (Huang et al. 2017), have been utilized to deeply learn and characterize the sentimental aspects from the given texts to handle SA task. Complex deep neural network-based approaches Cheng et al. 2017;Rao et al. 2018;Huang et al. 2017) have demonstrated state-of-theart performances in multiple downstream sub-tasks of SA (aspect-level, sentence-level, document-level, multiple domains, etc.). However, previous deep learning-based model still has drawbacks related to the capability of capturing rich semantic and contextual information of texts in order to better fine-tune for multiple downstream SA sub-tasks. Moreover, the application of deep neural architectures in the processes of sequential textual representation learning and SA-oriented neural training objective might severely suffer extracted feature ambiguity and noise which can dramatically reduce the accuracy performance of sentimental polarity prediction. Recently, with the appearance of pre-trained language models aka transformer (e.g. ELMo (Zhang et al. 2021), GPT-2 (Zhang et al. 2018), BERT (Do et al. 2019), etc.), multiple NLP tasks, including sentiment classification task, have been significantly improved in both efficiency and accuracy performance aspects. These pre-trained language architectures can effectively support to capture the rich syntactic and contextual information in the circumstance of large-scale text corpora of specific language which have been carefully and well trained, previously. For SA task, the utilization of pretrained language model such as the well-known BERT can sufficiently characterize sentimental features from the input texts and achieve the state-of-the-art accuracy performance on various downstream sub-tasks of SA, such as recent works in BERT4ABSA (Abboud and Tekli 2019), ABSA-BERT-pair (Chen et al. 2019), SentiLARE , etc. These pre-trained BERT-based models have demonstrated significant improvements on the accuracy performance of sentiment polarity identification within the situation of context/aspect-varied textual representation learning. However, recent pre-trained BERT-based SA model still suffered limitations, mostly in the capability of achieving the global semantic information of texts to better fine-tune for the SA task as well as the elimination of feature noise and ambiguity from the achieved text representations.

Motivations and our contributions
In this paper, we proposed an integrated fused fuzzy deep neural network with a topic-driven transformer-based encoder-decoder architecture to handle multiple downstream tasks of SA, called TopFuzz4SA. Figure 1 illustrates the overall architecture of our proposed TopFuzz4SA model. First of all, to effectively learn the latent representations of distributed latent topics in a given text corpus, we applied the neural topic modelling architecture as a variational auto-encoding mechanism which is mainly inherited from previous works (Miao et al. 2017;Srivastava and Sutton 2017) to capture all global semantic information of texts. Then the learnt topic representations are used to facilitate the self-attention mechanism of the given neural encoder-decoder architecture. In our TopFuzz4SA model, for dealing with challenges related to the context-varied and sentimental aspect-diversified understanding from a given text corpus, a deep neural transformer-based encoder-decoder architecture is applied to comprehensively capture complicated syntactical and sequential features from the input texts. Moreover, the utilization of an integrated topic-aware attention mechanism with the transformer-based network also can help to sufficiently capture both global salient latent topic and rich contextual relationships between words in order to enrich the quality of textual representations. Then, the achieved rich semantic representations of input texts are fed into a fused fuzzy deep neural network (FDNN) (Cheng et al. 2017) to produce the final highquality representations of sentences/documents for dealing with the sentiment classification problem. Motivated by the use of fuzzy learning concept in the data uncertainty and noise reduction of previous works (Deng et al. 2016;Nguyen et al. 2018), we applied a combined fuzzy and deep neural architecture with a fusion mechanism to learn and merge the achieved deep textual representations in previous steps and pass them through a full-connected layer with the softmax classification function to proceed sentiment polarity prediction.
In general, mainly inspired from previous deep neural approaches for sentiment analysis problem, our contributions in this paper can be summarized as threefold, which are: First of all, we apply a neural topic modelling architecture with an auto-encoding mechanism (Rao et al. 2018) to efficiently learn the representation of the latent distributed topics over input texts. The learnt latent topic embedding vectors are used to facilitate the attention mechanism of our designed transformer-based encoderdecoder architecture to effectively extract topic-oriented global and sequential semantics of texts.
Secondly, the previous achieved topic-oriented rich sematic representations of input texts are then fed into a FDNN-based architecture to significantly alleviate feature noise and ambiguity before feeding to a full-connected layer to handle SA task. The proposed FDNN architecture in our paper is designed as two separated components, which are the fuzzy learning-based and deep learning neural components. These two components are utilized to simultaneously capture latent features of input embedding vectors from both fuzzy and deep learning aspects. Then, the learnt latent features of these two components are fused together by using a fusion mechanism to produce the final representations of input texts.
Finally, we conducted extensive experiments in benchmark SA-based datasets to demonstrate the effectiveness of our proposed ideas in this paper for multiple downstream sentiment classification tasks. The experimental outputs show the outperformances of our proposed TopFuzz4SA compared with recent state-of-the-art baselines in SA task.

Differences between our proposed
TopFuzz4SA and recent techniques In general, most of recent state-of-the-art transformerbased methods for SA task, like in BERT4ABSA (Abboud and Tekli 2019), ABSA-BERT-pair (Chen et al. 2019), SentiLARE , etc. are majorly designed to capture the sentence-levelled sequential representations of texts to effectively predict the sentimental polarity. Thus, they might fail to preserve the global semantic information like topic within the representation learning process to better fine-tune for multiple downstream SA tasks. Moreover, recent mixture multitask driven text embedding techniques also suffered limitations regarding with the An integrated fuzzy neural network with topic-aware auto-encoding for sentiment analysis 679 ambiguity and noise in learnt textual features in which might lead to downgrades in the accuracy performance for different task-driven training objectives, including sentiment analysis. Thus, to cope with these challenges, in this paper we proposed a novel global semantic enhanced deep fuzzy neural auto-encoding mechanism, called as: TopFuzz4SA. Different from previous transformer-based SA techniques, in our proposed model, we combine the neural topic modelling with sequential textual embedding under a deep neural AE to effectively learn the joint global semantic and sequential representations of texts. Then, these rich semantic textual embeddings are fed into a fused fuzzy neural network architecture to reduce the latent feature noises and ambiguities which are latter utilized for better fine-tuning in various downstream tasks of SA domain.
In overall, the left contents of our paper are organized into 4 sections. In the second section, we briefly review about recent works in SA task and discuss about pros/cons of these works. Next, we formally present the methodology and implementation of our proposed TopFuzz4SA model in the third section. In the fourth section, we conduct extensive experiments in benchmark datasets to demonstrate the effectiveness of our proposed model compared with recent state-of-the-art baselines. In the last section, we conclude our works and highlight some potential improvements for the future works.

Related works
In this section, we briefly discuss about recent achievements, challenges of sentiment analysis task as well as state-of-the-art models which are related to our works in this paper.

Recent achievements and existing challenges in sentiment analysis
To deal with recent challenges of rich semantic representation learning, several advanced neural sequence-to-sequence (seq2seq)/auto-encoding-based architectures (Sutskever et al. 2014;Failed 2015;Vaswani, et al. 2017) with attention mechanism have been proposed to effectively capture the sequential representation of texts. Among significant achievements in NLP, attention is considered as the most important framework to facilitate multiple RNNbased textual representation learning models. For sentiment classification task, multiple integrated attention LSTM/Bi-LSTM approaches (Huang et al. 2017), such as well-known works of Bi-LSTM ? CRF (Peters et al. 2018), Sentic-LSTM (Failed 2018) and CDSC (Devlin et al. 2019), have been utilized to address the challenge of multiple emotional aspect learning objective for SA task. In fact, the neural attention framework has become an effective mechanism for most of proposed SA models recently. However, contemporary integrated attention with RNN-based architectures are still insufficient to capture the diversity in categories/attributes of texts (aka sentimental aspects) which might implicitly express different user's emotional expressions towards a specific entity. Moreover, the application of attention-based mechanism in SA task to for the system to fully concentrate on sentimental aspects is also considered ineffective and time-consuming.
Recently, with the appearance of pre-trained language models aka transformer (e.g. ELMo (Zhang et al. 2021), GPT-2 (Zhang et al. 2018), BERT (Do et al. 2019), etc.), multiple NLP tasks, including sentiment classification task, have been significantly improved in both efficiency and accuracy performance aspects. These pre-trained language architectures can effectively support to capture the rich syntactic and contextual information in the circumstance of large-scale text corpora of specific language which have been carefully and well trained, previously. For SA task, the utilization of pre-trained language model such as the well-known BERT can sufficiently characterize sentimental features from the input texts and achieve the state-ofthe-art accuracy performance on various downstream subtasks of SA, such as recent works in BERT4ABSA (Abboud and Tekli 2019), ABSA-BERT-pair (Chen et al. 2019), SentiLARE , etc. These pretrained BERT-based model have demonstrated significant improvements on the accuracy performance of sentiment polarity identification within the situation of context/ aspect-varied textual representation learning. However, recent pre-trained BERT-based SA model still suffered limitations, mostly in the capability of achieving the global semantic information of texts to better fine-tune for the SA task as well as the elimination of feature noise and ambiguity from the achieved text representations. In fact, most of recent pre-trained BERT-based opinion mining models such as BERT4ABSA (Abboud and Tekli 2019) and Sen-tiLARE ) mostly focused on enriching the captured latent features of semantic and syntactic relationships between words in a given text corpus to correctly identify the emotional aspects to predict the sentiment polarity rather than focus on the global semantic information such as topics/latent semantic structures of texts. Moreover, due to the ambiguity of emotional aspects in the form of natural language, the learnt sentimental aspectaware representations of texts, which are achieved by using pre-trained language models, normally contain a lot of noises and uncertainties in the extracted latent features. Thus, it might lead to the significant downgrades in the accuracy performance of sentiment polarity prediction task in the after all. Recently, there are proposals (Cheng et al. 2017) of using the fuzzy neural learning concept for eliminating the feature noise and ambiguity from the learnt data representation, such as the recent proposal of using fuzzy neural CNN model (FCNN) (Xu et al. 2019) to learn the high-quality representations of texts to leverage the performance of sentiment analysis task. However, the previously proposed FCNN-based SA model (Xu et al. 2019) still lacked of thorough evaluations on the global and sequential semantic information of texts which is considered as unable to deal with aspect-based sentiment classification problem. Moreover, due to the ambiguity of emotional aspects in the form of natural language, the learnt sentimental aspect-aware representations of texts, which are achieved by using pre-trained language models, normally contain a lot of noises and uncertainties in the extracted latent features. Thus, it might lead to the significant downgrades in the accuracy performance of sentiment polarity prediction task in the after all. Recently, there are proposals (Cheng et al. 2017) (Sun et al. 2019) of using the fuzzy neural learning concept for eliminating the feature noise and ambiguity from the learnt data representation, such as the recent proposal of using fuzzy neural CNN model (FCNN) (Xu et al. 2019) to learn the high-quality representations of texts to leverage the performance of sentiment analysis task. However, the previously proposed FCNN-based SA model (Xu et al. 2019) still lacked of thorough evaluations on the global and sequential semantic information of texts which is considered as unable to deal with aspect-based sentiment classification problem.

State-of-the-art methods for sentiment analysis
In this subsection, we briefly presented reviews on recent efforts which are related to sentiment analysis domain. For notable studies which are reviewed in this section, we classified them into three main categories, including: traditional deep learning-based, graph-lexical-based and transformer-based methods.

Deep learning-based SA approach
Recently, most of proposed SA models have adopted advanced deep learning architecture such as RNN and CNN to effectively handle the rich semantic representation learning and sentiment polarity prediction problems from texts. There are RNN-based SA models which utilize the LSTM/Bi-LSTM to dynamically characterize and extract sentimental features from the input documents by leveraging the capability of sequentially encoding and transforming words into fixed dimensional embedding vectors without the interactions of handcrafted feature engineering techniques, Such as the proposed IAN model (Ke et al. 2020) of Ma, D. et al. which applied an interactive attention-based dual LSTM architecture to efficiently model input sentences and existing sentimental aspects for handling aspect-based sentiment analysis task. Similar to that, Rao, G. el al. with SR-LSTM (Miao et al. 2017) applied a two-layered LSTM-based network to capture the semantic relationships between sentences to deal with the lengthvaried document-level sentiment analysis problem. With the integration of attention-based mechanism, RNN-based models which are for aspect-based SA task can effectively learn and identify sentimental categories/attributes from input sentences/documents, such as recent proposed models: Bi

Graph-structured/lexical-based SA approach
Considering sentiment analysis problem as the graphbased/lexical analysis task, several notable works have demonstrated significances of applying pre-knowledge for leveraging the performance of sentiment classification problem. Recent works of Dragoni, M. et al. in OntoSen-ticNet (2015), Cambria, E. et al. in SenticNet-5 (2017) and Fares, M. et al. in LISA model (2021) have presented potential application of utilizing lexical knowledge to enrich the contextual information in textual data representation learning process which then explicitly facilitate the sentiment classification problem. In the OntoSenticNet model, Dragoni, M., et al. proposed the utilization of lexical ontology which contains over 100 K concepts that support to property identify the conceptual hierarchy as well as properties associated with correct sentiment values.
Similar to that, in the SenticNet-5, Cambria, E. et al. pro-posed an enhancement of integrated word embedding with the external knowledge of Sentic ontology to improve the performance of sentence-level sentimental polarity identification. Recently, Fares, M. et al. (2021) have proposed an integration of lexical sentiment analysis with pre-knowledge-based approach, called as LISA. The LISA model enables to learn the rich lexical-based and contextual information from words to improve the sentiment categorization outputs. For the graph-structured sentiment analysis approach, Eliacik, A. B. et al. (2019) proposed a novel community-driven sentiment polarity classification over the social microblogging services.

Transformer-based SA approach
In recent years, with the emergence of pre-trained transformer-based language models, such as GPT-2 (Zhang et al. 2018)  In fact, most of recent pre-trained language model for SA task have suffered major drawbacks related to the capability of capturing the global semantic information such as topics of input texts in order to better fine-tune for handling context-varied and topic-diversified sentiment classification task. Moreover, recent deep RNN/transformer-based models for SA still suffered problems related to the feature noises and ambiguities in the learnt representations of input texts which might lead to the downgrades in the overall accuracy performance of the sentiment polarity prediction process.

Neural topic modelling (NTM) approach
For many years, topic modelling is considered as a popular method for efficiently learning the global semantic representations of texts. However, traditional topic modelling methods like LDA majorly relied on the mathematical inference processes to achieve the latent topics distributions over a given text corpus. This traditional approach encountered several challenges regarding with high computational effort and low-quality/sparse textual representations. In recent years, the emergence of advanced deep neural architectures like variational auto-encoding mechanism (VAE) has effectively support to tackle problems of classical topic modelling approach.  (2020) proposed a novel topic modelling approach which combines with SVM to identify the levels of customer satisfaction via their reviews.
Majorly inspired from the achievements of transformerbased and neural topic modelling approaches, our works in this paper mainly focus on finding better SA task-driven textual embedding mechanism to fulfil aforementioned challenges. Different from recent transformer-based/AEbased text embedding approaches, our works in this paper majorly focus on the utilization of fuzzy neural learning concept to reduce the uncertainty and noise from learnt textual representations which are obtained by using a topicaware neural encoder-decoder architecture. Our designed deep neural auto-encoding mechanism can sufficiently capture both rich global and contextual semantics of texts by integrating with the topic-aware attention mechanism which are facilitated by previous neural topic modelling approach (Ma et al. 2018) (Rao et al. 2018).

Methodology and implementation
In this section, we formally present the methodology and detailed descriptions on the implementation of our proposed TopFuzz4SA model. In the first section, we introduce the use of neural topic modelling in learning the representations of distributed latent topics up the given text corpus. Then, the achieved topic representations are used to facilitate the topic-driven attention mechanism of a neural auto-encoding mechanism to fully capture the global and contextual semantic information of the input documents. Then, these rich semantic textual representations are fed into a fused fuzzy neural network (FDNN) architecture to reduce the feature noise and ambiguity in order to improve the performance of multitasked sentiment classification which are taken in charge by a full-connected neural layer with the softmax classification function. Table 1 shows notations which commonly are used our paper.
3.1 Topic-oriented neural auto-encoding for sentimental analysis

Neural topic model encoding
To efficiently extract and learn the representations of distributed latent topics upon a given text corpus, we apply a neural variational auto-encoding (VAE) mechanism which is majorly inherited from previous works (Miao et al. 2017;Srivastava and Sutton 2017) with the involvement of analysed (z) latent topics from the input documents. Within the topic modelling paradigms, each document (d) with a set of words, as: d ¼ fw 1 ; w 2 ; . . .; w jdj g, with: W; w 2 W is the vocabulary set of all unique words in a given text corpus, has a corresponding distributed topic proportion, denoted as: z d 2 R 1ÂK with (K) is the pre-defined number of latent topics. On the other hand, for each latent topic, we have the t n as the topic assignment for an observed words, denoted as: w n . In our applied neural topic modelling approach, we used two separated neural network architectures which are: generative network (is played as the encoder) and inference network (is played as the decoder).
For the generative network, it is used to encode the input texts into the latent topic representations. Then, these latent topic representations from the generative encoder are reconstructed to the original input texts at the inference/ decoder network. The ultimate purpose of using the VAEbased architecture in this case is used to learn and parameterize the multinomial probabilistic distributions of latent topics in which the pre-defined distributions is not required to efficiently guide the topic generative process. In general, the generative process for each input document (d) can be formulated as follows (as shown in Eq. 1): Following previous works (Rao et al. 2018;Ma et al. 2018Ma et al. , 2017, we apply a diagonal Gaussian distribution to parameterize the topic distribution of each document and achieve the variable distribution with the unbiased gradient estimator. In Eq. 1, the l 0 and r 2 0 present for the mean and variance of the Gaussian distribution, W x and b x are the trainable weighting and bias parameter matrices. In overall, the loss function of a given VAE-based neural topic modelling architecture is formulated as: The latent topic distribution proportion of a specific document (d) t n The topic assignment for a specific (n th ) observed word, as: w n a t;i An attention score of a specific (i th ) word upon a (t th ) latent topic a i A general average topic-driven attention score for a specific (i th ) word l 0 and r 2 0 The mean and variance of the Gaussian distribution, respectively b The general topic-word distributions H The output hidden state as a weighting matrix a The attention weighting scores FFN(.) A feed-forward full-connected neural mechanism Softmax(.) The softmax function rð:Þ The sigmoid function An integrated fuzzy neural network with topic-aware auto-encoding for sentiment analysis p wjz d ð Þ and q z d ð Þ are the corresponding probabilistic distributions for the generative/inference networks, and standard normal prior: ðNð0; IÞ, respectively. After this process, we obtain the latent topic-word distributions, as: b 2 R KÂjWj .

Topic-oriented transformer-based encoder-decoder mechanism
From the latent topic embedding matrix, as: b which is achieved in previous steps, we use these distributed latent topic embedding vectors to calculate the attention weights, denoted as: (a) of the given topic-oriented attention mechanism as follows (as shown in Eq. 2): In Eq. 2, the topic-word distributions which are achieved previously by using the NTN-based architecture are first feed to the separated linear and softmax layers to normalize the distributions of latent topics over words with a text corpus. Then, these embedding outputs are used to produce the transformation component matrix, denoted as: (P). In general, (P) is the transformation component of the given latent topic embedding matrix (b), as: P 2 R KÂh , with (h) being the dimensionality of the RNN-based hidden state vector in a given neural encoder-decoder architecture. Then, for each (ith) word, as: (w i ), in the input document (d), the average attention weight over K latent topics for a specific (ith) word is calculated as follows: a i ¼ 1 K P K t¼1 a t;i , with a t;i is the identified attention score for a (ith) word, upon a specific (tth) latent topic.
Finally, the average attention weight for each (ith) word (w i ) is normalized with a softmax function, denoted as: , the final achieved topicoriented attention is represented as: a ¼ f b a 1 ; b a 2 ; . . .; d a jdj g. Then, the final topic-oriented attention scores are used to facilitate the neural encoder-decoder framework for handling sentiment classification task. In the encoder part, a transformer-based architecture is applied to learn and transform the input word embedding vector of document (d), as: fe w 1 ; e w 2 ; . . .; e w jdj g into the latent contextual representations through (k) multi-layered transformer-based layers. Specifically, for each l th layer, the previous input is passed to through and generate a corresponding hidden states, denoted as: H enc;½l ¼ fH enc; l ½ 1 ; H enc; l ½ 2 ; . . .; H enc;½l jdj g. Then, the last hidden state matrix of the given encoder, denoted as: H enc;½k is combined with the calculated topic-oriented attention scores (a), to form the final contextual representation, denoted as: (s). Finally, the encoded contextual representation vector: (s) is fed into the decoder part to reconstruct the original input as the embedding vector: (r) with a full-connected layer, as: FFNð:Þ at the end. The general process of these steps can be formulated as follows (as shown in Eq. 3): In Eq. 3, the f att ð:Þ is the calculation of the given topicoriented attention-based mechanism which is implemented as the self-attention mechanism with output word embedding (b y ½iÀ1 ) and input contextualized representation of the encoder (s).

Integrated FDNN for feature noise and ambiguity reduction
Finally, from the achieved embedding output (r) of the decoder, we feed it to a fused fuzzy deep neural architecture to alleviate the feature noise and ambiguity before applying the sentiment classification with a full-connected layer at the end. Our proposed FDNN is designed with two components, the first component is in charge of fuzzification/de-fuzzification input embedding vectors of the decoder part. This fuzzy neural learning component contain the separated multi-layered membership and fuzzy rule layers to handle the fuzzy logic representation learning. Then, the fuzzy-based learning outputs are fused with the deep learning-based transformed embedding vectors to form the final representation of (r). In the fuzzy learning-based component, each l th membership layer takes each ith dimension of the vector (r), denoted as: r i as the input variable and passes it through a fuzzy neuron, denoted as: u i ð:Þ with activation function is the Gaussian membership function to calculate the fuzzy degree, as: (o i ) of each ith input variable, as: u i r i ð Þ : R ! ½0; 1. Each l th fuzzy membership layer of the given FDNN is generally formulated as follows (as shown in Eq. 4): For each fuzzy neuron, the activation function is defined as the Gaussian membership function with l and r 2 are the mean and variance, respectively. Then, these output fuzzy degrees are fed into the fuzzy rule layer and perform the ''AND'' logic operation, as: o ; 8j 2 X i with X i being the set of all input output nodes in the l À 1 ð Þ th layer which are connected to the input variable (i). In the deep neural learning component which is also designed as a multi-layered neural architecture in which each input vector (r) is passed through different full-connected layer with the sigmoid activation function, in order to learn and exploit the high-level textual representations of previous neural encoder-decoder architecture. In general, each l th neural layer of the given deep neural learning component is formulated as: o deep;½l ¼ r W deep;½l :r þ b deep;½l À Á , with W deep and b deep are the trainable weighting and bias parameter matrices of each fullconnected neural layer. Finally, to fuse different representations of both fuzzy and deep learning-based mechanism in two components, we applied a neural networkbased fusion mechanism to effectively merge type-varied representations of (r) into a unified embedding, denoted as: (x) which is later used for handling sentiment classification task. The neural fusion mechanism is defined as follows (as shown in Eq. 5): This neural fusion mechanism has a set of trainable parameters, as: H fuse ¼ fW fuse;fuzz ; W fuse;deep ; b fuse g which are simultaneously optimized with the previous encoderdecoder architecture. Finally, the feature noise-reduced representations of each input document (d), as: x d is feed into a full-connected layer with the softmax activation function to handle the sentiment polarity prediction task with the training objective being formulated as the crossentropy loss, as follows (as shown in Eq. 6): In this equation, the T , y d and c y d are the training set, vector-based encoded data of ground-truth and predicted sentiment polarity of the given input document (d), respectively. To train our proposed architecture, we apply the stochastic gradient descent (SGD) to optimize all model's parameters upon the training objective of L TopFuzz4SE , as: H TopFuzz4SE ¼ argmin H¼fH fuse ;H encÀdec g L TopFuzz4SE ; g À Á with g being the predefined learning rate. In general, our proposed TopFuz-z4SE is motivated by the success of previous achievements of neural topic modelling, transformer-based encoder-decoder architecture in rich contextual text representation learning and utilization of fuzzy learning concept in feature noise and ambiguity reduction in order to leverage the performance of sentiment classification task.

Time complexity analysis
As an approach of topic modelling-based neural auto-encoding for the sentiment analysis problem, in our approach, the time and space complexity for estimating the latent topic distributions via the topic modelling-based approach is considered as a NP-hard, approximately: Oð W j jK a W j j W j j þ a ð Þ 3 Þ in which W j j, K and a are the size of corpus's vocabulary, number of latent topics and number of actual topics which are existed in the current corpus. Our main component of attention-based auto-encoding mechanism with the (k) layers is about: Oðkn 2 d þ nd 2 Þ where Oðnd 2 Þ is the time complexity for the attentionbased mechanism of (n) data entries. So in general, out proposed TopFuzz4SA model has an approximately time/ space complexity: O Oð W j jK a W j j W j j þ a ð Þ 3 Þþ ðkn 2 d þ nd 2 ÞÞ . Therefore, compared with our main auto-encoding/transformer-based competitors in this paper like BER-T4ABSA (Abboud and Tekli 2019), ABSA-BERT-pair (Chen et al. 2019) and SentiLARE ), our proposed model needs more time and computational efforts for extracting the latent topic distributions over the given text corpus to facilitate the textual embedding process for sentiment classification.

Experiments and discussions
In

Dataset descriptions
For our experiments in this paper, we used different benchmark datasets which are commonly applied to evaluate the SA model in previous works, which are: • SST (Stanford Sentiment Treebank) [1] : is considered as a common dataset for evaluating the performance of SA model. This dataset contains 11 K labelled sentences/documents as movie reviews which are collected from the https://www.rottentomatoes.com/. The SST dataset for SA task includes 8,544 for training, 1,101 for testing and 2,210 for validation with five sentiment-levelled labels for classification. • AR (Amazon Reviews) [2] : is a large-scale dataset for SA task which contains [ 34 M user's reviews and ratings on specific products (2.4 M) in different categories which are collected from the Amazon e-commerce platform. For experiments in this dataset, we randomly selected 500 K, 50 K and 50 K reviews for training, testing and validation sets, respectively. • MR (Movie Reviews) [3] : is a traditional dataset for SA task with 10 K short/single-sentence-based reviews of users on specific movies. These reviews are labelled as positive (5 K) and negative (5 K) classes. The MR dataset is considered as less challenging than other datasets in which the SA task is considered as a classical binary classification task. For this dataset, we divide it into three parts: training (8,534,) testing (1078) and validation (1,050). • Yelp (Yelp-5) [4] : is a well-known dataset for multiple disciplines including SA task. The Yelp-5 dataset belongs to the Yelp Challenge Dataset collection which contains [ 1.6 M reviews of 366 K users upon 61 K local businesses/companies. The Yelp-5 reviews are categorized into 5 classes (1-5). Similar to previous works , we randomly divided this dataset into 3 parts: training (594,000), testing (56,000) and validation (50,000). • IMDb [5] : is similar to the MR dataset, this dataset contains 50 K user's reviews upon specific movies which are categorized as positive and negative attitudes. For this dataset, we also applied the same split of Ke, P. et al. in (2017), as randomly selected 22,500 for training, 25 K for validation and 2.5 K for testing. • SemEval-2014 (Liu et al. 2017 )[6] : is a common dataset for aspect-level sentiment analysis task. For the sentimental aspect-based analysis problem with this dataset, we selected the ''laptop'' (contains 3,045 for training and 800 for testing) and ''restaurant'' (contains 3,041 for training and 800 for testing) categories for evaluating performance or implemented SA models. • Dataset pre-processing steps and configurations For basic textual pre-processing steps, such as stop-word removal, word tokenization and stemming, we mainly used the Stanford CoreNLP library [7]   to handle textual data in each dataset. For setup of general BERT (Do et al. 2019) which are used for SA task, we reused the general pre-trained BERT model (large/uncased version) which is released by Google at this repository [8] . For the setup of SentiLARE , we used the SentiWordNet 3.0 (Yoon and Kim 2017 )[9] which is similar to the original implementation of Ke, P. et al. (Chen et al. 2017).

Experimental setups and evaluation method usage
For the setup of TopFuzz4SA model, we implemented it by using Python programming language with the support of Tensorflow machine learning library. Our TopFuzz4SA model and other SA comparative baselines are deployed to a single server with the Intel Xeon SKL-SP 4210 CPU and 64 Gb in memory. For the detailed configurations of our proposed TopFuzz4SA model, we configured the number latent topic for the neural topic modelling architecture (described in Sect. 3.1.1) is 10 or K ¼ 10. The general dimensional size of word embedding vector for our transformer-based encoder-decoder architecture (described in Sect. 1.1.1), d w ¼ 300 and dimensionality of hidden state vector (or number of used RNN-based cells) of each transformer-based layer in the given topic-oriented autoencoding architecture, as: h RNN ¼ 256. For the number of layers which are used in both transformer-based encoder and decoder parts, we configured them as k encÀdec ¼ 7. For the number of fuzzy-based and deep learning-based layers in the FDNN architecture (described in Sect. 1.2), we set them as: k FDNN ¼ 5. Table 2 lists other configurations of our TopFuzz4SA model which are implemented for experiments in this paper. Evaluation metrics and methods Similar to previous works (Abboud and Tekli 2019;Chen et al. 2019Chen et al. , 2017, to evaluate the accuracy performance of different models for SA tasks as a primitive classification task, we mainly applied the Accuracy and F-1 evaluation metrics. For the sentence/document-level SA task, we mainly used the SST, AR, MR, Yelp and IMDb datasets to evaluate the performances of our proposed TopFuzz4SA model and other baselines. For SST, AR, Yelp and IMDb are considered as a multi-classed sentiment classification task (different rating scores from 1 to 5), whereas the MR is considered as a binary classification with only two sentimental classes (positive/negative). For the aspect-level SA task, we mainly utilized the SemEval-2014 dataset, experimental setups and configurations for this dataset are similar to previous works (Chen et al. 2019.

Comparative baselines
To compare the accuracy performance of our proposed TopFuzz4SA model with other baselines for both sentencelevel and aspect-level SA tasks, we implemented different well-known SA models, which are: • LDA4SA (Manshu and Bing 2019): is a recent proposal of Bashri, M. F. et al. which utilizes the traditional LDA topic modelling technique to characterize and learn the sentiment polarities from texts. The LDA4SA is mainly designed to distinguish the positive/negative words from latent sentiment-driven topic-word distributions. • SR-LSTM (Miao et al. 2017): is a dual LSTM-based architecture which supports to effectively learn the sequential representations of texts for dealing with document-level sentiment classification problem. In this SR-LSTM model, Rao, G. el al. (Miao et al. 2017) proposed a two-layered LSTM neural network to jointly learn the semantic sequential representations and latent relationship features between sentences of the input documents. As the sentence/document-level SA model, SR-LSTM is unable to deal with the aspect-based SA task. • Sentic-LSTM (Radford et al. 2018): is a recent wellknown RNN-based approach for handling aspect-level SA task. In this model, Ma, Y. et al. proposed  With different implementations, the ABSA-BERT-pair can be adopted for both sentence/document-level and aspect-level SA tasks. • SentiLARE : is considered as the recent BERT-based approach for SA task which is also our main competitor in this paper. Recently proposed by Ke, P. et al. (2017), the SentiLARE model utilizes the external resource such as SentiWordNet as the extra linguistic knowledge for assisting the rich contextual representation learning mechanism of pre-trained BERT model for handling multiple downstream subtasks in SA. Different from previous BERT-based SA models, like BERT4ABSA and ABSA-BERT-pair, the SentiLARE model is considered as more powerful to properly model and capture sentimental aspects of input documents due to the capability of integrating with external linguistic knowledge.
For the configurations of the above-listed comparative baselines, we kept the same configurations as described in their original works in which these models achieved the highest performances in different downstream SA tasks. For common configurations which are similar to our proposed TopFuzz4SA model, we set them as the same values as listed in Table 2.

Experimental results and discussions
In this section, we present experimental outputs of different SA models for both sentence/document-level and aspectlevel SA tasks in benchmark datasets.

Experiments on sentence/document-based SA task
For sentence/document-level sentiment classification task, all models are implemented to learn from the training set and predict the sentiment polarity of documents in testing set as the classical text classification problem. Table 3 shows the experimental results in terms of F-1 evaluation metric for different SA baselines in standard datasets which demonstrate the outperformances of our proposed TopFuzz4SA model compared with recent state-of-the-art baselines.
In general, as shown from the average results of the experimental outputs in all datasets (as shown in Table 4), we can recognize that most of transformer-based SA models (BERT, BERT4ABSA, ABSA-BERT-pair, Sen-tiLARE and our proposed TopFuzz4SA) achieved better performance than previous LSTM-based model (SR-LSTM and Sentic-LSTM) about 6.48% in all datasets. It proves the fact that the use of transformer-based textual representation learning technique can support to obtain richer contextual information from texts for leveraging the performance of SA task compared with previous RNN-based models. Specifically, compared with previous LSTM-based models, our proposed TopFuzz4SA model significantly improves the accuracy performance in term F-1 metric about: 27.67% (LDA4SA) 13.95% (SR-LSTM) and 10.18% (Sentic-LSTM) for all benchmark datasets. Similar to that with BERT-based SA models, our proposed TopFuzz4SA model also slightly gains better results in the sentence/document-level SA tasks about: 12.35% (general BERT), 6.69% (BERT4ABSA) and 5.53% (ABSA-BERTpair). For our main competitor in this paper, the Sen-tiLARE model, our proposed TopFuzz4SA model also outperforms approximately 2.32% in the accuracy performance of sentence/document-level SA task in all datasets.

Experiments on aspect-based SA task
In this section, we conducted extensive experiments to compare the performance of BERT-based models, including: BERT, BERT4ABSA, ABSA-BERT-pair, SentiLARE and our proposed TopFuzz4SA on the aspect-level sentiment classification task. Similar to previous empirical studies of Ke, P. et al. in the SentiLARE (Chen et al. 2017) model, all models are evaluated the accuracy performance in terms of Accuracy and F-1 metrics in handling the aspect-based SA task at two levels: aspect term extraction and aspect term sentiment classification. Tables 5 and 6 show the experimental outputs for aspect-based SA task in aspect term extraction and aspect term sentiment classification levels by using different models in the standard SemEval-2014 dataset.
Experimental outputs in Tables 5 and 6 demonstrate the outperformances of our proposed TopFuzz4SA model compared with recent BERT-based models in both aspectbased SA task in aspect term extraction and aspect term sentiment classification levels. In more details, for the aspect term extraction-level aspect-based SA task, our proposed TopFuzz4SA model achieves better performance than BERT-based models (BERT, BERT4ABSA and ABSA-BERT-pair) averagely 4.81% and 4.86% in terms of Accuracy and F-1 metrics, respectively. The TopFuzz4SA model also outperforms our main competitor SentiLARE model about 1.2% and 0.64% in terms of Accuracy and F-1 metrics, respectively, for this task. Similar to that with the aspect term sentiment classification level, our proposed TopFuzz4SA model also significantly achieves better performances approximately 3.85% and 7.23% in terms of Accuracy and F-1 metrics compared with BERT, BER-T4ABSA and ABSA-BERT-pair. For the SentiLARE model, our model also slightly improves the performance about 1.47% and 2.45% in terms of Accuracy and F-1 evaluation metrics. To sum up, through experimental outputs in both sentence/document-level and aspect-level SA tasks, we prove the effectiveness of our proposed ideas in this paper which is a combination of topic-oriented autoencoding mechanism with the integration of fuzzy neural representation learning for feature noise and ambiguity reduction.

Ablation studies
In this section, we conduct extensive empirical studies on the parameter sensitivity of our proposed TopFuzz4SA model, includes: dimensionality of word embedding vector (d w ), hidden state of the RNN-based architecture (h RNN ) in our topic-driven auto-encoding mechanism and the number of layers which are used for the topic-driven auto-encoding (k encÀdec ) and FDNN-based (k FDNN ) architectures.
To do this, we varied the values of (d w ) and (h RNN ) parameters in range of [10,448] and [10,324], then reported the changes on accuracy performance of our proposed TopFuzz4SA for the sentence-level SA task in AR and Yelp datasets. Figure 2 shows the experimental outputs for studies on the influences of these two parameters upon the accuracy performance of our TopFuzz4SA model. The experimental outputs show that our model is quite insensitive with these parameters in which it reaches the high performance with value of d w [ 200 and h RNN [ 220. Similar to previous studies on the (d w ) and (h RNN ) parameters, to evaluate the effects of (k encÀdec ) and (k FDNN ) parameters on the overall performance of our model, we   conducted extensive empirical studies on the same AR and Yelp datasets with different values of these two parameters within range (Zhang et al. 2021;Peters et al. 2018) for k encÀdec and (Zhang et al. 2021;Cheng et al. 2017) for k FDNN parameters. Experimental outputs (in Fig. 3) show that our proposed model reach the stability in performance with values of k encÀdec ! 6 and k FDNN ! 5 for both AR and Yelp datasets.

Studies on fuzzy versus non-fuzzy approaches in TopFuzz4SA model
To evaluate the effectiveness of applying fuzzy learning concept on reducing the feature noise and ambiguity for learnt textual representations which effectively support for the improvement on multiple downstream SA tasks, we implemented two versions of our proposed TopFuzz4SA model. The first version is the original implementation of TopFuzz4SA with the support of fuzzy neural learning mechanism in the FDNN-based architecture, named as: TopFuzz4SA-Fuzzy. The second version is TopFuzz4SA model without the setup of fuzzy neural layers in the FDNN-based architecture, named as: TopFuzz4SA-DL. Then, we utilized these two versions of TopFuzz4SA model to handle the sentence-based SA task in the AR and Yelp datasets with different training set size (%) and reported the accuracy outputs of each version in terms of F-1 evaluation metric. Experimental outputs in Fig. 4 show that the TopFuzz4SA-Fuzzy version remarkably achieves better performance than the TopFuzz4SA-DL version in which the accuracy performance of the fuzzy-based version stably increases with different training set size (%) and is higher than the only-deep-learning-based version in the after all. This extensive empirical studies show the usefulness of applying fuzzy neural learning concept on alleviating the feature noise and ambiguity from input texts in which significantly improve the performance of several primitive SA tasks as the result.

Conclusions and future works
In this paper, we propose a novel approach of an integrated fuzzy neural learning concept with the topic-driven autoencoding mechanism for handling multiple downstream sentiment analysis (SA) tasks, called as TopFuzz4SA. In our proposed TopFuzz4SA model, we apply the neural topic modelling approach to model and learn the distributed latent topic over the text corpus to facilitate the topic-driven attention-based mechanism in our textual auto-encoding mechanism for SA task. Then the achieved textual representation by the topic-driven encoder-decoder architecture is fed to a fused fuzzy deep neural network (FDNN)-based architecture to eliminate the feature noise and ambiguity which can effectively support to leverage the accuracy performance of sentiment classification task in the after all. Extensive experiments in benchmark datasets demonstrate the effectiveness of our proposed Fig. 2 Experimental studies on the sensitivity of (d w ) and (h RNN ) parameters on the accuracy performance of our proposed TopFuzz4SA model in AR and Yelp datasets TopFuzz4SA model comparing with baselines for SA task. For our future work, we intend to extend our proposed TopFuzz4SA model to handle the dynamic sentiment polarity classification task upon the real-time textual chats or QA-based conversions.