High Performing Sentiment Analysis based on Fast Fourier Transform over Temporal Intuitionistic Fuzzy Value

Sentiment analysis or opinion mining has an extensive area in the field of research. Today we consider the huge amount of structured and unstructured data available in the web for a particular subject to get an opinion. The surplus data handling termed as big data requires some new technology to deal with. This paper considers the requirement of sentiment analysis of such huge data for fast processing. Based on Fast Fourier Transform on Temporal Intuitionistic fuzzy set generated from text, this algorithm (FFT-TIFS) expedites the sentiment classification. Fourier analysis converts a signal from its time domain to its representation in frequency domain. Such frequency domain algorithm on Temporal Intuitionistic fuzzy set is used in Sentiment analysis for the first time. This algorithm is useful for short twitter text, document level as well as sentence level binary sentiment classification. It is tested on aclImdb, Polarity, MR, Sentiment140 and CR dataset which gives an average of 80% accuracy. The proposed method shows significant improvement in required time complexity where the method achieves 17 times faster processing in comparison to sequential Fuzzy C Means(FCM) method and again it is at least 7 times faster than distributed FCM method present in literature. The method presented in this paper has a novel approach towards fastest processing time and suitability of various sizes of the text sentiment analysis.


Introduction
The need for opinion on a subject or product or on any particular object has a growing importance in the arena of the age of information. Sentiment analysis is the key to get a fast response to have an opinion based on the comments of user and it is considered as a blooming area of research. Sentiment analysis can be categorised with various perspective: Task Oriented, Granularity Oriented and Methodology Oriented. Sentiment classification task mostly classifies the polarity with positive, negative or neutral as a binary or ternary classifier. Even classification can distinguish a specific text as an objective sentence or subjective sentence. Objective sentences refer to facts, happenings and neutral opinions whereas subjective sentences refer to positive or negative polarity values with opinion or belief or judgement. Another sub category of sentiment analysis is Granularity Oriented where opinion is generated on small or large document level, sentence level or word level basis. Researchers are exploring the analysis on large sized documents as well as on big data where counts of data are huge. Classification oriented analysis gives binary classification, ternary classification or multi-level classification. Methodology oriented analysis include Supervised, Semi-supervised and Unsupervised methods. Supervised methods include Rule based, SVM, Regression, Neural Network, Deep Learning etc. Though some of the supervised methods give high accuracy but computational cost is of concern moreover over-fitting needs to be checked and requires a large amount of time for training purpose. Semi-supervised methods are Graph-based method, Wrapper-based method and Topic-based method. Unsupervised methods handle large datasets with fast, simple and effective way.
In recent research Sentiment analysis has been explored through different directions. The most tested area is machine learning. Evolutionary algorithms also have taken part in opinion mining. The combination of machine learning and nature inspired algorithm is blooming area of research now-a-days. Besides algorithm the volume of data is also a major concern in the area of sentiment classification. Recent research shows that big data sentiment extraction uses hadoop distributed file system with map reduce techniques and researchers experimented on big data distributed techniques with machine learning algorithms. These all experiments have a single aim which is to attain less time complexity so that large volume data can be processed efficiently. Keeping in view of this particular large volume data the method proposed in this paper using Fast Fourier Transform on Temporal Intuitionistic fuzzy set gives a suitable method having a competitive time complexity with state-of-the-art techniques even with sequential processing. Moreover, the efficiency of proposed method is also greater than some state-of-the-art algorithms.
In this paper Sentiment of a word represented as Temporal Intuitionistic Fuzzy set. Intuitionistic fuzzy set developed by Atanassov is a powerful extension of classical fuzzy set. It differs from classical fuzzy which assigns only a membership degree to each element. In Intuitionstic fuzzy each element has a membership degree and a non-membership degree as well. Here positive and negative sentiment values of a word generated from SentiWordNet can be represented as membership degree and non membership degree of each word in the context of positivity and negativity. Temporal Intuitionistic fuzzy is a variant of IFS developed by Atanassov where time-moments are also taken into consideration. In this paper positive and negative values of the words are represented as Temopral Intuitionistic Fuzzy set. This time domain fuzzy values are then fed into frequency domain analysis.
The proposed method is based on Fast Fourier Transform of signal processing. Fast Fourier Transform is an algorithm which computes Discrete Fourier Transform of a sequence. FFT has a very wide range of applicability in digital methods such as image compression and encryption, spectral analysis, fourier spectroscopy, signal processing, speech processing or solution of differential equation. The use of FFT is prominent in image analysis such as fingerprint classification or microstructure evaluation etc. But fourier transform algorithms are not used in text processing. This is a novel method where Fast Fourier Transform is used for sentiment analysis. The Motivation of the use of this algorithm is fast processing in the age of big data and as well as no pre-processing for feature selection of the text is required. Moreover the algorithm does not need any training to generate bi-polar opinion i,e, positive or negative. In addition, with this facility the algorithm gives high efficiency both in document level and sentence level opinion mining. This is an unsupervised method, so training phase is completely eliminated. The proposed method will give a very fast processing in future fast technology or in distributed systems.

Motivation:
The motivation of the paper is five-fold: 1. In the literature the discrete cosine transform has never been explored in sentiment analysis field. So, the proposed method is novel in this case. 2. While processing big data, the time requirement is a critical issue. This issue is addressed successfully in this research work. This method expedites fastest time complexity in comparison to previous work. 3. While looking into the previous work this has been found that most of the cases a large training data is required to train the system. This requirement is sometimes not available for a new field of study on sentiment analysis. In the FFT-TIFS method this requirement is completely eliminated. 4. In previous methods feature selection is a mandatory step in most of the cases. For large documents or for cross domain or big data unstructured text, feature selection is difficult and confusing. In this method there is no requirement of feature selection. 5. In most of the previous work the suitability of the method is tested on a particular size of dataset. In this paper a variety of type of text-short twitter text, single sentence, 1kb file or 3 kb file, are explored and found suitability for all of them.

1.2
The main contributions of the proposed work are as follows: 1. The proposed method experiencing fastest processing time in computing sentiments in comparison to state-of-the-art methods. It achieves 17 times and 7 times faster processing than sequential Fuzzy C means and distributed FCM method present in the previous work respectively. 2. The method has been tested on short twitter text, on single sentence, medium sized (around 1kb) files, large sized (around 3 kb) files. In all of the cases the proposed process has shown efficient speed and accuracy. So, it can be concluded that the method is suitable for any size of the document level as well as sentence level sentiment analysis. 3. No training phase or feature selection is required in this method, So the availability of large training set is not at all a constraint in the proposed work. This work is suitably applicable for new field of study in sentiment analysis. 4. Fast Fourier Transform is applied here for the first time for sentiment analysis.
The paper is organized as follows: Section 2 describes related work in the field of Sentiment analysis, Fast Fourier Transform and Intuitionistic Fuzzy set. Section 3 describes the technical details of Temporal Intuitionistic fuzzy, Fast Fourier transform, SentiWordNet. Section 4 describes proposed method. Section 5 gives the application of the algorithm on various datasets and their Results and Section 6 draws Conclusion from results and outlines the Future Scope.

Related Work
Researchers are trying to analyse opinion mining in various angles of problem statement. The basic property of sentiment analysis is to get polarity of the document or sentences. Decade's research developed the category of the problem into different sections. One category is the level of the problem whether it is sentence level, document level or aspect level. The other is the algorithm used for sentiment analysis i.e, machine learning, fuzzy, evolutionary algorithms. Most recent concern of the problem is the size of the data in viewing Big Data age. In this section some overview of the recent research will be presented.
Sentiment analysis in document level shows whether the sentences appeared in document give polarity of positive or negative value. Tripathy et. al.,2017 [38] shows document level binary classification with a high accuracy 96.4% using machine learning techniques. They have compared their hybrid method of SVM and ANN with other algorithms and showed that they reached a greater efficiency. Yessenalina et. al., 2010[44] proposed a two-level document analysis for sentiment which extracts meaningful sentences for correct and accurate classification. They also have reached more than 90% accuracy using variant of SVM algorithm. Research direction on sentence level analysis is finer grained as they have two aspects: finding the sentence as subjective sentence or objective sentence. In case of subjective sentence finding the polarity of the sentence as positive, negative or neutral will complete the task. Dragoni et. al., 2015 [14] have worked on Sentence level opinion mining on Blitzer dataset. They have acquired a comparable efficiency with polarity aggregation using Simulated Annealing. Ruz et. al.,2020 [33] estimated sentiment analysis on twitter data on a different section of problems. They have used Bayesian networks classifier in critical issues and social movement sentiment analysis. Chilean earthquake and Catalan independence referendum twitter texts are the two datasets on which the Bayesian networks classifiers are evaluated. Another important aspect of texts that are mostly considered for experiment is basically clean data. In the twitter text it has been found that texts are short, informal, ambiguous and polysemi. Naseem et. al.,2020 [29] worked on such texts using word representation by transformer based encoding fused with deep intelligence contextual embedding. Jain et. al.,2019 [20] worked on larged sized (25Kb) document to perform ternary sentiment classification using evolutionary optimization. Evolutionary optimizations [11,7,12,13,16,52] which are developing fast can be a potential source for the sentiment analysis.
As the proposed method is based on the idea of Fuzzy values some of the application of fuzzy theory on sentiment analysis are discussed here. Krishna et. al.,2018[25] has applied fuzzy set concepts in opinion mining of text reviews posted in the social media sites. In the stage of feature extraction opinion holding words are extracted and assigned a degree of polarity with the help of fuzzy sets. Jefferson et. al.,2017 [21] applied Tsukamoto fuzzy rule based approach for sentiment classification. They have argued that fuzzy system is suitable to hold the linguistic uncertainty. Vashishtha and Susan,2019 [39] worked on twitter comments and developed nine fuzzy rules to compute sentiment of each tweet. They proposed unsupervised approach which is suitable for any dataset or any sentiment lexicon. Three different sentiment lexicons are used namely SentiWordNet, AFINN, VADER in isolation and compute the time and efficiency on nine publicly available twitter datasets. Vashishtha and Susan,2021 [40] deployed fuzzy linguistic hedges concept on sentiment analysis, as well as they have also shown fuzzy entropy filter and k-means clustering on document level sentiment datasets.

Machine Learning Approaches:
There are different directions to capture the research trends in Sentiment Analysis problem.

Intuitionistic Fuzzy Applications:
IFS and its variants have been applied on various fields like electoral system, medical pattern recognition, petrochemical farm, medical diagnosis, sociometry, pneumatic transportation process etc. To name some of the examples fuzzy methods have been successfully applied in image segmentation. Huang et. al.2015 [18] applied Intuitionistic fuzzy to segment MRI images using C-means clustering techniques. In the network system also IFS has been applied by Dutta and Sait,2012 [15] for routing. Wang et. al.,2018 [42] detected anomaly in network traffic from flow interaction using IFS. Advancement of multi criteria decision making problem has evidence of the use of Intuitionistic Fuzzy values. Zhang et. al.,2020 [49] has shown in his research that in multicriteria decision making problem where there are n alternatives and m criteria, how intuitionistic fuzzy values are used while employing fuzzy rough set model. But in literature the use of temporal Intuitionistic fuzzy set is very limited or it can be inferred that it is not used at all. Here the sentiment architecture can be seen as a Temporal IFS. Over this system FFT is applied to get the sentiment value.

Fast Fourier Transform Applications:
The use of Fast Fourier Transform is a novel approach in Sentiment Analysis. Earlier this fast algorithm is used in many engineering and science domain. It was included in top 10 algorithms of 20 th Century by IEEE. To mention some of the recent development Zhang et. al.,2018 [48] used Fractional Fourier Transform in color image encryption. They have used 2D compressive sensing to encrypt and compress and then re-encrypt with Fractional Fourier Transform.. Parchami et. al.,2016[33] investigated Short-time Fourier Transform(SIFT) method in speech processing. They concluded that this Fourier family of algorithms are suitable for handling different frequencies independently and can give flexibility to noise statistics to handle speech processing. Dara and Panduga,2015 [9] have used 2D FFT along with SVM in Telegu Character Recognition. They have reached 71% accuracy by exploiting 2D FFT. In recent advancement Jeong and Shin,2018 [22] uses Fast Fourier Transform to search accurate data from different kind of big data in P2P cloud computing environment.

Intuitionistic Fuzzy Set
Fuzzy set proposed by Zadeh,1965[45] states that belongingness of an element in a set is a matter of degree unlike classical set where membership is a matter of affirmation or denial. Intuitionistic fuzzy set (IFS) is the generalization of fuzzy set, proposed by Atanassov, in 1986[2]. It assigns two values called membership degree and a non-membership degree respectively.
An Intuitionist Fuzzy set A in the domain E is defined according to the following form is the degree of membership and (νₐ(x)) is the degree of non-membership which lies in the range [0,1] and 0 ≤ µₐ(x) + νₐ(x) ≤ 1. In this paper the positive sentiment is represented by µₐ(x) and the negative sentiment is represented by νₐ(x).
This logic differs from classical fuzzy when the term indeterministic or hesitancy comes and defined in Intuitionistic fuzzy as πₐ = 1 − µₐ(x) − νₐ(x), It is degree of hesitancy of x to A. When the term πₐ becomes 0 the set becomes classical fuzzy set. The alignment of the sentiment values towards Intuitionistic fuzzy is more prominent as there lays a hesitancy or indeterministic part due to πₐ = 1 − µₐ(x) − νₐ(x) is not equal to zero.
There are few variants of Intuitionistic Fuzzy Sets (Jain et. al.,2020[19]): a) Interval valued Intuitionistic fuzzy set: This is a combination of Intuitionistic fuzzy and interval valued fuzzy. b) Intuitionistic L-fuzzy set: Here L may be complete chain, lattice or complete ordered semi-ring. c) Temporal Intuitionistic Fuzzy set: Here Time element is added with Intuitionistic fuzzy sets. d) Intuitionistic fuzzy set of second type: Here varied degree of membership and nonmembership is counted.
In this paper rather than defining the sentiment value as IFS , it is preferred to represent it as temporal IFS due to the time domain requirement. Time is an important factor in real world system. An object can be defined with an instance of time. Time is required to record when an object is changing its parameters. An instance of Temporal Intuitionistic Fuzzy Set A(T) is defined over non empty set E and T where elements of T is called 'Time-moment'.

Temporal Intuitionistic Fuzzy Set
where: and νₐ(x, t) are the degree of membership and non-membership value of the element x ϵ E at the time t ϵ T Every ordinary IFS can be represented by Temporal IFS where T is a singleton set. Here also the sentiment values are simple IFS where it is converted to Temporal representation for T as a singleton set.

Fast Fourier Transform
Since, integrand exists at the sample points The disadvantage of DFT is that it is an approximation since it provides only a finite set of frequencies.
There are two types of errors in DFT namely aliasing and leakage. If the initial samples are not sufficiently closely spaced to represent high frequency components present in the underlying function then the DFT values will be corrupted by aliasing. The solution is to increase the sampling rate. Continuous Fourier Transform of a periodic waveform requires the integration to be performed over an integer number of cycles of the waveform. If a non-integer cycle of input signal is to be under consideration then the transform may be corrupted.
The time taken to compute DFT is mainly dependent on number of multiplication involve in the process as multiplication is the slowest operation. With the DFT, this number is directly related to N 2 for the signal of length N. Highly efficient computer algorithms to compute DFT known as Fast Fourier Transform(FFT) are developed in early 60's on the basis that standard DFT calculation involves lot of redundant calculation. The DFT requires N 2 complex multiplications. At each stage of the FFT N/2 complex multiplications are needed to combine the results of the previous stage. Since there are log2N stages the number of complex multiplication requires (N/2)*log2N. Such reduction in time complexity made FFT very useful and popular in many fields of study where DFT had used. FFT converts discrete data into continuous datatype at various frequencies. While FFT requires less processing power but for the real world problem it is difficult to achieve same accuracy as that of DFT. FFT mathematically works on divide and conquer method where an integer window is required. But for the real world problem it is difficult to obtain 2 n window to provide same accuracy as that of DFT. The resultant inaccuracy is called "Harmonic Leakage".

SentiWordNet
In

Proposed Method
The text of large document or the sentence set for the classification purpose is parsed and gone through POS tagging. Noun, Verb, Adjective and Adverbs are taken for the sentiment value generation. The positive and negative values of the words are analogous to IFS membership (µₐ(x)) and non-membership (νₐ(x)) components. This representation is valid as 0 ≤ µₐ(x) + νₐ(x) ≤ 1.

Computational Complexity of the Presented Method
As it has discussed in the previous section that FFT gives better time complexity in comparison with DFT. The FFT requires Nlog2N operations while DFT requires N 2 . In view of proposed algorithm the pre-processing step for tokenization and POS tagging (In step 5 in Fig 3) requires O(N) operations to get sentiwordnet score while N is the number of words in a sentence for  (Nlog2N). To get power calculation (step 9 and 10 in Fig 3) requires O(1) step. Rest of the comparison (step 12 in Fig 3) requires O(1) time complexity. So, it can be concluded that the time requirement of the algorithm is O (Nlog2N) as that of FFT.

Experimental Analysis and Results
In this section a detailed experimental analysis is presented and compared the method with state-of-the-art methods presented in the literature. Here emphasis is given on the Big data processing. In current scenario big data has a huge importance due to lots of usage of digital media which leads to ample number of document storage. For processing of such volume of data a fast algorithm is required. Moreover, the unstructured characteristic of big data requires an algorithm which will act uniformly over all variety. Here lies the strength of this algorithm which acts on the big data property such as volume, velocity and variety. The time required for processing of text whether it is single sentence or large document is in milliseconds retaining 80% accuracy. Thus, the claim that this method is suitable for big data is quite justified.
The proposed method is tested on documents as well as on sentences. Two document level datasets are taken which are aclImdb and Polarity dataset. AclImdb is a movie review dataset having 1Kb size and Polarity dataset have texts of around 3Kb size. Twitter and Sentence level datasets are taken as well such as Sentiment140, MR Rt-Polarity and CR datasets. It is noticed that the method is suitable for both document and sentence level sentiment classification.

AclImdb dataset*
AclImdb is a document level dataset having text of 1Kb size (Avg. 100 words per file). IMDB dataset has 25000 highly polar positive and negative reviews. In which 9999 files are tested for each category of positive and negative sentiment.  (Table 2) with machine learning and deep learning methods present in literature (Behera et. al.,2021[6]) and it is derived that FFT-TIFS can give maximum F-Measure and accuracy with comparison to the mentioned machine learning and deep learning methods.

Polarity dataset
Polarity dataset is a movie review dataset and its label was created with an improved ratingextraction system. It contains 1000 positive and 1000 negative files having an average of 3-5Kb size(Avg. 350 words per file). Overall efficiency came 84.89% where average time taken for processing of each file is .356 seconds. A comparison chart is given for the Fuzzy processes present (Table 4) in the literature which shows significant improvement in overall efficiency.

Sentiment 140 dataset
Sentiment 140 is a twitter dataset having 1600000 tweets extracted by twitter api. This dataset is created by Standford University. Here 15856 tweets are taken for positive and 15065 taken for negative sentiment processing. As tweets are short text having 7 words per sentence, the processing for such texts took less time than document processing. For each tweet the algorithm takes an average of .033 seconds. Overall efficiency came 79.6%. Precision and Recall values are compared (Table 6) with fuzzy approach (Vashihstha and Susan,2019 [39]) and with some deep learning methods (Table 7) [Basiri et. al.,2021[5]]. It can be derived from the comparison that the proposed method has comparable precision, recall and accuracy with the deep learning approaches too. Deep learning approaches require exhaustive training phase whereas FFT-TIFS does not require it at all. The achieved higher Precision and recall are shown in bold.

Rt-Polarity dataset
Rt-Polarity dataset contains 5331 positive and 5331 negative snippets. All snippets were labelled automatically. As the snippets are short sentences (avg. 17 words per sentence), so the time taken for average file processing is very short i.e,.026 seconds. Here comparison of MR dataset has done with Bag-Of-Words using Machine Learning methods (Table 9) and SAPCP (Song et.al.,2020[35] )method where sentiment information are extracted from SentiWordNet and then converted to PLTS and finally classification has done with SVM.
Though it is showing greater accuracy than proposed method but presented method is suitable for any sizes of sentences or documents. Whereas SAPCP method is only suitable for shorter texts.      [24] 81.8 Tree-CRF [24] 81.4 CNN-rand [24] 79.8 Ensemble-HMM [24] 87 Proposed Model 86.6 The above mentioned tables represent that the proposed model have comparable potentiality in comparison to machine learning and deep learning methods. In some cases it gives better result while comparing with machine learning algorithms. Though our focus is on the time complexity that the FFT-TIFS method is generating due to suitability of application on large volume of document or sentences of BigData but this method shows a higher performance on accuracy comparison also. The feature selection method is necessary and mandatory step for the algorithms applied based on machine learning. Feature selection is a time consuming task for large documents and it is unmanageable task for cross domain documents. FFT-TIFS is completely independent from this feature selection task.  [32] used Fuzzy C Means for big data as in the present scenario big data analysis is an blooming research field. In this case as the data size is large so processing of sentiment analysis needs to be faster. As shown by the mentioned paper, the time taken for the classification of positive negative review by cloudera distributed environment is faster than the time taken by the algorithm on sequential environment. It is shown from the table that the proposed method is at least 17 times faster and at most 28 times faster than the state-of-the-art method in sequential environment itself on same dataset. It is therefore assumed than in distributed environment the proposed process will give a greater efficiency suitable for big data processing purpose. The motive of presenting Table 15 is for comparison of time taken for pre-processing in eclipse and for processing of FFT-TIFS algorithm in matlab with the size of the dataset. For comparison purpose one large dataset of each of 25 kb file of positive and negative sentiment is created from Blitzer dataset. The comparison table shown in Table 15 between words taken for a single file or sentence verses SentiWordScore processing time in Eclipse and FFT algorithm processing time in Matlab. It is shown in the figure 8 that the FFT processing time is almost linear to the logarithmic(base 10) of number of words per file.

Conclusion and Future Scope
In this paper a method based on Discrete Fourier Transform is proposed for sentiment analysis on Temporal Intuitionistic Fuzzy set consists of membership (Positive) and nonmembership(Negative) values. The Fast Fourier Transform is used for fast calculation of DFT. This method is novel as such frequency domain transformation is never used in sentiment classification. The time complexity and accuracy reveals that the proposed method can be very useful for recent trend of big data classification. Document level, sentence level and twitter data all are passed through the system and noticed that the method is suitable for all of the categories. Moreover the pre-processing and feature selection are not required which gives suitability for cross domain and large sized text opinion mining. The accuracy reached 86% on document level and sentence level classification and about 79.6% on twitter classification. Time domain analysis shows that even the sequential implementation of FFT-TIFS is far better than previously applied FCM method on big data over distributed environment. Hence the conclusion can be inferred that in distributed environment the FFT-TIFS will give a more efficient result.
In this paper we have shown corresponding relations of positive and negative values of the text with Temporal Intuitionistic Fuzzy membership and non-membership values. The temporal relation is valid over here as the values are taken in time domain to convert it to frequency domain by Fast Fourier Transform for further analysis. This representation is also novel as Temporal Intuitionistic Fuzzy has never been used in the literature for sentiment classification.
The efficiency of the method looks less in comparison of some hybrid machine learning and deep learning methods whereas for singly used machine learning(NB,SVM) or deep learning method(CNN) the proposed process is much better. An improvement of accuracy can be done with hybridization of machine learning, deep learning or nature inspired algorithms with FFT-TIFS. The scope of the paper can be extended to measure the hybrid algorithm FFT-TIFS with evolutionary or machine learning on big data processing as well. Recent Fuzzy domain development such as multicriteria decision making processes (Zhan et.al,2021[46], Zhan et. al.,2020 [49]) can be utilized for multiway sentiment analysis using FFT classification. In the artificial text processing the incomplete information can be tested with fuzzy application on basis of FFT. This method needs more exposure of decision making on unclean incomplete texts as well as other branches of NLP like topic modelling, question answering etc.
Compliance with Ethical Standards: