Rider Weed Deep Residual Network-Based Incremental Model for Text Classi cation Using Multidimensional Features and MapReduce Framework

The increasing demand for information and rapid growth of big data has dramatically increased textual data. The amount of different kinds of data has led to the overloading of information. For obtaining useful text information, the classification of texts is considered an imperative task. This paper develops a technique for text classification in big data using the MapReduce model. The goal is to design a hybrid optimization algorithm for classifying the text. Here, the pre-processing is done with the steaming process and stop word removal. In addition, the Extraction of imperative features is performed wherein SentiWordNet features, contextual features, and thematic features are generated. Furthermore, the selection of optimal features is performed using Tanimoto similarity. The Tanimoto similarity method estimates the similarity between the features and selects the relevant features with higher feature selection accuracy. After that, a deep residual network is utilized for dynamic text classification. The Adam algorithm trains the deep residual network. In addition, the dynamic learning is performed with the proposed Rider invasive weed optimization (RIWO)-based deep residual network along with fuzzy theory. The proposed RIWO algorithm combines Invasive weed optimization (IWO) and the Rider optimization algorithm (ROA). The method mentioned above is solved under the MapReduce framework. The proposed RIWO-based deep residual network outperformed other techniques with the highest True positive rate (TPR) of 85%, True negative rate (TNR) of 94%, and accuracy of 88.7%.


Introduction
The massive demand for big data led to evaluate the source and implication of data. The fundamental opinion of analysis relies on designing a novel framework for studying the data. One of the mathematical models is a similarity measure that is utilized for classifying and clustering the data. Here, the fundamental assessment of common similarity measures is provided. Here, the similarity metrics, like Jaccard [10], Cosine [12], Euclidean distance [9], Extended Jaccard [11], are utilized for evaluating distance or angle amongst the vectors. Here, the similarity measures are categorized as feature content and topology. In topology, the features are organized in a hierarchical model, and the appropriate length of path amongst the feature is required to be evaluated. The measures of features are devised on the basis of evidence wherein the features having elevated frequency are employed to be explicit with elevated information, wherein features with less frequency are adapted with less information. Pair-Wise and ITSim metrics fit into the class of feature content metrics. Here, the information content measure provides elevated priority to the highest features with a small difference amongst the two data that can lead to improved outcomes. Here, the cosine and Euclidean belong to a class of topological metrics. It is susceptible to loss of information as two similar data offset by the existence of solitary feature having huge weight [4]. The methods, such as clustering and classification, are utilized in text mining-based applications that help to transform massive data into small subsets for increasing computational effectiveness [2].
The text data consist of noisy and irrelevant features that made the learning techniques fail to generate improved accuracies. For removing redundant data, various data mining methods are adapted. Here, the selection and Extraction of features are two methods for classifying the data. The techniques, such as clustering of text for selecting text data features and classification, are adapted recently. The selection of components is utilized for eliminating the superfluous text features for effectively performing classification and clustering. The previous techniques concentrated more on transforming huge data to small data considering classical distance measures. The reduction of dimensionality minimizes evaluation time and maximizes the efficiency of classification. The recovery of data and text are utilized in detecting synonyms and meanings of data. Authors devise several techniques for performing the classification and clustering process. Here, the clustering is carried out using unsupervised techniques with different class label data [2]. The goal of classifying text is to categorize data into different parts. Here, the goal is to allocate pertinent labels based on the content [3].
The categorization of texts is considered an imperative part of processing the natural language. It is extensively employed in several applications. For instance, most new services require repeatedly arrange huge articles in a single day [13]. The advanced services of mails offer the function to discover either junk mail or mail in an automated manner [14]. Other applications involve analysis of sentiment [15], modeling topic [16], text clustering [32], translation of language [17] and intent detection [18] [3]. The classification of technology assists people filters useful data and poses more implications in real life. The design of text categorization and machine learning has shifted manually to machine [19], [20], [21], [22]. With conventional text data for learning and then categorizing the unspecified text is a basic part. There exist several textualization classification techniques [23]. The goal is to categorize textual data. The categorization outcomes can fulfill an individual's requirements for classifying text and is suitable for rapidly attaining significant data. MapReduce is utilized for handling huge data with unstructured data [5].
The aim is to devise an optimization-driven deep learning technique for classifying the texts using big data considering MapReduce. Initially, the text data undergoes pre-processing for removing unnecessary words from data. Here, the pre-processing is performed using the stop word removal and stemming process. After that, the Extraction of features is performed wherein the SentiWordNet features, Thematic features, and contextual features are obtained. These features are employed in a deep residual network for classifying the texts. Here, the training of the deep residual network is performed with the Adams algorithm. Finally, dynamic learning is carried out wherein the proposed RIWO-based deep residual network is employed for incremental text classification. Here, the fuzzy theory is employed for weight bounding in order to deal with the incremental data. In this process, the training of deep residual network is performed with proposed RIWO, which is devised by combining ROA and IWO algorithm The key contribution of the paper is:  Proposed RIWO-based deep residual network for text classification: The proposed RIWObased deep residual network is employed for classifying texts by dynamic learning. Here, the developed RIWO is adapted for deep residual network training. The developed RIWO is devised by combining ROA and IWO algorithms. In addition, the fuzzy theory is employed for handling the dynamic data by performing weight bounding.

Proposed RIWO-based deep residual network for text classification in big data
The mission of text classification is to categorize text data into different classes based on certain content. The classification of text is considered an imperative role in processing the natural language. However, text classification is considered a challenging issue due to high dimensional and noisy texts, which is considered a complex process to devise an improved classifier for huge textual data. This research devises a novel hybrid optimization-driven deep learning technique for text classification using big data. Here, the goal is to devise a classifier that employs text data as input and allocates pertinent labels based on the content. At first, the input text data undergoes pre-processing to eliminate noise and artifacts in the text data. Here, the pre-processing is performed with stop word removal and stemming. Once the preprocessed data is obtained, the features, like contextual features, thematic features, and SentiWordNet features, are extracted. Once the features are extracted, the imperative features are chosen with Tanimoto similarity. The Tanimoto similarity method evaluates similarity amongst features and chooses the relevant features having high feature selection accuracy. Once the features are selected, a deep residual network [26] is used for dynamic text classification. The deep residual network is trained using the Adam algorithm [11], [33], [34]. In addition, dynamic learning is performed using the proposed RIWO algorithm along with the fuzzy theory. The proposed RIWO algorithm is the integration of IWO [27] and ROA [28]. Figure 1 shows a schematic view of the text classification model with big data using the proposed RIWO method. Assume input text data expressed as B with various attributes and is expressed as where e d B , refers to text data contained in the database B with th e an attribute in th d data. Here, D data points are employed using E attributes for each data. The other step is to eliminate artifacts and noise present in the data.
The data e d B , in a database is split into a specific number, which is equivalent to mappers present in the MapReduce model. The partitioned data is given by, where N symbolize total mappers. Assume N mappers in MapReduce be expressed as Thus, input to th q mapper is formulated as, where, l r d , symbolize split data given to th q mapper to process, and q m indicate data in th q mapper.

Pre-processing
The partitioned data l r d , from the text dataset is fed to pre-processing phase to remove the superfluous words by removing stop words and stemming. Here, pre-processing is known as an important process to arrange various data in a smooth manner for offering effective outcomes-the pre-processing assists in explaining processing for obtaining improved representations. The dataset consists of unnecessary phrases and words that influence the process. Thus, pre-processing is an important process for removing inconsistent words from the dataset. Initially, the text data are accumulated in the relational dataset, and all reviews are divided into sentences and bags of the sentence. Thus, the elimination of stop words is carried out to maximize the performance of the text classification model. Here, the stemming and stop word removal are done to refine data.

(i) Stop word removal
It is a process to remove words with less representative value for data. Some of the instances of nonrepresentative words are pronouns and articles. While evaluating data, there exist few words which are not valuable to text content. Thus, removing such redundant words is imperative, and this procedure is termed stop word removal [29]. The continual happening of words, like articles, conjunctions, and prepositions, such as the, is, a, an, and, when, but some high-frequency words are adapted as stop words. In addition, the removal of stop words is the most imperative technique, which is utilized to remove such redundant words using vocabulary as the size of vector space does not offer any meaning, and thus it is employed as less imperative. The stop words indicate word, which does not hold any data. It is a process to eliminate stop words from a huge set of reviews. The elimination of stop word is used for saving huge-space and for performing processing faster in order to attain an effective process.

(ii) Stemming
The stemming procedure is utilized to convert words to stem. In massive data, several words are utilized which convey a similar meaning. Thus, the critical method utilized to minimize words to root is termed stemming. Stemming is a method of linguistic normalization wherein insignificant words are reduced to general means. Moreover, stemming is the procedure to retrieve information for describing the mechanism for removing redundant words to their root form and word stem. For instance, if a word starts from connections, connection, connecting, connected, the word is reduced as connect [29].
where k P symbolize total words present in text data from the database.
The pre-processed outcome generated from pre-processing is expressed as, ' B which is subjected as an input to feature extraction phase

Acquisition of features for producing highly pertinent features
It describes an imperative feature produced with input review, and the implication of feature extraction is to produce pertinent features that facilitate improved classification of text. Moreover, the obstruction of data is reduced as text data is expressed as a minimized feature set. Thus, the pre-processed partitioned data l r d , is fed to feature Extraction, wherein mining of SentiWordNet features, contextual features, and thematic features is performed.

Extraction of SentiWordNet features
From pre-processed partitioned data l r d , , the SentiWordNet features are utilized by removing keywords from reviews employing each review. Here, the SentiWordNet [31] is employed as a lexical resource to extract the SentiWordNet features. The SentiWordNet assigns each text of WordNet considering three numerical sentiment scores, like positive score, negative score, and neutral score. Here, different words indicate different polarity that indicates various word senses. The SentiWordNet consists of different linguistic features that consist of verbs, adverbs, adjectives, and n-gram features. SentiWordNet is a process to evaluate the score of a specific word using text data. Here, the SentiWordNet is employed to determine the polarity of offered review. Thus, the SentiWordNet is employed for discovering positivity and negativity. Hence, SentiWordNet is modeled as . 1

Extraction of contextual features
From pre-processed partitioned data l r d , , the context-based features [1] are generated that describes relevant words by dividing them through non-relevant reviews for effective classification. It requires discovering key terms that attain context terms and semantic meaning, which establishes a pertinent context. The key term is considered as a preliminary indicator for relevant review wherein context terms act as a validator that evaluates if determined key term as an indicator. Here, the training dataset contains keywords with pertinent words. The context-based features assist in selecting the pertinent and non-relevant reviews.
Consider N representing a training dataset that poses rel N as relevant reviews and rel non N _ as nonrelevant reviews. In this method, assume s x represent key term and c x indicate context term.

-Detection of Key terms:
Consider L symbolize language model, that employs each term, and a metric is expressed as, rel on rel L L C _  (6) where rel L symbolize language model for rel N and rel non L  signifies language model for rel non N _ .

-Discovery of Context term:
After discovering key terms, the technique begins the process of context term discovery, which is similar to first to detect each term in a separate manner. The steps employed in determining the context term are given as, i) Compute all instances of the key term employed amid relevant and non-relevant reviews, rel N and rel non N _ .
ii) By employing sliding window size S , the conditions amidst S k are mined as context terms. Hence, the size of the window S is employed as a context span.
iii) The pertinent terms generated are employed as a text, modeled as r d , and non-relevant is denoted as nr d . The set of pertinent text is modeled as r d R _ , and the non-relevant set is referred as iv) Thereafter, the score is evaluated for each distinctive term expressed as,

Extraction of thematic features
Here, the pre-processed partitioned data l r d , is given as input to find thematic features. Here, the count of the thematic word [30] in a sentence is the imperative feature as the words that occur frequently are most probably connected to the topic in a data. Thematic words are words that grab key topics defined in a provided document. In thematic features, the top most 10 frequent words are employed as a thematic. Thus, the thematic feature 3 F is modeled as, Here, T express count of thematic words in a sentence of data, and it is expressed as  Thus, the feature vector considering the contextual, thematic, and SentiWordNet features are expressed as, where 1 F symbolize SentiWordNet features, 2 F signifies contextual features, and 3 F refers to thematic features.

Feature selection using Tanimoto similarity
The selection of imperative features from the extracted features F is made using Tanimoto similarity.
The Tanimoto similarity computes similarity amidst features and selects pertinent features having elevated feature selection accuracy. Here, the Tanimoto similarity is expressed as, where, S indicates Tanimoto measure, w y and w z represents features. The selected features are expressed as . R The produced feature selection output obtained from the mapper is given as input to the reducer U . The classification of texts is performed on reducer using the selected features and briefly illustrated below.

Classification of texts with Adam-based deep residual network
Here, the classification of text is performed with an Adam-based deep residual network using selected features R . The classification of text data assists in standardizing the infrastructure and makes the search simpler and more pertinent. In addition, the classification enhances the experience of the user and simplifies navigation. In addition, text classification helps to solve huge business issues in real-time, like social media, e-mails, which can speed work and take less time for processing. The deep residual network is more effective in the case of the count of attributes and computation. This network is capable of building deep representations at each layer. In addition, it can manage advanced deep learning tasks. The architecture of the deep residual network and training with the Adams algorithm is described below.

Architecture of Deep residual network
Here, a deep residual network [26] is employed to make an effectual decision in which the classification of text is performed. The DRN comprises different layers, namely residual blocks, convolutional (Conv) layers, linear classifier, and average pooling layers. Figure 2presents the structural design of the deep residual network.

-Convolutional (Conv) layer:
The two-dimensional Conv layer is utilized to reduce free attributes in training, and it offers reimbursement for allocating weight. The cover layer process the input image with the sequence of filter known as kernel using a local connection. The cover layer utilizes a mathematical process for sliding the filter with the input matrix and computes the dot product of the kernel. The evaluation process of Conv layer is represented as, (12) where, O express CNN feature of the input image, u and v refers recording coordinates, G signifies E E  kernel matrix, and is termed as a learnable parameter, a and s is position index of the kernel matrix.
Hence, Z G express the size of the kernel for th Z input neuron, and  express cross-correlation operator.

-Pooling layer:
This layer is associated with the Conv layer and is especially utilized for reducing the spatial size of the feature map. Hence, the average pooling is selected to function on each slice and depth of the feature map.
where in a symbolize input matrix width, in s signifies the height of input matrix, out a and out s represents the respective value of output. In addition, a Z and s Z symbolize the width and height of kernel size.
-Activation function: The nonlinear activation function is adapted for learning nonlinear and complicated features in such a way that it is utilized to improve the non-linearity of extracted features. Rectified linear unit (ReLU) is utilized for processing data. The ReLU function is formulated as, Here,  symbolize a feature.
-Batch normalization: Here, the training set is divided into various small sets known as mini-batches to train the model. It attains a balance between evaluation and convergence complexity. Here, the input layers are normalized by scaling activations to maximize reliability and training speed.
-Residual blocks: It indicates shortcut connection amongst Conv layers. The input is unswervingly allocated to output only if input and output are of equal size.
Here, O and  signifies input and output residual blocks,  symbolizes mapping relation, and M  expresses dimension matching factor, and   .  signifies activation function.
-Linear classifier: After completion of Conv layer, linear classifier performs a procedure to discover noisy pixels using input features. It is the combination of softmax function and a fully connected layer.
Here,  express weight matrix, and  represent bias-figure 2indicatesstructural design of the deep residual network. Here, the output is represented as  that assists in classifying the texts.

Training of Deep residual network with Adams algorithm
The deep residual network training is performed with the Adams technique that assists in discovering the best weights for tuning the deep residual network for classifying text. The optimal weights are produced with the Adams method, which assists in tuning the deep residual network for generating the best results. Adam [11] represents a first-order stochastic gradient-based optimization that is extensively adapted to a fitness function that changes for attributes. The major implication of the method is computational efficiency and fewer memory needs. Moreover, the problems associated with the non-stationary objectives and the subsistence of noisy gradients are handled in an effective manner. In addition, Adam contains the following benefits. Here, the magnitudes of updated parameters are invariant in contrast to rescaling of gradient, and step size is handled with a hyperparameter that works with sparse gradients. In addition, the Adams is effective in performing step size annealing. The classification of text employs a deep residual network for texts. The steps of Adam are given as:

Step 1: Initialization
The foremost step represents bias corrections initialization wherein l q signifies corrected bias of first moment estimate and l m represents corrected bias of second moment estimate.

Step 2: Discovery of error
The error of bias is computed to choose the optimum weight for training the deep residual network. Here, the error is termed as an error function that leads to an optimal global solution. The function is termed as a minimization function and is expressed as, Where f signifies total data,  symbolizes output generated with Deep residual network classifier, l O indicates expected value.

Step 3: Discovery of updated bias
Adam is used to improving convergence behavior and optimization. This technique generates smooth variation with effectual computational efficiency and lower memory requirements. As per Adam [11], the bias is expressed as, The corrected bias of second order moment is represented as,

Step 4: Determination of best solution:
The best solution is determined with error, and a solution having a better solution is employed for classifying text.
Step 5: Termination: The optimum weights are produced repeatedly till utmost iterations are attained. Table  1 describes the pseudocode of the Adams technique.

Dynamic learning with proposed RIWO-based deep residual network
For incremental data B , dynamic learning is done using the proposed RIWO-based deep residual network. Here, the assessment of incremental learning with the developed RIWO-based deep residual network is done to achieve effective text classification with the dynamic data. The deep residual network is trained with developed RIWO for generating optimum weights. The developed RIWO is generated by integrating ROA and IWO for acquiring effective dynamic text classification.

Architecture of deep residual network
The model of the deep residual network is already explained in section 3.4.

training of deep residual network with proposed RIWO
The training of deep residual networks is performed with developed RIWO, which is devised by integrating IWO and ROA. Here, the ROA [28] is motivated by the behavior of rider groups, which travel to attain a common target position to turn out to be a winner. In this model, the riders are chosen from the total riders of each group. Hence, it is concluded that this method produces enhanced accuracy of classification. Furthermore, the ROA is effective and follows the steps of fictional computing for addressing optimization problems but contains less convergence. IWO [27] is motivated by colonizing characteristics of weed plants. The technique provided a fast rate of convergence and elevated the accuracy. Hence, the integration of IWO and ROA is carried out to enhance complete algorithmic performance. The steps present in the method are expressed as:

Step 1) Initialization of population
The preliminary step is algorithm initialization, which is performed using four-rider groups provided by A and represented as, where,  A signifies th  rider, and  is total riders.

Step 2) Determination of error:
The computation of error is already described in equation (19).

Step 3) Update riders position:
The rider position in each set is updated for determining the leader. Thus, the rider updates position using a feature of each rider is defined below. The update position of each rider is expressed as, As per ROA [28], the update overtaker position is used in the update to increase the rate of success by determining the position of overtaker and is represented as, The attacker contains a propensity to grab the position of leaders and given by, The bypass riders contain a familiar path, and its update is expressed as, (29) where  symbolize random number,  signifies arbitrary number amongst 1 to P ,  denote an arbitrary number in 1 to P and  express arbitrary number between 0 and 1.
The follower poses a propensity to update position using leading rider position to attain target and given by, where, h is coordinate selector, L A indicate leading rider position, L represent leading rider index, Substitute equation (33) in equation (31), The final update equation of the proposed RIWO is expressed as,

Step 4) Re-evaluation of the error:
After completing the process of update, the error of each rider is computed. Here, the position of the rider who is in the leading position is replaced using the position of the new rider generated such that the error of the new rider is less.

Step 5) Update of Rider parameter:
The rider attribute update is imperative to determine an effectual optimal solution using error.

Step 6) Riding Off time:
The steps are iterated repeatedly till time attains off time OFF N , in which the leader is determined.  (27) Update attacker position with equation (28) Rank riders using error with equation (19) Choose the rider with minimal error Update steering angle, gear, accelerator, and brake

End for End while End
Hence, the output produced from the developed RIWO-based deep residual network is  , which helps to classify the text data considering dynamic learning that helps to classify the dynamic data. Here, fuzzy bounding is employed for remodeling the classifier if an error of previous data with respect to present data is high.

Fuzzy theory
Whenever incremental data is added to the model, the error is evaluated, and weights are updated without using the previous weights. If the error evaluated by the present instance is less than the error of the previous instance then, the weights are updated based on the proposed RIWO algorithm; else, if the error computed by the current instance is more than the error of the previous instance, then the classifiers are remodeled by setting a boundary for weight using fuzzy theory [1] and then, choose optimal weight using proposed RIWO algorithm. On arrival of data 1  i d , the error 1  i e will be computed, and that will be compared with that of previous data i d . If 1   i i e e then prediction with training based on RIWO is made else the fuzzy bounding based learning will be done by bounding the weights, which is given as,  is weight at current iteration, and s F signifies fuzzy score. For the dynamic data, extract the features } {F . Here, the membership degree is given as, when the highest iteration is attained, the process is stopped.

Results and Discussion
The competence of the technique is evaluated by analyzing the techniques with various measures like TPR, TNR, and accuracy. The assessment is done by considering mappers=3, mappers=4, and by varying the chunk size.

Experimental setup
The execution of the developed model is performed in PYTHON with windows 10 OS, Intel processor, 4GB RAM. Here, the analysis is performed by considering the NSL-KDD dataset.

Dataset description
The dataset adapted for text classification involves Reuter and 20 Newsgroups database and is explained below.

20 Newsgroups database:
The 20 Newsgroups data set [24] is contributed by Ken Lang for the newsreader to extract the Netnews. The dataset is established by collecting 20,000 newsgroup data, which is split amongst 20 different newsgroups. The database is popular for analyzing text applications for handling machine learning methods, like clustering and classification of text. The dataset is organized into 20 different newsgroups, wherein each indicates different topics.

Reuter database:
The Reuters-21578 Text Categorization Collection Data Set [25] is contributed by David D. Lewis. The dataset comprises documents that occur on Reuters newswires in 1987. The documents are arranged and indexed based on categories. The count of instances of the dataset is 21578 has five attributes. The count of websites attained by dataset is 163417.

Evaluation metrics
The efficiency of the developed model is examined by adopting measures like accuracy, TPR, and TNR.

accuracy:
It is described as the measure of data that is precisely preserved and is expressed as, where P signifies true positive, Q symbolizes true negative, H denote true false positive, and F is a false negative.

TPR
TPR refers ratio of the count of true positives with respect to the total number of positives.
where, P refers true positives, H is the false negatives.

TNR
The TNR refers to the ratio of negatives that are correctly detected.
where Q is true negative and F signifies false positive.

Comparative analysis
The assessment of the proposed technique is performed by adopting certain measures, like accuracy, TPR, and TNR. Here, the analysis is performed by considering two datasets, namely Reuter dataset and the 20 Newsgroup dataset. In addition, the assessment of techniques is performed by considering the mapper size=3 and 4.

Analysis with Reuter dataset
The assessment of techniques with the Reuter dataset considering TPR, TNR, and accuracy parameters is described. The assessment is done with mapper=3 and mapper=4 by varying the chunk size. The assessment of techniques with accuracy, TPR, and TNR measure considering with Reuter dataset using mapper=4 is described in figure 4. The assessment of techniques with TPR is displayed in figure 4a). For chunk size=3, the TPR evaluated by LSS-CNN is 0.754, RNN is 0.768, SLKNN+MLKNN is 0.792, BPLion+LFNN is 0.810, and the proposed RIWO-based deep residual network is 0.828. Likewise, for chunk size=6, the TPR evaluated by LSS-CNN is0.810, RNN is 0.820, SLKNN+MLKNN is 0.824, BPLion+LFNN is 0.826, and the proposed RIWO-based deep residual network is 0.850. The assessment of techniques with TNR measure is depicted in figure 4b). For chunk size=3, the TNR evaluated by LSS-CNN is 0.839, RNN is 0.860, SLKNN+MLKNN is 0.863, BPLion+LFNN is 0.896, and the proposed RIWObased deep residual network is 0.925. Likewise, for chunk size=6, the TNR evaluated by LSS-CNN is 0.855, RNN is 0.856, SLKNN+MLKNN is 0.876, BPLion+LFNN is 0.900, and the proposed RIWO-based deep residual network is 0.940. The assessment of the method with accuracy measure is displayed in figure  4c). For chunk size=3, the accuracy evaluated by LSS-CNN is 0.837, RNN is 0.843, SLKNN+MLKNN is 0.846, BPLion+LFNN is 0.862, and the proposed RIWO-based deep residual network is 0.880. Likewise, for chunk size=6, the accuracy evaluated by LSS-CNN is 0.833, RNN is 0.849, SLKNN+MLKNN is 0.852, BPLion+LFNN is 0.868, and the proposed RIWO-based deep residual network is 0.887. The performance improvement of LSS-CNN, RNN, SLKNN+MLKNN, BPLion+LFNN with respect to the proposed RIWObased deep residual network considering accuracy are 6.087%, 4.284%, 3.945%, and 2.142%.

analysis with 20 Newsgroup dataset
The assessment of techniques using 20 Newsgroup datasets with TPR, TNR, and accuracy parameters is elaborated. The assessment is done with mapper=3 and mapper=4 by altering the chunk size.  Table 3 presents an assessment of techniques

Conclusion
A technique is presented for text classification in big data considering the MapReduce model. The purpose is to provide a hybrid optimization-driven deep learning model for text classification. Here, the pre-processing is carried out with stemming and stop word removal. In addition, the mining of significant features is performed wherein SentiWordNet features, contextual features, and thematic features are mined from input pre-processed data. Furthermore, the selection of the best features is carried out with Tanimoto similarity. The Tanimoto similarity examines the similarity between the features and selects the pertinent features with higher feature selection accuracy. Then, a deep residual network is employed for dynamic text classification. Here, the deep residual network is trained by the Adam algorithm. In addition, dynamic learning is carried out with the proposed RIWO-based deep residual network along with fuzzy theory for incremental text classification. Here, the training of the deep residual network is performed using the proposed RIWO. The proposed RIWO algorithm is the integration of IWO and ROA. The proposed RIWObased deep residual network outperformed other techniques with the highest TPR of 85%, TNR of 94%, and accuracy of 88.7%. In the future, other datasets can be employed to validate the feasibility of the developed model.