Hindi Text Summarization Using Sequence to Sequence Neural Network

Text summarizing reduces a large block of text data to a precise, short, and intelligible text that conveys the whole meaning of the actual text in a few words while maintaining the original context. Due to a lack of relevant summaries, it is hard to understand the main idea of the document. Text summarization using the abstractive technique is well-studied in English, although it is still in its infancy in Indian regional languages. In this study, we investigate the effectiveness of using a sequence-to-sequence (Seq2Seq) neural network based on attention and its optimization for text summarization for the Hindi language (HiATS), explicitly comparing the Adam and RMSprop optimizers. Our method allows the model to take the Hindi language dataset and, as output, provides a concise summary that accurately reflects the gist of the original text. The performance of the models will be evaluated using Rouge-1 and Rouge-2 metrics.


INTRODUCTION
Text summarization for Hindi (a low-resource language) is a crucial task in natural language processing.The limited availability of text data and resources for Hindi has made it challenging to develop accurate and reliable text summarization models.According to Statista [ 1 ], Hindi is the third most widely spoken language in the world, with over 602.2 million speakers, and is considered a low-resource language for natural language processing tasks.Additionally, the complex grammatical structure and different scripts of the Hindi language are significant challenges for developing effective text summarization models for Hindi.
One of the challenges while working with the Hindi language is the need for high-quality annotated data for training [ 2 ].The available datasets for Hindi are limited in size and quality compared to other widely spoken languages.This makes it challenging to train deep learning models that can accurately capture the nuances of the language.Hindi has a rich vocabulary with many words borrowed from Sanskrit and other languages, and the meaning of a word can often be 239:2 N. Kumari and P. Singh context-dependent [ 3 ].Finally, Hindi has a different sentence structure than English, makings it challenging to adapt pre-existing language models designed for English to the Hindi language [ 4 ].
Here is an example of a simple Hindi sentence: " " (main khaana khaata hoon.)[I eat food.]In this sentence, " " (main) [I] is the subject, " " (khaana) [food] is the object, and " " (khaata hoon) [eat] is the verb.Another vital feature of Hindi sentence structure is using postpositions instead of prepositions.In English, prepositions show the relationship between nouns and other parts of the sentence.In Hindi, postpositions are used similarly.For example, take an example, " " (main bazaar ko ja raha hoon.)[I am going to the market.].In this sentence, " " (baazaar) [market] is followed by the postpositthe ion " " (ko) [to], which shows the relationship between the subject and object.Hindi also has a flexible word order, which means that the order of words in a sentence can be changed for emphasis or to convey different meanings.For example, the sentence " " (usane mujhe ek kitaab dee) [He gave me a book] could be rearranged as " " ( ) [he gave me a book] to emphasize the fact that the speaker was the recipient of the book.Overall, the sentence structure of Hindi is unique and different from other languages, making it challenging to adapt pre-existing language models to the Hindi language.However, it also offers a rich and expressive means of communication for those who speak and understand the language.
Seq2seq neural networks based on the attention model have been recognized as a powerful approach to text summarization in recent years [ 5 ].These models utilize an encoder-decoder architecture, where the encoder processes the input text and extracts a condensed representation of its meaning.At the same time, the decoder generates the summary based on this representation.While these models have demonstrated success in high-resource languages, their effectiveness in low-resource languages, like Hindi, still requires further investigation.Extractive summarization can be exemplified by a highlighted summary, whereas an abstractive outline can be demonstrated by a handwritten summary [ 6 ].
This study aimed to contribute to the text summarization field by focusing on the Hindi language.The system is novel because it addresses the challenges of text summarization in Hindi, which has received less attention than in English.The study contributes to creating methods specifically adapted to the distinctive linguistic properties of Hindi by examining the efficacy of abstractive summarization for Hindi.A sequence-to-sequence ( Seq2Seq ) neural network with attention mechanisms generates concise and accurate summaries.Furthermore, the explicit comparison of Adam and RMSprop optimizers provides valuable insights.The HiATS system, which is dedicated to Hindi abstractive text summarization, represents a novel contribution to the field.Figure 1 is the pictorial representation of the general architecture of the text summarization.
In this research article, the main contribution is to investigate the effectiveness of using a Seq2Seq neural network using an attention model for text summarization in Hindi.We will focus on evaluating the performance of the Seq2Seq model based on the attention model using two different optimizers: Adam and RMSprop.We have focused on abstractive text summarization techniques for the Hindi language.These are two commonly used optimizers in deep learning, each with unique properties.Adam is known for its fast convergence and ability to handle sparse gradients, while RMSprop is known for its robustness to noise in the slopes.By comparing the performance of these two optimizers on the text summarization task in Hindi, we aim to understand which optimizer is better suited for this task.We have compared our system-generated summary with the other models based on the same models, such as BART [ 7 ] and T5 [ 8 ].The proposed research will be conducted on a dataset of texts in Hindi, and the performance of the models will be evaluated using Rouge-1 and Rouge-2 metrics.

RELATED WORK
Single document summarization involves the generation of a summary from a single document.Several evolutionary-based techniques have been applied to find summarization, including the Hybrid fuzzy Genetic Algorithms ( GAs ) and Genetic Programming ( GP ) approach [ 9 ], fuzzy logic with Particle swarm optimizer based method [ 10 ], Differential Evolution-Cluster-based method [ 11 ], Fuzzy Logic and Evolutionary Algorithms based method [ 12 ], Modified Discrete Differential Algorithm [ 13 ], GA based method [ 14 ] semantic similarity-based method [ 15 , 16 , 17 ], and ontology similarity measure [ 18 ].Topic-based summarization techniques, such as topic themes [ 19 ], Topic signatures automated, Lexical Ranking [ 20 ], and Graph-based Summarizer (GRAPHSUM) [ 21 ], have also been proposed to find a summary of a single document.Mann and Thompson proposed Rhetorical Structure Theory ( RST ) for document summarization [ 22 ].
Hybrid machine learning-based models, including Support Vector Machine ( SVM ), Naïve Bayes classification, Mathematical Regression, Decision trees, and Neural networks (Multilayer Perceptron), have been applied for summary generation [ 23 ].Clustering has also been used for summary generation, along with GA-based methods [ 24 ].Recent advancements in deep learning have led to the development of state-of-the-art abstractive summarization models.Some of these models include Pointer-Generator Networks [ 25 ], Transformer-based models [ 26 ], and BERTbased models [ 27 ].These models use an encoder-decoder architecture with attention mechanisms to generate summaries not limited to the input document's sentences.Instead, they can generate novel sentences that convey the document's key information.
Multi-document summarization ( MDS ) involves generating summaries from multiple documents.Taner Uçkan et al. proposed an extractive generic summarization of text documents using a maximum independent set [ 28 ].A generative adversarial network has been proposed to reduce redundancy in the text [ 29 ].Xiaojun et al. proposed a novel extractive approach based on manifold ranking [ 30 ]. Thematic-based approaches, frequency and position information [ 31 ], the use of SVM [ 32 ], sentence compression [ 33 ], and semantic-based sentence ordering approaches [ 34 ].In recent years, machine learning-based models have gained popularity in text summarization, including SVM, Naïve Bayes classification, mathematical regression, decision trees, neural networks, and clustering techniques [ 35 ].
In Table 1 , we have surveyed text summarization based on its types specifically for the Indian languages.For this survey, we have considered five widely spoken languages i.e., Hindi, Tamil, Bengali, Telegu, and Marathi.
State-of-the-art techniques in abstractive text summarization include transformer-based models such as BERTSUM [ 55 ], MASS [ 56 ], and PEGASUS [ 57 ], among others.These models use attention mechanisms to generate summaries that capture the context and semantics of the input text.One promising approach is neural network-based models that generate summaries that are more fluent and natural compared to traditional methods.Zong et al. proposed a new approach called UniLM [ 58 ], which employs a transformer-based architecture that can generate summaries of varying lengths with improved coherence and fluency.In another study, Yonghua et al. introduced the Concept-Flow framework, which uses a graph-based neural network to capture important concepts in the source document and generates summaries based on these concepts [ 59 ].Tianshui et al. proposed a new approach called SARL [ 60 ], which uses a self-attention mechanism to focus on important parts of the source document and reinforce the generation of high-quality summaries.Recently, there has been growing interest in unsupervised methods for text summarization.[ 61 ] proposed a novel unsupervised method that leverages a pre-trained language model to generate summaries without needing labeled data.In conclusion, text summarization is an active area of research, with many new and exciting developments in recent years.These include using neural network-based models, reinforcement learning ( RL ), unsupervised methods, multi-document summarization, and more interpretable and explainable methods.These developments are expected to significantly advance the state-of-the-art in text summarization in the years to come.

METHODOLOGY
In this section, we have described our dataset, pre-processing techniques, word embedding technique, Seq2Seq neural network, and optimizers.

Problem Definition
We have a dataset of Hindi articles and their corresponding reference summaries.Each input document can be represented as X = {x1, x2, . . ., xn} and has a corresponding reference summary Y = {y1, y2, . . ., yr}, where n and r denote the lengths of the source document and the gold summary, respectively.We aim to implement a seq2seq neural network based on an attention model called Hi-ATS that takes the source article X as input and produces a target summary Z = {z1, z2, . . ., zm}, where m is the length of the target summary.

Dataset
In this study, we have used the dataset (as described in Table 2 ) of three hundred Hindi news articles and their corresponding summaries as a ''ref-summary'' generated manually.The dataset is divided into three categories: Bollywood, politics, and religion.The text files and summaries are in Unicode Devanagari script, which is the standard writing system for Hindi.Then, these files are merged to form a CSV file, which is finally fed to the model.

Pre-Processing of Data
The collected dataset will be pre-processed to remove any irrelevant information and to ensure that the data is in a format that the encoder-decoder models can use.This will include cleaning the text, tokenizing the text, and applying any necessary data augmentation techniques.It's important to note that pre-processing is not only about cleaning and tokenizing the text but also about Average sentences per re-summary 5 understanding the language and its structure and using specific tools for that language.Table 3 shows the working of pre-processing and vocabulary formation with the help of an example.

Word Embedding
Word embedding is a technique used to represent words as numerical vectors, which can be used as input to neural network models.In the context of text summarization for Hindi languages, word embeddings can represent the meaning of words in a low-dimensional vector space, which can then be used as input to an encoder-decoder model.Non-contextual embeddings are generated by representing each word with a fixed-size vector trained on large amounts of text, without considering the context in which the word appears.These embeddings are useful for tasks that do not require understanding the word's meaning in the context of a particular sentence or document, such as named entity recognition or partof-speech tagging.One popular method for generating non-contextual embeddings is Word2Vec [ 62 ], which learns embeddings by predicting the likelihood of a word given its surrounding words (skip-gram model) or predicting the surrounding words given a central word (continuous bagof-words model).The Word2Vec algorithm trains a neural network to predict the probability of a word given its context, using a window of surrounding words.The embeddings are then generated by extracting the weights of the hidden layer of the neural network.Another method for generating non-contextual embeddings is GloVe [ 63 ], which learns embeddings by factorizing the co-occurrence matrix of words.The co-occurrence matrix counts how often each word appears with every other word in a large text corpus.GloVe factorizes this matrix to obtain embeddings that capture global and local statistics of word co-occurrences.
On the other hand, contextual embeddings capture the meaning of a word in the context of a particular sentence or document.These embeddings are useful for tasks that require understanding the word's meaning in contexts, such as sentiment analysis, question answering, and machine translation.One popular method for generating contextual embeddings is ELMo [ 64 ], which learns embeddings by training a bidirectional LSTM on a large corpus of text.The embeddings are generated by concatenating the hidden states of the LSTM at each position in the input sentence.Because the LSTM is bidirectional, it can capture both the preceding and following context of each word.

Word2Vec
Embedding.The mathematical foundation for word2vec embedding involves using a neural network to learn continuous word representations from large datasets.The model is trained on a corpus of text to predict the likelihood of a word occurring within the context of other words.The result is a set of high-dimensional vectors, each representing a word in the corpus.During the training of the Word2Vec model, the input layer is typically fed with one-hot vectors representing the target word.In contrast, the output layer predicts the probability distribution of the context words [ 65 ].The goal of the model is to maximize the likelihood of observing the context words in the corpus, which is achieved by minimizing the cross-entropy loss between the predicted probability distribution and the actual distribution of context words.To do this, the model is trained using stochastic gradient descent ( SGD ), where the gradients of the loss function for the model parameters are computed for each training example, and the weights are updated accordingly.The final result of this training process is a set of word embeddings, where each embedding is a high-dimensional vector representing a word in the corpus.The need for word embedding can be understood with an example.Consider two sentences: ' ' (aapaka din shubh ho) [have a nice day] and ' '(aapaka din shaanadaar ho) [have a wonderful day] .Meaning of the both sentences is hardly different.Constructing a vocabulary V will have { , , , , '} ( , , , , ).For one hot encoding, the vocabulary size here will be V ( = 5).Representation of a particular word is like " = [1,0,0,0,0]"; " = [0,1,0,0,0]" and so on.While encoding, it represents a six-dimensional space where each word represents one dimension, a form where one can infer that from the above sentences " & " are as different as " & ".Both sentences are hardly different meanings.

Sequence-To-Sequence (Seq2Seq) Modeling
Seq2Seq models are popular text summarization approaches that consist of an encoder and a decoder, an attention model based on neural network architecture.The encoder processes the input text and extracts a compact representation of its meaning, while the decoder generates the summary text based on this representation.One variation of the Seq2Seq model is the bidirectional LSTM architecture, where two LSTM layers are used in the encoder and decoder parts of the model.The first LSTM layer processes the input text in a forward direction, while the second LSTM layer processes the input text in a backward direction.The outputs of these two LSTMs are then concatenated and used as the input to the next layer of the model.This approach can improve the model's performance by capturing the context of the input text from both directions [ 66 ]. Figure 2 depicts the working of the encoder-decoder architecture.
In this research article, the proposed methodology will involve training a double LSTM encoderdecoder model for text summarization on a dataset of texts in Hindi.The model's performance will be evaluated using Rouge-1 and Rouge-2 metrics and compared to other text summarization methods.The model can be fine-tuned and optimized by adjusting the parameters of the LSTM layers and the embedding layer, such as the number of hidden units, the dropout rate, and the learning rate.

Encoder.
It reads the whole input by feeding the encoder one word at a time.It gathers the information at every time stamp incorporated within the input sequence.
Here, x_i is the Hindi word at time step I, h_i is the hidden state of the first LSTM layer in the encoder, and c_i is the context vector generated by the second LSTM layer in the encoder.The first LSTM layer processes the input Hindi word x_i and generates a hidden state h_i for each word.The secret state h_i is then passed to the second LSTM layer, which produces a context vector c_i representing the meaning of the input Hindi text.

Decoder.
Based on the previous word, the decoder predicts the next possible word.A start and an end token are added along with the sentence to know where it starts and ends.
The decoder processes the context vector c_i and generates the summary text.The first LSTM layer in the decoder processes the context vector c_i and generates a hidden state s_i for each word in the summary.The secret state s_i is then passed to the second LSTM layer, which produces the output Hindi word y_i.

Optimizers
Optimization algorithms, also known as optimizers, are used to update the parameters of a model during training to minimize the loss function.Several different optimizers can be used in training neural networks, each with its strengths and weaknesses.In this section, I will explain some of the most commonly used optimizers and their mathematical foundations for text summarization in Hindi languages using the encoder-decoder mechanism.
Recent literature on optimizers for text summarization focuses on applying deep learning-based approaches.Optimization techniques such as Adam [ 67 ], Adagrad [ 68 ], Adadelta [ 69 ], and RM-Sprop [ 70 ] have been applied to improve the performance of neural network-based summarization models [ 71 , 72 , 73 ].Recent studies have also explored the effectiveness of new optimization techniques such as the AdaBelief optimizer in improving the performance of summarization models.Additionally, recent research has focused on adapting RL techniques such as policy gradient-based methods for text summarization.RL-based methods have been shown to improve the performance of summarization models by optimizing the tradeoff between relevance and informativeness of the summary.Overall, recent research on optimization techniques for text summarization is aimed at improving the performance of deep learning-based models by optimizing the objective functions with various optimization techniques.Adam (Adaptive Moment Estimation) is an optimization algorithm that Kingma and Ba first proposed in 2014.It is an extension of the classic SGD algorithm and combines the benefits of both SGD and Root Mean Square Propagation ( RMSprop ) algorithms [ 67 , 70 ].Adam uses the model's parameters' gradients to update the parameters so that the model's performance on the training set improves over time.Adam uses the concept of adaptive learning rates, which means that the learning rate for each parameter is adjusted dynamically based on historical gradient information.This helps to overcome the problem of oscillation or divergence that can occur with a fixed learning rate.Adam also uses the concept of momentum, which helps to smooth out the optimization process and prevent the optimization from getting stuck in local minima.RMSprop is another optimization algorithm Hinton first proposed in 2012.It is an extension of the classic gradient descent algorithm.It is designed to improve the optimization process by using a moving average of the squared gradients to adjust the learning rate for each parameter.

RESULTS
This section is divided into three parts.First, to show the impact of pre-processing techniques for text summarization.Second, to compare the results of our system-generated summary with the other existing systems.In the last part, we have shown results to show the impact of different optimizers on the text summarization model.The performance of the system has been evaluated using Rouge-1 and Rouge-2.All results are analyzed and shown in the Figure 3 .

Impact of Pre-Processing for Text Summarization
Tokenization, stop word removal, stemming, and lemmatization are standard pre-processing techniques used in natural language processing to improve the quality of text data before feeding it to a summarization model.These techniques can be applied to the Hindi language to improve the quality of summaries generated by text summarization models.To know and understand the need for pre-processing of text, we have produced a summary in two ways, one by applying preprocessing techniques (Summary_With_Preprocessing) and the other one by commenting on the pre-processing functions (Summary_WithOut_Preprocessing).
Based on these results shown in Table 4 , we can see that summary with pre-processing techniques has higher accuracy than a summary without pre-processing techniques.Therefore, pre-processing techniques such as tokenization, stop word removal, stemming, and lemmatization can improve the quality of summaries generated for Hindi text.

HiATS Performance Evaluation
We have compared the results of our system-generated summary with the other models implemented using BART and T5.From Tables 5 , 6 , and 7 , we can conclude that our system matches and, many times, is ahead of the baseline summarizers.

Impact of Adam and RMSProp on Text Summarization
Adam and RMSProp are the parameters used to optimize the parameters of the network.The learning rate was set to be 0.001, along with the number of epochs while training was 50.We have developed the compression ratio to fifty percent.The results of the model using Adam and thee.tabhee use ek jangalee tota ne dekh liya aur bola, hamesha cheentiyon ke saath mastee karate rahatee ho.kya tumhen pata hai ki tum jangal ke andar kyon ho? tumhen jangal ke baahar nikalana chaahie, vahaan sukh aur aanand hoga.cheentee ne use dhyaan se suna aur jangal ke baahar nikalane ka phaisala kar liya.") ["Once upon a time, an ant was roaming in the forest.The ant was very happy with his life inside the forest.She was having fun sitting with some of her friends.Then a wild parrot saw her and said, 'You always have fun with ants.Do you know why you are inside the forest?You should come out of the forest, there will be happiness and joy.' The ant listened to him carefully and decided to get out of the forest."]Summary: " , " ("ek cheentee ko ek jangalee tota ne bataaya ki vah jangal ke baahar nikalana chaahie, tab vah jangal se baahar nikalee.") ["An ant was told by a wild parrot that it should come out of the forest, then it came out of the forest."]Ambiguity: The summary does not fully capture the internal conflict and decision-making process that the ant went through before deciding to leave the jungle.
Unusual sentence structure: The summary uses a simple sentence structure, which may not fully convey the emotional weight of the ant's decision.
Named entity recognition failure: The summary does not mention the presence of the ant's friends or convey the joy they experienced together, which is a significant part of the original document.
These issues highlight the importance of carefully selecting the most vital information to include in a summary, and the need to consider the context and relationships between characters in the original text.

ALGORITHM 1 :
Pre-Processing of the Dataset Input: An unstructured dataset Output: Structured and cleaned dataset 1: Tokenization 2: Punctuation Mark Removal 3: Stop Word Removal 4: Removal of HTML tags 5: Removal of URLs

3. 6 . 1
Adam and RMSProp.Adam and RMSprop are popular optimization algorithms widely used in deep learning and neural network models.Both are gradient-based optimization algorithms designed to update the model's parameters so that the model's performance on the training set improves over time.

Fig. 3 .
Fig. 3. Pictorial representation of the all the performance analysis: (a) performance of HiATS with preprocessing technique; (a) performance of HiATS without pre-processing technique; (c1), (c2) Comparison of HiATS with BART and T5 for the Bollywood dataset; (d1), (d2) Comparison of HiATS with BART and T5 for the Politics dataset; (e1), (e2) Comparison of HiATS with BART and T5 for the Religion dataset ; (f) (g) Comparison of HiATS using Adam and RMSProp; and (h) (i) K-Fold analysis of HiATS.

Table 2 .
Description of the Dataset Used in this Article

Table 3 .
Pre-Processing Steps and Vocabulary Construction of Abstractive Text Summarization