"An Automatic Text Summarization for Malayalam Using Sentence Extraction" by Renjith S R et al. is a research paper that describes a new way to extract and summarize text in Malayalam [1]. Five different steps in the method are explained and each one helps to make short, useful outlines. Steps in the process as mentioned are Preprocessing, Sentence Scoring, and Finding Similarity between Sentences, Ranking Sentences, and Making a Summary.
In the paper, "Complete Pre Processing Phase of Punjabi Text Extractive Summarization System," Vishal Gupta summed up text in Punjabi in 2010 and 2012, as shown in two separate studies. The extraction method was used to summarize both news stories and other types of texts [2]. There are two key steps to summarize news stories: pre-processing and processing. On the other hand, the TF-IDF method was used to summarize the text.
"Information Retrieval by Text Summarization for an Indian Regional Language" by Jagadish S Kallimani, Srinivasa K G, and Eswara Reddy B is a study paper that describes a text summarization method that is specially made for the Kannada language. This method was created based on the features of a program called AutoSum that summarizes. The method starts by reading a written piece that has been encoded in UTF-8 format. A vocabulary or tagging and analyzing the text are both ways to get the keywords out of the text. After that, lines are judged on a number of factors, including where they appear, how often they appear in the first line, their numerical value, and their relationship to the keywords that are pulled. The final report is made by choosing words based on how highly they are ranked, which effectively condenses all the information[3].
In their paper "Automatic Keyword Extraction from Dravidian Language", M. Hanuman Thappa, M. Narayana Swamy, and N. M. Jyothi describe a way to get keywords out of Dravidian languages like Tamil. Tokenization, an important step in natural language processing, is the first step in the process. Breaking the text into separate tokens, or words, is what tokenization does. Once the text has been tokenized, stop word removal is used to get rid of common words that don't have much meaning. This leaves behind a list of vocabulary words[4].
The authors of "Marathi e-Newspaper Text Summarization Using Automatic Keyword Extraction Technique" (ShubhamBhosale, Diksha Joshi, VrushaliBhise, and Rushali A. Deshmukh) created a program that can automatically pull keywords from Marathi e-newspapers for their study [5]. The Word Extraction Module and the Summarization Module are the two main parts of the method. A story from an e-newspaper is fed into the system's Word Extraction Module. After the piece is tokenized, any stop words are taken out. The steps in this process produce the keywords, which are then ranked by how important they are. In the Summarization Module, lines that have the buzzwords that were found are picked out and given a score. The top lines, which make up 10–40% of the original text, are the only ones that are shown.
"Test Model for Representation Rich for Semantic Hindi Text Graph using Abstractive Method" by ManjulaSubramaniam and VipulDalal (2015) used an abstractive method with advanced semantic graph techniques for the text summarization [6].
The authors M. HumeraKhanam and S. Srivanani in the paper “Text Summarization for Telugu Document"have showed how to use a frequency-based method to create a summary technique that works well for Telugu texts. To start the process, the raw text is tokenized, which means it is broken up into single words, or tokens. After that, stop words like adjectives and conjunctions are taken out of the text. After that, the frequency of each word that is still there is found. Sentences with words that are used a lot are taken out and put into the end report. This plan tries to make the paper shorter by giving more weight to key words that show up a lot in the text[7].
In the paper "Bhasa: A Corpus-Based Information Retrieval and Summarizer for Bengali Text" Md. Tawhidul Islam and Shaikh Mostafa Al Masum introduced Bhasa, an innovative tool designed for Bengali text analysis. Bhasa is a corpus-based search engine and summarizer, leveraging advanced techniques for document indexing and information retrieval. It includes a vector space retrieval method to efficiently process user queries and deliver desire results [8].
In the paper "Bengali text summarization by sentence extraction" the author Kamal Sarkar used the harvesting approach and TF-IDF algorithms [9].
In the paper "Topic Based Bengali Opinion Summarization" Amitava Das and Sivaji Bandyopadhyayused the harvesting method, which included the k-means algorithm and the page rank standard method [10].
"An Extractive Approach of Text Summarization of Assamese Using WordNet" by Chandan Kalita, Navanath Saharia, and Utpal Sharma describe a text summary method that is especially useful for Assamese works [11]. The method has three key parts: preprocessing, guessing how many groups there are, and making a summary. Assamese WordNet is used to figure out how semantically close two words are during the preprocessing step. A resemblance function is also used to find out how similar two sentences are in terms of structure and meaning. Moving on to the next step, they figure out how many groups there are by looking at how many themes are in the text. The writers use the K-means method to group the words together so that a summary can be made. Then, they pick the most important lines from each group. After that, they figure out how much the unselected words match up with the title of the paper. Sentences that sound like the title have been taken out. At end, the words are put into groups based on how often they appear in the input and are shown in the summary.
"Query–Based Extractive Text Summarization for Sanskrit" by Siddhi Barve, Shaba Desai, and RaziaSardinha is a study that describes a query-based method for summarizing texts that is especially made for Sanskrit[12]. There are three main parts to the system: preparation, sentence extraction, and grading. Both very short and very long words are taken out of the text by the preprocessing tool. First, the text is tokenized, and then stop words are taken out. The words are broken down into their parts, and each one is put into one of two groups: compounds and sandhis. In sentence extraction, there are three ways to get the words out. Some of the methods used are Average TF-ISF, vector space model, and graph-based approach with PageRank. Using a query and these three methods together will give you sentences that answer the question. A question could be any word that is not broken up. At end, the sentences with the highest scores are picked to make the summary. The vector space model and PageRank did better than the average TF-ISF after being looked at.
The authors Jikitsha Sheth and Bankim Patel "Saaraansh: Gujarati Text Summarization System" [13] made a text summarizer called Saaraansh that is especially for Gujarati text The major contribution of the work is Dhiya stemmer, a GujStringSimilarity module, a stem weightage module, a LexRank module, an anaphora resolver, and a sentence generator. The writing is broken up into words, and each phrase has its own number. The text is broken up into separate lines, and words that are used a lot but don't mean much are found and removed. The GujStringSimilarity tool figures out how similar two words are in terms of their meaning. The words are put into a standard style, and then the Gujarati stemmer, Dhiya is used to look at them. In the weightage section of STEM, the TF-IDF for each line is found. The cosine relationship between two sentences is found by the LexRank module. It is possible to make a graph where an edge between two nodes shows how alike they are. The LexRank algorithm sorts the words in the graph and pulls out the ones that belong in the summary. The anaphora resolution is done by comparing the shortened text to the original. In turn, this makes the process of choosing words for the summary better.