Over the last few years, Grammatical error detection and repair have received a lot of study attention. By providing a summary of the most recent study in the field, this chapter places the current study in perspective.
2.1 Early Approaches to Grammatical Error Correction
Hand-coded rules were used in early attempts at automatic error correction. The first commonly used grammar checkers relied on straightforward pattern matching and string replacement, such the Writer's Workbench [1], were based on simple pattern matching and string replacement. Other rule-based systems used manually created grammar rules and syntactic analysis. For instance, IBM's Epistle [2] and Critique [3] performed complete syntactic analysis, in contrast to Aspen Software's Grammatik [4] which both relied on fundamental linguistic analysis. For some sorts of errors, rule-based methods are frequently simple to implement and can be quite effective. This is why current grammar checking algorithms continue to make extensive use of them. Rules, however, may eventually become unworkable for some complicated faults and unmanageable. It is impossible to create rules for every possible error due to language's great productivity. Rule-based methods are thus frequently shunned as a general fix.
When extensive annotated resources first became available in the 1990s, researchers switched to data-driven methodologies and used machine learning techniques to create classifiers for particular mistake types [5]–[11]. Two mistake types—articles and prepositions—have received the majority of attention in research utilizing machine learning classifiers on a large diverse corpus and achieved an accuracy of 88%. This is because these errors are some of the most prevalent and difficult ones for ESL learners, and they are also simpler to correct using machine learning techniques than by manually writing rules. [9] used maximum entropy models to correct errors for 34 common English prepositions in learner text utilizing machine learning classifiers. This is because these errors are some of the most prevalent and difficult ones for ESL learners, and they are also simpler to correct using machine learning techniques than by manually writing rules [12]. A finite confusion set or candidate set, such as a list of English articles or prepositions, is defined for these closed-class errors and includes all potential repair possibilities. During the training process, examples used to train grammatical error detection systems, whether they are native or learner data, are represented as vectors of linguistic properties deemed relevant to the specific error type. These linguistic properties can include neighboring words, part-of-speech tags, grammatical relations (GRs), and dependency trees, among others. Using various machine learning methods, classifiers are trained based on these features. Once a system has been trained, it compares the most likely choice predicted by the classifier with the original term used in the text. This comparison allows the system to identify and rectify new errors. It is vital to create distinct classifiers for each type of error since the most beneficial characteristics frequently rely on the word class. [6] used a large, diverse dataset to train a maximum entropy classifier to identify article errors, and the accuracy was 88%. Maximum entropy models were employed by [9] to fix 34 typical English prepositional errors in learner material.
ESL learners' L1s frequently have a role in their mistakes [13]. Systems perform significantly better when their L1s are taken into account. For the purpose of fixing prepositional errors, [10] compared four linear machine learning classifiers. Results demonstrated that discriminative classifiers perform best, and that performance can be further enhanced by L1 adaptation. Rather than training individual classifiers for each specific native language (L1), the authors proposed the inclusion of language-specific priors using Naive Bayes (NB) models during the decision-making process. This approach involves incorporating prior knowledge specific to each language when making predictions. By utilizing NB models, which assume independence between features, the system can factor in language-specific characteristics to improve the accuracy of error detection and correction.
Techniques based on "classification by error type" have a drawback in that they primarily focus on local context and treat errors in isolation, assuming that there is only one error present and that all surrounding information is accurate. However, the errors made by language learners can often interact and combine in complex ways. This limitation restricts the practical applications of an error correction system that exclusively targets one specific type of error. For effective language learning, it is crucial to address the interplay and intricate relationships between different types of errors, rather than considering them in isolation.
Building several classifiers and then cascading them into a pipeline system is a frequent technique. Systems that repair multiple errors are frequently built using a combination of classifier-based and rule-based procedures [11], [14]. The order of the classifiers is important, and this type of solution is labor-intensive and requires numerous pre- and post-processing stages. Furthermore, it does not address the issue of interacting errors, and predictions made by different classifiers might not agree. The following is a typical illustration from [15]:
Example
electric cars is still regarded as a great trial innovation ...
Predictions made by a system that combines independently-trained classifiers: cars is → car are.
The issue of interacting errors has been addressed using a variety of strategies. [16] created a beam-search decoder to iteratively produce candidate sentences at the sentence level and score them using individual classifiers and a general LM rather than making decisions independently. The following five types of alterations were made to the five proposers to create new candidates: orthography, articles, prepositions, punctuation insertion, and noun number. The decoder showed superior performance compared to a pipeline system consisting of separate classifiers and rule-based processes, which indicated promising results. However, it should be noted that the decoder only addresses five specific types of errors. To cover a wider range of grammatical problems, additional modules or components need to be incorporated into the system. Designing these additional components can be challenging, as they must effectively handle various types of errors that may pose difficulties in correction. Nevertheless, expanding the system to include more proposers is necessary to enhance its capabilities and address a broader spectrum of grammatical issues. Additionally, the number of candidates increases exponentially as the type of errors (i.e., the quantity of proposers) and the length of the sentence increase. It is impossible to count all candidates, so creating a reliable decoder becomes challenging. [17] suggested a combined inference model to address contradictions brought about by different classifiers. Integer Linear Programming (ILP) was used to integrate the output of different classifiers with a set of linguistic constraints. These limitations were personally established and immediately written into the system. In order to take into account brand-new types of interacting errors, any additional constraints must be manually defined. Language uses the subject-verb and article-NPhead structures. [15] addressed by developing two joint classifiers. Instead of utilising two classifiers independently for each of the structures, a joint classifier simultaneously predicts two words that are a component of the same structure. The joint classifier, in contrast to the ILP model suggested by [17], does not require human-defined limitations because it may learn directly from the training data. On the other hand, it is more difficult to compile enough pairs of candidates to represent the necessary structures and use them as training data. One joint classifier can only target one form of interacting mistake, therefore new classifiers must be built for every new type of interaction. These classifier-based systems still rely on the individual classifier scores, making it time-consuming to train each classifier for all potential types of (interacting) errors.
Using n-gram LMs is a more comprehensive method for fixing many faults in ESL text [18], [19]. After being trained on a large number of precise phrases, a single model is used to assign probabilities to word sequences based on counts from the training data. In this approach, the target word sequence is replaced with substitutes drawn from a precompiled candidate set, and the LM scores for both the original text and the replacements are calculated. Whichever has the highest likelihood is chosen as the proper order. Correct word combinations should theoretically have high probabilities whereas incorrect or undetected ones should have low possibilities. Parts of a sentence with low scores are presumed to contain errors. No matter how large a training corpus is, it is impossible to cover every conceivable accurate word sequence in practice. Another problem is how to distinguish inappropriate word combinations from low-frequency ones. The LM technique is widely used in conjunction with other approaches to prioritize modification suggestions offered by other models. [18] coupled machine learning classifiers with an LM in addition to LM classifiers. [16] scored correction candidates in a beam-search decoder using an LM and classifiers.
Additionally, several initiatives have been made to address learner mistakes that are particularly challenging to find and fix. A linguistically motivated method to verb mistake correction was put out by [20]. In order to initially identify verb candidates in noisy learner text, their methodology coupled a rule-based system with a machine learning approach. From there, verb finiteness information was used to detect errors and identify the specific sort of error. The development of a computational approach for spotting plagiarism in ESL essays. [21] They proposed a technique to award high ratings to words and phrases that are likely to be repetitive inside a given sentence by comparing an ESL sentence with the output from commercial MT systems. Using compositional distributional semantic models, [22] accomplished mistake detection for adjective-noun and verb-object combinations in learner data for content word combinations.
2.2 Machine Learning-Based Approaches and Error Correction
It is possible to distinguish two basic types of grammatical error correction methods regarding English compositions: grammatical error correction methods based on rules and grammatical error correction methods based on statistics, when it comes to English compositions. The former is done by handwriting grammar rules on a piece of paper. A statistical model, such as n-grams, is used in English composition in order to correct grammatical errors that have been committed by the writer. There have been several early grammar correction tools that use rule-based approaches [23]. In spite of the fact that the Japanese language lacks articles and singular and plural nouns, English expressions tend to use these in their place. The issue of translation from Japanese to English must always be considered when translating the two languages. A literature study [24] proposes guidelines to determine whether a singular or plural noun should be added to the translated sentence based on its context. A recent study found that the accuracy rate of this method was 89 percent, which is in line with the results of the test. An approach based on rules offers many advantages over a rules-based approach. The addition, modification, or deletion of grammar rules is remarkably simple. Users can receive more specific and targeted feedback by adding grammatical explanations to each rule. A key feature of the system is its ability to debug a problem directly because the system provides prompt information in the rules. Rule bases are easier to write for linguists without programming skills or limited programming knowledge. It is difficult, however, to use corpus statistics in handwritten rules due to exceptions, as noted in the literature [25] .The use of statistics can also be used to correct grammatical errors if they are caused by a faulty sentence structure. Statistics-based grammar checking treats grammar checking primarily as a classification task, focusing mainly on article and preposition error checking [26]–[28] .Literature [29] typically uses vocabulary and parts-of-speech features for classification. Language model scores, neighboring words, and part-of-speech marks are included. Literature [30] also adds analytical features, resulting in a higher accuracy rate and recall rate for preposition error correction. Accordingly, the classification algorithm makes use of the maximum entropy algorithm, the voting perceptron algorithm [30], and the naive Bayes algorithm [31]. As well as ensuring that no errors occur in prepositions and articles, some attention has also been paid to verb form error checking [32], [33]. A classification algorithm's main advantage is its capability of correcting certain types of errors. Recently, some work has been done to correct various types of errors comprehensively. Various errors are detected using a high-order sequential labeling model in the literature [34]. An approach based on rules and syntactic n-grams is used in the literature [35]. In this document, syntactic ngrams contain syntactic information, unlike traditional n-grams. Historically, most grammatical errors were corrected by correcting articles and prepositions, since these are common errors made by non-native speakers. Grammar points such as clauses are also difficult to master, as they play an important role in English writing expression. Studying the Chinese learners corpus, we found that related word errors are the most common type of clause grammatical errors, and they are difficult to correct. Nevertheless, not much research has been conducted on automatic error correction for English clauses. An algorithm based on statistical machine translation is used in the literature [36]. The language model can then be used to correct all of the errors of CoNLL2014, including false errors. Machine translation also involves automatic grammar correction. For this reason, we need to find a solution, an automatic editing system is required to enhance the output of the machine translation system, since Japanese lacks articles, for instance. In order to choose the right article, the output sentence needs to be corrected when translating Japanese into English. Using these types of systems for grammatical correction, literature [26] improves machine translation by solving the article selection problem.
The different faults that ESL students make should be able to be corrected by a viable error correction system. In more recent studies, MT approaches have been effectively applied to fix a wider range of problems.
Text is automatically translated into a target language by MT algorithms from a source language. Error correction can therefore be seen as a special translation problem from grammatically incorrect statements into proper ones. Unlike traditional MT jobs, the source and target sentences are both in the same language, even when the source sentences could be grammatically incorrect. The correction mappings that MT-based GEC systems have learnt from parallel examples are used to generate a corrected version of the original (incorrect) sentence that fixes as many errors as possible.
2.3 Machine Translation-Based Methods
As a sequence-to-sequence (seq2seq) method, the machine translation-based approach can convert faulty sentences into the proper ones. Several strategies have been put out in recent years to enhance the effectiveness of grammatical error correction models based on machine translation. Reference[37] applies a seq2seq model to a pre-trained masked language model, such as BERT. A copy-augmented architecture for grammatical error correction is suggested in reference [38]. Using translation models to create more synthetic data for pre-training is the topic of reference [39]. Reference[40] explicitly introduces noise to back-translate standard sentences. Create a seq2seq model with a multi-layer convolution and attention mechanism for Chinese text according to [41]. A BiLSTM-based machine translation model is suggested in reference[42] to capture long-distance interdependency. The machine translation-based models continue to suffer from creating results from scratch despite the aforementioned improvements, which inevitably result in over-correction and generation errors.
2.4 Sequence Tagging-Based Methods
By defining the objective of correcting grammatical errors as a sequence tagging task, another line of study adopts a different perspective. These models often perform an edit to incorrect tokens and forecast a specified set of tags depending on the original phrase. According to reference [43], editions occur when a token or term is kept, dropped, or added to an existing lexicon. For a fixed number of iterations, reference [44] predicts token-level editions sequentially in a non-autoregressive manner. To produce more compact editions, reference [45] generates span-level tags. Reference [46] further develops the strategy by creating more precise editions based on Arabic lexical norms. while accomplishing grammatical error correction assignments in English reasonably effectively. In addition, rather than fixing grammatical problems, the majority of the models listed above concentrate on identifying them. Using BERT for text prediction, reference [47] adds a mask to the text places designated as missing mistakes for Chinese grammatical error repair. This method still needs a different correction model for grammatical error correction, though.
We employed the English grammatical error correction model GECToR [46] with a unique dynamic word embedding upgrade and a residual connection network to further increase the English grammatical error correction capacity. By training the model on an enhanced dataset, we increase the algorithm's capacity to handle complicated tags in the interim.
2.5 Machine Learning and Deep Learning Approaches
These algorithms can comprehend human language without being explicitly coded thanks to statistical techniques, and they are now commonly utilized in GEC tasks. Systems that use machine learning and deep learning begin by examining the training set in order to gain information and create their classifiers and rules. Deep learning algorithms are based on probabilistic outcomes, and their primary benefit is their learnability. They also don't require manual rules or grammar coding. Furthermore, viewing error correction as a translation process is a desirable and easier approach. The basic premise is that a statistical machine translation (SMT) system should be able to convert material written in "poor" (incorrect) English into "good" (correct) English. Studies like reference [48], [49], have used this technique. There have been numerous attempts to create a hybrid system that combines rule-based and deep-learning algorithms. These studies include [50], [51], which used a grammar-based parser for text-to-SQL translation and deep learning to supplement rule-based grammar by fixing the syntax and removing typos. Additionally, the earlier end-to-end GEC methods relied on recurrent neural networks, were manually created, and were frequently derived from short words (RNNs). Additionally, there are constraints on training models based on constrained error correction sentence pairings that prevent models from precisely correcting sentences. Additionally, it is typically inappropriate to use single-round grammar corrections on sentences that include many faults. Table.1 provides a summary of the literature review.
Table 1
A Glimpse of Previous Work
Reference | Model | Dataset | Accuracy |
[52] | Sequence to Sequence model | CoNLL-2014 Shared Task, JFLEG | F0.5: 61.15 (CoNLL-2014 Shared Task), GLEU: 61.0 (JFLEG)【11†source】 |
[53] | Multilayer Convolutional Encoder-Decoder Neural Network | CoNLL-2014 Shared Task, JFLEG | F0.5: 54.79 (CoNLL-2014 Shared Task), GLEU: 57.47 (JFLEG)【17†source】 |
[54] | Neural Network Translation Models | CoNLL 2014 test set | F0.5: 41.75*【40†source】 |
[55] | Neural Network models | CoNLL-2014 Shared Task | 81.6 (CoNLL-2014 Shared Task) |
[56] | Contextualized word embeddings | CoNLL-2014 Shared Task | F0.5: 52.2 (CoNLL-2014 Shared Task), GLEU: 52.3 (CoNLL-2014 Shared Task) |
[57] | Sequence to Sequence model | JFLEG | F0.5: 46.17 (JFLEG), GLEU: 42.83 (JFLEG) |
[58] | BERT-based models | CoNLL-2014 Shared Task | F0.5: 56.4 (CoNLL-2014 Shared Task) |
2.6 Shared Tasks on Grammatical Error Correction
Over the previous few years, four GEC shared tasks have provided participating teams with a place to compare results on shared training and test data. Participants are instructed to build their GEC systems in a few months utilizing any publicly available information and resources after receiving a fully annotated training set. The performance of the systems is then assessed for the participating teams using fresh data from a blind test. Within a few days after the test results being published, systems should be able to identify grammatical problems in material written by non-native speakers and deliver repaired versions. The organizers then assess the output of each system and offer the final rankings.
2.6.1 HOO 2011 & 2012
The Helping Our Own (HOO) shared tasks from 2011 and 2012 were the first in the NLP community to promote the use of NLP tools and methodologies for the development of automated systems that may help non-native authors in their writing [59], [60]. A collection of texts authored by non-native authors and culled from the ACL Anthology were sent to participants in the HOO-2011 shared task. All textual errors had to be automatically found and corrected. Based on the CLC coding scheme, errors were divided into 13 different error kinds. There were six teams involved in the work, and some of them excelled by concentrating solely on a small number of fault kinds.
Due to HOO-2011's difficulty, the HOO-2012 shared task only examined article and prepositional errors. The designated training set was the FCE dataset. There were 14 teams who participated, and the majority of them created machine learning classifiers. For evaluation in both HOO shared tasks, the P, R, and F-score between a system's edit set and a manually created gold-standard edit set were calculated.
2.6.2 CoNLL 2013 & 2014
The following two joint projects were completed in conjunction with the Conference on Computational Natural Language Learning (CoNLL). Three new error types—noun number (Nn), verb form (Vform), and subject-verb agreement (SVA)—were added to the HOO-2012 scope by the CoNLL-2013 shared effort [61]. This revised error list is more thorough and includes interaction faults in addition to article (ArtOrDet) and preposition (Prep) issues. Used as in-domain training data was NUCLE v2.313. 50 brand-new essays that were prepared in response to two prompts make up the test data. The training set likewise had one prompt, but the other was brand-new.
The M2 scorer was used to evaluate systems, and the organizers suggested restricting the maximum number of unmodified words per edit to three by setting the max unchanged words parameter to three. P and R were given equal weight when determining rankings using F1. As there was only one set of gold annotations at first, participating teams were later given the chance to offer different solutions (gold-standard modifications), as there are typically multiple appropriate fixes for numerous faults. This procedure was used for the shared HOO duties in 2011 and 2012. There were so two evaluation rounds, the second of which permitted alternate responses. Ng [61] pointed out that these new ratings tended to favor the teams that provided alternate replies. Therefore, they advised against using alternative replies in future evaluation in order to lessen bias. Ultimately, 17 teams took part in CoNLL-2013. A frequent strategy used by these teams was to create classifiers for each sort of mistake. Heuristic rules, MT, and LM were among more methodologies.
The CoNLL-2014 shared task [62] attempted to stretch GEC's limits once more by switching back to an all-errors correction task. Three significant changes in particular were made in comparison to CoNLL-2013: Two human annotators independently annotated the test writings, participating systems were required to fix all forms of grammatical errors, and the evaluation measure was altered from F1 to F0.5, prioritizing P over R. As official training data, NUCLE v3.0, a more recent version of the NUCLE corpus, was utilized. A new collection of 50 essays written by non-native English speakers was used as supplementary blind test data. The CoNLL-2013 test set is available for unrestricted training and/or improvement. The official scorer was once more the M2 scorer.
13 teams in all submitted work to CoNLL-2014. The majority of them created hybrid systems that integrated various techniques. LM and MT approaches were utilized for non-specific error type correction, whilst rule-based methods and machine learning classifiers were favored for correction of single error types.
2.7 Deep Learning-Based Approach
Two broad methods correct spelling errors based on context. As a first step, words that may be corrected are generated, and as a second step, the candidate word is related to the context and is used for determining the final correction. A typical method of generating candidate words determines the edit distance between the target word and its dictionary equivalent [63]. As a result, distance from the keyboard corresponding to the edit distance were taken into consideration in the candidate generation method [64], which is based on the environment for keyboard input and restricts candidate generation accordingly. An approach that uses contextual information [65] has recently been developed to eliminate the need for word comparisons. This study generated and searched error words using this method. 3-gram is candidate word generation method based on contextual information. A corpus of ten quadrillion English words was used to generate a variety of high-quality candidate words. Developing or selecting an appropriate correction language model is the next research objective. For the purpose of context-sensitive spelling error correction, both statistical and deep learning methods are used. As of now, statistical corrections have usually been used in conjunction with the noise channel model [63] or the n-gram-based language model [66]. Research on Korean statistical methods include smoothing, interpolation, and improving n-gram search structures based on the noisy channel model [64], [65], [67]. Deep learning has recently been used to develop a method of correcting words using recurrent neural networks and convolutional neural networks [68], [69], as well as word embedding [70]. Context-sensitive spelling errors have been neglected in recent years; however, the correction of documents can be of substantial benefit to researchers and writers of texts. This paper discusses context-sensitive spelling error correction for a variety of words in a variety of documents. We chose an unsupervised deep learning model to solve context-sensitive spelling errors because it is difficult to obtain correct answers to all spelling errors. Our method uses a variety of deep learning language models, the purpose of this is to correct context-sensitive spelling errors.