2.1. The Development and Current Situation of Machine Translation
The study of machine translation began in the late 1940s. In the course of more than 50 years' development, MT research has experienced several stages, such as initial stage, depression stage, recovery stage and prosperity stage.
In 1946, booth, a British engineer, and Weaver, the vice president of Rockefeller Foundation in the United States, put forward the idea of automatic language translation when discussing the application scope of computers. In this period, it is believed that the study of machine translation should be done more in syntactic analysis. The feature of machine translation is to separate grammar from algorithm, which is also a great progress of machine translation technology. Machine translation in the recovery period is divided into six steps: lexical analysis, syntactic analysis, lexical conversion, syntactic conversion, syntactic generation and translation generation.
China is one of the first countries engaged in machine translation research in the world. In 1957, the Institute of language and computation of the Chinese Academy of Sciences took the lead in the research of Russian Chinese machine translation. Since then, many scholars have invested in this research field, and have carried out the research of French Chinese, German Chinese, Japanese Chinese and other machine translation, and made some achievements.
Direct translation is achieved through word translation, insertion, deletion and local word order adjustment, without deep syntactic and semantic analysis. Syntactic transformation and semantic transformation are similar. Their translation process can be divided into three stages: analysis, transformation and generation. In the analysis stage, the source language sentence is analyzed and the deep structure of the sentence is analyzed. For syntactic transformation, the deep structure is mainly syntactic information. For semantic transformation, the deep structure is mainly semantic information. In the transformation stage, the deep structure of the source language is transformed into the deep structure of the target language. Finally, the target language sentence is generated from the deep structure of the target language. The interlanguage method uses an inter-language as the intermediary representation of translation, and divides the whole translation process into two stages: analysis and generation. In the analysis phase, convert the source language to the intermediate language. In the generation phase, the intermediate language is converted to the target language. Among them, the analysis process is only related to the source language, not the target language. The generation process is only related to the target language, not the source language.
In the early machine translation system, the methods used can be divided into direct translation, conversion and intermediate language. The difference of these methods lies in the depth of the source language analysis. The same thing is that they all need a lot of rules, such as language conversion rules, source language derivation rules, target language generation rules and so on, as well as large-scale bilingual dictionaries. Among them, the transformational method is the most profound in the analysis of source language, which includes morphological analysis, structural analysis and semantic analysis, and completes the transformation of structure, semantic and morphological three-tier structure from source language to target language. The transformational method takes into account the characteristics of the source language and the target language, and it is easier to get good translation results than the intermediate language method. Therefore, the early translation system adopted the translation method of transformation, and the whole translation process was divided into three parts: source language generation, transformation and generation.
2.2. Machine Translation Methods
The knowledge representation of rule-based machine translation is rule. Its advantages are: the rule granularity has great scalability; the rule with large granularity has strong generalization ability, and the rule with small granularity has very fine description ability. Rule-based machine translation can express linguists' knowledge directly. This method has strong adaptability, does not depend on specific training corpus, and can integrate different linguistic features. A rule-based translation method uses a set of rules to understand natural language. This set of rules includes three aspects: the source language analysis rules that describe the source language, the conversion rules from the source language to the target language, and the generation rules that generate the target language.
Therefore, the core of rule-based translation is to construct a complete or adaptable rule system. But such a complete set of rules system is difficult to construct. Because of the poor coverage of the rules, especially the rules of fine particle size are difficult to summarize comprehensively. The subjective factors of the rules are strong, sometimes there is a certain gap with the objective facts. With the increasing number of rules, there is no good solution to the conflict between rules. In the end, rules are highly dependent on the language and only apply to specific systems, so the development and maintenance cost of rule base is too high.
Rule-based translation method has achieved good results in some specific restricted areas, but in most experiments, it has not met the requirements of people. With the rapid development of corpus linguistics and the widespread application of statistics and information theory in the field of natural language processing, researchers began to try to use statistical methods for machine translation.
The statistical based translation method is derived from the translation idea of "decoding ciphers", which belongs to the noise channel model. The source channel model is also called noise channel model. The kernel and idea of noise channel model is to regard the process of machine translation as a process of information transmission of a piece of text. The process of decoding the information using the noise channel model is shown in Fig. 1.
After passing the noise channel, the target text t becomes the source language S. the decoding process is to restore the source language s to the target language t. The method based on noise channel model uses language model and translation model to represent translation probability. P(T) is the probability of the occurrence of target text T, P (S|T) is the probability of t being translated into s, the former is the language model, the latter is the translation model. The language model reflects the characteristics of the target language. It refers to the probability that the text sequence t may appear in the target language, that is, whether it conforms to the syntax rules of the target language. The translation model reflects the relationship between bilingualism and refers to the probability that the target language t is the source language s translation.
According to Bayesian formula, the above decoding process can be deduced as follows:

The advantage of statistical based machine translation is that it is not dependent on language. As long as there is a corpus, it is easy to adapt to new fields or languages. The system has short development cycle and good robustness. The system does not need to write rules manually, and uses corpus to train the machine translation system directly. Although the translation quality of statistical method is better, it has some disadvantages. First of all, it costs a lot of time and space. Secondly, data sparsity is a serious problem. Finally, it is difficult to integrate different linguistic features and introduce complex linguistic knowledge. Therefore, how to construct large-scale aligned bilingual corpus and how to estimate the parameters of the model are the key problems to be solved. In addition, designing a search algorithm with good performance is helpful to find the best translation.
In the case-based machine translation system, the bilingual translation case base is the main knowledge source of the system. There are two fields in the translation case base, one is to save the sentences in the source language, and the other is to save the corresponding translation. When entering a sentence of a source language, the system compares the sentence with the sentence field of the source language in the instance database. The system finds out the most similar sentence with this sentence, simulates the translation corresponding to this sentence, and finally outputs the translation.
In the case-based machine translation system, translation knowledge is represented in the form of examples and bilingual dictionaries, which is easy to add or delete, and the maintenance of the system is also easy. If a large translation case base is used and a precise comparison is made, it is possible to produce a high-quality translation, and the difficulty of deep linguistic analysis in traditional rule-based machine translation methods is avoided. However, the case-based method also has disadvantages, that is, low coverage. A practical system needs a large corpus. For case-based machine translation, there are two aspects to be studied. One is bilingual automatic alignment. In the case base, the corresponding target language instances and instance fragments should be accurately found from the source language instances and instance fragments. In the specific implementation of the case-based machine translation system, not only the alignment at the sentence level, but also the alignment at the vocabulary level or even the phrase or sentence structure level is required. The other is the case matching retrieval. Because of the large scale of the case base, in order to quickly find the case or instance segment matching the sentence to be translated in the case base, an efficient retrieval mechanism needs to be established. In addition, the matching of instance and instance fragment is usually not exact matching, but fuzzy matching. Therefore, to determine whether two sentence or phrase fragments are similar, we need to establish a set of similarity criteria.
Since the development of case-based translation, there are two more special methods: translation memory method and template based method. The process of translation memory can be described in this way. First, save the translated sentences. Then, when translating new sentences, we can directly find them in the corpus. If we find the same sentences, we can directly output the translation; otherwise, we will give them to someone to translate, but we can provide a reference translation of similar sentences. Such a method is usually used in computer-aided translation software. Template based translation is a method between rules and instances. In terms of knowledge representation, it is more concrete than rule knowledge and more abstract than instance knowledge. Templates are rules of lexicalization. Monolingual templates are strings of constants and variables, while translation templates are composed of two corresponding monolingual templates and variable mapping relationship between them.
Phrase based translation is a hot topic in the field of statistical machine translation. This method is better than the word based statistical machine translation method in performance, because it can better grasp the local context dependency.
Phrase based translation is developed from word based translation. This model is based on the corpus of word alignment. It regards any continuous word string as a phrase, and then learns the bilingual phrase pair according to the bilingual corpus of word alignment, and counts the translation probability of the phrase. This method translates according to the probability table of phrase translation. For a given sentence in the source language, the translation process is as follows: first, the sentence in the source language is divided into phrases; second, each phrase is translated according to the translation model; finally, the translated phrases are reordered.
2.3. Decoding Algorithm
In statistical machine translation, there are different decoding algorithms according to different translation models. At present, the most commonly used search algorithm is column search, and the decoding process mainly includes four steps:
(1) Given a source language string, all possible translation results are obtained through phrase translation table. In the target language, the translations corresponding to these phrases are found, and translation candidates and corresponding translation probabilities are generated.
(2) According to the translation candidates generated in the first step, the translation probability is calculated to minimize the cost of the untranslated part.
(3) Column search is used to search for translation candidates, and pruning strategy is used to optimize the process. Finally, the search results are traced back and the target language translation t is generated.
Pruning strategies include merging pruning and histogram pruning. Merging pruning is a pruning strategy that saves the cost of two search paths in the process of merging, and its data will not lose information. However, due to the exponential change of the number of sentences in the search space, it is impossible to get the global optimal solution in the shortest possible time. Therefore, the pruning strategy of heuristic search is introduced. Histogram pruning is a risky pruning strategy. In the process of pruning, the inappropriate hypothesis is deleted and the search space is greatly reduced, which reduces the time complexity of decoding algorithm.
2.4. Feature Extraction in Japanese Machine Translation
In the decoding part of the hierarchical phrase model, CKY algorithm is used to reduce the source language sentence to the start symbol and find out the best derivation to get the target language sentence. The hierarchical phrase model combines a variety of features, and uses the minimum error rate training to train the feature weight.
Hierarchical phrase model is one of the best statistical translation models at present. Its main advantages include the following points. Firstly, the hierarchical phrase model integrates lexicalization features and contextual information to guide translation. Secondly, the hierarchical phrase is a kind of formal syntactic structure, so it does not need linguistic knowledge and the method of hierarchical rule extraction is simple. Hierarchical phrases can be easily applied between any two languages. Finally, the model has a certain generalization ability, which to some extent alleviates the problem of long-distance order adjustment.
Through the syntactic analysis, we can get the syntactic analysis tree which reflects the lexical dependency. How to use the information of these syntactic trees effectively has always been a hot issue in machine translation. There are two kinds of translation models based on linguistic syntax: one is based on phrase syntax tree, the other is based on dependency tree. There are also three types of syntactic tree information used in Translation: tree to string model, string to tree model and tree to tree model.
Phrase component tree is a kind of generative grammar, which refers to the phrase blocks that can form a certain meaning between consecutive word sequences. Dependency tree refers to the dependency relationship between words whose predicate verb is core and other constituent words. The phrase component tree contains more syntactic information, while the tree structure of dependency tree is simple and clear. Compared with the component tree, the number of rules in the dependency tree is reduced. Figures 2 and 3 are examples of phrase component trees and dependency syntax trees, respectively.
The biggest advantage of the method based on syntax tree is the introduction of syntactic structure knowledge. Compared with the formal syntactic model which can only be used for local phrase ordering, the model based on syntactic tree has the ability of global phrase ordering, which is in line with people's intuitive expression of language understanding. But at the same time, the main disadvantage of this kind of model is that it depends on syntactic analysis, so the accuracy of syntactic analysis directly affects the accuracy of translation model. Secondly, due to the different syntactic tree structures, the translation rules in the syntactic model are very large and the decoding is time-consuming, so it cannot be applied in the application scenarios with high real-time requirements for translation.