4.1 Cultural translation
This study attempts to modify the traditional model by constructing new translation and teaching models. In the new situation of translation teaching, teachers are no longer the leading role of teaching, but the assisting role. They create a discovery based learning environment for students, so that students can make full use of their initiative and self-learning ability, use dynamic research and software retrieval to translate typical, real, rich observation, analysis and thinking examples, and then the interaction between teachers and students to discuss language features, Translate translation methods and techniques into two ways. In order to evaluate translation quality, teachers can guide students to use relevant statistical data retrieval procedures to judge translation quality and style more comprehensively and objectively.
According to the latest catalog issued by CSSCI in January 2022, 378 translation research papers were obtained from 15 foreign journals (including the expanded version), which were selected as the source of CSSCI database. The topics are "Corpus" and "Corpus Translation". The two topics were searched in the journal. After manual differentiation, the meeting notice and the documents not related to corpus translation research were deleted, and 378 papers related to corpus translation research were obtained. Taking time as the horizontal axis, the research trend of corpus translatology is shown in Fig. 1, based on the name and real nature of all documents.
Classics Chinese textbooks are taken as the main collection objects of dynamic corpora, including the new HSK test text corpus released by the National Hanban, as well as ordinary natural corpora and classic Chinese textbooks for primary and secondary schools. The current scale is close to 200000 sentences. See Table 1 for composition ratio. Chinese textbook corpus is an important part of dynamic corpus. This set takes into account such attribute characteristics as textbook type, application level, publishing age and influencing factors.
Table 1
Serial No | Corpus content | Number of statements | Proportion |
1 | Textbook Corpus of Classical Chinese | 140000 sentences | 65.7% |
2 | Text Corpus of HSK True Title | 21000 sentences | 9.9% |
3 | Language materials of natural literature | 40000 sentences | 18.8% |
4 | Corpus of Contemporary Chinese Textbooks | 12000 sentences | 5.6% |
The two translations of the same article are compared by means of dynamic search corpus. In the analysis of the results, it is found that translation 1 and translation 2 differ greatly in the use of keywords, as shown in Table 2.
Table 2
Comparison of Key Values of Translation 1 and Translation 2
| function word | Translation 1 | Translation 2 | Key value | Significance |
1 | however | 79 | 15 | 47.10 | 0.000 |
2 | but | 7 | 58 | -46.37 | 0.000 |
3 | from then on | 63 | 16 | 29.38 | 0.000 |
4 | from | 19 | 57 | -20.38 | 0.000 |
5 | I | 140 | 34 | 68.27 | 0.000 |
6 | We | 67 | 26 | 18.25 | 0.000 |
7 | this | 60 | 107 | -14.03 | 0.000 |
8 | of | 1345 | 1112 | 20.45 | 0.004 |
Table 2 shows that there are significant differences in the use of pronouns (I, we, this) in the translated versions, which shows that the effect of using dynamic search material library in cultural translation is significant.
4.2 Emotional Analysis
The analysis of emotional text is an important trend in the field of natural language processing. Text is classified based on emotion details, and important emotion analysis can be divided into two categories according to rough particles and fine particles. In the field of emotional text analysis, researchers usually use it to analyze coarse granular emotions, which is called emotional analysis, that is, to classify texts according to emotional polarity, which is usually divided into two types: positive and negative emotions; Or it can be divided into three categories according to positive emotion, negative emotion and ruthlessness. The construction of emotion dictionary, multi category emotion classification, text preprocessing, multi label emotion analysis, and the construction of emotion resource library are common research contents of fine-grained emotion analysis.
In the previous experiment, the comparative analysis of "anger" and "hate" emotions in the classified emotion corpus found that these emotions had some similarities, although some controversy would often occur in the case of manual classification. For this phenomenon, this paper takes "anger" as the unified label of the two kinds of emotional data. On the basis of the original emotion classification, the criteria for manual classification are summarized and analyzed, as shown in Table 3.
Table 3
Definition of emotion classification
Emotional classification | Classification criteria |
anger | There are obvious mood swings, a large number of mood particles, and even curses or very harsh words. |
fear | There are mood fluctuations, which are reflected in the gloomy and horrible context, and the subjective expression of fear semantics. |
disappointment | There is no obvious emotional fluctuation, and the context is negative. |
Emotionless | Objective opinions, advertisements, explanations, reports, etc. |
Joy | Subjective expression of emotional fluctuation, happiness and satisfaction |
praise | Positive context, expressing appreciation and praise for an object. |
Compared with traditional text classification, affective learning is more difficult. The research on natural language processing and emotion analysis based on natural language processing has also attracted much attention in affective learning research. In terms of experience in emotion analysis, this paper uses the Naver sensory movie corpus 1.0 data set as the experimental data set, and the results are shown in Fig. 2.
From the chart, we can see that most of the translated texts are within 60 words in length, even in an LSTM design with a sentence length of 60 words, fill in the blank with a 0 vector, and then eliminate the parts with more than 60 words.
Next, the algorithm is introduced. AdaBoost algorithm is one of the Boosting algorithms. Without changing the training data, it redistributes the weight of the training data, making the direction of the learner's focus on the data set different. The final classifier is the result of linear combination of these learners.
$$H\left(x\right)=sign\left({\sum }_{t=1}^{M}{\alpha }_{t}{h}_{t}\left(x\right)\right)$$
13
Linear combination construction of multiple classifiers:
$$f\left(x\right)={\sum }_{t=1}^{M}{\alpha }_{t}{h}_{t}\left(x\right)$$
14
The final AdaBoost classifier can be expressed as:
$$H\left(x\right)=sign\left(f\right(x\left)\right)$$
15
The self-service sampling method is the core content of Bagging algorithm. The self-service sampling method samples the original data set using random data samples that have been put back. Repeat N times to get a data set containing N samples. According to the formula:
$$\text{lim}{\left(1-\frac{1}{N}\right)}^{N}=0.368$$
16
The specific formula for the dependency between local text vectors and learning words is as follows:
$${Q}_{m}=elu({Q}_{3}\otimes {M}_{1})$$
17
$${K}_{m}=elu({K}_{3}\otimes {M}_{2})$$
18
$${M}_{1}^{{\prime }}=elu({V}_{1}\otimes {M}_{1})$$
19
$${M}_{2}^{{\prime }}=elu({V}_{2}\otimes {M}_{2})$$
20
The sentence (string) similarity fuzzy matching method based on N-Gram model measures the difference between two similar sentences to measure the similarity. N-gram similarity is calculated by dividing the parts of the original sentence according to the length of n.
After the algorithm is set, text emotion analysis based on natural language processing is started. In the experiment, the number of selected feature words has a certain impact on the accuracy and efficiency of PRE-TF-IDF algorithm. Through the experiment, the proportion of feature words that give consideration to both accuracy and operation efficiency is calculated. In the experiment, the number of training sets and test sets is set to 8800. Under the same conditions, by adjusting the proportion of feature words, observe the changes of operation efficiency and accuracy, and select the best proportion of feature words.
In the result analysis of emotional orientation tagging, the analysis of the tagged text is ignored, and only the text that successfully judges emotional orientation is analyzed and counted. Table 4 shows the statistics of specific annotation results. The total number of texts in the following table is the result of calculating the proportion of each item based on successful annotation.
Table 4
Distribution of Emotional Tendency of Large Scale Corpus Marked by Natural Language Processing
option | anger | nausea | fear | Sadness | Expect | Joy | pleasantly surprised | trust | neutral |
Group 1 | quantity | 3551 | 1977 | 6694 | 7579 | 15744 | 5946 | 1187 | 6779 | 19962 |
Proportion (%) | 5.07 | 2.82 | 9.54 | 10.81 | 22.45 | 8.48 | 1.69 | 9.66 | 28.47 |
Group 2 | quantity | 3571 | 2009 | 6733 | 7679 | 15866 | 5987 | 1197 | 6885 | 19808 |
Proportion (%) | 5.07 | 2.85 | 9.56 | 10.90 | 22.52 | 8.50 | 1.70 | 9.77 | 28.13 |
According to the annotation results in Table 4, it is compared with the manual annotation results of 6500 small-scale translated texts, focusing on the proportion of each emotion. The comparison results are shown in Fig. 4. It can be seen from the comparison that the proportion of texts marked neutral in natural language processing is far less than that of manual annotation. One of the reasons is that texts that cannot be judged by natural language processing are not included in the statistical results. A large part of texts may not be judged by the machine, but the actual emotion is neutral. In addition, the proportion of fear, expectation and joy represented by natural language is quite different from the actual situation.
According to the comprehensive experiments, Bert pre training language model based on natural language processing has obvious advantages over the word vector model built by World2Vec. In order to significantly improve the effect of text emotion analysis, we can make full use of the text features that Bert language model can obtain.