Application of facial expression recognition based on domain-adapted convolutional neural network in English smart teaching system

The application of the facial expression recognition system in the human–computer interaction system refers to the recognition of human facial expressions through the human–computer interaction system in the real society, so as to be able to feel the specific situation of recognizing people. This is also one of the main directions of human–computer interaction system research. In this paper, the facial expression recognition system is designed by the algorithm that combines the expressions of the students in the classroom teaching with the system environment, so that the recognition of the facial expressions of the students in the classroom environment is more accurate. This article elaborates on the identification method of the system and conducts detailed experimental analysis on the specific functions of other modules in the system. The experimental results show that the security and stability of the system are very high. At the same time, the accuracy of the system in the classroom teaching environment is also very high in the recognition of student facial expressions. This is a modern intelligent face recognition system that enters education and teaching. Provide a strong theoretical basis and technical support during the work.


Introduction
Facial image recognition has become a hot spot in our national society in recent years. W-related algorithms are gradually improved. Expression recognition algorithms are gradually being recognized and accepted by people in the real society. Emoji recognition is now being used in many fields and industries (Weissman and Tanner 2018). But facial expression recognition has not been applied in the education and teaching process in the education system (Zhang and Tjondronegoro 2011). In the past few years, our country's education and teaching work mainly focused on teaching methods and methods to keep up with the needs of the development of modern information technology, using a variety of teaching methods and methods such as multimedia for teaching, but there was no emphasis on intelligence in the education and teaching process, so intelligentization has not developed (Zhou 2013). If you want to achieve a breakthrough in intelligence in the education and teaching process, you need to mobilize the enthusiasm and initiative of students in learning, so that students' interest in learning will be significantly increased, and students' emotions in the learning process should also be well developed. Grasp and facial expression recognition system has a certain auxiliary function for mobilizing students' learning emotions, so in the process of achieving intelligence in education and teaching, facial expression recognition system is a basic part of it (Samadiani et al. 2019;Majumder et al. 2016).
In the daily production and life process, we have a variety of emotional rich emotions. It is very important for us to communicate and exchange in daily production and life, as well as express our thoughts and opinions. Abundant emotional expression can make people express their own inner thoughts and opinions more clearly and vividly, making it easier for others to recognize and accept. Application in a human-computer interaction system refers to the recognition of human facial expressions through a human-computer interaction system in the real society, so that the specific situation of the person can be sensed. This is also one of the main directions of human-computer interaction system research. In fact, in the real world, facial expression recognition is a very complicated project, because it has a very important relationship with the angle of the human face, the interference of the surrounding environment, and the intensity of light. At present, facial expression recognition of human body is mainly carried out in the research of scientific research institutions. Through people's deliberate settings, factors such as light and environmental interference are removed, which is more convenient for data collection. Human facial expressions are extremely rich, so a lot of expression samples are needed, and the cost of collecting and sorting expression samples is very high. Therefore, the establishment of an efficient calculation method that can still operate and run face recognition even with a few or no samples is a major aspect of our research on facial expression image recognition.

Related work
In the previous human-computer interaction system, when the human-computer communicates, the social behavior and recognition behavior of people are more so that the machine is more passive in the recognition of people (Li et al. 2016). In this way, it is difficult to achieve the intelligent control requirements for human-computer interaction systems. During human-computer communication, human facial expressions, body language, and behavior are more accurate in expressing people's thoughts and feelings. This is also the focus that machines should pay more attention to. When human-computer communication, people express the most information through facial expressions, followed by information expressed by language, and the least information expressed through words (Memo and Zanuttigh 2018). When designing a humancomputer interaction system, the standard that needs to be achieved is that when the human-computer communicates, it seems to make two people communicate, so as to achieve the most perfect human-computer communication. In order to achieve this perfect effect, it is necessary to intelligently set up the human-machine communication system and implement and solve it through the intelligent system. The intelligent setting of the face recognition system is a major aspect of the intelligent development of the human-computer interaction system at present and in the future, and it will receive general attention (Gurumurthy and Tripathy 2012).
The intelligent system of face recognition has been highly valued by relevant experts and scholars in recent years. The Nexi's so-called emotional robot can not only perceive what people want to express according to changes in human facial expressions, but also express their emotions by pouting, frowning, and squeezing their eyes (Salama AbdELminaam et al. 2020). People communicate. The intelligent face recognition equipment currently being promoted can not only obtain human facial expressions through smart devices, but also can read and recognize these acquired expressions in a very short period of time, thereby enabling more good to judge the facial expression characteristics of a person. In terms of people's sensory evaluation of certain things, the device has a better judgment function. Salesmen and advertising publishers often use this device to judge the customer's feelings and experience to determine their own goals (Kumar et al. 2019). The research and development of functional recognition equipment or products that can recognize human facial expressions in real time, follow the development of information technology and intelligent technology, and develop and develop related aspects of facial recognition systems. Perfect, there are still many problems that need to be solved here, especially how to solve the problem of how people's facial images are deformed in real society, and the recognition errors are caused by factors such as angle and light. In order to better solve the above problems, that the use of face detection methods or the refinement of facial expression features for extraction, and the recognition of facial expressions from multiple angles and levels have conducted a lot of research (Hu et al. 2019). It is possible to extend human facial expressions from the initial six expressions to more detailed and subtle expressions and to recognize these refined expressions through multiple dimensions. The facial expressions of people are divided into six different dimensions, the six most basic facial expressions of people are derived, and then the facial expressions of these six faces are layered in different time series. To show different expression intensities, the difficulty of facial expression recognition will increase.
Facial expression recognition is carried out in the field of education and teaching. It is not easy to notice the changes in the cognitive psychology of students in online education (Jabid et al. 2010). This is because teachers and students are in different natural environments, and because of the network there is also a certain time delay due to the limitation of waiting, so that teachers and students cannot communicate in a timely and smooth manner. Therefore, the students think that learning is a very cold and lonely thing, without any emotional exchange (Zhang et al. 2018). The use of emotional algorithms to sense and recognize the emotions of students in real time and make certain predictions, so that the emotions of the students are recognized by the expression recognition system and sent to the teacher, so that the teacher can understand the emotions of the students and emotions and prompt guidance (Borch and Lange 2017;Mei et al. 2017). Teachers can recommend suitable learning content for students according to students' emotions and provide personalized teaching to students, so that teachers' humanistic care for students can be realized. Improve students' enthusiasm and learning efficiency and improve teaching results.
3 Facial expression recognition method based on domain adaptive convolutional neural network

Label-guided domain adaptive convolutional neural network expression recognition method
The specific structure of the model (LDAGAN) for generating confrontational network expression recognition guided by the label a priori is shown in Fig. 1. The model mainly includes three parts: generator model, judge model and classifier model. This article first briefly describes and introduces this model and then describes the specific application of the entire model. The libfaceDetection algorithm in the face recognition library is used to acquire and collect faces first, and then all face images are cut into uniform size images, and the images are preprocessed. Collect the data in the state of the experiment. Face data collection in the social environment is a goal. Set the sample to X, the emotional label to YS, and the target image to Xt and Yt. The emotional label is transformed into one-hot through the program coding. For example, the code corresponding to the sad expression in Fig. 1 is [0000001], and the code corresponding to the happy expression is [0000010].
Confrontation loss: In the original generation network, the samples generated by the generator are somewhat confusing to the discriminator. That is to say, when the samples generated by the generator are passed to the discriminator, the result of the discriminator's judgment on the sample is not correct. Setting the image and emotion tags as preconditions to the generator before transmitting will cause the confrontation loss, which can be expressed as: Reconstruction loss: The generated image is X, which should be consistent with the original image to be better and more realistic. When recognizing faces, keep as much information as possible about image details in the face recognition system. Therefore, the generated image X j has to be sent to the model for imaging again, so that a new image X can be generated, and it is hoped that the newly generated image has great similarity to the original image, or even the same. Therefore, the pixel norm of the reconstructed image X r and the source image X s is L1, which is the reconstruction loss, so the reconstruction loss can be expressed as: Classification task loss: The classifier will recognize and classify human facial expressions, and classify and train the generated image with the original image. The purpose is to improve the maturity and accuracy of the classification of the classifier, and this classification the detector can also classify the target image. The classifier transmits the classified image to the model and then changes the classified image in the model under the command of the management system, so that the changed image is closer to the original image, or even restored to the original image state. The generated image should be the image that best matches the facial expression characteristics of a person, and such classified images can be used in the future social environment. Therefore, the classification task loss of facial expression recognition added to the classifier is softmax cross-enrichment loss, so the classification task loss can be expressed as: Sorting out the above three loss functions, the loss function of the generator can be expressed as: From the perspective of the model, there are two types of samples for input: (1) is the target sample, (2) is the generated sample. So the loss function is: In order to make the model of the classifier more adaptable to the entire model system, several commonly used models are analyzed and compared in the RAF-DB database, and we have a deep understanding of the most commonly used deep learning models, mainly: convolutional neural network. The specific situation is shown in Table 1. It can be seen from Table 1 that the deep learning method has a much higher recognition rate than the original method. Table 2 compares the LDAGAN method with the current commonly used methods. The specific situation is shown in Table 2. It can be seen that the first row is the recognition result of Resnet50 on the RAF-DB dataset. The Pixe1DA method was originally an unsupervised method. In order to compare with the LDAGAN model in this experiment, Pixe1DA was changed to a supervised method.
In order to investigate the cross-domain performance of the LDAGAN method, two experiments were performed in the SFEW and RAF-DB datasets for comparison. The specific situation is shown in Table 3. It can be seen that there is no domain method in the first row of each block, and Resnet50 performs domain recognition test directly in the Resnet50 model.

Conditional adversarial domain adaptation method based on convolutional neural network
The structure of the conditional adversarial domain-adapted expression recognition model (CADA) of the generated confrontation network is shown in Fig. 2. Denote the experimental data as the source domain and the data in the social environment as the target domain. Define the source domain D, = {(x,y,)}, which means n labeled source domains, in order to make the deep network G:x ? The cross-domain data offset of y is greatly reduced, so that possible problems in the target domain can be controlled and solved by the problems of the source domain and the new condition domain. The CADA model includes three aspects. One is to transform the source domain image into the target domain image through the generator model of the LDAGAN model. After the transformation, the newly generated image is shown as xf in Fig. 2, which can increase the content of the target domain data information.
The second is to combine the image in the source domain with the newly generated image to form a new source domain image, extract the image features of the source domain and the target domain through the deep learning model, classify them, and learn the extracted features of the network. It can be expressed by f = F(x), and the processing result of the deep learning network can be expressed by g = G(x). The third is to combine the features of the image with the processing result, as shown by the blue arrow in Fig. 2, and the discriminator (shown as D in Fig. 2) distinguishes the source domain from the target domain.
Conditional generative adversarial networks (CGANs) can perform more accurate feature distribution by adding relevant data information of generators and discriminators, such as some labels or small image information. For the generation of adversarial network domain method set to two competitive error optimization cases, it can be expressed as: The optimization of caDa can be expressed as: Therefore, the multi-linear mapping of the domain discriminator can be expressed as: In order to solve the very complicated situation of the computer in the case of high dimensionality, the  introduction of the corresponding calculation method can be expressed as: The final objective function of minimax can be expressed as: In order to investigate whether the current method is feasible, the source domain of the RAF-DB and SFEW databases is compared with some commonly used methods in this experiment. These methods are the basic methods of the methods in the source domain. The DaNN model is divided into two layers: a feature layer and a classification layer. If an adaptation layer is added, the gap between the source domain and the target domain will be reduced, and the situation and information can be fed back to the system. The Pixe1DA model is to generate a communication intermediary between the source domain and the target domain in the confrontation network. Its function is to transform the image of the source domain into the image of the target domain. These two methods have better adaptability in the source domain, but the representation is not very good, and there is a phenomenon that the classification accuracy is relatively low. The specific situation is shown in Table 4.
It can be seen from Table 4 that compared with other methods, the CADA method has great accuracy, and the function has also been greatly improved. The domain adaptability of this method is relatively strong, which is nearly half that of other methods. This also shows that the CADA method is effective. Table 5 shows the comparison between the CADA method and the baseline method for human facial expression recognition. The first line is the recognition of the baseline method, the second line is the recognition without any domain method, and the third line is the generation of the confrontation network. The situation is recognized by the facial expression recognition method. From the bottom row of Table 5, it is easy to find that the CADA method is significantly better than the others. The CADA method has strong domain adaptability, so it also illustrates the effectiveness of the CADA method.
4 Design and implementation of facial expression recognition system for students in smart English teaching

The overall design of the facial expression recognition system for intelligent English teaching students
After the pictures are processed by grayscale, histogram normalization and median filtering, wavelet filtering is used to collect facial expression images. Using vector machine algorithm as facial expression algorithm is now commonly used in classroom teaching for student facial expression recognition algorithm. The overall architecture of this system is shown in Fig. 3. It can be seen from the figure above that the architecture of this system mainly includes the following main parts: 1. System platform. It is to provide the required operating system and network support, to carry the interfaces for various operations, and to provide support and guarantee for the normal operation of the system. 2. User management. That is, the administrator can manage user information and permissions. 3. Automatic facial expression recognition. It means that teachers can automatically recognize students' facial expressions anytime and anywhere during class. 4. Manual facial expression recognition. In order to obtain better recognition results, it is expected to achieve good results through multiple debugging and multiple adjustments to the recognition system. 5. Establish feature space. Perform feature analysis and classify the input facial expression images. 6. Data management. Enter various information in the facial expression image into the database and then analyze and classify the information and read the data through the feature space. The details are shown in Fig. 4. Figure 4 is a network topology diagram of the system. The system consists of cameras, computers and databases, and switches. The camera needs to be able to have high definition and be able to capture clear and real images. Need more than 800,000 pixels is the best; the computer must be relatively fast, which can obtain the desired results within the effective time; the server needs a large storage space to store a large amount of data information, and the stability of the server is better; The switch must be able to keep the network stable and smooth, and the speed of 100 M can meet the requirements. The specific situation is shown in Fig. 5. The data tables in the database are user information table, expression feature space table, and data information table. The relationship between these data tables is shown in Fig. 6.

Image preprocessing and recognition
Through the formula for multiple grayscale conversion, the image is clearer and the image quality is higher. It can be obtained in three ways. The first is linear transformation. This transformation makes the system expand and  compresses the grayscale, which can be expressed by the following formula: ði; jÞ À aÞ ð 12Þ The second type: piecewise linear transformation. Perform segmentation processing on the gray range, expand the main range, and compress the non-main range. The specific formula can be expressed as: The third type: nonlinear grayscale transformation. It is the selective expansion of the gray interval. That is, the mapping function performs grayscale transformation on the as-referential function. The logarithmic transformation is a transformation that expands the low gray level of the picture and compresses the high gray level. The corresponding transformation expression can be expressed as: Stretching the high grayscale interval can be expressed as: The two-dimensional filter number representation is as follows: Based on the student facial expression recognition system in classroom environment  The first type of kernel polynomial kernel function can be expressed as: The second type of radial basis function can be expressed as: The third type of sigmoid function can be expressed as: The function definition formula can be expressed as: The H-th category classified as the largest decision value can be expressed as: The definition of discriminant function can be expressed as: 4.3 Detailed design of the facial expression recognition system for students in smart English teaching The database is a tool for setting user permissions and recording operation records and other data information.
The database design must be scientific and reasonable, and each database must be accurately expressed, and the data type must be adapted to the database. Therefore, the design of the database is very important. This article uses the Microsoft network database SQLserver2008R2 Enterprise Edition. Details are shown in Tables 6, 7, and 8. They are contacted through the use field.
The new pixel value is placed in the data2[length][height] matrix, which can be expressed as: The wavelet feature at the pixel point is obtained by the filter coordinates and the gray-level convolution value of the image and can be expressed as: Tðr; hÞ ¼ Gðr; hÞ Ã Iðr; hÞ ð 25Þ

System software development technology
The application of this system is applied in the development using VC?? 2010 under the operating system in the enterprise version of Microsoft VisualStudio2010. As the main tool development system of Microsoft, the VisualS-tudio2010 operating system has very powerful functions, high safety factor, and very good stability. It is a very common system when using C?? language. OpenCV is often used as a computer vision library, and it can provide a lot of optimized data. It appeared in the late 1990s, which was founded by Intel Corporation, and Wil-lowGarage provided support for it. It contains more than 500 functions to calculate images and vision. It is a common algorithm for graphics processing and visual computing. It belongs to a cross-platform computer vision library and can be applied on all versions of Windows, Apple IOS and Android operating systems. The latest version 3.0BETA has just been developed a few years ago. The main modules of OpenCV are: (1) IMPROC image processing module, and this module has image filtering, image transformation, histogram, structure analysis symbol, etc.; (2) NONEFREE algorithm module, this module makes it unique, is patented The certificate is mainly for feature detection and description; (3) CORE core functional modules, including basic structure, command interpreter, C language, dynamic structure operation and drawing, etc.; (4) HIGHGUI high-level GUI graphical user interface, mainly including I/O input and output, video capture, encoding and decoding, interactive interface, etc.
LIBSVM has a very powerful function as a toolbox, and it is also a very common toolbox in the society now, which can support vector machine classification. At the same time, it is also a c/c?? open-source library. In addition to classification, the LIBSVM toolbox can also do regression analysis and distribution design. It was researched and developed by Taiwanese scholars at the beginning of this century. It is also better for solving multi-class classification problems. The latest version is version 3.2 which was released a few years ago. The LIBSVM toolbox is very convenient for non-professionals to use, so the starting point of this toolbox is not high, and ordinary users can use it very easily. The LIBSVM toolbox has a friendly interface and can support multiple languages. LIBSVM mainly includes: (1) SVM expression; (2) multi-class classification; (3) cross-validation; (4) kernel function matrix; (5) unbalanced weighting; (6) C?? and java source code; (7)) GUI interface for SVM classification regression.

Experimental design
The robustness test of the system is mainly carried out in two aspects. One is to test the security performance of the login system. If the input user information does not exist or the input is wrong, then the system will not recognize and monitor the face. Second, if there is no face recognition information in the picture input to the system, the stability of the system will not be affected. The functional design of the system is mainly to recognize the facial expressions of the students when they are in class, so the system is still an expression recognition system, but because the environment of this system is in a class environment, it is mostly for students research and design the facial expressions in the classroom, and the facial expression recognition algorithm optimizes the design of the facial expressions of the students in class.
According to the above data, the test experiment is given. The experimental results and test conditions are shown in Tables 9 and 10.

User information is incorrect
The username is ''abc'' and the password is ''123'' User information error prompt Password is too long The username is ''user1'' and the password is ''1234567890abc''

Password input error prompt
Password is empty The user name is ''user1'' and the password is '''' Please enter an accurate password hint Pictures without faces White picture The system cannot be identified, but the system is operating normally

Analysis of experimental results
In the classroom, students' facial expressions are tested, first to see if the system login is normal and safe, and then the system is tested. Through the experiment, it can be seen that the system login is normal and running well in the classroom teaching environment. It can be seen from the robustness test results that if the user information entered during login are wrong, or the password is incorrect and the password is not entered, the system can give different error analysis according to different situations. There will be no system abnormalities caused by login errors. For situations where there is no facial expression recognition during input, the system can also operate normally, but it will not log in. Therefore, it can be considered that the system is operating in good condition and has high stability, and the user's information is also guaranteed to a large extent. The security and stability of the system are very good.
Through the analysis of the watch time, it can be seen that when the system recognizes human facial expressions, the correct rate is still relatively high. But if there is no classifier to classify the human facial expression data information, the recognition rate is relatively low. This also shows that the establishment of a classification database in the system has important meaning. And from the four groups and five groups, it can be seen that the order of recognition has nothing to do with the test situation. The system can meet the design requirements, and all functions can be used normally.

Conclusion
In modern society, education and teaching activities are more about using multimedia for teaching in modern technological methods. Therefore, when monitoring the teaching status, it is more inclined to use intelligent facial expression systems for monitoring. In this paper, the facial expression recognition system is designed by the algorithm that combines the expressions of the students in the classroom teaching with the system environment, so that the recognition of the facial expressions of the students in the classroom environment is more accurate. This article elaborates on the identification method of the system, and conducts detailed experimental analysis on the specific functions of other modules in the system. The experimental results show that the security and stability of the system are very high. At the same time, the accuracy of the system in the classroom teaching environment is also very high in the recognition of student facial expressions. This is a modern intelligent face recognition system that enters education and teaching. Provide a strong theoretical basis and technical support during the work. Data availability Data will be made available on request.

Declarations
Conflict of interest The authors declare that they have no conflict of interests.
Ethical approval This article does not contain any studies with human participants performed by any of the authors.