A Chinese Stock Reviews S entiment Analysis Based on BERT Model

: A huge amount of stock reviews occurred on the Internet due to its rapid development, therefore, the stock reviews sentiment analysis has profound significance for the study of the financial market. Due to the lack of a large amount of labeled data, the accuracy of existing sentiment analysis of Chinese stock reviews remains to be further improved. In this paper, a sentiment analysis algorithm for Chinese stock reviews based on BERT is proposed and it improves the accuracy of sentiment classification. The algorithm uses BERT pre-training language model to perform representation of stock reviews on the sentence level, and then input the obtained feature vector into the classifier layer for classification. In the experiments, we show our method has nearly 8% and 9% improvement than TextCNN and TextRNN in F1, respectively. Our model can obtain the best results via fine-tuning which is proved to be effective in Chinese stock review sentiment analysis.

reviews. The existing study [1] has shown that the sentiment in these messages has been related to stock prices, It can significantly improve the accuracy of stock market predictions. Therefore, this paper attempts to cope with the sentiment analysis of Chinese stock reviews. Sentiment analysis [2] is also called opinion mining and opinion analysis, which aims to find the polarity of text and classify it into positive, neutral or negative. Sentiment analysis areas include news comment analysis, product comment analysis, movie reviews analysis and other fields.
Recently, the mainstream of research on text sentiment analysis is machine learning. Some early works depend on manually extract features, eg. sentiment lexicons [3] [4]. Lately, machine learning methods [5] are widely used in the sentiment classification task. The early approaches need to manually build the sentiment dictionary, and the method based on machine learning needs to manually extract features, both of which are time-consuming and labor-intensive. In recent years, assorted methods based on deep learning have become popular for sentiment analysis, as they don't require manual feature engineering. But for many specific fields, especially in the field of stock commentary, we don't have a large number of labeled data sets, which makes it difficult for us to train complex models. In this case, the sentiment analysis for stock reviews has met a bottleneck. The basic idea of this paper is to rely on BERT (Bidirectional Encoder Representations from Transformers) [6]with its pre-training on an enormous dataset and the powerful architecture to gain better results in the sentiment analysis of Chinese stock reviews.
In this paper, we designed different variants of using BERT on the sentiment analysis for Chinese stock reviews. These models used the deep learning method which could automatically extract features. Meanwhile，we used the pre-training model to fine-tune in new application areas which further boost the performance of sentiment analysis classification. Our model could solve the problem of sentiment analysis in a specific field. The experiment was performed on Chinese stock reviews data set in Guba of Eastmoney, and the result showed that the proposed model in this paper had higher accuracy than TextCNN [7]and TextRNN [8] models.
In summary，the main contributions of this paper are as follows: (1) We propose a new solution for sentiment analysis of Chinese stock reviews which can avoid building dictionaries and extracting features manually.
(2) We designed different variants of using BERT on the Chinese stock reviews and found that the BERT+Linear model gains the best performs via fine-tuning, Experimental results indicate that the proposed method is highly effective.
(3) The benefit of this method is that it has good generalization ability and can be widely used in some specific fields. For stock comment sentiment classification tasks, the BERT pre-training model was trained on a general text corpus, and then the model is fine-tuned on a domain-specific corpus, which effectively improves the performance of the model.

2.Related Work
The goal of sentiment analysis is to identify sentiment polarity. Now, abundant methods have been used to resolve sentiment analysis, including traditional methods and deep learning methods. In this section, we will briefly introduce the related work of sentiment analysis. Sentiment analysis methods are mainly divided into three types: sentiment dictionary based method, machine learning based method [9][10] and deep learning method.
The method of sentiment dictionary is based on the sentiment dictionary, which calculates the sum of positive and negative sentiment in the text, greater than zero indicates that the text tends to be positive, otherwise it is negative. The advantage of the sentiment dictionary method is simple and does not need a labeled text. However, the disadvantage is that the sentiment dictionary depends on the manual design, which leads to insufficient coverage of sentiment words. The machine learning method depends on the machine learning algorithm and has become the most popular method.
This method converts a large number of features into feature vectors based on sentiment lexicons, then train a classifier. The earliest method used is Naive Bayes [11], then some important methods in the development of machine learning, such as support vector machine (SVM) [12][13] and decision tree [14]have been widely used in the field of text sentiment classification. The method of text sentiment analysis based on machine learning is more and more accurate, but it relies on the quality of the feature extracted.
Nowadays, deep learning has been popularly used in the field of text sentiment classification. Convolutional Neural Network(CNN) and Recurrent Neural Network(RNN) have been broadly used to text sentiment classification. Dos et al. [15] proposed a CharSCNN which used tow convolution layers to extract features to solve sentiment analysis. Wang et al. [16]used LSTM to predict the sentiment polarities of tweets by composing word embeddings.
However, the RNN network can't be calculated in parallel, and also produce gradient explosion. Vaswani et al. [17] proposed a Transformer to solve the problem of parallel computing and also achieved the best results in many natural language processing tasks including sentiment analysis. Devlin et al. [6] proposed a pre-training model called BERT, which achieved the best results in many NLP tasks.
Due to the great progress of the pre-training model, many researchers utilize BERT to figure out the down-stream tasks. Dong et al. [18] proposed BERT-CNN model to improve the accuracy of commodity sentiment analysis. Yu et al. [19] used BERT to achieve state-of-art results for ancient Chinese sentence segmentation.
Literature [20] assorted BERT to classify the Chinese short texts. Liu et al. [21] proposed BERTSUM based on BERT for extractive summarization.

Methods
This paper aims to settle the sentiment classification of Chinese stock reviews based on BERT. The model is composed of two parts: pre-training and fine-tuning layer. The structure of the model is shown in Figure 1: After we obtain the sentence vector from the BERT, we add a classification layer stacked on the top of the BERT. The output of the BERT is the input of the classification layer, thereby capturing sentence-level features to perform sentiment classification on Chinese stock review text. Sun et al. [22] concluded that the 12th layer has the best classification capability. Therefore, this paper added several layers on the 12th layer, for example, BERT +Linear, BERT + LSTM and BERT + CNN models. These classification layers are jointly fine-tuned with BERT.
l BERT＋Linear The basic method is to add a linear classification output layer behind the BERT model. BERT output a sentence vector, which is classified through a fully connected layer. In this paper, we called this model as BERT+ Linear.
l BERT＋LSTM LSTM [23]is a special RNN network, which is good at handling sequence issues.
This paper used LSTM as a classifier to access the BERT model, and the input of LSTM is the BERT output. Every BERT output is recorded and calculated as: The bull Market is here [SEP] (1) Among them, ， ， are the forget gate, the input gate and the output gate, respectively, are the hidden layer, is the memory vector, and is the output vector, ， ， realize the standardization of different layers. The final output is still added to a linear layer.
l BERT+CNN A convolutional neural network has been successfully applied in computer vision, and its basic structure includes an input layer, convolutional layer, pooling layer, fully connected layer and output layer. The output of the BERT model is the input of the convolutional neural network. After the convolutional layer and the pooling layer, it is connected to the fully connected layer to finally output the sentiment classification of the Chinese stock review text. Its generation process could be described as: represents the weight vector of the i-th layer convolution kernel, is the offset. The operation symbol indicates the convolution kernel and the i-1-th layer convolution calculation, and sampling.

Dataset
In this paper, we used the pre-trained BERT model released by Google and trained on the Chinese Wikipedia corpus. There are few datasets for Chinese stock reviews with a label. The dataset used in the experiments in this paper is from the Github [24]website. There is a total of 9204 review data in this dataset, with half of the positive and negative samples. This dataset uses logistic regression for sentiment classification, and its accuracy rate reaches 88.09%, which shows that the quality of this data set is very high and can be used to train other models. This paper divided the training set, verification set and the test set according to the ratio of 6:2:2. Table 1 was some examples of stock reviews.

Evaluation Indicators
This paper is designed to realize the sentiment classification of Chinese stock review text, which is a typical classification problem. The most commonly used evaluation indicator for classification problems includes accuracy rate, recall rate and F1 value. According to the changes in the two dimensions of model prediction and actual situation, the binary classification problem can be divided into four cases: See Table 2 below: Table 2  The formula is as follows: The F1 is an indicator that combines the precision rate (P) and the recall rate (R), and its calculation formula is: It can be seen from the formula that it is the reconciled average value of the = recall rate (R) and the accuracy rate (P). If the value is 1, the formula will become to: The F1 score is one of the significant indicators to evaluate the performance of the classifier. The value is close to 0, indicating that the model has a bad performance.
To verify the classification ability of the BERT model in Chinese stock reviews more effectively, this paper selected two indicators, which are accuracy and F1.  Table 2:

Experimental results on the hyperparameters setting of the model
The hyperparameters in the proposed analysis sentiment for Chinese stock reviews include epochs, pad size,learning rate and so on. This section will carry out related experiments to find the best values of the parameters.
(1) Epoch The hyperparameters of the model have an important impact on the performance of the model. The hyperparameters in this paper included the number of epochs, learning rate, pad size, and so on. Next, we executed several experiments to select the best hyperparameters. The data set had been prepared, and it is divided into a training set and a test set according to the ratio of 6:4.
Epoch is the number of training times, specific to the training set. Generally, with the increase of Epoch, the performance of the model is enhanced. However, if the Epoch is too large, overfitting problems may occur. Therefore, it is important to choose the correct era. Figure 2 is the relationship between epoch and F1. Figure 2 Relationship between Epochs and F1.
As can be seen from Figure 2, as the Epoch increases, the F1 score also increases, which means that the classification performance of the model becomes better. When Epoch is 50, the performance of the model is the best. When the epoch value is 60, the model stops training early, and the accuracy is also decreased.
(2) Pad size The pad size is the length that the model will process each sentence. Its basic processing principle is short filling and long cutting. If a sentence is shorter than the set length, then it is filled with 0. If the length of the sentence is greater than the set value, Then the sentence shall be truncated to get the set value.
Pad size has an enormous influence on the performance of the model, hence the value should choose a suitable one, If the Pad size is too large, the zeros in the data will be filled too much. On the contrary, plenty of information will be lost. The distribution of the review length is shown in Figure 3. The stock review length is very short, and most of them are less 50.  We set up different pad size to experiment, and Figure 4 is our result. When pad size increases, the F1 score will show an upward trend. When the pad size is 16, the F1 score will reach the maximum. But when pad size continues to increase, the machine will have a memory leak.
It can be concluded that when pad size is small, too much information will be lost. In this experiment, the pad size is 16 which is the best. TextRNN was based on a recurrent neural network and used a multi-task learning framework to train jointly. Both models achieved favorable results in text classification.

Experimental results on sentiment analysis for different methods
TextCNN model is one of the most commonly used methods for text classification. After many trials and comparisons, we chose the best hyperparameters as shown in Table 3:   Figure 5 and 6:  Figure 5 is the accuracy of the three experimental results, and Figure 6 is the F1 scores. From Figure 5, it can be seen that the BERT+Linear model has the highest accuracy of sentiment classification for review text, which has a 12% improvement over TextCNN, and 9% for TextRNN, respectively. Figure 8 shows that the BERT+Linear model also has a nearly 9% improvement compared with TextRNN in F1 scores, which demonstrate the BERT+ Linear model had a better classification effect.

Discussion
We tested and evaluated our proposed BERT+Linear method to address the sentiment analysis of Chinese stock review which proven the advantages of our

Conclusion
With the advancement of Internet technology, investors can easily post relevant tweets. Mining these data is valuable research, so we propose the BERT+Linear model for sentiment analysis of Chinese stock reviews which can avoid building dictionaries and extracting features manually. Experimental results indicate that the proposed method is highly effective.
The main contribution of this paper was that we proposed a BERT+Linear model for Chinese stock review sentiment classification. The advantage of this method is that it has good generalization ability and can be widely used in some specific fields.
Although this paper innovatively applied the BERT model to the field of sentiment classification in stock reviews and eliminated the dependence on the emotion dictionary in the professional field, overcome the difficulty of manually extracting features, and improved the accuracy of sentiment classification. However, there were still some defects in this paper. The dataset was small, and the conclusion had not been verified on the larger dataset. In subsequent research, make further improvements to BERT model, and expand the sample size to carry out further experiments. Figure 1 The structure of a sentiment analysis model