2.1. A review of the literature on pre-trained language models
Pretrained language model (PRLM) is beneficial to downstream task execution in natural language processing and can provide better initialization parameters for the model. After years of development, PRLM has become more mature. Xu et al. [10] proposed the application of neural network in pretrained Language Model (PRLM) in 2000. Bengio et al. [11] proposed a classical neural network language model in 2003.The work of Xu Wei and Bengio et al. laid the foundation for the development of the pre-trained language model (PRLM). Based on this, Wilbur et al. [12] proposed a method to represent a document as a Bag of Words, which takes terms and frequency of occurrence in the document as its core idea. However, this method cannot calculate the similarity between words, and there is sparsity in representation, which is easy to cause dimension explosion. In order to solve this problem and calculate the similarity between words, Mikolov et al. [13] proposed Word2Vec word vector model in 2013. The model can train hundreds of millions of data sets, and the generated word vector can measure the similarity between words well. However, word vector model based on Word2Vec cannot recognize the contextual word vector, so it cannot solve the polysemy problem. In order to make word vectors context-relevant, Google Research [14] proposed Bidirectional Encoder Representation from Transformers (Bert) model in 2018. Transformer Encoder structure is adopted in this model, and the training is divided into pre-training and fine-tuning two stages. This model also considers the context structure of the text, which can mine the text information in a deeper level, promoting the development of NLP field. Later, many scholars improved Bert model and applied it to different fields. For example, Bobur et al. [15] combined Bert model and pierced index model in anomaly detection of judicial documents, which produced better results for searching outliers. Zhou et al. [16] proposed a Bert-based transfer learning method, which established a new network telecom crime monitoring and early warning platform and achieved good results. These Bert-based methods provide ideas for the research of this paper.
2.2. Literature review of the Big Five personality model
There are many personality tests based on psychological research [17], the most accepted of which is the Big Five Personality Model, also known as OCEAN [18]. It contains five different personality traits: openness (OPN), conscientiousness (CON), extraversion (EXT), agreeableness (AGR) and neuroticism (NEU). Open people prefer abstract thinking and have a wide range of interests; Conscientiousness people are usually efficient and organized; Extraversion people display traits such as enthusiasm, sociability, decisiveness, activity, risk taking, and optimism; Agreeableness people are optimistic about resilience and believe that people are inherently good; Neurotic people tend to have unrealistic thoughts and are more prone to negative emotions. The study of personality characteristics is of great significance to the analysis of crowd psychological characteristics. Many researchers applied the Big Five personality model to analyze the psychological characteristics of people in different scenes and achieved good results [5, 19–21].
Mairesse et al. [19] used continuous modeling technology to extract personality characteristics from text information in order to realize automatic recognition of personality characteristics in text information. However, the data set they used was small, and the proposed method did not consider the possible over-fitting of features, resulting in low accuracy of feature recognition. In order to avoid too small data set, Sun et al. [20] proposed a group-level personality detection model based on AdaWalk. This model not only traverses the entire text network, but also relies less on annotations. However, the text network constructed by this model is based on simulated reality rather than real social network, which lacks authenticity. Ren et al. [5] proposed a multi-label personality detection model based on neural network in order to carry out more accurate personality characteristics detection in real and small data sets. Combining semantic and emotional features, this model can accurately identify different personality characteristics even with a small amount of data. Kazameini et al. [21] proposed an automatic text personality detection model based on deep learning. This model combines support vector machine (SVM) and Bert to study personality characteristics in text without using a large amount of computing resources. And in a recent paper by Kazemeini et al. [22], they used BI-LSTM model with maximum pooling layer, which can provide sentence embedding for mental statements with rich semantics with less computational overhead, and at the same time can better distinguish personality traits. These scholars have made great contributions to the recognition of personality characteristics in text information, and our research is based on their research results.