Predicting the next word using the Markov chain model according to profiling personality

Understanding human data has been the focus of philosophers and scientists. Social media platforms encourage people to be creative and share their personal information. By analyzing data, we will be able to identify people’s personalities and information that is also important to specific profiles. The aim of this paper is to propose an approach that predicts the next word during writing a sentence based on the user’s personality. To achieve this goal, our approach is illustrated by two points: (1) An approximate extraction of the big five personality model for a specific user from his tweets. (2) Predicting the next word while a user is writing a new tweet depends on his personality using the Markov chain model. On the basis of these two notions, our approach makes writing posts easier by predicting and suggesting next words based on the user’s personality. Experience represents the ease of predicting the next word during the writing of a new post related to individual potential.


3
Predicting the next word using the Markov chain model according… to anticipate the preferences and actions of the people we interact with in order to establish effective cooperation. Personality recognition is applied in many fields and has several methods to extract it. Most of those studies have emerged from texts in which they focus on the analysis and examination of textual samples. Several studies have found a strong correlation between personality traits and linguistic characteristics. There are many models used to profile user personalities [2].
Numerous strategies have been proposed to automatically determine a user's personality based on their content [3]. However, the accuracy of these strategies is a critical factor in their performance. Therefore, we chose to use the big five personality model as the foundation for extracting user personality in this paper, as it is the most widely used and appropriate model for predicting personality traits. The big five personality model has been extensively researched and validated in numerous studies [4], and it has been found to be more reliable and valid than other personality models such as the Myers-Briggs-Type Indicator (MBTI) and the DISC model. Recent studies have further supported the superiority of the big five personality model, such as a meta-analysis by Ng et al. [5] that found the big five personality model to be more stable and better at predicting job performance than the MBTI. Another study by Spurk et al. [6] found that the big five personality model was more strongly associated with career success than the MBTI and other personality models. In contrast, the DISC model has been criticized for lacking empirical support and for having poor reliability [7]. Overall, the big five personality model is considered the most robust and scientifically valid approach to personality assessment.
Alternatively, people text each other frequently, and every time users try to compose a text, a suggestion pops up, trying to predict the next word they want to enter. One of the applications that NLP deals with is the prediction process [8]. However, when a user is entering text on a mobile device, it can be helpful to suggest the next word so that typing time is reduced and errors are avoided. The problem we found is that the suggested prediction words sometimes are not compatible with user needs. However, smart phones suggest words based on historical user data or selected language dictionary [9] and for search engines like "Google, Yahoo, Bing..." suggest next words based on geographic data or user profile [10]. For our approach, we aim to inject user personality as a parameter to predict the next word using Markov chains.
Knowing one's personality can help a person gain a better understanding of himself and the people around him. It can assist him in identifying his strengths and shortcomings, comprehending his emotions and actions, and regulating his conduct in various settings [11]. The availability of a significant amount of high-dimensional data has cleared the path for marketing initiatives to be more effective by targeting specific consumers. Personality-based communications are extremely effective at increasing product and service acceptance and attractiveness [12].
In our approach, we focus on profiling personality characteristics from users' tweets. Then, our program predicts and suggests words based on his personality when writing a new tweet. This facilitates the writing of tweets for each user based on their personality, and mainly to give suggestions and orientations for not posting aggressive texts.

3
By summarizing, the document structure is as follows: In the first section, we provide an overview of relevant publications as well as the main problem that our technique addresses. The second section introduces the main idea of the solution. In the third section, we introduce our implementation and discussion. Finally, we provide a general conclusion and suggestions for additional research.

Related works
Recent research has shown that personality computing can be used to predict user behavior on social media platforms [13]. In a primary research study by Liu et al., the authors analyzed social media data to predict users' big five personality model using a deep learning model [14]. The study found that their model achieved high accuracy in predicting users' personality traits, demonstrating the potential for using machine learning techniques in personality computing. Other studies have also explored the use of machine learning in predicting personality traits. For example, Schuller et al. used machine learning techniques to predict the big five personality model based on speech data [15]. Their study found that vocal features were strong predictors of personality and that machine learning techniques could effectively predict personality traits. Overall, these studies demonstrate the potential of machine learning techniques in predicting personality traits, which could have important implications for targeted advertising and content recommendation on social media platforms.
Much research has been conducted to predict personality on social media [16]. For example, Pednekar and Duny conducted a data mining method using social media to identify the essence of personality [17]. Kosinski et al. [18], have identified the personality patterns of Facebook users. Various methods using different classifiers and feature spaces have been proposed for clustering human's personality. Until recently, the majority of models have relied on shallow learning techniques like support vector machine (SVM) [19], naive Bayes classifier [20], K-nearest neighbors (kNN) [21], and logistic regression (LR) [22].
Currently, there are several ways used to profile the user's personality, such as the big five personality model or OCEAN model [23], Myers Briggs-Type Indicator (MBTI) [24], and Dominance Influence, Steadiness, Conscientiousness (DISC) [25].
Generally, most of previous next word suggester/predictor concentrate on two models, N-gram model or long short-term memory (LSTM). However, The statistical language model has been presented for a long time, with Katz introducing a nonlinear recursive algorithm to solve n-gram language prediction in 1987 [26] and Stolcke proposing a language model using a method called Bayesian Learning [27]. Yoshua et al. went on to develop a distributed representation for words that improves the n-gram model [28]. Great feed-forward networks were built using LM in [29]. Recently, Sundermeyer et al. [30] have developed an upgraded LSTM, while Mikolov et al. [31] designed an RNN model.
Traditional models are easy to use and perform better than deep learning models in some situations [32]. We chose the n-gram model based on Markov chain for our project for the following reasons: Markov chain is very insightful. It can identify the Predicting the next word using the Markov chain model according… areas of any process where we are deficient, allowing us to make changes in order to improve. The memoryless quality of a stochastic process is referred to as the Markov property. Also, any size of system can readily determine its very low or modest computation requirements.
In general, our approach would focus on profiling the big five personality model from Twitter posts. Using the Markov chain model, we will be able to predict the next word when writing a new tweet targeted at a user's personality.

Problem formulation
To completely understand their users' activity, several research initiatives are working on gathering metadata from their products and platforms [33]. Understanding user behavior, on the other hand, pushes companies to improve the quality of their various products and services [34], and then present their products to fit the user's desire [35]. When searching in search engines or writing on a specific platform, however, we notice that most solutions suggest words or sentences that follow what we have already written as the proposition of what we want to write. Its recommendations are based on past research done by other users and even by their regions [36]. Sometimes the user does not object if he finds suggestions that contradict his principles, culture, or general personality. This situation, companies will lose their users. Therefore, our objective is to analyze the user's personality through their historical data and suggest to him targeted data appropriate to his personality and behavior. In this paper, we will focus on extracting the big five personality model from previous tweets of a user and then suggest the next word for his sentences when he is writing a tweet using the Markov chain model.

Research method
By the 1990s, it had been commonly understood that both situational and personality factors influenced short-term behavior [37]. The research and improvement of the OCEAN, or big five personality model, was still in progress, proving its influence until today [38]. The big five personality model, or "ocean traits," are presented as follows: • Openness This indicates a willingness to try new things and think outside the box, and is sometimes referred to as intelligence or imagination. Insight, inventiveness, and curiosity are all qualities to look for. • Conscientiousness Need to control the desire for immediate gratification by being careful, vigilant, and self-disciplined. Ambition, discipline, consistency, and dependability are all characteristics. • Extroversion Rather than a person, a state in which an individual pulls energy from others and wants social relationships or contact (introverted). Outgoing, active, and self-assured are characteristics.
• Agreeableness The way a person interacts with others, as measured by their level of compassion and collaboration. Wit, sociability, and loyalty are among the traits. • Neuroticism Negative personality traits, emotional instability, and self-destructive thoughts are all risk factors. Pessimism, anxiety, insecurity, and fear are all characteristics of the human personality.
Our approach has two main points, as shown in Fig. 1. First point (1: Extract big five personality model): We are going to focus on a real database from one of the social networks and then extract all the posts filtered by their users. Through data profiling [39,40], we need to extract an approximate personality score of a user from an existing library and store the results in our repository. Second point (2: calculate distance and predict next word), predict the next word for the new post based on the user's personality, and we can also adapt our algorithm to suggest words from another personality.
In the first point of our approach, we were inspired by Navonil Majumder et al. paper [41]. Its main idea is to use deep learning algorithms to detect personality from text based on document modeling [42]. Moreover, the approach focuses on processing the input data in a hierarchical manner by analyzing and evaluating every Predicting the next word using the Markov chain model according… single word and then combines words to make n-grams, n-grams to make sentences, and sentences to make a complete document (Fig. 2).
After editing the program to extract the five characteristics necessary to detect the personality, we did a separate analysis for each user to detect their personality and classify them among other users. In our case, we take all the tweets that are made by a single user and calculate the average value as a representation of his permanent personality. This analysis could be done for an interval of time. However, when a user wants to write a new post, the system already knows his personality, so it will suggest the next words for his tweet depending on his personality and his behavior.
Jack Dorsey, the founder of Twitter, posted the first tweet on March 21, 2006. The billionth tweet was not reached until the end of May 2009, after a 3-year wait. One billion tweets are sent in less than 2 days nowadays. So, we are speaking of data, which is extremely hard to calculate. For that, we propose a method to classify the results of the personalities that we have already profiled. However, the main framework of personality psychology is the big five personality model. Characteristics of personality are conceptualized as five independent continuous dimensions in this model. If we divide each dimension at the median to produce personality types, we get 32 different types, in which individuals are above or below the median in neuroticism, extroversion, openness, agreeableness, and seriousness. If these five dimensions are completely independent of each other, we will see that individuals are equally assigned to one of 32 types.
In the second point of our approach, we have to predict the next words when the user writes a new tweet. In fact, the system already knows the average personality of each user as well as their classification. Therefore, the system will suggest the next word in the cluster to which the user belongs. In this part, we used the Markov chain algorithm.
Markov chain is a type of stochastic process that is distinct from others, in that it must be "memoryless" [43]. Future acts, on the other hand, are not reliant on the steps that lead to the current situation. The Markov property is the name for this. Markov chains theory is important, because so many discrete processes (X n ) n≥0 satisfy the following Markov property: Equation 1 is a formula that calculates the probability of the next word in a sentence ( i n ), given the preceding word ( i n−1 ). The right-hand side of the equation represents the joint probability of the next word occurring ( i n ), given all the preceding words in the sentence ( i 0 through i n−1 ). To help clarify the variables used in the equation, X n represents the nth word in the sentence, and i n represents the specific word we are interested in predicting. Similarly, X n−1 represents the previous word in the sentence, and i n−1 represents the specific preceding word.
By using this formula in our analysis, we can predict the most likely next word in a sentence, taking into account the context of the preceding words. (1) This indicates that all of the knowledge needed to forecast the future is included in the current state of the process and is not dependent on previous states.
Time-homogeneous Markov chain 44 is a common example in probability theory where the likelihood of state transitions is not affected by time. To visualize this process, a labeled directed graph can be used where the labels of any vertex's outgoing edges add up to 1.
For example, the Markov chain (homogeneous in time) based on the foundations of states A and B is shown in Fig. 3. To move from A to B after 2 steps, the process must stay on A in the first step and then move to B in the second step, or move to B first and then to B again.
To clarify the description of Fig. 3 and its relation to the formula, the figure illustrates a Markov chain with two states, A and B, and the directed links represent the transition probabilities between the states. The probability of moving from A to B after 2 steps can be calculated by multiplying the probability of staying on A in the first step (0.3) with the probability of moving to B in the second step (0.7), and adding it to the probability of moving to B first (0.7) and then moving to B again (0.2). This gives a total probability of 0.35. Similarly, the probability of the process being on A after two movements is obtained by multiplying the probability of moving from A to A in the first step (0.2) with the probability of staying on A in the second step (0.8), and adding it to the probability of moving from B to A in the first step (0.8) and then moving to A in the second step (0.3), which results in a probability of 0.65. The complement of this probability, 1-0.65, gives the likelihood that the process will be on B after two steps, which is 0.35.
It is important to note that the example given in Fig. 3 is just a cyclic example with two steps, and Markov chains can have many more steps and even more complex structures, without necessarily creating cycles. In fact, Markov chains can have various shapes and topologies, including linear chains, branched chains, and more intricate networks. The calculation of probabilities and likelihoods for these more complex Markov chains can be done in a similar way as for the two-state cyclic example. However, the number of states and the transition probabilities between them will have to be taken into account to accurately compute the likelihood of a particular state or sequence of states.
Note again that the last formula 1 expresses the fact that, for a given historical record, only the current state, not the previous state, the probability distribution of the following state is influenced.

Fig. 3 Markov chain graph
Predicting the next word using the Markov chain model according… Now, we will focus on predicting words. Assume we wish to create a system that, when given an incomplete sentence, will make an attempt to guess the next word in the phrase. To deal with word prediction cases like this, we model it as a Markov model problem. Each word must be treated as a state ( i t ), and the next word must be predicted based on the previous state ( i t − 1 ). This situation is very suitable for the Markov chain model. To emphasize this point, all the unique words from our database could form different states. Moreover, the probability distribution consists of determining the probability of a transition from one word to another.
To fully understand the application of Markov chains in our approach. Considering the following example, let us take three sentences as follows: • I like Engineering. • I like Science. • I love Mathematics.
All of the words in the preceding sentences are unique, "I," "Like," "Love," "Engineering," "Science" and "Mathematics" can form different states. In other words, probability distributions are all related to figuring out the chances of a transition from one state to another, in this example, the transition from one word to another state. In this scenario, it is clear from the above example that the first word always starts with the word "I." Therefore, the first word of the sentence has a 100% chance of being "I." In the second state, we must choose between the two words "like" and "love." The probability distribution now represents the likelihood that the next word is "like" or "love," assuming the previous word is "I." The word "like" appears in two of the three phrases after "I" in our example, although the word "love" appears only once. Therefore, there is approximately a 67% or 2/3 probability of successfully obtaining "like" after "I," and a 33% or 1/3 probability of "love" occurring. Similarly, the probability of success for "Science" and "Engineering" is fifty-fifty. In our case, "mathematics" is always after "love." Figure 4 presents graphically the probability of a transition between words.

Fig. 4 State transition diagram for our example
To generalize, we propose the following algorithm 1 as the basic procedure for our approach.
The concept of our algorithm is to start by defining two variables: • first_possible_words contains the first words of each sentence of the tweets with its transition properties. • transitions all the possible transitions between states (the words which are related to each other).
Our algorithm starts by decomposing every tweet from the previous suggestions into words. Through analysis of the words, we save all the first unique words in the first variable with their possible probabilities of starting. Then, if the word is located at the end of some sentence, the program will store the word and propose the next word as "END." Otherwise, we will store all words that have a relation between them and their transition probability.
Predicting the next word using the Markov chain model according…

Implementation and discussion
We started the realization by implementing the first point of our approach. However, we have uploaded a database from the Kaggle platform under the name [45], and this database was extracted through the Twitter api. It contains 1,600,000 tweets from 659,775 different users. We have been inspired by a library called "big five personality model" by the official package manager for NodeJs programs (NPM). This library was created by Peter Hughes, which used the big five personality model lexica data from the World Well-Being Project and obtained a Creative Commons Attribution-Non-commercial-Same Sharing 3.0 Sharing License (CC BY-NC-SA 3.0). Through this library, we could analyze and extract approximately five-factor model features from each tweet.
To optimize our analysis, we performed the five personality traits on 100, 1000, and 10,000 tweets from the dataset. We get similar results from one input to another. The error between each conversion is almost nonexistent. (The standard error does not exceed 2%.) Figure 5 shows that we could say, the mean average is close between one stage and another. Also, through analyzing dataset, we  profiling some metadata concerning the tweets of the population, as shown in Table 1. The result concluded that we could generalize the mean average for the big five personality model for the whole dataset. Now, we will extract the approximate personality from a random user. First, and using our data source as a search engine, the user who has multiple tweets and is also active in Twitter is (lost_dog). Between May 1, 2009, and June 25, 2009, he posted 549 tweets. After profiling his personality, we found the results as shown in Fig. 6.
In this study, we applied a clustering approach to decompose the database into 32 clusters based on the metadata extracted from the users' tweets. Each personality trait was used to break down the database into two groups: one with a higher than average component value and the other with a lower value. This resulted in 2 5 = 32 clusters for the five personality traits. To demonstrate the effectiveness of this approach, we applied it to the most frequent user in our dataset, lost_dog, who had 9917 tweets. The results of this analysis are presented in Table 2, where each vertical column represents the information for a specific type of big five personality model. The column "top" describes whether the user frequents on a positive or negative value for each personality type. The row "freq" indicates how many times the user exhibited a positive or negative value for each personality trait, and the row "avg" represents the average value for each personality trait. Overall, this clustering  Predicting the next word using the Markov chain model according… approach provides a useful method for analyzing and understanding the characteristics and behaviors of users on social media. We will now establish a relationship between the user's personality and the personality of others and all of the terms in the database. We all know that this dataset is only a small part of the actual database. Therefore, we suggest calculating the distance between the user's average personality and the tweets that are included in his cluster or even in the clusters closest to his personality. For example, through the Euclidean distance, we will predict the words only from the 1000 first tweets at the closest distance.
In the second part of our approach, when the user composes a tweet, we must recognize the next word prediction based on his personality. In fact, through the Markov chain algorithm. The system is always on, so when the user writes a word, the system analyzes the data (words and sentences) that depends on his personality, and suggests the most relevant words to continue the sentence, thus accepting his personality. These suggestions are listed in order, from most appropriate to least. In our system, we configure the program to suggest only the first four words, as shown in Fig. 7.
Through our approach, we could suggest words when writing a sentence according to a personality, either from the user himself or to give another personality. In fact, we have applied our program to two datasets, the first without any modification. We get the tweets randomly from the database, and then, we apply the second part directly. And in the second experience, we respect the procedure of our approach and we apply both parts. The results obtained after training our algorithm are illustrated in Fig. 7. The program is based on the fact that each time the user enters a word, it reviews the suggested words depending on their probabilities.
Application 1: Applying the second part without profiling personality.  Based on our observations, applications (1) and (2) have the same vision, which is to predict the next word when writing a new post. The difference is that the first application (1) does not filter when using the database, but relies on historical tweets. The second application (2) uses our approach (predicting the next word based on the user's personality). In fact, we found a huge difference between the outputs of the experience. We focused on writing both sentences with the meaning "I just hate saying goodbye" and "I really miss you." Indeed, depending on the big five personality model of the "lost_dog" personality, we calculate the distance between the two sentences and his personality. We notice that the first result (without profiling) is far from 0.54, but the second (with profiling) is only far from 0.24. This means that the second sentence is more relevant than the first. Table 3 summarizes the evaluation metrics for two applications used to predict the next word in a sentence. These metrics were obtained from an experiment that was conducted to compare the effectiveness of these two applications. Application (1) uses historical tweets to predict the next word, while application (2) incorporates personality profiling to make the prediction. The evaluation metrics were calculated using a dataset of tweets that was collected and preprocessed for this experiment. This dataset was representative of the general population of Twitter users and was randomly sampled from a period of three months. The experiment was carried out on this dataset, and the results were analyzed to provide insights into the effectiveness of the two applications. The evaluation metrics used were precision, recall, F1 score, accuracy, and mean distance, which are commonly used to assess the performance of natural language processing applications. Therefore, our approach has successfully outperformed the results of the previous forecasting methods.

3
Predicting the next word using the Markov chain model according…

Conclusion and future works
In this paper, we profile user personalities based on the big five personality model to predict the next word when writing a new tweet using the Markov chain model. Through our approach, we found that there is a big difference between predicting the next word with and without profiling the user's personality. Moreover, from the comparison, it might be said that the suggested technique was successful in identifying and detecting the accuracy of the personality prediction and suggesting the target words for each user.
In the future, we plan to apply this approach to speech recognition and to develop our real model to precisely profile the user's personality. Therefore, we improve our system power using other prediction methods such as the hidden Markov model.  Data availability Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.

Declarations
Conflict of interest All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.
Ethical approval Not applicable.