Sportsman’s Mental State Evaluation and Early Warning Method Using Decision Tree


 Aiming at the intelligent needs of psychological state assessment of university students, the text information-based psychological problem identification approach is investigated in the paper. This approach uses the text of student forums within universities as the database and introduces the convolutional neural network (CNN) model in deep learning, which contains a convolutional layer, a pooling layer, and a fully connected layer. After the convolution is completed, the convolution result is de-linearized by the activation function, and then pooling is performed to improve the fitting ability of the network for nonlinearities. For data processing, behavioral features attribute features, content features, and social relationship features are extracted from text information as the input of the CNN by using the decision tree. The psychological lexicon of expertise (LIWC) is used to enhance the efficiency of text word frequency statistics when performing text content extraction. To evaluate the performance of the proposed method, simulations are performed in the open dataset of CLPsyh2017 ReachOut Forum, and the FastText method is used as a comparison. The results show that the CNN model achieves an accuracy of 0.71 in the full-sample domain, which is significantly higher than that of the FastText model at 0.64. In the early warning evaluation of mental states, the CNN performance is better than that of FastText.


Introduction
According to the latest research statistics from the World Health Organization (WHO), mental health disorders have become the fourth most common disease worldwide. Taking depression as an example, about 300 million people worldwide are currently suffering from the ravages of depression [1][2][3][4][5]. With the development of China's economy and society, people have started to pay more attention to their mental health. At the university stage, due to the rapid changes in the internal and external environment in which students live, more students are unable to adapt in time and are prone to psychological problems. According to research statistics, the psychological problems of college students have obvious stage characteristics, and more students are unable to detect their psychological changes in time, which leads to the deterioration of psychological problems and serious consequences [6][7][8][9][10][11][12]. To detect the psychological problems of college students and provide psychological help on time, the intelligent psychological state evaluation method is studied in this paper. Considering that psychological problems are difficult to detect by themselves and that students are generally resistant to psychological counseling and investigation, the textual information analysis method is used to identify psychological problems. The Internet is an important platform for students' extracurricular spiritual life, and various social networks generate a large amount of textual information every day, which can reflect the changes in students' psychological status. Based on the text resources generated by the internal student forums of universities, this paper introduces artificial intelligence algorithms and deeply investigates the psychological state evaluation and early warning model.

Related Work
In this section we have presented some related discussion about the related works.

Mental state evaluation based on textual information
Campus forums in higher education institutions are important places for reflecting students' voices and expressing their personal opinions, and it is important to automate the monitoring of this place to grasp students' current psychological status and predict their psychological future dynamics on time. The social network-based assessment of college students' mental health requires the collection of various characteristic indicators reflecting mental health, as shown in Figure 1 [13][14][15]. It can be seen from Figure 1 when assessing mental states based on text content, the main focus is on four characteristics including behavioral characteristics, attribute characteristics, content characteristics, and social relationship characteristics.
Where Behavioral characteristics 0 W : Online behaviors that can portray the psychological characteristics of users. From the perspective of psychology, the frequency of college students' posting, commenting, and liking on the forum, as well as the length of time students are active online are influenced by the state of students' mental health.
Attribute characteristics 1 W : The characteristics left by college students on online forums that portray their basic personal information, such as age, gender, place of origin, major, whether they are single, etc.
Content features 2 W : text messages left directly by college students on the forum, which can truly reflect their true inner thoughts.
Social relationship characteristics 3 W : In psychology, the social relationship is the interrelationship between students in various contexts such as school and society due to various behaviors such as studying and socializing. In social forums, students follow each other and have different intimacy to make connections with different users. If a student is considered as a node and its active and passive attention behaviors are considered as a connection, a social network can be mapped for each student, and this network is also important for the assessment of students' mental health status. Based on the analysis in Figure 1, the task definition of intelligent mental state assessment is given here. Under the campus forum collection, the following collections are defined, which are represented by Eq.1： where D denotes the set of campus forums, P denotes N different posts in the forum, H denotes L different topics of the posts, and R denotes the coupling between posts.
At this point, the mathematical definition of intelligent psychological evaluation and early warning is as follows. For any element p in a set D, search for a mapping relation m and its corresponding set of features F by using Eq. 2： Where C is the result of the classification of text to mental state. Eq. (2) indicates that each text message published by each user corresponds to a classification, which can characterize his mental health status, and thus alert the mental health teachers in universities to provide timely intervention. The mapping relationship m used in this paper is a convolutional neural network (CNN) in deep learning. Further, in [16], the study focused on speech monitoring approach of depression recurrence in the Chinese language condition. For the speech gathering, it is distributed into two portions, semantic feature and acoustic feature. As the acoustic signal of the foreign speech database is supplementary perfect whereas domestic one is malfunction. This work states to the foreign speech database and gathers normal and depression speech database from domestic medicinal organizations. This work also targets on the algorithm of speech information, relate the numerous algorithms at abroad and home, investigate the benefits and drawbacks, then select the top form them. Lastly, the speech monitoring platform of depression recurrence is achieved. In [17], a commonly used approach Mel Frequency Cepstral Coefficient (MFCC) is presented for speech processing in depression monitoring. The low frequency MFCC can be used to recognize patient speech, however it is partial by a definite degree of noise. In genuine study, smart headsets are usually used to gather audio; noise in these processes can be efficiently reduced by high-band voice activity detection, conservative speech segment choice strategy and particularly personalized normalization algorithm.
For the detection of depression, [18] acquired the clustering data by Gaussian Mixture (GM) model and maximum probability assessment. The i-vectors vector classification method was used to combine speech quality and MFCC features. Similarly, it is proved the consistency of by speech quality features to detect depression. In [19], the possibility of multilingual database in the detection of depression by matching Turkish and German language is established, and discovered the possibility of multilingual fusion algorithm, which delivered a decent example for the formation of a comprehensive database in China.
In the above works it is to be noted that, most of the work focused on simple features, data dependent feature to the detection depression. Compared to the previous work our proposed deep learning CNN model performed superior in sportsman's mental state evaluation and early warning.

Convolutional Neural Network
The basic structure of the CNN model used in the paper is represented in Figure.2. This contains an input layer, convolutional layers, pooling layers, and a fully connected layer. The convolutional layer performs the convolutional operations and the pooling layer performs the pooling operations. In the input layer, the text content is first processed into a sequence of word vectors of length n with the help of LIWC and then converted into a sequence of word vectors of length n with the help of word embedding (Word2Vec) by using Eq. 3.

1)Convolutional Layer
In the convolution layer, the word vector is first divided by using Eq. 4.
Convolution is a unique operation in CNN, and the convolutional operation can be used to obtain local semantic information at different locations of the text for feature detection and extraction with the help of convolutional kernel windows of different sizes. For the division vector in Eq.4, the convolution operation in Eq. 5 is used to process one by one： (5) Where ʄ is the convolution kernel function used in the convolution, → is the eigenvalue obtained after convolution.
After the convolution is completed, the activation function is used to de-linearize the convolution result and then perform the stitching. At this point, the feature matrix G of the output of the convolution layer can be obtained in Eq. 6. , , , , , 2)Pooling Layer The pooling layer is used to downsample the features obtained by convolution to reduce the feature dimensionality and prevent the network from being too complex, resulting in reduced operational efficiency and overfitting. The pooling method used in this paper is maximum pooling, which is represented in Eq.7. 3

)Fully Connected Layer
The fully-connected layer is used to connect all the feature values obtained after convolution and pooling operations and use them as the final feature vector to characterize the text information. The computation in the fully connected layer is performed by using Eq. 8.
Where ⃗ is the original feature information obtained after full concatenation and y is the final classification result.

Proposed Approach
In this section different phases of the proposed approach is discussed.

Data Pre-processing
Since it is oriented to the internal forum of university students, the textual information of existing open forums can be selected for the simulation of the model to ensure the fit of the application scenario. In this paper, the training set of the CLPsyh2017 ReachOut forum is selected. In Table 1  In Table 2 the structure of each data item is represented. In this dataset, each piece of data consists of 6 parts including time of posting, author, section, number of reads, number of likes, and content.  Table 3 represents the data set collected is labeled into 4 categories and the amount of each category.
 Crisis: Indicates a psychological problem tends to self-aggression.  Red: Indicates a psychological problem that suffers from severe psychological distress.  Amber: Indicates a psychological problem that is likely to occur;  Green: Indicates a psychological problem that has a low probability of occurring. The amount of data corresponding to each category is shown in Table 3. Table 3 Amount of data for each sample category  Sample category  Training set  Test set  Crisis  40  42  Red  137  48  Amber  296  94  Green  715  216  Total  1188  400  Unlabeled data  65755  91806 The paper is based on the LIWC dictionary for text data processing. In extracting linguistic feature information, the Eq. 9 is used： For a post with sample size |D| and length : (9) The frequency of occurrence of its category l (Crisis) in the category in Table 3  Based on the word frequency, the standard deviation of words can be calculated in Eq.11. The larger this indicator is, the greater the difference in such mental issues for the words in that category. The standard deviation is calculated as follows：   Where term frequency (TF) is calculated in Eq.12.
The loss function used in the training is determined by Eq.14.
When determining the number of iterations of the model, the number of manually labeled samples is small due to the samples used in the paper. If too much iteration is performed, the CNN network will be overfitted. If the number of iterations is too small, the accuracy of the model will not reach the requirement. Figure 3 shows the accuracy of the model with a different number of iterations in the training and validation sets. It can be seen that when the number of iterations of the training set is small, the accuracy of the validation set is consistent with that of the training set and the model accuracy is low. When the number of iterations of the model is large, the accuracy of the model in the training set increases, but the gap between the accuracy of the model in the validation set and that in the training set becomes larger. At this time, the model appears to be overfitted. Therefore, to balance the relationship between model accuracy and over-fitting, we have chosen about 600 of the model. Fig.3. Relationship between the number of model iterations and model accuracy.
In Table 4 the final parameters of our proposed neural network model are discussed.

Simulation Results
To better evaluate the effectiveness of our model in identifying the psychological states of college students, the FastText model was introduced for comparison experiments. Before experimenting, the original 4 categories of Crisi, Red, Amber, and Green were reclassified to distinguish the different mental states. Table 5 shows the five reclassified categories, for which the two metrics F1 and Acc (classification accuracy), which are commonly used in machine learning classification problems, are used as the evaluation metrics of the models. The test results of the two models are shown in Tables 6 and 7. It is important to note that in the model's index performance, Non-Green F1 is the average of Non-Green F1, which reflects the model's ability to identify all psychologically unhealthy students in the validation set, while Flagged F1 is the average of Green F1, which reflects the model's ability to distinguish psychologically healthy samples from psychologically unhealthy samples. The results showed that the CNN model improved by 0.05 and Flagged F1 improved by 0.06 compared with the FastText model in terms of Non-Green F1, which indicated that, the CNN model improved in both sample differentiation and recognition of non-healthy samples. The CNN model improved by 0.11 over the FastText model for Urgent F1, which indicates that the model is more capable of distinguishing between general and urgent psychological problems. This can help students get help quickly for their psychological problems. In terms of the accuracy of the model for each category, the CNN model outperforms the FastText model; in terms of the accuracy of the full sample, the CNN model achieves an accuracy of 0.71, which is higher than the FastText model's 0.64, an improvement of 0.07. In summary, the CNN model has a better performance in the evaluation and warning of psychological states.

Conclusions
To achieve intelligent evaluation and timely warning of psychological problems of students in higher education institutions, this paper uses the textual information generated in students' daily life for the extraction of relevant features of psychological problems from the perspective of monitoring public opinion in campus forums. Compared with traditional psychological questionnaires and psychological counseling, it can detect students' psychological problems in study and life in a more hidden, effective, and timely manner. The simulation results show that the intelligent psychological state recognition method based on CNN proposed in this paper has better accuracy and differentiation ability in the recognition of various psychological problems, and can be applied to the existing psychological work in universities.

Data Availability
The datasets used during the current study are available from the corresponding author on reasonable request.

Conflicts of Interest
The author declares that there are no conflicts of interest.