Accurate Recommendation Approach of Psychological Consultation Information Based on User Portrait and TAG_SVD_CF

-A hybrid recommendation algorithm of psychological counseling information based on user profile and item tag attribute with singular value decomposition （ SVD ） technology is developed. To solve the problem of data sparsity of the recommendation algorithm, the SVD technology is applied to collaborative filtering algorithm for optimizing the user item rating matrix. The recommendation algorithm includes two parts: generating medical user profile in accurate recommendation of medical information, and realizing the storage, query and update of user profile, where the index system of medical portrait was established from demographic attribute, interest label dimension and business social dimension. The performances of the developed algorithm are investigated by compared with the traditional cosine similarity and Pearson similarity. The results show that the proposed similarity has a lower mean absolute error and significantly improves the accuracy of the system.


Introduction
With the blowout development of the Internet industry, more and more ways to obtain information are available.
People gradually change from active access to information to passive access to information, and the amount of information is also exploding in a geometric multiple. In today's medical security has also been significantly improved with high-tech support and complete hospital facilities. However, in the eyes of many patients, even if the hospital services are perfect, the hospital is far from being a user-oriented place [1] . Although a large number of scientific and technological achievements have been rapidly put into medical and health applications, they have never reversed the bad user experience that hospitals give patients. Unlike traditional search engines, recommendation systems can actively help people filter information and provide personalized services by using data mining, artificial intelligence and other technologies. With the increasing expansion of medical health data, recommendation algorithms have drawn attentions from many researchers due to its importance and wide application in dealing with the massive medical data collected efficiently and efficiently.
SVD is one of the most classic and commonly used recommendation algorithms in the industry and academy areas, which was investigated by many researchers from theoretical and experimental perspectives SVD is a commonly used matrix decomposition algorithm to reduce data dimension. Li Chunchun and others pointed out that SVD algorithm used in collaborative filtering recommendation shows excellent prediction accuracy and stability, and quickly becomes one of the most popular recommendation algorithms [2] [3] . Wang Jianfang and others put forward svd2 algorithm to add bias to users and projects respectively. In order to reduce the number of parameters and regard user features as a function of project features, nsvd and nsvd2 algorithm are further proposed [4] . Wang Quanmin proposes an SVD++ algorithm based on implicit feedback information, which regards user-scored items as implicit information only with explicit scoring information.
TimeSVD++ further considers that user and project characteristics will change with time [5] . The domestic academic circles have also made remarkable achievements in the research and application of user portraits: Fei Peng, Lin Hongfei and others have proposed a method to construct user portraits from the perspective of multi-perspective fusion framework [6] . Wang Qiangbing et al. proposed a user portrait model that integrates content and user gesture behavior to improve the efficiency of user portrait construction [7] . In the current medical field, Momeqi and Xia Zhiping made detailed analysis on the characteristics of medical data and realized a hierarchical CF recommendation based on subject words [8] . This system is an earlier system for recommending medical information, and named Demo Meb PRS. Master of Zhejiang University, Yu Baofu and others also proposed personalized medical information recommendation based on hobbies [9] , but did not propose an effective doctor recommendation and case recommendation for patient data information in a specific hospital environment.
Generally speaking, the researches on recommendation algorithms in the field of medical health have made some breakthroughs and the recommendation technology for patients' portraits to meet their own conditions is also developed. Due to the complexity of the environment in Psychological Consultation, the problems of sparse data, cold startup and user interest migration should be solved and optimized for the recommendation algorithms of psychological consultation. The traditional collaborative filtering recommendation algorithm only considers the user's product rating information for generating recommendation, which is vulnerable to the impact of many missing data scores. To solve the problem of data sparsity of the recommendation algorithm, a hybrid recommendation algorithm of psychological counseling information based on user profile and item tag attribute with SVD technology is proposed in the paper. The improving performances of the proposed recommendation algorithm will also be carried out by compered studies with the traditional algorithms.

User Portrait Label Model
The user medical portrait model proposed in this paper has three main features: the basic characteristics of users (demographic attributes), the medical domain characteristics of users, and the business and social dimensions of users [10] .
Because in this precise recommendation system in the field of health care, the social relationship and similarity between users are the same as the user's interest tendency, which will not only affect the choice of customer goals, the accuracy of user's medical portraits, the rationality of doctors'and hospitals' recommendation, but also affect the final result.
The effect of personalized recommendation system and the user's social relationship (similarity) should also be fully taken into account [11] . At the same time, it is necessary to fully tap the information of users'Internet consultation and various registration information provided by users, and to describe a skeleton and detail user (including patients and doctors) with disease knowledge base. to represent a user portrait information. As shown in Figure 1. Similarity similarity vector [12] . In the above multi-level user portrait model, the user's business dimension is merged into tag vector Tag in the form of tag in the field of household medicine. When processing, the user's social relationship is taken out separately, which is more convenient for our calculation. Each model is described in detail below.
(  recommendation or recommendation algorithm rating [13] . between them can reflect their similarity to some extent [14] .

User Portrait Generation
The generation of medical user portraits is actually the generation of Users models: demographics, relationship and MedAttr. Because demographic attributes are provided to the system by registered information of patients, the dimension is a static attribute vector that can be extracted directly without specific training.  [15] . The following four steps are used to build the user topic domain vector model: Step1: Grab the basic information, browsing behavior, search records and interactive text data that users fill in when they register on the website.
Step2: Processing the eigenvalue extraction algorithm for all the text data information obtained in Step1: including step, and constructing the text eigenvector Step3: The Doc feature vectors obtained by Step2 are used to compute the probability of patients' classification in the four areas mentioned above by using Naive Bayesian algorithm (1), (2). The probability vectors 1 2 3 4 , , , P P P P P   are used to obtain the probability vectors in the field classification.
In formula (1) The dimension of label vectors should also be considered when building user label feature vectors. That is to say, some users have more dimensions of label vectors.
Therefore, the length of tags should be truncated to facilitate updating, management and maintenance [16] . Some less active registered users may have fewer tag features for their tags, and these vectors need to be extended appropriately. In this paper, a dimension is defined to handle the user's image tag feature attributes more conveniently, and the length of the tag is specified to be 10. When the number of dimension of user's label eigenvector is more than 10, labels should be sorted according to their weights. Only the first 10 labels with larger weights are selected as the label attribute vector of the user. On the contrary, when the number of tags is less than 10, it is necessary to consider extending the tag vector.
First, the tag feature vector (N < 10, N stands for dimension) is obtained through the API interface provided by the system, and then the recommended tags provided by the system are obtained. Finally, according to the user's original label and recommendation label, the first 10 label feature attributes are selected. (

3)Generation of Indicators of User Social Relations
In this paper, we use the similarity between users to represent the characteristics of users'social relations. That is, Topici represents the similarity degree of label vectors of u and i . In the medical information accurate recommendation system of this paper, a label is formed for each user through the label annotation engine. The label is obtained through related text processing and feature extraction, not the final user portrait model. After getting tag vector, cosine similarity is used to calculate tag similarity between users [17] .
But sometimes there is a synonym in the tag, that is, there is a difference in the expression. For example, the tag vector of user 1 u contains "hyperglycemia, educator", while the tag vector of user 2 u contains "teacher, hyperglycemia". The similarity calculated for this problem is 0, but it is not the case in fact. Therefore, it is necessary to preprocess or synonym and synonym of the collected tag features, and then calculate the similarity. First, establish a thesaurus of synonyms, and then the specific steps are as follows: Step 1: the system itself obtains the tag library through the dialogue data training between the user and the intelligent robot. When extracting the tag, the tag attributes of user u and user i can be obtained directly from the API interface provided by the system.
Step 2: to solve the influence of synonyms, look up whether there are synonyms in the labels of user u and user i in the established synonym dictionary. If it exists, it is represented by the unified label keywords in the dictionary; Step 3: the tag vectors obtained through step 2 are all processed tag vectors without synonyms. Get the label vector u Tag of user u and the label vector i Tag of user i , expand the length of the vector or reduce the weight.
The rule of extension is to add a blank weight to the processed label vector if a synonym exists.

Collaborative filtering algorithm based on user rating
Because the similarity between users can not only be calculated by the user's rating of the recommended items, but also be analyzed by the user's tagging of the recommended items. When two users label the target item similarly, it can be shown that the two users have similar interests and preferences to a certain extent, that is, the two users have high similarities, and the items recommended by one user should also meet the needs of another user well [18] .
If user u is interested in label attributes of a project, then each user in the neighborhood user set ui similar to user u should have some common interests and preferences with u . For example, the information of disease and doctor in user u 's subject interest domain classification is too much, so the neighbor users of user u should be more interested in the subject interest domain related to disease and doctor. Therefore, more and more recommendation algorithms begin to use user information and project information to improve the core algorithm, and the more perfect the modeling of users and projects, the more accurate the information of recommendation results.
Collaborative filtering algorithm process integrating Tag attributes: (1) Calculated to the preference matrix S of tag attribute and the scoring data matrix R of the user. Among them, the dimension of attribute preference matrix S is mk  . m represents the number of users, the total tag feature of the whole is k .
ij Weight represents the total weight of the j th tag feature of all items evaluated by active users. The R matrix is used to represent the scoring data of users. Its dimension is mn  , dimension m represents the number of registered users, and dimension n represents the number of items to be recommended.
indicates that the user i has scored item j in the past period of time, and the score value is ij r . When 0 ij r  , it means that user i has not scored item j in the past period of time. The schematic diagram of the matrix is as follows: 11  , which is a reasonable fusion of the two similarity of user scoring and tag attribute preference.
Finally, the following formula is used to measure the comprehensive similarity between user u and user v : the higher the importance. The value of w can be set dynamically. After a large number of experiments, we can finally find a value of w , so that the performance of our recommendation system can reach the best in Mae index.
The corresponding weight is the value we ultimately need as the similarity measurement parameter. Using the above formula(9), we can get the similarity matrix between the target user and other users as follows. 11  Similarity is used to represent the similarity of user i and user j , and according to the symmetry, the elements on the diagonal are equal.
(4) When recommending to the target users, it is mainly to select k neighboring users with high matching degree as their neighbor users according to the similarity matrix Similarity , which is customarily called k-neighbors, that is, the user's neighbor set. Then formula(10) can be used to predict the unknown score of the target user. The score value is , ui pre :

Singular value recommendation algorithm
For a matrix M whose dimension is mn  ( mn  ), it can be decomposed into three matrices after SVD Optimization: matrix The U in formula (11) is an orthogonal matrix, its dimension is mm  , and the definition of the orthogonal matrix meets the following requirements: Step Output: rating matrix '' R .
The recommendation based on SVD optimization can well preprocess the scoring data through dimension reduction technology, alleviate the sparsity of the data, and make the system can find more hidden main feature information while processing [19] . Not only for users, users' subject interests and hobbies can be represented by vectors of keywords of different quantity dimensions; similarly, the items to be recommended in the system can also be represented by vectors of different quantity of tag dimensions (tag attribute) [20] . Finally, the recommendation of the target user can match the user's image feature vector with the item tag vector to obtain the similarity calculation.  Steps:

Detailed steps of accurate recommendation algorithm based on TAG_SVD_CF
Step 1: in this paper, SVD technology is used to optimize the sparse problem of the original scoring matrix R , and then fill in the missing value: according to the existing literature [21], for the data on the training set, The prediction ui P of the target user u to the unknown score represents the prediction value of u to the item i . ui P can be calculated by the following formula: In the above formula (12), _ u R is the average score of the target user u on all items that have been scored can be obtained by simple calculation. Matrix U , matrix S and matrix V are obtained by SVD decomposition technology. After processing in formula (11), three matrices k U , k S , k V with dimension K can be obtained, that is, the dimension parameters reserved after processing. The selection of K is mainly through the following experimental part, and then a complete scoring matrix ' R without missing value can be obtained.
Step 2: for the matrix obtained in 1) above, calculate Step 3: when calculating the predicted value, it is necessary to calculate the preference of the tag attribute of the item. According to the formula(8), (9) and (10) Step 4: recommend according to the predicted value of step 3.

Output: recommended result sets
The accurate recommendation algorithm based on Tag _SVD_CF is applied to the recommendation of medical information. The recommendation system studied in this paper is mainly through the research of user portrait, and the design of personalized recommendation algorithm for accurate recommendation of medical information. The core recommendation algorithm is to use SVD technology to optimize the matrix of the user disease score data obtained and processed, and to solve the data sparse problem caused by the absence of some score values in the matrix. Then, we introduce the similarity of user profile's label attribute while calculating the user similarity. We use w as the coordination coefficient to get the hybrid similarity based on Algorithm steps: Step 1: get all user profile data according to the acquired user data, disease knowledge base and data acquired after processing. Extract the top 10 labels of each user's portrait label weight according to the needs, and calculate the user's portrait label matrix PM .
Step 2: the user's disease score matrix IM is mainly provided by the user's medical history. The corresponding disease situation corresponds to the user's score ui P for this disease, which is a five point system [22]  Step 3: SVD singular value decomposition technology is used to optimize the user rating matrix to solve the problem of sparsity of the user rating matrix [23]. Then, by calculating the tag attribute preference matrix of PM , the tag information of user profile is applied here, and the attribute tag of disease is not used, so our tag attribute preference can be directly replaced by tag weight [24] . Then we calculate the similarity between users and the traditional user similarity of IM . By combining a weight w , we get the Step 4: use the ( , ) Similari u y v t in step 3 as the final similarity to predict the unknown scoring data r of the target user.
Step 5: fill in the initial scoring matrix IM of the target user according to the unknown scoring data Pr e . since the patient's past disease may still suffer from this disease in the future, the patient's condition recommendation should integrate the initial scoring data [25] . Finally, it sorts and outputs the relevant information back to the user.

3.4.1Time complexity analysis
Input: user disease rating matrix

Spatial complexity analysis
The spatial complexity determines the number of parameters of the model. Due to the limitation of dimension disaster, the more parameters the model has, the more data it needs to train the model. However, the data set in real life is usually not too large, which makes the training of the model easier to over fit [26] .
When using TAG_SVD_CF for information recommendation, SVD decomposition is carried out according to the score matrix after collaborative filtering based on user score. It can greatly reduce the size of feature vector, discard redundant features, and finally get a matrix of similarity between users. In this way, the number of parameters obtained is less than that of traditional CF ( n , m is the dimension of input/output), which effectively solves the problem of data sparsity. It can be seen from this that the algorithm of TAG_SVD_CF has a lower spatial complexity than that of single CF.

Experimental data
The experimental data are from 31 days' IIS log records of the intelligent system of psychological counseling from Among them, i p is the prediction score, that is, the score of unknown score prediction generated by the system according to the designed core algorithm. i q is the actual score of the user to the movie in the test set. The smaller the MAE is, the closer the predicted score is to the actual score.  When the number of user neighbors K reaches 40, the calculated Mae value is the minimum, so the optimal number of user neighbors is 40 in the paper due to the computed cost.

Analysis of experimental results
(2) Selection of K value in SVD optimization Dimension K is the key of the next experiment when Figure 4 that the MAE value firstly decreases and then increases as the number of user neighbors K increases.
Because the selection of K value is too small (such as K=1), which means that users have less preferences, and the main attribute information of the project is also relatively small, which means that most attribute information is lost after decomposition. However, if the K selection is too large, it will lead to the significance of SVD technology optimization(such as K=30). Therefore, selecting the appropriate K value will improve the performance of the recommendation system. It is necessary to select reasonable values through experiments.
As shown in Figure 4 and   reduced. When the number of nearest neighbors is 5, the MAE value of the improved algorithm is smaller and the change range is smaller than other algorithms. When the K value is more than 30, all kinds of algorithms tend to be stable gradually. The MEA value obtained by this algorithm is significantly lower than other algorithms, which shows that the algorithm proposed in this paper is superior to other algorithms in terms of recommendation accuracy and can significantly improve the recommendation quality of the recommendation system.