Research on multi-context aware recommendation methods based on tensor factorization

Compared to the traditional recommender systems, context-aware recommender systems are more in line with actual application contexts. However, the existing researches are mostly focused on single context-aware recommendation, such as time-aware recommendation or location-aware recommendation, and lack of in-depth research on multi-context-aware recommendation. Therefore, we proposed a recommendation method of high-order tensor factorization based on multi-context-aware. First, on the basis of analyzing the influence of context on users’ interest preferences, the sensitivity of users to multiple contexts was detected using statistical methods. For context-sensitive users, four-dimensional tensors and feature matrices used to solve data sparsity were constructed based on rating matrix and situational information. And then the stochastic gradient descent algorithm was used for iterative calculation to fill in missing data values and carry out parameter optimization. For context-insensitive users, we used matrix factorization to predict users’ interest preferences. Finally, we tested and validated our method on a multi-context-aware movie dataset, and the experimental results show that the proposed method could effectively reduce the prediction error and improve the recommendation quality.


Introduction
With the advent of big data, to obtain information on the Internet accurately and quickly under the condition of information overload has become a hot research direction for scholars. The recommender systems can filter the redundant information according to users' interest preferences and recommend relevant items for them. However, in the context-aware recommender systems, users' interest preferences are also affected by the context. In different contexts, there may be significant differences in users' interest preferences .
The traditional recommendation algorithm is usually a two-dimensional implementation of User-Item relationship. It assumes that users' interest preferences and the attributes of items are static and ignored the influence of contexts. In the real world, users' interest preferences are influenced by the surrounding environment, and different contexts may have different degrees of influence on users. So, taking into account context can improve the accuracy of determining users' interest preferences. Adomavicius et al. [1] pointed out that in addition to the User-Item interaction matrix, users' interest preferences are also influenced by different types of contexts, and integrating contexts into recommendation is beneficial to improve the performance of recommender systems. Therefore, they proposed the concept of Context-Aware Recommender Systems (CARS).
At present, most researchers have recognized the importance of contexts and attempted to incorporate relevant contexts in their research. Some believed that time is one of the situational information that cannot be ignored [2,3]. Users' 1 3 interest preferences change over time. With the emergence of new selection, users' cognition and popularity of items would change. In addition to time, space is also a very important factor in the context. Users' interest preferences vary from region to region, especially in the location-aware recommender system, where users' interest preferences are affected by spatial location information [4][5][6]. Studies have shown that recommendation algorithms that consider either time or space outperform traditional recommendation algorithms. However, how to select the correct relevant context and how to integrate multiple contextual factors into the recommendation algorithm or model has become one of the main focuses of future research. In CARS, there may be a lot of contexts, and the impact of different context may be different. Therefore, it is very important to make effective use of multiple contexts for recommendation.
To address the above problems, we proposed a recommendation method based on multi-context-aware higherorder tensor factorization, which is based on detecting multiple valid contexts of users and integrating their interest preferences for multi-context-aware recommendations. Since users have different degrees of sensitivity to contexts. Firstly, we used the Chi-square test to detect user's sensitivity to contexts, multiple effective contexts were found. Then, we integrated the sensitive contexts with users' interest preferences matrix to build the multi-dimensional tensor model. At the same time, a joint tensor factorization of multiple feature matrices is constructed to alleviate data sparse. For context-insensitive users, the traditional matrix factorization method is used to predict their preferences.

Context-aware recommender system
Context is a very complex concept with different definitions in different application contexts. Initially of the study, Schilit et al. [7] referred to contexts as locations, the collection of nearby people and objects, as well as the changes to these objects over time. Brown et al. [8] consider contexts as location, identification of people around the user, time of day, season, temperature, etc. Dey [9] enumerates the context as the user's emotional state, concerns, location and direction, date and time, goals, people in the user's environment, etc. The current definition of context in the field of context-aware computing widely cites the concept proposed by Dey et al., who argue that a context can be information that describes the characteristics of any context of an entity. The entity here can be a person, place, or object, as long as the entity is considered relevant to the interaction between the user and the application, including the user and the application system [10].
Context-aware is one of the most important research aspects of pervasive computing, in which people are able to access and process information at anytime, anywhere, and in any way. The context-aware enables the system to automatically discover and utilize contextual factors such as location and surroundings. The CARS introduce context into the traditional recommender system and applies many contexts such as time, location, device, and surroundings to generate recommendations for users, extending the traditional twodimensional recommendations to the multi-dimensional recommendations. Compared with the two-dimensional relational recommendation, CARS are based on the three elements of User-Item-Context.
CARS can be expressed formally as follows.
The utility function to measure the preference of user u for item i in the CARS is given by equation 1, given that U is the set of users, I is the set of items, and C is the set of included contexts.
Rating is a full sequence.
One of the main issues to be studied in CARS is how to integrate context with traditional User-Item two-dimensional recommendation. Adomavicius et al. proposed three paradigms of context-aware recommendation according to which process the context is integrated in the recommendation, namely pre-context filtering, post-context filtering and context modeling [2], as shown in Fig. 1.

Matrix factorization based methods
Traditional recommender system regarded the User-Item rating matrix as the primary processing data and attempt to predict users' interest preferences for unrated items. In real life, recommender systems deal with massive data items. Matrix factorization technology is used to reduce the dimension of data so as to speed up calculation without losing important data items. Liu et al. [11] added consideration of temporal and social to the matrix factorization approach. Shi et al. [12] then used matrix factorization to Fig. 1 Three paradigms of context-aware recommender systems mine emotion-specific movie similarities to obtain contextaware recommendations. Baltruna et al. [13] then argued that matrix factorization with context-aware could improve the accuracy of standard matrix factorization. Zheng et al. [14] proposed a matrix factorization method based on a sparse linear method in which a user's rating for an item is coalesced by that user's ratings for other items. Kim et al. [15] incorporated the neural network KNN into the matrix factorization technique, also known as convolutional matrix factorization. The method applies the maximum a posteriori estimation method to optimize the parameters of the document latent vector model, the user latent vector model and the item latent vector model.

Tensor factorization based methods
It is common to refer to scalars as 0 tensors, vectors as 1st tensors, matrices as 2nd tensors, and so on to refer to multidimensional data as tensors. Consider the tensor of order d ≥ 3 as a d-dimensional generalized form of the matrix. Essentially, tensor factorization is a higher-order generalization of matrix factorization that provides a flexible and versatile integration of contextual information. It does not use any post-filtering or pre-filtering techniques, which increases the significant complexity of the model.
Tucker decomposition and CP decomposition are the most commonly used methods for tensor factorization [16]. The CP decomposition is a special representation of the Tucker decomposition, which is essentially a decomposition of a tensor into a sum of tensors of finite rank 1. Suppose that given a third-order tensor X ∈ ℝ (I×J×K) , then the CP decomposition can be expressed by equation 2: Where • denotes the vector outer product and R is a positive integer and a r ∈ ℝ I , b r ∈ ℝ J , c r ∈ ℝ K . The factorization process of the third-order tensor is shown in Fig. 2.
At present, tensor factorization is widely used in recommender systems for its advantage of being able to solve the problem of multi-dimensional data [17]. The context-aware recommender system will inevitably lead to the problem of multi-dimensional space, and the tensor factorization technology will be more convenient to integrate context. Most of them are used to recommend points of interest. Cai et al. [18] constructed the three-dimensional tensor of User-Item-Label for label recommendation, improved the statistical information among users, items and labels by using loworder polynomials, and solved the problem of data sparse at the same time. Luan et al. [19] proposed a cooperative tensor factorization method, which utilized a three-dimensional tensor with three feature matrices to recommend points of interest. They used an element-level gradient descent optimization algorithm to solve the problem. Meanwhile, many scholars combine tensor factorization with neural network to solve the multi-type information in the CARS. Chen et al. [20,21] proposed a model that combines tensor factorization and adversarial learning for context-aware recommendations. They combined deep neural networks and tensor algebra to capture nonlinear interactions among multi-aspect factors. Wu et al. [22] proposed a Neural network based Tensor Factorization model for predictive tasks on dynamic relational data. They argued that users' preferences would change over time and the underlying factors driving the user project relationship would also change over time.

User multi-context sensitivity detection
Existing related studies have shown that there are significant differences in users' sensitivity to different types of contexts [23], that is, users are context-sensitive. In terms of movie recommendations, some users are sensitive to their own emotions. When they are in a good mood, they will choose relaxing or cheerful movies, otherwise they will choose sad movies. We believe that in the same context with different dimensions, when there are significant changes in user interest preferences, this context is the sensitive context for the user. In this paper, the chi-square test of significance test method is used to detect whether users are sensitive to a certain context. Chi-square test is generally applicable to fitness test, independence test and unity test, and usually represents the deviation degree between the observed value and the theoretical value. The statistical value of each observed context for the corresponding preference in each dimension is used as the observation value of the Chi-square test. The average value of the user's evaluation number in a single situation is taken as the theoretical value. The calculation formula is as in equation 3:

Fig. 2 Third-order tensor factorization process
Where A i is the observed count at level i, E i is the expected count at level i, n is the total count, and p i is the expected frequency at level i. When n is relatively large, the X 2 statistic approximately obeys a cardinal distribution with I-1 degrees of freedom.
Based on the LDOS-CoMoDa movie rating dataset [24], we select users who have evaluated more than 5 items and more than 10 items as test targets. The Chi-square value of each user for a single situation is calculated. If the Chisquare value is greater than the critical value compared with the critical value table of Chi-square test, it indicates that the user is sensitive to the context, otherwise it is not sensitive. Then the number of individual contexts judged as sensitive contexts is counted, and the high number is considered as user sensitive contexts. Individual contexts fall under the user sensitive context statistics as shown in Fig. 3.
In the figure, the horizontal axis indicates the results of sensitive statistics for users evaluating more than 5 and more than 10 items. Each part is represented as 12 different contexts in the dataset, including time, daytype, season, location, weather, social, endEmo, dominantEmo, mood, physical, decision, and interaction. As can be seen from the figure, the results of both parts are high in the statistics of daytype and season, then it is considered that the user sensitive context is daytype and season. (3) Tensor factorization for multi-context-aware recommendation methods

User interest model based on four-dimensional high-order Tensor
From the user multi-context sensitivity detection, it is concluded that most users are sensitive to both daytype and season. We constructed a tensor X ∈ ℝ (U×T×D×S) of User-Item-Daytype-Season to represent the users' interest preferences for items in different dimension of contexts. Where U, T, D, and S denote the number of users, items, daytype dimensions and season dimensions, respectively. For ease of reading, Table 1 lists the key notations of this article. The four modules are described in detail as follows. We use the rating of the item as an indication of the users' interest preferences, with higher ratings indicating that the user likes the item more. For each X(u, t, d, s) denotes the rating of user u for item t with daytype d and season s. If the user does not interact with this item in that context, then X(u, t, d, s) =0 .
As in Fig. 4, a schematic diagram of the constructed fourdimensional tensor is shown. The constructed tensor can be represented as a combination of three-dimensional (User-Item-Daytype) tensor in different seasons.
Evaluating more than 5 items E valuating more than 10 items The interaction counts Fig. 3 Single context belongs to user sensitive context statistics (The higher the interaction count, the more likely the context is to belong to a sensitive context.) The value of item Q in dimension j G j The value of item G in dimension j u r ,t r ,d r ,s r The rank-one vectors of user, item, daytype, and season X (1) ,X (2) ,X (3) Expand tensor X according to different modes 0 The regularization coefficient Weight of feature matrix The learning rate k The number of hidden factors

Construction of feature matrices
The user is limited to the evaluation of a few items in contextual conditions, and the amount of data is extremely sparse, and populating it with zero values using only the data present in the tensor would greatly reduce the accuracy of the prediction. To reduce the data sparsity, we further constructed three feature matrices Item-Item similarity matrix, User-Daytype matrix, and User-Season matrix. And used the tensor for collaborative factorization. All three feature matrices are common to at least one dimension of the constructed four-dimensional tensor.
(1) Item-Item similarity matrix M1 Driven by interest, the categories in which users watch movies have greater similarity. For the same context, movies with high similarity in recommendation categories are more capable of satisfying user interest preferences. For example, if the user prefers to watch mystery movies, the system can recommend mystery movies to meet the user's needs better than comedy movies. Therefore, in this paper, the first three attribute types (gener1, gener2, gener3) of item features are used as item category features, and the Item-Item similarity feature matrix M1 is constructed based on the cosine similarity. The similarity between items is calculated as shown in equation 4: (2) User-Daytype matrix M2 According to the previous multi-context sensitivity detection, the daytype is a sensitive context for most users. Therefore, this paper constructed a User-Daytype context matrix to represent user interest preference on the daytype. To simplify the calculation, we used the average ratings of users on each dimension of the daytype to construct the User-Daytype feature matrix M2. An example of this matrix is as follows.
(3) User-Season matrix M3 Similarly, season is also a usersensitive context through user multi-context sensitivity detection. The User-Season matrix can also reflect users' interest preferences in different dimensions of the context from the perspective of the season. The User-Season matrix M3 is constructed based on the average ratings of user interest preferences of the item on each dimension of the season. A partial example of this matrix is as follows.

Context-aware collaborative Tensor factorization
The ultimate goal of both tensor factorization and matrix factorization is to fill in the missing items based on the existing data items. Tucker decomposition and CP decomposition are the most commonly used methods in tensor factorization [16]. CP decomposition is a special representation of Tucker decomposition, which can be applied to massive data and is more convenient to calculate. Therefore, we used CP decomposition. The tensor X ∈ ℝ (U×T×D×S) constructed in our experiments can be decomposed into U ∈ R (u×k) ,T ∈ R (t×k) ,D ∈ R (d×k) ,S ∈ R (d×k) .The expression for its decomposition is given in equation 5.
Where U, T, D, and S are called factor matrices and are combinations of rank-one vectors. u r ∈ R U ,t r ∈ R T ,d r ∈ R D ,s r ∈ R S (r=[1,2,...,R])denote the rank-one vectors of user, item, daytype, and season, respectively.R is a positive integer that represents the number or rank of components. Usually, we correspond the rows of the factor matrix to each dimension of the tensor, and the columns to the rank R. The three constructed feature matrices M1, M2, and M3 can be decomposed into two smaller matrices using matrix  Where k denotes the rank of the factorization and represents the matrix containing k implied features. We considered that users' interest preferences for items are mainly determined by k hidden features.
To implement the calculation between the tensor and the matrix, it is first necessary to transform the tensor matrix into the same dimension as the matrix. For a third-dimension tensor X ∈ ℝ n 1 ×n 2 ×n 3 can be expanded in three modules. Accordingly, each modal expansion yields a matrix, as shown in equation 7-9: The created feature matrix M1 shares the item dimension with tensor X . M2 shares the user and daytype dimensions with the tensor. And M3 shares the user and season dimensions with the tensor. Thus, the data in the matrix can be fused into the tensor by a shared matrix dimension.Given the tensor X ∈ ℝ (U×T×D×S) and the feature matrices M1, M2, M3, construct the objective function as shown in equation 10.
Where ‖...‖ denotes finding its two-parametric number and F denotes the least square error loss function that decomposes the four-dimensional tensor into four factor matrices U, T, D, and S. Then this part of the formula can be further expressed as equation 11.
Tensor factorization can usually be computed using Alternating Least Squares (ALS) and Gradient Descent (GD), and considering their limitations, we used Stochastic Gradient Descent (SGD) for optimization. To implement the multiplication of a tensor and a matrix, the tensor needs to be matrixed first. Matricization is the rearrangement of the elements of an n-dimensional array into a matrix. Matricization of a tensor means transforming the tensor into a matrix from different dimensions of the tensor. The product is calculated by multiplying the matrix formed by the n-dimensional matricization by the matrix. The derivative of our four-dimensional tensor for the matrix U when n = 1 is chosen is given in equation 12.
Then, for the U matrix in tensor X can be updated based on the following equation 13, where denotes the learning rate.
By the same token, T, D, and S are updated as follows in equation [14][15][16].
� is a regularization term to prevent overfitting. 0 is the regularization coefficient, and 1 , 2 , 3 are the model parameters controlling the weights of different parts of the objective function.
The learning process of this algorithm is shown in Algorithm 1, where the input is a four-dimensional tensor X with three feature matrices M1, M2, and M3. In the algorithm, the four-factor matrix is first initialized using the minimum random value, then the optimal values of the parameters are learned using the stochastic gradient descent method, and finally the dense four-factor matrix is output.
Algorithm 1 Tensor joint matrix factorization algorithm Input: original tensorX , identity matrices M1, M2, M3 Output: factor matrices U, T, D, S 1: Initialization of the factor matrix using the minimum random value according to the size of each dimension of the tensor X 2: Iteration starts, set the difference between loss values β less than 0.0001 stop 3: Calculate each gradient as follows: Update the values according to the new gradient: The complexity of the algorithm is analyzed below. Assuming that the tensor X ∈ ℝ (n×n×n×n) , where R denotes the rank of X , it is known that S ∈ R (n×R) ,D ∈ R (n×R) ,T ∈ R (n×R) , then the time complexity of computing , then the computation of The time complexity of the algorithm is about O(n 4 R) when the value of the rank R in the calculation is taken to be smaller. The algorithm needs to store the matrix of four-dimensional tensor expansion, the feature matrix and the associated product values, then the space complexity is about O(n 4 ).

Matrix factorization
In this paper, context-insensitive users are predicted using the traditional matrix factorization method [25] for User-Item ratings. The constructed User-Item rating matrix is denoted as M, where M(u, t) ratings the users' interest preferences of the item, and the prediction model corresponds to the objective function in equation 17.

Top-N recommends
In this paper, the recommendation method used the Top-N recommendation strategy. In the corresponding contexts, users sensitive to multiple contexts use the outer product of the output four-factor matrix to recover the sparse tensor based on the higher-order tensor factorization, and users not sensitive to contexts use the matrix factorization method for the scoring prediction array. The reconstructed tensor X new , matrixM new expressions are as in equation [18][19].
Where X new (u, t, d, s) is the preference pre-rating of contextsensitive user u for item t in the context with daytype d and season s, and M new (u, t) is the preference rating of contextinsensitive user u for item t. Finally, the Top-N items are recommended for different users by pre-rating the location items in reverse order.

Multi-context movie dataset
The experimental dataset in this paper was chosen from the real movie dataset LDOS-CoMoDa, collected by Prof. Ante Odi'c, which contains multiple contexts [24]. The dataset not only contains basic information about users and items such as age, sex, city, country, director, country, language, year, genre, actor, budget, but also collects information about 12 contexts. The relevant information of the dataset is shown in the following Table 2. The 12 contexts in the dataset with different categories are time, daytype, season, location, weather, social, endEmo, dominantEmo, mood, physical, decision, and interaction. The specific description of each context is shown in Table 3.

Baseline methods
(1) Standard-CP [26]: Only the four-dimensional tensor is used as input, and the influence of feature lifting on it is not considered. By setting 1 , 2 , 3 in the determined objective function to zero can be obtained as in equation 20.
(2) HOSVD [27]: HOSVD is a common method applied to tensor factorization, which is to fill the data after decomposing the tensor by different modes and then using the SVD method in turn. It is often used in contextual recommender systems because of its applicability to higher-order data. (3) NMF [28]: The method of non-negative matrix factorization is often used for the recommendation of twodimensional data items and, like traditional SVD, does not consider the effect of context on the results.

Evaluation Metrics
We used the classical Root Mean Square error (RMSE) and Mean Absolute Error (MAE) as evaluation metrics. RMSE and MAE are used to measure the deviation between the actual and predicted values and are calculated as in equation [21][22].
X (u,t,d,s) , X � (u,t,d,s) denote the actual rating and predicted rating, respectively, and N denotes the number of predicted ratings.

Parameter optimization
We first conducted optimization experiments on the parameters of the multi-context-aware recommendation method. The parameters to be optimized include the learning rate and 0 , 1 , 2 , 3 . In order to ensure the accuracy of parameter optimization and prevent overfitting caused by the complexity of the model, we adopt a three-fold cross-validation experimental method to calculate the experimental results. Both context-sensitive and context-insensitive users selected 80% of the random data as the training set and the remaining 20% as the test set.
(1) Optimization of learning rate When optimizing the learning rate , it is necessary to ensure that 0 , 1 , 2 , 3 take relatively fixed values. The experimental results for different learning rates are shown in Fig. 5. It can be seen from the figure, as the value of increases both RMSE and MAE decrease first and then increase. When the value of is 1e-4, both RMSE and MAE are minimal, and the recommendation model reaches the relatively optimal results.
When performing matrix factorization for contextinsensitive users, the number of hidden factors k has a large impact on their results, and we choose the optimal value of k by the RMSE. It can be seen from Fig. 7 that RMSE decreases and flattens out as the value of k increases, and the RMSE is minimal when k = 14 in the selected interval.

Method Comparison
Based on the parameter optimization, the proposed method was compared with three baseline methods (Standard-CP, HOSVD and NMF), and the experimental results on RMSE and MAE are shown in Fig. 8.
It can be seen from the figure that the NMF has higher values on both RMSE and MAE than the other methods. Since it only considers users' interest preferences and does not take into account contexts. Standard-CP and HOSVD are the most commonly used tensor factorization methods, and their values are closer. Our method adds the feature matrix based on tensor factorization to alleviate the problem of data sparsity, and the values of RMSE and MAE are 0.4765 and 0.3988, which are 5.09% and 5.32% lower than optimal HOSVD. Therefore, the method proposed in this paper can reduce the recommendation error to a certain extent and effectively improve the recommendation accuracy.

Conclusions
In the context-aware recommender system, making full use of multiple context information can effectively improve the accuracy of recommendation. In this paper, we proposed a recommendation method based on multi-context-aware higher-order tensor factorization. Firstly, users' sensitivities to the contexts were tested, and then users were divided into two categories. For the context-sensitive users, a four-dimensional tensor was constructed to simulate the relationship between users, items, daytypes and seasons.  Further, three feature matrices were constructed combined with different dimensions to solve the problem of data sparsity. Compared with the standard tensor factorization, high-order matrix decomposition and traditional matrix decomposition methods, this method has higher accuracy and better recommendation results. In the future work, we will deeply study the multi-context-aware recommender system from the influence difference of each dimension of high-order tensor.