Next location recommendation: a multi-context features integration perspective

Next location recommendation aims to mine users’ historical trajectories to predict their potentially preferred locations in the next moment. Although previous studies have explored the idea of incorporating location or social contextual information for recommendation, they still suffer from several major limitations: (1) not fully considering the semantic associations between locations, (2) not considering the heterogeneity in preferences of socially linked users, (3) not fully utilizing contextual information from distinctive sources to further improve the recommendation performance. In this paper, we propose a novel multi-context-based next location recommendation model that incorporates location context, trajectory context, and social context to obtain comprehensive users’ preferences while allowing for interactions between contexts. Specifically, we first develop an efficient method combining both high-order location graphs and location semantic graphs to characterize subtle associations between locations. Then we explore the social contextual information and introduce the location subgraph which considers heterogeneous preferences among friends. Finally, we use the LSTM and geo-dilated LSTM to capture the spatio-temporal associations between users’ trajectories and integrate various contextual information to improve model performance. Extensive experiments on three real datasets show that our model has superior results in the next location recommendation task over other baselines.

of user check-in data. Recently, the next location recommendation problem that predicts the location a user is most likely to visit in the next moment has received widespread attention. Researchers have proposed sequential analysis models for trajectory data which make use of users' past check-in locations to mine users' preferences and predict the next location accordingly [17,27,42,52]. However, user check-in data may be very sparse, making it difficult to train the model adequately and thus limiting the recommendation performance [11,37]. As such, various models incorporating location and social context have been developed to improve learning effectiveness [12,19,33,38]. Although existing solutions to next location problems have considered auxiliary information from multiple sources, they suffer from three major limitations.
-First, extant models focusing on location context, while capturing location influence [5,21] and spatio-temporal correlations [31,39], pay less attention to location semantics and semantic correlations among locations that reflect users' activity topics or preferences [3,45,54]. As shown in Figure 1, the trajectory of Tom consists of "music bar school music hall -bookstore". Although "music bar" and "music hall" do not occur consecutively in the sequence, a strong semantic correlation exists between the two locations because they both relate to the topic of music. That said, the user's current activity topic might be "music appreciation". Such semantic correlations between non-consecutive locations have not been fully identified in previous studies. -Second, models that consider social context [6,18,20,21,31] mainly focus on retrieving topological patterns but neglect the heterogeneity in preferences of socially linked users. Although homophily plays an important role in forming friendships, friends do not always have similar preferences because various factors may contribute to the formation of friendships. For example, in Figure 1, Tom and Bob are classmates exhibiting high social proximity but are quite different regarding location preference: Tom prefers music-related locations while Bob prefers entertainment-related locations such as bars and cinemas. Therefore, we believe that preference consistency between users, rather than friendship itself, determines the behavior similarity between users. -Third, although various embedding methods with auxiliary information have been proposed to learn the representations that capture users' sequential movements [26,33,43,44,53], their ability to learn complex patterns, especially hierarchical structures, is limited [9,34]. Meanwhile, contextual information from distinctive sources is Figure 1 Example of User Semantic Trajectory assembled and mapped to a uniform space for representation learning [2,46], which limits the model's capacity to exploit the fine-grained intrinsic interactions between contexts for better performance.
To address the aforementioned issues, we propose a multi-context-based next location recommendation model (MCLR), which incorporates location context, trajectory sequences, and social context to obtain effective and comprehensive representations of users' preferences while allowing for interactions between contexts. In specific, we construct both location semantic graphs and high-order location graphs to characterize the subtle associations between locations and apply graph autoencoder to obtain sophisticated location embeddings. To mine users' location preferences from trajectory sequences, we use the LSTM and geo-dilated LSTM to analyze the spatio-temporal associations between users' points of interest (POI). Moreover, we dive into the information conveyed through social context and identify the preference consistency among friends based on mobility similarity. Interactions between friends at different locations are analyzed using a location subgraph consisting of user trajectories.
Overall, the main contributions of this study are summarized as follows: 1. We construct both high-order location graph and location semantic graph to capture the multidimensional correlations between locations, which can effectively learn highquality location representations. 2. We further consider the preference consistency between the target user and his/her friends when using social context for users' preference mining. Our approach can thus efficiently aggregate trajectory information of selected friends to ameliorate the utilization of social information. 3. We propose an integrated model that can fully exploit various contextual information and adaptively fuses multiple features to improve learning performance. Moreover, extensive experiments on three real datasets demonstrate the superiority of our model for next location recommendation tasks.
The remainder of the paper is organized as follows. Section 2 reviews the related work. Section 3 introduces the model framework. Details of the proposed MCLR are presented in Section 4. Section 5 reports the experimental results. Finally, Section 6 concludes the paper.

Related work
Next location recommendation is an important topic in location-based services, which has attracted much attention from scholars in recent years. Most of the existing methods focus on user trajectory information, a direct data that reflects the spatio-temporal regularity of user activities, and predict user next location by analyzing spatio-temporal correlation among check-ins, with less consideration for other contextual factors. However, the sparsity of user check-ins results in limited improvement in location recommendation of these methods. Graph representation learning methods can improve the effectiveness of different recommendation tasks by analyzing the interaction between different types of information and integrating multidimensional data features, thus, have been increasingly applied to the location recommendation scenario in recent years. In this section, we introduce the related work about next location recommendation and graph representation learning for location recommendation.

Next location recommendation
The next location recommendation aims to predict the user's next visited location based on his historical trajectory. Compared to the general location recommendation task, the next location recommendation needs to consider location sequence relations. Markov model and its variants [13,36,41] are commonly used in this task. They calculate the probability of future check-ins by modeling the transition matrix between locations based on user trajectories. However, those methods can only capture local and static user preferences [29,38]. Inspired by the effectiveness of the recurrent neural network in modeling sequential data, various RNN-based models have been proposed for next location recommendation. For example, ST-RNN [50], CARA [30], and STAN [29] can model the time interval and spatial distance between check-ins to capture the spatial and temporal characteristics between different locations. These methods effectively improve recommendation results through fine-grained analysis of spatio-temporal characteristics in trajectories, but still mainly tap into the short-term preferences of users [38]. To learn users' long-term preferences, researchers have developed methods such as HVAM [51], ARNPP-GAT [20], and [38]. They use attention mechanisms or trajectory segmentation to determine the effect of historical or remote check-ins on users' current location preferences to capture the variability of user preferences.
Research shows that contextual information plays an important role in location recommendation tasks [10,34] , and thus context-based location recommendation models focusing on point-of-interest (POI) are gradually developed. To extract location context features, GT-HAN [27] used three factors (the geo-influence of POIs, the geo-susceptibility of POIs, and the distance between POIs) to model the geographical co-influence between two POIs. STP-DGAT [21] constructed three location graphs to represent the complex correlation between locations and used GAT to learn node representations. In terms of social context, CGA [49] and ARNPP-GAT [20] designed different attention layers to learn the impact of friends on target users' preference prediction. Prior researchers have also developed multi-context-based recommendation models to fully exploit the value of different contextual features in user location preference prediction. For example, Chang et al. [2] designed a multi-attention network that integrates location and social information. Xiong et al. [42] proposed a factor graph model to incorporate social relationships, textual reviews, and POIs' geographical proximity. Huynh et al. [15] developed a multi-context embedding method to integrate multiple presentations for each user in LBSNs, which can account for the social relationship, check-in time, location category and other information. However, these methods either focus on the sequence correlation between locations and lack identification of the semantic associations between locations, or do not consider the preference consistency between friends in the use of social relations.

Graph representation learning for location recommendation
Recently, graph representation learning has shown excellent capabilities on a variety of recommendation tasks. The basic idea is to treat users and items as nodes in a heterogeneous graph, and the interactions between users and items can be modeled as links that connect them. Therefore, the recommendation task can be formulated as a link prediction problem.
In this approach, LBSNs can be represented as a heterogeneous graph composed of users and location nodes, and then, heterogeneous graph-based representation learning methods are designed to map users and location nodes into a uniform low-dimensional space, the location recommendation can be finished by calculating the distance between users and location nodes [23,53,55]. For example, JRLM++ [53] extracted location-level and segment-level relatedness in check-ins to obtain location embeddings. Zhou et al. [55] proposed a multi-context trajectory embedding framework to incorporate various context information. LBSN2vec++ [44] viewed LBSNs as hypergraphs containing multiple nodes and relationships, after that, using a random walk-and-stay approach to learn location representations.
Graph neural network models (GNN) are also increasingly used for location recommendation tasks to capture the complex dependencies between location nodes. After representing user trajectory information as a location graph, GNN can effectively learn the location relationship features [35,45]. For example, STP-DGAT [21] constructed POI-POI spatial graph, POI-POI temporal graph, and POI-POI preference graph to represent the multiple correlations between locations and used the graph attention network to obtain a multiple correlation-based location representation. Chang et al. [1] constructed a directed POI-POI graph according to users' check-ins, based on that, a graph auto-encoder was used to learn location embeddings which can capture the ingoing and outgoing geographical influences between locations.

Model framework
The purpose of MCLR is to deeply analyze the intrinsic interaction of different contextual information to capture multi-dimensional contextual features and then integrate those features to improve the location recommendation performance. The overall structure is shown in Figure 2. MCLR consists of three modules that are designed for location context, trajectory context, and social context, respectively.
The location context module aims to identify high-order correlations and semantic correlations between locations. High-order correlations indicate deep correlations between locations that are not directly reflected through user trajectories. For instance, two locations visited by users with similar trajectories should have an implicit relationship where Figure 2 The overall framwork of MCLR similar latent traits are implied [15,40]. Analyzing such patterns can help identify global associations among locations, alleviating the problem of sparse individual check-in data that cannot provide sufficient information for mining user preferences. We represent the check-in record of different users as a bipartite graph and use metapath-based random walks [7,25] to obtain location sequences, based on which a high-order location graph is constructed to characterize the high-order correlations between locations. Semantic correlations refer to the functional or thematic association between locations. Referring to extant literature [28,45], we calculate the semantic relationships between locations based on their categories. It has been shown that the higher the transition probability between two categories in the semantic sequences, the higher the semantic correction of the two categories [45]. Therefore, we calculate the semantic relevance based on the number of co-occurrences between adjacent categoriesand then build the location semantic graph. After obtaining the high-order location graph with location semantic graph, we apply graph autoencoder to learn location representations, which will be fed into the subsequent module as location pre-training features.
User trajectory is the most direct and effective information for user location preference discovery. Inspired by [38], we consider users' long-term and short-term preferences and use LSTM and geo-LSTM to extract the corresponding representations based on the spatio-temporal correlation of different check-in sequences. The overall user trajectory is divided into sub-trajectory sequences. For each sequence, we use LSTM to identify the sequence correlation between consecutive check-in locations and use geo-LSTM to identify the spatial correlation between non-consecutive check-in locations. The sub-trajectory representation that shows the highest correlation with the target user's current location is selected to predict the current preference of the user.
Social relationships affect users' preferences, but the variability in preferences between the target user and his/her friends may attenuate the effect of the social relationships on user's preference prediction. Therefore, we consider the mobility similarity between users as a supplement to user friendships. For example, users A and B exhibit high mobility similarity if both of them would visit entertainment locations after work. We construct a location social context module to calculate the user mobility similarity and distinguish the influence of friends on the target user's preferences by analyzing the interaction between their check-in locations. Firstly, the social context module takes trajectories from the target user and his/her friends as input and models the locations contained in those trajectories as subgraphs derived from the high-order location network generated by the location context module. Then, the label that marks each location's structural role is calculated. Based on the labeled subgraph, we use a graph neural network to learn the similarity between users, which can be used as the weight to integrate friends' preferences when updating the target user's preference. The derived social-based user preference representation serves as input for next location prediction.

Model detail
In this section, we first introduce the preliminaries that are necessary to formulate the problem and develop the algorithms. Then, we explain the algorithm of each module in detail. After that, we show how to integrate multiple graphs for representation learning.

Problem Definition
The input information of MCLR includes social relationships and user check-in records, which are represented by the social graph and user-location bipartite graph in this paper. The formal definition of each concept is as follows: Definition 1 Social graph: The social relationships between users are modeled as a graph G U = (V U , E U ), where V U denotes a set of users and E U denotes the edges between users. Given two users, < u i , u j >, if they are friends, then there exists an edge e U (u i , u j ) ∈ E U indicating the friendship.

Definition 2 User trajectory:
The temporally ordered check-ins of user u ∈ V U are represented as τ (u) = {l 1 , l 2 , l 3 , ..., l n u }, each location l is identified by its unique location ID and geocoded by a (longitude, latitude) tuple, i.e., (lon l , lat l ), n u is the number of check-ins in τ (u). The set of trajectories of all users is denoted as τ (U).
To identify the semantic relations between different location categories, we construct user semantic trajectory as follows:

Definition 3 User semantic trajectory:
In the user's trajectory τ (u) = {l 1 , l 2 , l 3 , ..., l n u i }, each location belongs to a specific category c j . Then, each location trajectory can be represented as a semantic trajectory consisting of different location categories τ C (u) = {c 1 , c 2 , ..., c m }, m is the length of the semantic trajectory. All users' semantic trajectories form a set τ C (U ).
Based on all users' location trajectories τ C (U ), we further construct the user-location bipartite graph.

Definition 4 User-Location bipartite graph:
A User-Location Bipartite graph G UL = (V U , V L , E UL ) is a bipartite graph where V U denotes the user set and V L denotes the location set, E UL is the set of edges between users and locations. If a user u has visited a location l, there is an edge e UL (u, l) ∈ E UL between them.
Problem definition Given the social graph G U and all users' trajectory τ (U), for a user u ∈ U and u's historical trajectory τ (u) = {l 1 , l 2 , l 3 , ..., l t }, where l t is the most recent location that u has visited, the next POI recommendation aims to recommend the top-k locations that u may be interested in the next time t + 1.

Location context module
The proposed module aims to capture high-order correlations and semantic correlations between locations for effective location representations. The modeling process is shown in Figure 3. It consists of two parts, high-order location relation learning and location semantics mining.
High-order location graph This step aims to infer the potential correlation between locations that do not necessarily appear in the trajectory of the same user. Specifically, we construct a user-location bipartite graph based on the check-in record of all users. For each location in G UL , we conduct a heterogeneous random walk along the bipartite graph on the pre-defined metapath. Specifically, we consider the metapath "LUL" that represents two locations (L) visited by the same user (U). In each step, the walker can travel to its neighbors with equal probability. For instance, starting from location l i , we randomly select a user u j from those who have visited l i with equal probability. After that, we randomly select a location l k from the trajectory of user u j with equal probability. This way, a random walk sequence containing both locations and users can be obtained after several iterations. We only retain the location nodes in each sequence to examine the associations between locations that do not appear in the same user's trajectory but are visited by similar users. Based on the multiple trials of random walks, we calculate the proximity of each location pair in terms of co-occurrence and choose the top 5 (top N = 5) neighbors of each location. Then we construct a dense location network G L , in which each link indicates a high-order correlation between two locations. Location semantic graph Studies have shown that the continuity of location categories in user check-ins shows the semantic relevance between locations [45] thereby reflecting the semantics and theme of user activities [54]. As such, a graph with location categories can capture the transitions between categories and further explore the temporal pattern in user preferences [24]. Therefore, we incorporate categories as a supplement to check-in location information to capture the intrinsic similarity between locations and improve the learning effect of user preferences. To obtain the location semantic features, we first calculate the category correlation based on users' semantic trajectories τ C(U ) , and then build the location semantic graph G C . Specifically, for two categories, c 1 and c 2 , if they are adjacent categories in u's semantic trajectory τ C(u) , it means that there is a sequential semantic correlation between these two locations [45,47,54]. The strength of the correlation can be measured by the number of occurrences of the adjacency categories < c 1 − c 2 >. We define the semantic correlation between category pairs as (1): where,Rel(c i , c i+1 ) refers to the semantic correlation score between category c i and c i+1 , f re i,i+1 is the number of occurrences of category c i and c i+1 . The detailed calculation process is shown in Algorithm 1. Specifically, based on τ C (U ), the method travel all semantic trajectories to extract adjacent category pairs and obtain categories pair set O C . Then, for each adjacent category pair < c i , c i+1 >, we calculate their co-occurrences number in all users' trajectories and the correlation Rel(c i , c i+1 ).

Algorithm 1 Location semantic correlation calculation
After getting the location relevance score, we construct the location semantic graph G c = (V C , E C , C ), where V C refers to location categories. For each pair of categories < c p , c q >, if it belongs to the pair set O C , there exist an edge e (p,q) ∈ E C , wich φ (p,q) ∈ C as the weight and φ C = Rel(c p , c q ).
Location representation learning Based on the higher-order location graph G L and the location semantic graph G C , we use unsupervised graph neural networks (GAE, Graph Autoencoder) [16] to learn location representations that contain both higher-order correlation and semantic features of locations.
GAE is an unsupervised graph neural network for graph node embedding, which consists of an encoder and a decoder, where the encoder takes the graph adjacency matrix as input to learn node features vectors through graph convolution operations. The decoder takes the node vectors as input to reconstruct the adjacency matrix. To apply the GAE to the weighted graph G C , we modify the input adjacency matrix and the present process in (2): whereÃ C =D −1/2 (A C + I )D −1/2 , A C is the adopted adjacency matrix of G C , each element a i,j is equal to the weight φ (i,j ) of edges e (i,j ) between row nodes c i and column nodes c j , I ∈ R |V C |×|V C | is the identity matrix, |V C | is the categories number in V C .D is a diagonal matrix withD ii = j (A C + I ) ij , W 0 and W 1 are weight matrices, Relu is the nonlinear activation function. Through (2), we can obtain location category embeddings Z C ∈ R |V C |×|d C | , where z i in each row is the embedding of category c i and d C is the embedding dimension. Then, the decoder reconstructs the adjacent matrix by (3).
To obtain effective node representations, we reconstruct matrixÂ C in such a way that the dissimilarity betweenÂ C and A C is minimized. Specifically, we use the mean square error between the predicted scores of node pairs inÂ C and the true correlation weights in A C as the loss function for model optimization: where |V C | is the number of nodes in G C , a C i,j is the value of the element in matrix A C with index (i, j ),â C i,j is the element value of the corresponding position in matrixÂ C . We use the location category embeddings z i as the semantic attribute for the location l whose category is c i . Semantic embeddings of all locations in G L form the feature matrix X L ∈ R |V L |×|d C | . After that, a traditional GAE, which takes location adjacency matrixA L and location features X L as inputs, is used to learn the final location embedding Z L ∈ R |V L |×|d C | . To make the reconstructed adjacent matrixÂ L = σ (Z L Z T L ) and the original matrix A L as close as possible, the loss function is defined as: where |V L | is the number of locations in V L , a L i,j is the true value in the adjacency matrix A L and theâ L i,j is the value of the corresponding element inÂ L . To obtain more accurate location representations, we use an end-to-end approach for the joint optimization of (4) and (5). The total loss function L is shown as (6):

Trajectory context module
To conduct an in-depth analysis of the spatio-temporal correlation in the check-in data, we select the historical check-ins that are most relevant to the user's current context to improve the modeling of the user's current preferences. We apply LSTM and geo-dilated-LSTM to learn temporal and spatial correlations in user trajectories, respectively. We split each user's trajectory τ (u) into temporally ordered trajectory sequence, τ (u) = {S u 1 , S u 2 , ..., S u n−1 , S u n }, in which each sequence S u h contains several check-ins of user u in the time window indexed by h, i.e., S u h = {l 1 , l 2 , ..., l |S u h | }. S u n is the most recent trajectory sequence containing the user's current location. By analyzing the temporal and spatial correlation between user history location sequence {S u 1 , S u 2 , ..., S u n−1 } and user's current sequence S u n , we aim to extract the most relevant history information to represent the user's current preferences.
For each location sequence in S u = {S u 1 , S u 2 , ..., S u n }, we can obtain the sequence representation learned by a standard LSTM with the input S u h = {l 1 , l 2 , ..., l |S u h | }: where h t is the hidden state of LSTM and z t ∈ R d×1 is the learned feature embedding of the t-th location l t in S u h . Through (7), each sequence S u h = {l 1 , l 2 , ..., l |S u h | } in {S u 1 , S u 2 , ..., S u n−1 } can be represented as {h 1 , h 2 , ..., h |S u h | }. Considering that the location popularity is time-varying, i.e., the popularity of the same location varies as time proceed, we apply the time-weighted operation to integrate time property [37,38]. Specifically, the time in a week is divided into 48 slots (24 slots for hours on weekdays and 24 slots for weekends). The time similarity η ij between i-th and j -th time slot is calculated as: H i is the location set that contains locations visited by at least one user at time point i. Two time slots tend to be similar if there are more overlaps in position. For each sequence S u h = {l 1 , l 2 , ..., l |S u h | }, we can generate a time slot sequence {p l 1 , p l 2 , ..., p l n } to mark the time of each check-in, n is the length of S u h . After that, for the current time slot p l c of the target user is, we can calculate the sequence representation of S u h by (9): where the η p lc ,p l j is the time similarity between current time slot p l c and the slot of location p l j belongs to. The motivation of (9) is that, in sequence S u h = {l 1 , l 2 , ..., l |S u h | }, the checkins which are more similar to the current slot are likely to have a more substantial impact on the user's current preference prediction. Based on (9), we can obtain the sequence representation s u h for each trajectoy sequence S u h ∈ S u , all historical sequence representations of u as {s u 1 , s u 2 , ..., s u n−1 }. For users' current sequence S u n , we use the average pooling of time instead of time weights to calculate the sequence representation s u n . The motivation is that all locations in the user's current sequence have a significant impact on user current preferences. Therefore, s u n can be calculated as follows: where |S u n | is the length of user's current sequence. From (7)-(10), we can obtain all sequence representations {s u 1 , s u 2 , ..., s u n } for {S u 1 , S u 2 , ..., S u n }. Based on that, we further analyze the sequence-level similarity and spatial correlation between trajectories to select the most relevant sequences for user current preference prediction.
The sequence-level similarity between the user's current sequence s u n and history sequence s u h ∈ {s u 1 , s u 2 , ..., s u n−1 } is as: Then, the sequence-based user preference s u * is: where W h is the parameter matrix. By (12), we can effectively distinguish the influence of the previous n − 1 sequences on the current sequence s u n , and thus obtain a sequence-based representation of the user's preferences s u * . In spatial similarity analysis, we calculate the spatial distance between historical trajectories {S u 1 , S u 2 , ..., S u n−1 } and users' current locations l t to enrich the user's preferences. For a trajectory sequence S u h that contain multiple locations, we calculate the central coordinate by (13)- (14).
where lat l 1 , lon l 1 is the latitude and longitude of l 1 respectively. Then the distance between S u h and l t can be calculated by: then, we calculate the spatial similarity-based user preferences s u + as: wheres u n = s u * + h t , s u * is the sequence-based user preference learned from (12) and h t is the hidden state of the current location.
It has been proved that there is a strong spatial correlation between users' recent checkins. A fine-grained analysis of users' recent check-in locations can effectively capture the short-term preference and improve the effectiveness of next location prediction [22,27]. Thus, for the user's current sequence S u n , we adopt a solution similar to the geo-dilated LSTM [38] to capture the spatial relation between recent check-in locations.
Unlike the traditional LSTM, the input of geo-dilated LSTM is a location sequence rearranged by geographic distance. Therefore, the user's next check-in will be influenced by the closest location in geographic space. Specifically, given the current sequence S u n ∈ {l 1 , l 2 , ..., l |S u n | }, we fistly select locations from S u n based on the geographical distance to form the reconstructed seqence. For example, for S u n = {l 1 , l 2 , l 3 , l 4 , l 5 }, l 3 has two preceding locations l 1 and l 2 . The geographical distance between l 3 and l 1 are closer than that of l 2 and l 1 , indicating a geo-dilated sequence {l 1 , l 3 }, in this way, the geo-dilated sequence of S u n can be constructed as S geo n = {{l 1 , l 3 }, {l 3 , l 5 }}. After that, the input for geo-dilted LSTM can be represent as {{z 1 , z 3 }, {z 3 , z 5 }}, z 1 is the embedding of location l 1 . The learning process of geo-dilated LSTM is as follows: where h t is computed from the last sequence {z δ , z t }. δ is the skip length determined by geographical factors. h δ is the hidden state of the last geo-dilated sequence. Meanwhile, we use (7) to capture the temporal relevance in S u n and obtain the latent representation h t at current location l t . The final representation of users' short-term preferences based on spatio-temporal correlation can be expressed as: Through the above process, we can obtain the final user preferences u :

Social context module
The framework of this module is illustrated in Figure 4. Given a user pair composed of a target user u i and his friends u j , the locations visited by u i or u j can be represented as a location subgraph, and the trajectory-based user similarity can be calculated by analyzing the interactions between the locations in the subgraph.  (20) where V ij is the location set that visited by the user u i or u j , E ij ∈ E L are the edges between V ij in location graph G L . In the subgraph G ij , different nodes have different roles in representing user similarity, for example, the co-occurrence location nodes can better represent the location preference similarity of the user pair (u i , u j ). To distinguish roles of different locations from a structural view, we utilize the location-trajectory distance to mark the structural role of each location. The location-trajectory distance is calculated by (21).
where the d(l, k) is the location-wise distance, which is the length of the shortest path between l and k on the graph G L . d lt (l, τ (u)) is the location-trajectory distance from a location l to the trajectory τ (u). Then, we use the location-trajectory distance to mark each location node in G ij . Specifically, for l p ∈ (u i ), the label of l p is calculated as d lt (l p , τ (u j )) = min l q ∈τ (u j ) d(l p , l q ).
Obviously, for the co-visited locations, d lt = 0, while for the isolated locations d lt = ∞. The more locations have small location-trajectory distances, the more similar the two users will be in mobility preferences [40]. The detailed node labeling method is shown as follows: 1) The co-visited location always has the distinctive label "1", such as the label of l 4 in Figure 4.
2) The labels of non-cooccurring locations are derived from the location-trajectory distance in the subgraph. For example, for l 1 in Figure 4, l 1 ∈ τ (u i ), we first calculate the shortest location-wise distance from it to each location that belongs to u j in G ij and obtain d(l 1 , l 6 ) = 2, d(l 1 , l 7 ) = 2, d(l 1 , l 8 ) = 1, and d lt (l 1 , τ (u j )) = d(l 1 , l 8 ) = 1. We then increase d lt (l 1 , τ (u j )) by one and use it to label l 1 to distinguish from co-visited locations. Similarly, l 1 , l 2 , l 3 are labeled as 2, 2, 3, respectively. 3) For isolated nodes with d lt = ∞, we give them a null label 0.
After getting the labels, a one-hot encoding vector e i ∈ R d is used to encode l i 's label.
Based on the obtaining the labeled location subgraph, we design a solution similar to DGCNN [48] to learn the subgraph representation as the mobility pattern similarity between the two users. DGCNN consists of several graph convolution layers and a graph aggregation layer. The convolutional layer enables node embedding updates through information passing and aggregation between graph nodes. The m-th graph convolutional layer is shown as follows: whereÃ = A + I , A is the adjacency matrix and I is the identity matrix. D is a diagonal degree matrix withD ii = jÃ ij . Let d m denotes the embedding size of m-th layer, then W m ∈ R d m ×d m+1 is the trainable parameters matrix of k-th layer projecting the node features into a new space. f is the nonlinear activation function. Z m ∈ R n×d m is the output of the mth convolution layer and Z 0 denotes the initial node features containing the location nodes embeddings Z L learned in Section 4.1. The structural one-hot label embeddings are fused by a MLP: where z 0 i is the i-th row vector of Z 0 for location l i . After propagating through several graph convolutional layers, the node embeddings in each layer are concatenated as the final node representations, and further, a graph aggregation layer is proposed to obtain the graph's representation. To emphasize the structural roles of distinctive locations, in the novel SortPooling layer of DGCNN, we sort all V ij in the subgraph G ij in descending order of location-trajectory distance, and select Top-k node representations as input to a traditional 1-D CNN to generate the final user pair's mobility similarity v τ ij . k is a hyperparameter that will be discussed in the experiment.
The v τ ij can effectively represent the mobility similarity between target users and their friends, so we use it as the weight to integrate friends' preferences to update the target user's preference and derive the social-based user preference representation.

Training process
After getting the user preference, we compute the probability distribution r over all locations |V L | by (27).
The objective function can be formulated as: N is the total number of training samples, r n represents the probability of the ground truth location generated by the model regarding the n-th training sample.

Experiment
In this section, we conduct extensive experiments to demonstrate the effectiveness of the MCLR. The experiment setup including data, evaluation metrics, and baseline models is introduced in Section 5.1. In Section 5.2, we present the overall performance of our model in comparison with a comprehensive set of benchmarks. In addition, the results of ablation experiment and parameter sensitivity analysis are also reported in Section 5.2.

Dataset
We conduct experiments on a real LBSN dataset collected [44] from Foursquare, which contains social relations and detailed user check-in records, including check-in time, location ID, category ID, and location coordinate. Considering the differences in regions and cultures, we select three different cities with numerous check-ins data: NYC, Sao Paulo (SP), and Jakarta (JK). The statistics of those three datasets are shown in Table 1.
Following the existing research [38], we divided each user's trajectory into different sequences according to check-in data, all check-ins in one day consist of a check-in sequence. After that, each trajectory can be represented as a sequence of check-ins ordered by date. Then, we remove the sequences with less than three check-ins and filter out the user with less than three sequences. We use 80%, 20% for each user to partition the train and test set.

Evaluation metrics
Two commonly used metrics, Recall@K and NDCG@K (Normalized Discounted Cumulative Gain) are used to evaluate model effectiveness, which are defined as follows: where M is the total number of users, R(u) denotes the recommendation list generated by the model, T (u) denotes the list of ground truth locations visited by the user, and K is the length of the recommendation list that includes K locations the user is most likely to visit. Recall@K measures the presence of the correct locations in the recommendation list. A larger value of Recall@K is desired. NDCG@K is also used to evaluate the recommendation list. It is calculated from DCG u @K, which is the value gain of each position in the recommendation list. rel i is one if the recommended location is relevant, and zero otherwise. The larger the NDCG@K is, the better the performance of the recommendation algorithm is. We choose the K = {1, 5, 10} for evaluation.

Baseline selection
We propose a multi-context-based model to integrate location, trajectory, and social contextual information to improve the next location recommendation result. To verify the effectiveness of the model, we select several cutting-edge models for next location recommendation models as benchmarks.
-Markov: Markov models are widely used in trajectory prediction tasks [4,32]. The model considers all the user's visited locations as "states" and represents the first-order transfer probabilities between different states by building a transfer matrix. -LSTM [14]: LSTM can capture the sequential dependencies between check-ins and has shown effectiveness in handling sequential data. -TMCA [19]: TMAC employs two attention mechanisms to fuse context features and spatiotemporal transitions between user check-ins for next location preference prediction. -CARA [30]: CARA is also a context-based recommendation model. It leverages two GRU gates to capture the ordinary context and transition context information from user check-ins for dynamic preference learning. -DeepMove [8]: DeepMove utilizes a multi-modal embedding layer to learn the representation of user trajectories and uses an attention recurrent network to identify the check-in information from the user's historical trajectory that is most relevant to the current trajectory for the next location prediction. -STGN [52]: STGN adds spatio-temporal gates to LSTM to capture the spatio-temporal relationships between successive check-ins. -LSTPM [38]: LSTPM considers the spatio-temporal correlations between discontinuous check-in locations, and uses geo-dalited LSTM to learn the spatial correlations between users' discontinuous check-in locations, while non-local LSTM is used to learn the similarities between different trajectory sequences. We also adopt the geo-LSTM module and the non-local method for spatio-temporal correlation analysis of trajectories.
Parameter setting In our experiment, we set the dimension of the hidden layer to 500, and use Adam for model training. The learning rate is set to 0.0001 and the batch size is 32. The effect of different parameters on the model is analyzed in the parameter sensitivity analysis. We utilize the code and parameter settings provided in the original paper for each baseline.

Overall performance
Experiment results are shown in Tables 2, 3 and 4, where the bolded values refer to the best results for each metric and the underlined values are the optimal baselines. As we can see,  Tables 2-4, it can be inferred that: (1) Analyzing user check-in periodicity helps improve location recommendation. Deep-Move uses attention mechanism to identify multiple periodicity features in user check-ins, and its result is better than LSTM model which only learns location sequence relationships in trajectory.   map user check-in time, location category, and other context information in a unified manner, and the obtained representations are used as input for the subsequent process. It cannot distinguish the intrinsic characteristics of each contextual information and its specialized impact on user preferences. CARA takes into account the spatial and temporal intervals between users' check-ins and uses a specially designed gate layer to capture users' dynamic preferences reflected in different transition contexts, thus improving the effectiveness compared with TMAC. The LSTPM further considers the spatial relationships between discontinuous check-ins and compares the spatiotemporal correlations between different historical check-in sequences to filter the most effective information for predicting users' current preferences. Therefore, it achieves better results compared with STGN which only considers continuous locations. Drawing on this idea, MCLR also uses geo-dilated-LSTM to learn the spatial correlations between non-consecutive check-in locations to capture the spatial relationship between locations. (4) MCLR achieves the best results among all models, demonstrating the need to identify the intrinsic characteristics of different contextual factors and analyze the different effects of each context on user preferences mining. MCLR designs specific modules for distinctive contextual factors including multiple location relationships, spatiotemporal correlation of trajectories, and social influence, thus achieving the best results. We also compare the computation runtime of MCLR with the best baseline LSTPM. Both of them are performed on a Linux server with 64 Intel(R) CPUs and 4 Titan X Pascal 12GB GPUs. The convergence time of MCLR is a little longer than that of LSTPM (about one third). The reason is that MCLR adds a social context module to enhance user preference learning. However, the experimental results show that the addition of this module can significantly improve the model performance.

Ablation experiment
The main components and contributions of MCLR include three aspects: (1) Multiple location feature analysis, which identifies location associations based on both high-order correlations and semantic correlations to obtain more comprehensive location representations. (2) Social context analysis that captures preference consistency between the target user and his/her friends. The trajectory association between users is considered in the model to improve performance by selecting friends who have similar preferences to the target user.
(3) A recommendation model that integrates multiple contextual information. After an indepth analysis of the intrinsic characteristics of different contextual factors and their impacts on user preference prediction, an integrated model is proposed to fuse multiple contextual features for next location recommendation. To validate the effectiveness of the two key innovations of the proposed model, we conduct an ablation test and compare the performance of MCLR with two variants, which are defined as follows:  Table 5. Compared to M Non L, MCLR performs significantly better on all metrics and datasets. This result confirms the improvement in performance brought by multiple location correlations and location semantics. Meanwhile, comparing the results of MCLR and M Non S, we find MCLR significantly outperforms M Non S after incorporating users' social information, which indicates that our proposed preference consistency-based social context analysis method can effectively utilize friends' information, and thus improve the model performance.

Parametric sensitivity analysis
In this section, we study the influence of different settings of two hyperparameters on the model, i.e., the number of location neighbors N for location network reconstruction in Section 4.2 and the parameter k for the sortpooling layer in Section 4.4. We show the results on Recall@5 for simplicity. In addition, we neglect the analysis of other hyper-parameters since they have less influence on the model performance.
Location number N The number of location neighbors affects the density of the constructed location network. If N is too large, many irrelevant locations will be connected, making the location network too sparse to represent the location correlation. In this section, we test our model on an array of N ranging from 1 to10, and show the results in Figure 5a.  As we can see, the different settings of N do affect the model performance, and the model performs best at N = 5. Therefore, we set N to 5 in MCLR.
SortPooling parameter k k determines the size of the aggregated location nodes when learning user similarity from the location subgraphs in Section 4.4. A large value leads to higher computational complexity, while a small value makes the obtained representations unrepresentative. We set k as a ratio k ∈ (0, 1) and select at most max{10, k × size(V ij )} as input to the DGCNN, whert size(V ij ) denotes the number of nodes in the location subgraph G ij . The results are shown in Figure 5b. When k = 0.5, the model achieves the best performance.

Conclusion
In this paper, we proposed an MCLR model which can fuse multiple contextual information, i.e., location correlation, trajectory context, and social relationship, for the next location recommendation. Specifically, we constructed a high-order location graph and a location semantic graph to represent the multi-level location relations and applied a GAE to derive the location embeddings. Meanwhile, we also used the location subgraph-based method to capture the preference similarity exhibited in different friends' trajectories to identify the influence of social information. Experiments on three datasets demonstrated that the MCLR model achieved the best performance in the next location recommendation task. For future research, researchers can incorporate other contextual information, such as location text or user reviews, to further improve the performance of the model. Availability of data and materials All data can be found at: https://sites.google.com/site/yangdingqi/home/ foursquare-dataset