Multimodal Interaction Aware Embedding for Location-Based Social Networks

Location-based social networks (LBSNs) have greatly promoted the development of the field of human mobility mining. However, the sparsity, multimodality and heterogeneity nature of the user check-in data remains a great concern for learning high-quality user or other entities representations, especially in the downstream application tasks, such as point-of-interest (POI) recommendation. Most existing methods focus on user preference modeling based on sequential POI tags without exploring the interaction between different modalities (e.g., user-user interactions, user-timestamp interactions, user-POI interactions, etc.). To this end, we introduce a multimodal interaction aware embedding framework to generate reliable entity embeddings on the heterogeneous socio-spatial network. At its core, first, multi-modal interaction sub-graph sampling techniques are designed to capture the heterogeneous contexts; then, a self-supervised contrastive learning technique is leveraged to extract intra-modality and inter-modality interactions in a light way. We conduct experiments on the next-POI recommendation tasks based on three real-world datasets. Experimental results demonstrate the superiority of our model over the state-of-the-art embedding learning algorithms.


Introduction
With the rapid development of mobile Internet, numerous location-based social network (LBSN) application services have emerged, such as Yelp, Twitter and Uber.In LBSN, users share digital footprints of daily life with their friends by checking-in at a point of interest (POI), which provides fine-grained user mobility and social network information.Consequently, various locationaware data mining tasks, e.g., next-POI demand modeling Feng et al (2020); Yu et al (2020) and friends recommendation Yang et al (2020); Yu et al (2021), benefit from such fine-grained information.However, the multimodality (e.g., user, POI and time) and heterogeneity (e.g., user-user interactions, user-POI interactions and user-POI-timestamp interactions) of the data brings challenges for learning good entities' representation embeddings in data-driven human mobility Yu et al (2020) and social network analysis Yang et al (2020).
Currently, to exploit the complete semantic context information, most existing works Guo et al (2020); Han et al (2020); Feng et al (2020); Yang and Zhu (2019); Huang et al (2018); Wang et al (2017) regard it as a heterogeneous information network mining problem Wang et al (2020); Zeng et al (2020).However, they usually split heterogeneous information networks into bipartite graphs to model multimodal interactions, which drop out part interaction of multimodal information and only focus on the interaction between the user and the item for embedding representation learning.
It is very realistic to consider multimodal interaction modeling in POI recommendation scenariosZhang et al (2020).For example, Figure 1 represents the hierarchical relationship of multiple modals of user check-in activities.
In the dimension of friends, Emma's relevance score with Jack is lower than Cora's with Jack, which holds in the traditional social relationshipbased recommendation systems.However, the abovementioned phenomenon is extremely unreasonable because Jack and Emma are similar in their activity schedule(They are all used to activities in the first and third time periods, as shown in Figure 1's time period modal.).Therefore, it makes sense to recommend Emma's favorite POIs to Jack.As a result, the interaction between each modality is significant for POI recommendation, and we need to incorporate multimodal information for embedding representation learning as much as possible.
In the process of fusing complex and variable multimodal information into an embedding representation, several key problems remain unsolved.1)the relationship between the behavioral relevance of users and the interaction of multiple modals were not sufficiently investigated.2)Traditional methods did not consider the interaction effects of multiple modal data, which resulted in the loss of a large volume of multimodal interaction information.3)multiple modal data are independent, and in previous graph convolutional embedding learning models, when the model fuses a large amount of information from neighboring nodes, it leads to over-smoothing of information and inefficient embeddings.To tackle the aforementioned problems, we develop a sampling-aggregation POI recommendation embedding representation learning framework(SAPRec) based on multi-modal sub-graph sampling and heterogeneous information aggregator.SAPRec aims to capture the interaction between different modalities and generate efficient embedding representations Yang et al (2020); Velickovic et al (2019).First, SAPRec generates a multi-modal interaction subgraph of users based on their social connections and recent check-ins.Second, we put the multi-modal interaction subgraphs into the aggregator for feature extraction.In the aggregation process, we focus on the most characteristic information in the multi-modal interest subgraphs of users, which is exploited to learn the embedding representation for each node in the heterogeneous information network.
The main contributions of this paper are summarized as follows: • We study the distribution properties of multimodal data on real data sets and analyze the influence of different modal information on each other.

POI Recommendations Embedding Learning
Few studies focus on the embedding representations for multi-modal POI recommendations in recent years.LBSN2Vec++ provides a reliable idea for learning multi-modal embedding representation Yang et al (2020).Nevertheless, it does not consider the topological relationship between the embedding nodes.

Preliminary Analysis
In this section, we first describe the different modal data features in the user check-in record.Then we explored the distribution preferences of user checkin records under different modalities.The symbol table of this paper is shown in Table 1.Multimodal Interaction Aware Embedding for Location-Based Social Networks

Data Description
In the POI recommendation task scenario, users share their location checkingin at POI with their friends.With the user's check-in record, we denote U as user set, P as POI set, T as time period set, and < u, t, p > as a check-in, which means user u visited POI p at time period t.We denote check-in set as C, and each user have a check-in sequence defined as Notably, human movement patterns are very intricate, and various factors have been studied in POI recommendation tasks Wang et al (2019a).In this paper, we only consider mixed interactions of three modalities, i.e., users, POIs, and time periods, which have been shown effective in next-POI recommendation Lu and Huang (2020).
User Modal Interaction.In user modal interactions, there are social interactions between users, interactions between users and POIs and interactions between users and time periods, which we define respectively as POI Modal Interaction.In POI modal interactions, there are geographic distance interactions between POIs, collaborative filtering interactions between POIs and users, and preferences between POIs and time periods, which we define respectively as < p i , p j >∈ P P , < p i , u j >∈ P U and < p i , t j >∈ P T .
Time Period Modal Interaction.In this paper, we divide a week into 168 time periods.There are correlation between time periods, preferences of the public in a specific time period, and the popularity of POI under different time periods, which we define respectively as < t i , t j >∈ T T , < t i , u j >∈ T U and < t i , p j >∈ T P .

Heterogeneous Information Network for POI Recommendation(HINPR).
Unlike LBSN, HINPR is an open, heterogeneous network structure.Unlike knowledge graphs, edges in HINPR only represent the existence of connections between nodes, and there is no entity relationship.The features of HINPR allow us to focus on the correlation relationships between data of different modalities and provide a unified framework to represent data of different modalities.Combining user modal, POI modal, and time period modal, we get a HINPR G = (V, E), where V = (U ∪ P ∪ T ),

Data Observation
We observed the user's mobility behavior and the correlation between different modal data on three real-world datasets: New York City, Jakarta, and TokyoLi et al (2018).
Figure 2 shows the size of POI intersections between users and their friends in three cities. Figure 2(a) shows the intersection size in New York City is generally less than 5.However, In Jakarta and Tokyo, the intersection size mostly ranges from 1 to 20.The differences in the POI intersection size show that user POI preferences in Jakarta and Tokyo are more likely to be influenced by social relationships compared to New York City.
Figure 3 shows the size of time period intersections between users and their friends in three cities.Users in Tokyo have more similar time periods of activities to their friends, because the intersection size of time period between the user and their friends in Tokyo is around 1-40.On the contrary, Figure 3   Fig. 3 The number of time period intersections between the user's friends and the user.
Combining the observations in Figure 2 and Figure 3, we can infer that New York users' check-in preferences are less influenced by their friends, while Tokyo users' check-in preferences are easily influenced by their friends.Although Jakarta users' check-in location preferences are easily influenced by their friends, however the time period of their activities is personalized.Therefore, it is necessary to introduce the influence of different modal factors on users' check-in preferences.
By analyzing the check-in dataset, we can conclude that implicit affiliations between users' different modal check-in data exist, and these implicit affiliations often show a power-law distribution.Many studies have also proposed that user check-in data have power-law distribution properties, but these studies only focus on the power-law distribution of check-in frequency and POI transitions Qiao et al (2020); Feng et al (2020).Therefore, the study of embedding representation based on multi-modal check-in data is significant.Multimodal Interaction Aware Embedding for Location-Based Social Networks

The SAPRec Framework
In this section, we first introduce the proposed multi-modal sub-graph sampling method.Then we present a method named light deep graph infomax to learn the embedding representation for each node in the heterogeneous information network.

Multimodal-Aware Sub-graph Sampling(MSS)
Graph sampling-based method is motivated by the challenges in scalability (in terms of model depth and graph size) Zeng et al (2020).When more modalities are considered, the larger the HINPR (the heterogeneous information network for POI recommendation) is constructed.In this context, the solution time and memory requirements of traditional methods, such as GCN, increase exponentiallyKipf and Welling (2017).Therefore, we introduce the graph sampling method into HINPR, which effectively alleviates the model time complexity and space complexity.Based on the features of heterogeneous information networks, we introduce three modality-aware sub-graph sampling techniques.
There is a part of HINPR in which we can apply multi-modal sub-graph sampling.In this paper, we only consider the interaction between three modal data: user, POI, and time period.The HINPR matrix M t,s = {m i,j i, j ∈ V } is defined according to interaction between modalities, where m i,j = 1, if interaction (i, j) is observed; 0, otherwise. (1) As shown in Equation 1, a value of 1 for m i,j indicates there is an interaction between node i and node j.POI p i and POI p j are accessed successively in the user's check-in history and dis(p i , p j ) < D, therefore we set m pi,pj = 1, where dis(.) is the distance function between p i and p j , D is maximum distance threshold.And for temporal modal data, when the two time periods t i , t j are successive in time sequence, then we set m ti,tj = 1.
With Equation 1, we can generate an adjacency matrix for each modal interaction, shown as Equation 2 In Equation 2, M t,s is the adjacency matrix of the multi-modal interaction, t, s ∈ T M , and T M is types of modalities, in this paper, T M = {u, p, t}.Finally, we combine M t,s into M for sample processing, as shown in Equation 3. (3) We propose three sampling strategies to obtain sub-graph: node-level hop sampler, random walk sampler, and user-based random walk sampler.
Node-level Hop Sampler.For each sub-graph G sub = (V sub , E sub ), we first sample a node n i from V uniformly at random, and put node n i into set V sub .Then, we get a neighbor set S n of node n i .For each node n j in set S n , we obtain the set of its h neighbor nodes randomly and put node n j into V sub .We stop it until V sub > k, where k is the size of sub-graph.
Using the Node-level Hop Sampler, the probability of nodes with different modalities being sampled is as follows: In Equation 4, where T M is types of modalities, T N j is the number of nodes with the same modal as node j, h is the number of sampling for node i's neighbor nodes, and deg(n i ) is degree of node i.The sampling probability for different kinds of modalities depends on the network topology, the number of nodes in that modality, and the degree of the previous node.Thus Node-level Hop Sampler prefers to sample users' existing interactions and has difficulty in mining users' potential interest preferences.Details can be found in algorithm 1. Initialize sub-graph G i sub = (V i sub , E i sub ). 3: Let i = RandomInt(0, ColNum(M )).

5:
Get neighbor nodes set N of node i.

6:
Put node i into V i sub and edge (i, i adj ) into E i sub . 7: end while

8:
Put G i sub into sub-graph set S. 9: end while 10: return S Random Walk Sampler.There are numerous random walk based samplers proposed in the literature Zeng et al (2020).In our experiments, we implement a regular random walk sampler (with r root nodes selected uniformly at random, and each walker goes h hops).
Using the Random Walk Sampler, the probability of nodes with different modalities being sampled is as follows: (5) In Equation 5, where r is the number of root nodes and h is length of walk steps.Unlike in Equation 4, h in Equation 5 is usually larger than Equation 4. Therefore, Random Walk Sampler prefers to explore the connections between users' different modal interactions and recommend novel points of interest for users.

Normalized Sum
Layer User-based Random Walk Sampler.In section 3.2, we introduce a correlation between user social relationships, POI check-ins, and time periods.Therefore, we propose a user-based random walk scheme to sample friendship, POI check-ins, and time periods jointly.For each sub-graph G i sub = (V i sub , E i sub ), we use random walk of depth d walk h times through the social network of user u i to obtain the set S i f riend of user's friend nodes.Subsequently, for each user u j in the user's friend set S i f riend , we sample c times check-in records for the user according to a normal distribution, where the check-in record contains POI p j and time periods t j , then we put u j , p j , t j into V sub .Compared with the first two sampling algorithms, User-based Random Walk Sampler pay more attention on the impact of users' social relationships over POI as well as active time period.Details can be found in algorithm 2.
We construct the edge e i,j ∈ E sub of sub graph G sub = (V sub , E sub ) that are based on the adjacency of the node n i ∈ V sub and n j ∈ V sub in HINPR, where Initialize sub-graph G i sub = (V i sub , E i sub ).

4:
Put user u into V i sub 5: Initialize user u friend set 6: S u f = GetU serF rinedSet(u, f ).

17:
Put node p and node t into V i sub . 18: Let j = j + 1.

22:
Let G i sub = (V i sub , E i sub ).

23:
Put G i sub into sub-graph set S. 24: end while 25: return S Zhang et al. propose that in order to learn the global features in large graph(such as HINPR) correctly, the total number of nodes , where V is the node size of HINPR Zeng et al (2020).

Light Deep Graph Infomax(lightDGI)
After getting the sub-graph set S, we need to deal with the problem of how to generate node embeddings e i ∈ E V×R , where E is an embedding set of HINPR' nodes, and R is embedding dimension.Numerous studies have been focusing on unsupervised graph embedding methods, such as GraphSaga, LINE, and DeepWalk Hamilton et al (2017); Tang et al (2015); Perozzi et al (2014).
From the viewpoint of graph convolution, the convolution process already fuses its neighbor information, and after several such patch-level fusions, the neighboring nodes already have similar expression vectors.Thus, in the loss function part, we should pay more attention to the variability among neighboring nodes to generate a more efficient node embedding representation.
Therefore, Velickovic et al. propose the DGI method to maximize mutual information in graph convolutional neural network Velickovic et al (2019).GCN was initially designed for graph node classification tasks, and these nodes are rich in attributes to be used as input features.However, each node (user, POI or time period) in HINPR is only described by an ID, which has no concrete semantics besides being an identifier.In such a case, feature transformation will bring no benefits, but negatively increases the difficulty for model trainingHe et al (2020).Therefore, we propose lightDGI to improve model performance and reduce the number of parameters on HINPR.
In each embedding learning batch of lightDGI, we need to maximize local mutual information as much as possible.In section 4.1, we obtained the subgraph set S of HINPR.Moreover, for each sub-graph G i sub , we seek to obtain node representations that capture the entire sub-graph's global information content, represented by a summary vector s.
In order to obtain the sub-graph summary vectors s, we define a graph readout function R(x) = mean(e i ), e i ∈ E sub and E sub ∈ E, and use it to fuse local representation embeddings into sub-graph representation, i.e., s = R(E sub ).
Since lightDGI is based on an unsupervised graph embedding method, we need to obtain negative sample G i neg of sub-graph G i sub .Based on positive sub-graph G i sub , we do not change the adjacency of sub-graph edges E sub , and then we use normal distribution N to generate node index in the range 0 to V − 1 for each node in V sub .The negative graph G i neg of G i sub is defined as (7) In Equation 7, where G(V, E) is a graph generation function, based on node set V and edge set E. G i neg has the same adjacency structure as G i sub , but every node embedding changes.Next, we need to make the information flow through nodes, where we use a lite GCN to aggregate information flow between each node.We define a lite GCN as (8) In Equation 8, where D is a diagonal matrix of sub-graph G i sub , I is identity matrix of sub-graph G i sub , and A is adjacency matrix of sub-graph G i sub .
(9) In Equation 9, where H (0) is a node embedding matrix of sub-graph G i sub , and Â is normalized adjacency matrix of sub-graph G i sub .Inspired by contrastive learning, we define a discriminator function to calculate the probability score of each node with summary vector s, function is k+1) .
(11) In Equation 10, G i sub is the positive sample in contrastive learning.We need to encode the positive graph G i sub as a teacher to teach the discriminator to know what the positive graph is and what the negative graph is.In Equation 11, where E is expand function to expand s to the same size as H (k+1) and W is weight matrix for calculate the probability score.
For the loss function, we use maximize Jensen-Shannon divergence loss Velickovic et al (2019) to evaluate mutual information probability scores for positive and negative samples, and it is defined as In Equation 12, (V, E) is positive set, (V , E) is negative set, P is size of positive set and N is size of negative set.

Experiments
In this section, we evaluate SAPRec on location/activity prediction tasks.First, we introduce the experimental setup and discuss the advantages of SAPRec over other baseline models.Then we explore the performance of the SAPRec under different samplers.Finally, we analyze the information embedding effect of the SAPRec under different parameter settings.

Experimental Setup
We used a large scale and long interval LBSN dataset containing three cities collected by Yang et al (2019).Due to the cultural differences and social preferences of users in different regions, the three selected cities are Tokyo(TKY), New York(NY), Jakarta(JK).We split our check-in data into two parts, i.e., the first 80% for training and the last 20% for testing.The hardware configuration of our experimental platform is as follows: AMD Ryzen 7 3700X 8-Core Processor, 64GB DDR4 memory and GeForce RTX 2070(8G).The software configuration of our experimental platform is as follows: Python 3.6.12,Pytorch 1.7.We use Xavier uniform to initialize the embedding parameters and set the initial learning rate to 0.001.We set embedding size to 128, sub-graph size to 200, number of sub-graph to 2000, node-hop to 3, number of root node to 32, length of random walk to 16, friend sample number to 8, friend windows size to 2, and check-in sample number to 3. We divided the dataset into training set, validation set and test set by 8:1:1.
To avoid the variability of different model metrics, we metricize all comparison models under cosine space.The metric formula is We use the precision of Top@K as an evaluation metric, and its formula is defined as

Comparison Against State-of-the-Art Methods
We compare our method to the following state-of-the-art graph information embedding methods: NetMF Qiu et al (2018) derives the closed form of DeepWalk's implicit matrix and factorizes this matrix to output node embeddings.
LBSN2Vec++ Yang et al (2020) propose a POI recommendation embedding method and learn the embedding representation of nodes in cosine space.
IMP- GCN Liu et al (2021) performs high-order graph convolution inside subgraphs to identify users with common interests by generating embedding of user features and graph structure.IMP-GCN outperform the state-of-the-art GCN-based recommendation models significantly.
SAPRec is the model we propose, and we denote the SAPRec with three different sampling strategies.SAPRec urw indicates user-based random walk sampler, SAPRec rw indicates random walk sampler and SAPRec nh indicates node-level hop sampler.
As shown in Table 3, our model achieves state-of-the-art embedding results on the datasets of three cities.Through the previous analysis of the datasets in Figure 2 and Figure 3, Tokyo users have the most frequent interactions between different modal data.Hence, our model performs best on the Tokyo dataset, proving that our model can fully explore the interactions between different modal data and generate effective embedding representations.Furthermore, the three sampling strategies we propose have different performances on the three cities' dataset, and the random walk strategy has the best average performance.
NetMF is a matrix decomposition-based graph embedding representation model that learns the representation of each node in a graph network by converting the network embedding into a matrix decomposition.However, NetMF does not consider the heterogeneous information in the LBSN graph and the correlation of information between different modals, therefore NetMF achieves a poor performance in this experiment.
DHNE proposes a deep hyper-network embedding model to embed hypernetworks with indecomposable hyper-edges.DHNE does not decompose the connections between nodes into bipartite graphs as in previous graph embedding models but learns the embedding directly on the heterogeneous network, thus preserving the rich structural information retained in the heterogeneous network.However, DHNE is a graph embedding model without a targeted design for LBSN heterogeneous graphs and thus has poor performance on the three city datasets.
LBSN2Vec++ is a heterogeneous hypergraph embedding approach explicitly designed for LBSN data for automatic feature learning.For the LBSN heterogeneous graph features, LBSN2Vec++ samples the nodes on the heterogeneous graph by the random walk.It embeds the multimodal node information into the same embedding space using a linear transformation to realize the homogeneous metric of heterogeneous information.However, the embedding learning method of LBSN2Vec++ does not consider the adjacency relationship between heterogeneous graph nodes, so its performance still has a gap compared to SAPRec despite considering different modal information on LBSN.
IMP-GCN performs high-order graph convolution inside subgraphs to identify users with common interests by exploiting user features and graph structure.IMP-GCN achieves excellent performance in traditional recommendation scenarios by relying on user-item interaction, so IMP-GCN also achieves suboptimal performance on LBSN-based POI recommendations.However, like DHNE, IMP-GCN is not explicitly designed for LBSN heterogeneous graphs, so there is still a tiny gap in LBSN-based POI recommendations compared to SAPRec.

Effect of Different Sampling Strategies
To investigate the effect of three sampling strategies on the performance of the embedding representation of SAPRec, we experiment on the dataset of Tokyo.
In Figure 5, it shows that the random walk strategy has the best embedding representation performance on the Tokyo dataset.This situation is because the random walk can obtain deeper neighboring nodes than the node-level hop and user-based random walk.Therefore more mutual information can be learned in lightDGI.

Ablation Experiments
In this section, we verified the effectiveness of lightDGI by ablation experiments and explored the different parameter settings of SAPRec.

Ablation Experiments for lightDGI
In Table 4, we compare the embedding representation performance of SAPRec with feature transformation layer(FTL) and SAPRec without feature transformation layer on the Tokyo dataset.As we can see, SAPRec without Dense has achieved an enormous performance boost, an average performance improvement of up to 35%.This result proves the effectiveness of lightDGI.

Ablation Experiments for Different Modalities
To verify the effect of different modal data on the performance of our model, as well as to demonstrate that multimodal data has a positive effect on improving the embedding representation, we conducted ablation experiments on multimodal data.γ indicates that all modalities are considered, α indicates that the effect of time period on POI is not considered, β indicates that the effect of the Multimodal Interaction Aware Embedding for Location-Based Social Networks Pre@5 Pre@10 Pre@15 Pre@20 Pre@30 Pre@50  time period on POI and users is not considered, θ indicates that that we only consider social influence and interaction between users and POIs.As shown in Figure 6, on the Tokyo dataset, the time period has a strong influence on user check-in preferences.And the performance of the embedding representation of the model decreases whenever some of the modal data is removed so that SAPRec can generate effective embedding representations for multimodal information.

Embedding Size for SAPRec
Through parametric experiments on SAPRec, we can draw a conclusion that the model can obtain the best embedding representation performance at an embedding dimension of 128.When the embedding dimension is too small, it is not conducive to the fusion of multimodal information, which results in information loss.On the other hand, when the embedding dimension is too large, it causes difficulties in model convergence, which leads to the degradation of model performance.

Conclusion
In this work, we propose a multimodal interaction aware embedding learning framework named sampling-aggregated POI recommendation embedding representation learning framework(SAPRec).SAPRec can generate reliable embeddings on the heterogeneous socio-spatial network by the modality-aware sub-graph sampling and the self-supervised contrastive learning.We conduct experiments to demonstrate the strengths of SAPRec: better information fusion capabilities and more effective embedding representation.In the future, we will further explore the integration of SAPRec with downstream recommender systems and further improve the accuracy and robustness of POI recommendations.

Ethical Approval and Consent to participate
Not Applicable.

Consent for publication
Not Applicable.

Availability of supporting data
Not Applicable.

Competing interests
The authors declare that they have no competing interests

Fig. 1
Fig. 1 A multi-modal information network contains time period modal, POI modal, user modal, and other modalities.The solid line in the figure indicates the relationship of nodes in the same modal domain, and the dashed line indicates the relationship between different modal domain.
(a)   and Figure3(b)  show that users in New York City and Jakarta have less similar time periods of activities to their friends.

Fig. 2
Fig. 2 Number of POI intersections between the user's friends and the user.

Fig. 5
Fig. 5 Effect of three sampling strategies on the performance of the embedding representation of SAPRec.

Fig. 6
Fig. 6 Embedding representation performance of SAPRec with different modalities.

Fig. 7
Fig. 7 Embedding representation performance of SAPRec with different embedding dimensions.
This work is supported by the National Natural Science Foundation of China (62072094), the LiaoNing Revitalization Talents Program (XLYC2005001), the Key Research and Development Project of Liaoning Province (2020JH2/10100046).

Related Work 2.1 Graph Neural Networks Graph
Wang et al (2019b)ve recently become a critical research method in the recommendation system.Li et al.propose a few-shot learning framework, which encodes geographical neighborhood information using graphs and models the dependence relationship among businesses using graph convolutional networksLi et al (2020).Wang et al.propose NGCF, which exploits the useritem graph structure by propagating embeddingsWang et al (2019b).NGCF leads to the expressive modeling of high-order connectivity in user-item graph, effectively injecting the collaborative signal into the embedding process in an explicit manner.Ying et al. propose PinSage, which paves the way for a new generation of web-scale recommender systems based on graph convolutional architectures Ying et al(2018).In this paper, we partition heterogeneous information network (HIN) into subgraphs by three sampling methods in order to better capture the personalized information of users and the attribute features of items.Compared with previous work, SAPRec focuses more on the local features of the HIN, i.e., the personalized information of the user.
• Extensive experiments were conducted on datasets of three cities to prove our model's excellent performance on multimodal interaction aware embedding.Multimodal Interaction Aware Embedding for Location-Based Social Networks 2

sample set Table 1 Symbol Definition Descriptions
NNegative

)
Algorithm 2 User-based Random Walk Sampler Input: PRHIN matrix M Parameter: Sub-graph size k, friend sample number f , friend windows size w, check-in sample number c Output: Sub-graph set S 1: while S ≤ n do 2: Statistics of three city datasets

Table 3
Precision@K comparison results of different models on 6 cities datasets