A GNN-based proactive caching strategy in NDN networks

As people spend more time watching and sharing videos online, it is critical to provide users with a satisfactory quality of experience (QoE). Leveraging the in-network caching and named-based routing features in Named Data Networks (NDNs), our paper aims to improve user experience through caching. We propose a graph neural network-gain maximization (GNN-GM) cache placement algorithm. First, we use a GNN model to predict users’ ratings of unviewed videos. Second, we consider the total predicted rating of a video as the gain of the cached video. Third, we propose a cache placement algorithm to maximize the caching gain and actively cache videos. Cache replacement is implemented based on the cache gain ranking of videos, with higher cache gain videos replacing lower cache gain videos. We compare GNN-GM with two state-of-the-art caching strategies, namely the NMF-based caching strategy and GNN-CPP. GNN-GM is also compared with two traditional caching strategies, LCE and LRU, LCE and FIFO. We evaluate the five caching strategies using real-world datasets in a tree network topology, a real-world network topology GEANT, and various random topologies. The experimental results show that our caching policy significantly improves cache hit ratio, latency and server load. Notably, GNN-GM achieves a 25% higher cache hit rate, 5% lower latency and 7% lower server load than GNN-CPP in GEANT.


Introduction
With name-based routing and in-network caching, NDN [1] offers many advantages. Unlike IP packets which require host addresses to forward packets to specific locations, NDN packets are associated with a unique content name identifier, and all nodes have caching ability in NDN. Users send an Interest packet, and any node receiving the interest packet and having the content can reply with a Data packet.
Suppose a network has a video provider far away from the user. In this case, the IP network requires the user to get the video from that content provider, which causes a significant delay. However, NDN's name-based routing and in-network caching capabilities enable the NDN node to cache the video requested by the user. Then, if the user-requested video is cached at a node near the user, the video can be sent back to the user as soon as the node receives the user's request. It can undoubtedly improve the user experience and reduce the traffic load of the whole network.
However, due to the limitation of node cache size, the best we can do is to cache only popular content. Recently, several papers [2][3][4][5] have addressed this challenge by applying deep learning-based models to predict the number of future content requests and actively cache these popular contents on nodes. They all aim to make caching decisions based on the expected number of content requests without considering user preferences. However, user preference is an essential factor in caching because it reflects user request patterns and can be used to predict what content users will be interested in in the future.
The authors of the paper [6] predicted users' future demand through user preferences. They adopted the Non-Negative Matrix Factorization (NMF) [7] technique in the recommender system to predict user ratings to videos. Following that, they proactively cached popular videos and achieved promising results. However, the problem with the 1 3 NMF technique is that it is transductive, and thus it cannot generalize to unseen users or videos during the training stage. To address the NMF's problem, they also considered the previous popularity of videos to help make caching decisions. However, the popularity of videos in the past does not strongly correlate with their popularity in the future. Usually, users who have watched videos in the past are likely not to watch them again in the future. To address these problems, we utilize an inductive matrix completion (IGMC) [8] technique, which is based on Graph Neural Network (GNN), to predict user ratings to videos that have not been watched. Furthermore, we consider the total predicted ratings of a video as the gain of caching the video. Following this, videos are cached according to their ranking of gains in descending order.
The contributions of this paper are as follows: • We utilize an inductive GNN-based model to predict user ratings of movies that have not been watched and use the total predicted movies' ratings as the gains in the caching framework. We are the first to apply a GNN model to the caching problem to the best of our knowledge. • We propose a gain-based caching placement algorithm utilizing gains of caching the movies to make caching decisions. • We deploy our proposed scheme and state-of-the-art caching algorithms on Mini-NDN. We evaluate various caching algorithms using the real-world dataset and different network topologies. Our proposed caching strategy achieves a 25% higher cache hit ratio, 5% lower latency and 7% lower server load than the state-of-the-art algorithm in a real-world network topology GEANT.
A preliminary version of this work [9] was presented at the 2022 IEEE ICC Workshop on Research Advancements in Future Networking Technologies (RAFNET). The rest of this paper is organized as follows. Section 2 overviews related work. Section 3 presents our proposed caching strategy. Section 4 presents the experimental results. Section 5 concludes the paper.

Related work
In NDN, caching can generally be divided into two main categories: reactive caching and proactive caching. In reactive caching, content is cached only as it passes through the node. Unlike reactive caching, proactive caching actively caches content at the node. If a node caches the requested content in advance, it can immediately satisfy the interest without forwarding it to the server, even if the content has never been requested before. This section reviews various reactive and proactive caching strategies. In addition, papers that make caching decisions based on user preferences are presented.

Reactive caching
A traditional cache placement algorithm: leave copy everywhere (LCE) [1] aims to cache packets as soon as they pass through the node. However, a significant disadvantage is that it reduces cache diversity. Another method, Leave copy down (LCD) [10] is to cache the content in the immediate neighbourhood of the original producer. However, since the cached content is only one hop away from the producer, it is still non-optimal. On the other hand, cache replacement is essential to evict the undesired content and make room for more popular content. The traditional methods of cache replacement are least recently used (LRU), least frequently used (LFU), and first-in-first-out (FIFO), [11,12]. LRU works by discarding the least recently accessed content, while LFU discards the least frequently used content first. The caching method FIFO is a less efficient caching strategy compared to LFU and LRU. It discards the oldest content when there is no cache space available, regardless of the popularity of the content. In our paper, we compare the performance of our scheme with LCE and LRU, LCE and FIFO schemes. Recently, paper [13] proposed a cache placement and replacement strategy named CnS in 5G-enabled Information-Centric Networking (ICN) networks. The authors performed cache placement as two steps: (i) calculating the content popularity (ii) based on the popularity, determining whether the content needs to be cached. If so, the content will be cached locally or pushed down towards the edge nodes. Once the nodes' cache store is full, the cache replacement policy will be executed based on content popularities. Paper [14] proposed a Push Down popular, Push Up lesspopular (PDPU) cache placement strategy in ICN. It aims to push popular content to the edge nodes while pushing less popular content to the core network. The authors also developed a one-hop cache notification to notify neighbouring nodes of their cached contents. Besides, the cache replacement was done through the content popularity. However, the two papers do not apply a powerful tool -deep learning -to predict the popularity of content. Moreover, they utilize the reactive caching strategy that is less powerful than proactive because interest packets can only be satisfied if the content was requested before.
The authors in paper [15] proposed a deep Q-learning caching algorithm in the ICN-based intelligent Internet of Vehicles (IoV) scenario. They focused on providing integrated computing and caching services at the edge server. The deep Q-learning-based algorithm was used to predict the popularity of service requests of vehicles, and joint computing and caching decision were made on edge nodes. Paper [16] utilized multi-level federated Reinforcement Learning (named CoCaRL) to cache contents in vehicular networks. The reinforcement learning (RL) is used to optimize cooperative caching, and the federated learning is used to reduce the communication and computation latency as well as cost. Paper [17] proposed a deep reinforcement learning (DRL) caching strategy to realize the QoE-driven roadside units (RSUs) caching update strategy in IoV. The QoE-driven RSUs caching model was established based on their innovative user interest model.

Proactive caching
Recently, paper [18] proposed a proactive caching strategy based on the popularity and chunks of large content objects. They considered the caching problem as an optimization problem. They aimed to minimize the number of forwarding nodes and the number of content replications from consumers to cache nodes while obeying the cache capacity constraint. Paper [19] described their proactive caching strategy as an Ant Colony Process. They demonstrated that their approach could put contents close to the user and reduce access latency. Paper [20] proposed a mobility-aware proactive caching algorithm in ICN-IoV networks. They modelled the mobility of vehicles and their connection with RSUs using a Markov Model. These papers did not apply a deep learning strategy to predict the popularity of content.
The authors in [5] proposed a DeepMEC strategy that applies deep learning to predict future requests count (popularity) of contents and proactively cache contents with high popularity scores. They utilized various types of deep learning models, Recurrent Neural Networks (RNN), Convolutional Neural Networks (CNN), and Convolutional Recurrent Neural Networks (CRNN), to predict future content requests and then compare their performance. The authors in [2] proposed a technique, named IntellCache, to increase the caching efficiency by predicting future content popularity using deep learning models, multilayer perceptron (MLP), Long short-term memory (LSTM), and a combination of LSTM and CNN. The authors of [21] proposed a proactive sequence-aware content caching strategy (PSAC), which is based on a convolutional neural network and an attention mechanism to make caching decisions.

User preferences-based content caching
Some papers make caching decisions by predicting user preferences. The authors of [22] proposed a proactive caching method in the 5G-ICN scenario. They applied NMF [7] to predict user ratings of movies that have not been watched. Furthermore, they considered content's historical popularity and achieved better performance than traditional reactive caching strategies. Following this, they combined the caching approach with autonomous vehicle (AV) user mobility predictions in highway scenario [6]. Paper [23] proposed a cooperative caching scheme that jointly considers caching locations, content popularities, and predicted future content ratings to make caching decisions in ICN-based vehicle networks. They also use the NMF technique to predict content ratings in the future. Paper [24] applied the collaborative filtering-based caching strategy to optimize edge caching in ICN-Internet of Things (IoT) architecture. They divided the cache space of each edge node into two halves, where the first half is to cache contents based on the content's local popularity, and another half is to cache contents based on the highest possible content to be requested in the future. They utilized collaborative filtering, which calculated cosine similarity between any pair of contents and predicted the content requested probability in each edge node in the future.
Machine learning techniques Matrix Factorization (MF) [25], Singular Value Decomposition (SVD) [25], and NMF are used to predict users' preferences, i.e., to predict users' ratings of videos they have not watched. These techniques characterize items and users by vectors, and the factorization between the user and item vectors is the corresponding rating. However, a big problem of matrix factorization is the cold start problem. We cannot make predictions for items and users that are neven been seen in the training stage because their embeddings are not available. It means that the MF approach is transductive.
Recently, GNNs have been used for user-item rating prediction. Recent research [26] applied inductive node-level graph convolutional neural (GCN) [27] framework to make item recommendations to users. The authors of [8] proposed an IGMC model to predict the ratings between users and items with encouraging performance. They viewed the users and items rating matrix as a bipartite graph with two types of nodes, user-type and item-type. Edges only exist between users and items with ratings as labels. In this case, the rating prediction problem is converted to an edge label prediction problem. The IGMC can tackle the cold start problem encountered in the MF approach, and our paper uses this model to predict users' ratings of movies.

Proposed methodology (A) System Model
We consider a NDN network consisting of F forwarders and C user communities, denoted by F = {f 1 , f 2 , ..., f F } and C = {c 1 , c 2 , ..., c C } , respectively. Each user community is placed at a different forwarder. There are U users and M movies in our model, denoted by U = {u 1 , u 2 , ..., u U } and M = {m 1 , m 2 , ..., m M } , respectively. All users are divided into C user com-munities, and each u i ∈ c i but u i ∉ C�c i , where c i ∈ C . Users give a rating to movies they have watched, denoted as r u i m i , where u i ∈ U and m i ∈ M . For movies they have not watched, the ratings are empty. We consider each rating as a request in NDN. We assume that user communities send interest packets follows a Uniform Distribution with 1 requests per second or a Poisson Distribution with 2 requests per minute.
In our model, all forwarders have the caching ability with a uniform cache size N, which is defined as number of movies. We apply a binary variable Our proposed caching strategy aims to predict ratings u i to movie m i , and make optimized caching decisions within limited caching space.

(B) IGMC Ratings Prediction Model
Unlike traditional matrix factorization techniques [7,25], IGMC [8] trains a GNN model. We take an approach similar to IGMC to make rating predictions. Given that a matrix contains ratings from users to movies, we build an undirected bipartite graph G = (U, M, E) , where U denotes sets of users, M denotes sets of movies, and E denotes set of edges. Edges exist between a user u i and a movie m i instead of two users or movies. Each edge has a label r The first component is enclosing subgraph extraction. From a (u i , m i ) pair, a breadth-first search (BFS) strategy is applied to extract u i 's and m i 's h-hop enclosing subgraph. We select a 1-hop subgraph, and each subgraph includes: (i) the target user u i , (ii) the target movie m i , (iii) all users that have watched the movie m i , (iv) all movies that the user u i has watched, (v) known edges and corresponding labels between users and movies. The subgraph is fed into a GNN model and mapped to the target rating r u i m i . The second part of the rating prediction is node labelling. The purpose of node labelling is to distinguish the target user, target movie, user-type nodes and movietype nodes in the enclosing subgraph extracted in the first step. We give label 0 to the target user and label 1 to the target movie. Other nodes' labels are given according to the hop count included in the subgraph. For example, a user-type node is included at the n th hop, and then it will be given a label 2n, while a movie-type node will be given a label 2n + 1 if it is included at the same hop. One crucial point is that node labels depend on the local subgraph rather than the global bipartite graph. Therefore, we can predict ratings even for a subgraph from an entirely different bipartite graph. After labelling each node in the subgraph, we consider the one-hop encoding of each label as the initial feature of that node.
The third part trains a GNN model to predict ratings from the (u i , m i ) 's 1-hop subgraph. We utilize a graphlevel GNN strategy and aim to map the subgraph to the target rating r u i m i . The IGMC paper applies a relational graph convolutional operator (R-GCN) [28] to implement the message passing layers in GNN. R-GCN is an extension of [27], and the main difference is that the former one is to handle heterogeneous graphs where there are different edge types in a graph, while the latter one not. In our dataset, the ratings range from 1 to 5, each with an edge type. Therefore, R-GCN is adopted to handle the five edge types. It works as follows: 1) for a central node, aggregates its 1-hop neighbouring nodes features; 2) Update the central node's feature based on the neighbouring nodes' features and edge types. The procedure is as below: where x l i is the node i's feature at layer l, and W l 0 is a learnable weight matrix that applies to the node's self-loop connection. Note that each edge has a rating feature, and edges with the same feature have the same edge type. We use R to denote the set of all edge types. For each edge type r ∈ R , N r (i) is the set of 1-hop neighbour nodes of node i connected through edge type r, and W l r is a learnable weight matrix corresponding to the edge type r and massage passing layer l. For each node i and a message passing layer l, we compute a feature vector x l i . After computing L feature vectors, each corresponding to a message passing layer, we concatenate the feature vectors computed for each node and consider this to be the final representation of that node. The following step is to concatenate the final representations of the target user and target movie and treat it as a graph representation. Finally, the ReLU activation function and MLP are applied on the graph representation to predict the rating r It is worth noting that the model only leverages subgraph patterns and ignores user or movie features, which are difficult to achieve due to information privacy and label cost. Besides, it is inductive as it learns GNN parameters rather than user or movie embeddings. Therefore, the model generalizes well to unseen users or movies during the training stage. Furthermore, the model can transfer to new tasks since different datasets may share similar subgraph rating structures.
After predicting users' ratings of unwatched movies, we apply our gain-based caching decisions across the network. The proposed caching decision algorithm is described in the next section.

(C) Caching Decision
This section introduces the caching decisions for each forwarder in the network. In our paper, we consider each movie's total predicted ratings in a user community c i ∈ C as the gain of caching the movie: where ∑ u i ∈c i r u i m i is the sum of movie m i 's predicted ratings in the user community c i , and max m j ∈M ∑ u i ∈c i r u i m j is the maximum sum of a movie's ratings in c i . We normalize the gains of each movie by the maximum gain of a movie in the same user community. The total ratings for each movie reflect the movie's popularity across all users in that community. We aim to maximize the total gain G of caching movies in the network, which is mathematically formulated as follows: where S f i is the set of user communities whose requests pass through the forwarder f i . Besides, the number of cached movies in each forwarder f i does not exceed the maximum cache size N. Given a network topology and g C M , our task is to make caching decisions for each forwarder in order to optimize Eq. (3). It is worth mentioning that our network topology is static, and the routing policy is the shortest path routing. We firstly apply Dijkstra's algorithm to find the shortest path from each f to the server and optimize the content caching along the shortest path tree. In the shortest path tree, the server node is considered the root. Let V denotes a node (i.e., a server or a forwarder). Each V is associated with attributes { id, gain_t , gain_arr , u_set , cache_size , n, child_arr , par, local_arr , global_arr } , where id: a unique id; gain_t : a hash table with (item, gain) pair; gain_arr : a two dimensional array stores [item, gain] pair in its gain_t ; u_set : a set storing user communities' id whose requests pass through the current node; cache_size : an integer scalar indicates the cache size, cache_size = 0 for the server and cache_size = N for forwarders; n : an integer scalar indicates the node should cache the item with n th highest gain, n = 0 by default; child_arr : an array stores a node's direct children; par: the node's direct parent, each node has at most one parent node due to the extraction of the shortest path tree; local_arr : an array that stores cached items by the node; global_arr : an array stores the node and its ancestors' cached items.
Algorithm 1 illustrates our proposed gain-based caching strategy where all user communities are provided as input. Each user community has a unique id and is placed at a different forwarder in the network topology. For each user community, we update its user community set ( u_set ) with its own id and then call the "Node-Initialization" function with a parameter c.
Algorithm 2 represents the process of node initialization, where a user community c is provided as input. In this function, we traverse the shortest path tree from c to the server. If c has a parent node, we update its parent node's gain_t by merging the parent node's and child node's gain_t using plus operator. Besides, we update the parent node's u_set by unioning the child node's u_set . If the parent node receives the exact user community requests as its child node, we assign its child node's n plus 1 to the parent node's n (e.g., assume n = 0 , then the child node caches the item with the highest gain, but the parent node caches the item with the second-highest gain). The idea is to put the popular item near the user community and optimize the caching diversity. Otherwise, we assign 0 to the parent node's n, indicating the parent node caches the item with the highest gain in the parent node's gain_t . The process is repeated until the server node is reached.
After updating nodes' information, the Algorithm 3 is executed, where the server node is the input. We traverse the shortest path tree from the server to the forwarder to make caching decisions. Firstly, each forwarder's gain_t is updated by removing items cached by its ancestors. The idea is to make downstream forwarders not cache items cached by upstream forwarders. The following is to append (item, gain) pairs in gain_t to the gain_arr and sort gain_arr by gains in descending order. Next, the loop (i.e., cache_size ) indicates the cache space of the current forwarder. For each iteration, the forwarder caches the item with the gain in the (n * cache_size) th index in gain_arr . The cached item is inserted into the forwarder's local_arr and global_arr . Besides, the item is removed from gain_t and gain_arr . The process is repeated until reaching the user community.
Once the caching decisions for each forwarder are made, we proactively load items in the forwarder's local_arr to its cache store. The proactive caching process makes sure that, before users send Interest packets, contents are already available in the forwarder's cache store to satisfy user requests. Regarding the cache replacement policy, when the forwarder's cache store is full, the content with the lowest cache yield will be evicted first.

Experimental results
To evaluate our caching algorithm, we use Mini-NDN [29] to perform all experiments. Mini-NDN is an emulation tool, and it runs real instances of NDN packages. We deploy our GNN-GM and the NMF-based proactive caching strategy proposed in paper [6] on Mini-NDN. The authors of [6] also took user mobility into account when making caching decisions due to the highway simulation environment, which is different from our paper. Therefore, we do not consider user mobility. Except for the user mobility, [6] employs the same caching scheme as [22], which is the method we compare in our paper. We utilize their caching decision module to calculate the gain of caching movies in each forwarder. After that, we apply our proposed gain-based caching placement algorithm to make caching decisions for each forwarder. Besides, we also compare our proposed caching algorithm with GNN-CPP [30], which employs the GNN model to make item caching probability predictions. It is worth noting that GNN-CPP makes predictions only based on the item's requested numbers in the past. For GNN-GM, NMF-based caching strategy, and GNN-CPP, we preload items that need to be cached into the forwarder's cache store before users send requests. In addition, GNN-GM and NMF-based caching strategies employ cache replacement policies based on content caching gains, and GNN-CPP employs caching replacement policies based on predicted content popularities. Furthermore, we compare two traditional reactive caching strategies, LCE+LRU and LCE+FIFO.

Experimentation setup
This section presents network topologies, traffic generation, dataset collection and metrics we used to evaluate caching algorithms GNN-GM, NMF-based caching strategy, CNN-CPP, LCE+LRU and LCE+FIFO.

Network topology
Similar to papers [32,33], we employ a real-world network topology GEANT [31], which has 45 nodes associated with 71 edges. The server is placed at the "UK" node, and all other nodes are forwarders. In addition, we explore a tree network topology with 50 nodes. We also explore random topologies with various numbers of nodes {10, 20, 30, 40, 50, 60}. There are one content producer and two user communities for all topologies, and each user community randomly accesses a forwarder. In particular, the root node is the content producer, and user communities can only access leaf nodes in the tree topology. It is worth noting that all forwarders have uniform caching capability.

Traffic generation
We employ NDN Traffic Generator [34] to generate Interest and Data packets. We assume each user community sends interest packets in a Uniform Distribution with one request per second or a Poisson Distribution with 50 requests per minute. Table 1 shows key parameters and values used in our paper.

Dataset collection
We use the public benchmark dataset MovieLens 100K [35], which includes 943 users and 1682 movies. We sort the dataset by timestamp and use 80% of it as the training dataset and 20% as the testing dataset to compare the performances of various caching strategies. Similar to papers [6,22], we consider the user rating for a movie as the user request for that movie. We randomly divide 943 users into two user communities.

Evaluation metrics
The following three metrics are adopted to evaluate various caching algorithms: • CHR (Cache hit ratio): It defines the percentage of requests that can be satisfied by the cached data packets. The CHR is calculated as follows: where cache_hits_num is the number of cache hits and cache_misses_num is the number of cache misses. • ALT (Average Latency Time): It defines the average delay between the time the consumer sends an Interest packet and the time it receives a Data packet.
(4) CHR = cache_hits_num cache_hits_num + cache_misses_num • Server Load: It defines the number of Interest packets served by the server.

Results
This section describes the experimental results for GNN-GM, NMF-based caching strategy, GNN-CPP, LCE+LRU, and LCE+FIFO. We utilize 4 R-GCN layers in the GNNbased rating prediction model. Both GNN and NMF are trained and tested using the same dataset, and they are trained with the Adam optimizer and stochastic gradient descent (SGD) optimizer, respectively. The loss function is the mean square error. The GNN-CPP model requires time-series data. Therefore, we divide the training dataset into four time periods, with 20,000 requests within one time period. The testing dataset has 20000 requests, and thus it can be considered a single period. We use the content requests number in the previous two time periods to predict content caching probability in the next period. The GNN-CPP model includes 3 GNN layers and is trained using the Adam optimizer, and the loss function is binary crossentropy. All experiments are run multiple times, and the results have been averaged. Figure 1a shows the cache hit ratio of the five caching algorithms with various forwarders' caching abilities in a 50 nodes tree network topology. The cache size of a forwarder is {2, 10, 20, 30, 40, 50, 60, 70}. We can observe that the cache hit ratio increases with the increase of forwarders caching size for all caching strategies. GNN-GM achieves the best performance among the five caching strategies. On average, the GNN-GM caching algorithm has a 20% higher cache hit ratio than the NMF-based one. Benefit from accurate user rating predictions and applying total predicted ratings to make caching decisions, GNN-GM caching algorithm has a significant performance improvement (40% higher) over the NMF-based caching algorithm when each forwarder can cache two movies. The GNN-CPP algorithm performs worse than the other two proactive caching strategies because it only considers previous content requests when making predictions. However, in reality, users will not be likely to watch movies they have watched before. Besides, GNN-GM caching algorithm can perform nearly 200% better on average than the other two traditional reactive caching algorithms, LCE+LRU and LCE+FIFO. Figure 1b shows the average latency time of the five caching algorithms. At best, the GNN-GM caching algorithm achieves around 11% and 35% lower latency than the NMFbased caching algorithm and GNN-CPP, respectively. In addition, GNN-GM consistently achieves the lowest latency regardless of the cache size. LCE+LRU and LCE+FIFO have the worst performance, with a notable margin compared with the other three proactive caching strategies. In the best case, GNN-GM caching algorithm can achieve 50% lower latency than LCE+LRU and LCE+FIFO. Figure 1c shows that the server load decreases as the forwarders' cache sizes increase. Overall, GNN-GM caching algorithm can achieve a 20% lower server load than the NMF-based caching algorithm. The proactive caching algorithm GNN-CPP has a heavier server load than GNN-GM and NMF-based caching algorithms. Besides, LCE+LRU and LCE+FIFO have the heaviest server load among the five caching algorithms. On average, GNN-GM caching algorithm can perform almost 60% lower server load than the LCE+LRU and LCE+FIFO.

Effect of node cache sizes in tree topology
The results indicate that our GNN-GM caching algorithm has an outstanding performance in a tree network topology. GNN-GM can catch user preferences and put movies that most users will likely watch near the user in advance. In addition, our GNN-GM caching strategy improves cache diversity by ensuring that different movies are cached on the path. Figure 2a shows the cache hit ratio of the five caching algorithms with various forwarders' caching abilities in GEANT. Similar to the Section 4.2.1, the cache size of a forwarder varies from 2 to 70 movies. GNN-GM performs best among the five caching methods in the GEANT network topology. The cache hit ratio of GNN-GM is, on average, about 25% higher than that of the NMF-based caching algorithm. When each forwarder can only cache two movies, the GNN-GM can achieve a 40% higher cache hit ratio than the NMF-based  caching algorithm. In addition, GNN-CPP performs worse than the GNN-GM and NMF-based caching algorithms because it only considers user content requests number in previous time steps. LCE+LRU and LCE+FIFO still have the worst performance because they are reactive caching strategies that do not capture users' future preferences. Figure 2b shows the average latency time of the five caching algorithms. At best, GNN-GM achieves around 8% and 25% lower latency than the NMF-based caching algorithm and GNN-CPP, respectively. GNN-GM consistently achieves the lowest latency regardless of the cache size. The other two traditional reactive caching algorithms have the worst performance, with a notable margin with the other three proactive caching strategies. In the best case, GNN-GM can achieve around 30% lower latency than LCE+LRU and LCE+FIFO. Figure 2c shows the server load of the five caching algorithms. The server load of all caching algorithms decreases as the forwarders' cache sizes increase. Overall, GNN-GM can achieve a 7% lower and a 22% lower server load than the NMF-based caching algorithm and GNN-CPP, respectively. Because LCE+LRU and LCE+FIFO have the lowest cache hit ratio, more number of Interest packets are forwarded to the server. It results in the server load of LCE+LRU and LCE+FIFO being much higher than the other three proactive caching strategies. On average, GNN-GM can perform 30% lower server load than the two traditional caching algorithms.

Effect of node cashe Sizes in GEANT
In short, our GNN-GM can catch user preferences, resulting in a higher cache hit ratio. Besides, the lower latency demonstrates that interest packets can be satisfied along the forwarding path before reaching the server. GNN-GM can also put more popular content nearer to the user in order to improve user experiences. Our GNN-GM can definitely decrease the traffic workload and provide better QoS. Table 2 shows the cache hit ratio, average latency and server load for GNN-GM, NMF-based, GNN-CPP, LCE+LRU and LCE+FIFO in GEANT when the user requests follow a Poisson distribution with a request rate of 50 requests per minute. All forwarders have a uniform cache size of 30. The table shows that GNN-GM achieves the best performance, with significant improvements compared to other caching algorithms. In particular, GNN-GM achieves a 27% cache hit rate, 6.3% latency, and 9.2% server load compared to the NMF-based caching algorithm. GNN-GM achieves  Figure 3 shows the cache hit ratio, average latency time and server load for five caching algorithms with a different number of nodes {10, 20, 30, 40, 50, 60}. In each network topology, all forwarders have a uniform cache size of 30. From Figure 3a, we can notice that the cache hit ratio decreases as the number of nodes increases. The reason is that the user community is far away from the content provider in a large network topology. In this case, an Interest packet has to be forwarded through more nodes to reach the content provider. It results in a much larger denominator than the numerator in Eq. (4). We can easily find that GNN-GM always performs best regardless of the number of nodes. In the best case, GNN-GM can achieve about 36% higher cache hit ratio than the NMF-based caching strategy. From the figure, we can see that GNN-CPP performs worse when the network size is large. GNN-CPP is sensitive to the model structure, i.e. the number of GNN layers used to train the model. Since all experiments utilize only 3 GNN layers, each node in the GNN can only know information about its 3-hop nodes at most. It is not sufficient in larger network topologies. On average, the cache hit rate of GNN-GM is 200% higher than that of the GNN-CPP algorithm. In addition, LCE+LRU and LCE+FIFO have similar performances and are the worst among the five caching strategies.    Figure 3b shows the average latency time for the five caching algorithms. We can see that GNN-GM has the lowest latency for all network sizes. It demonstrates that our GNN-GM caching algorithm can catch user preferences and put contents that most users will be interested in near the user. More number of Interest packets can be satisfied along the path, and thus users can receive the videos without much time waiting. At best, GNN-GM performs a 5.8% lower and a 17% lower latency than the NMF-based caching algorithm and GNN-CPP, respectively. The figure also shows that LCE+LRU and LCE+FIFO have almost exactly the same average latency time and perform significantly worse than the other three deep learning-based proactive caching strategies. Figure 3c shows the number of requests served in the server node for the five caching algorithms. If more number of Interest packets are satisfied along the path, then less number of them will be served by the server node. It makes sense that GNN-GM's server serves the lowest number of user requests. Following that is the NMF-based caching algorithm, GNN-CPP, LCE+LRU and LCE+FIFO. It is worth noting that GNN-GM alleviates a significant amount of server load than the other four caching strategies. The other two traditional caching algorithms, LCE+LRU and LCE+FIFO, almost overlap in the figure and have the highest server load.

Effect of network sizes for arbitrary topologies
We can conclude that our GNN-GM caching strategy has the best performance for a different number of nodes in arbitrary network topologies. GNN-GM can capture user preferences and increase the diversity of caches. In addition, it can cache popular videos near users. The GNN-GM caching strategy can significantly ease the traffic load and enhance user experiences.

Conclusion
In this paper, we propose GNN-GM, a GNN-based active cache placement policy. The cache placement strategy is implemented based on the ranking of cached movies in terms of gains, and movies with high cache gain replace movies with low cache gain. We can predict user ratings more accurately than NMF by using a GNN-based model to predict ratings. We can also cache popular movies and place movies near users by considering the total ratings of predicted movies as gain and applying our gain-based caching decision. We compared our caching strategy with two state-of-the-art, NMF-based caching algorithms, GNN-CPP, and two traditional reactive caching algorithms, LCE+LRU and LCE+FIFO. We deployed these five caching algorithms on Mini-NDN and evaluated them using real-world datasets. We evaluated the five caching algorithms' performances among a tree network topology with various cache sizes, a real-word network topology GEANT with various cache sizes, a GEANT with user requests following a Poisson Distribution, and random network topologies with a different number of nodes. The evaluation results show that our GNN-GM can consistently achieve the highest cache hit ratio, lowest latency and lowest server load. More notably, our proposed caching algorithm has a 25% higher cache hit ratio, 5% lower latency and 7% lower server load on average than the NMF-based caching algorithm in GEANT. In addition, our proposed caching algorithm performs much better than GNN-CPP, which utilizes only previous user movie requests for prediction. Note that GNN-CPP is an end-toend algorithm, while our algorithm is not. The experimental results show that GNN-GM can perform much better than the end-to-end algorithm GNN-CPP. In particular, the average latency and server load of GNN-GMM are about 13% and 25% lower than GNN-CPP, respectively, in GEANT. In addition, GNN-GMM provides more significant improvements than the other two traditional caching strategies, namely LCE+LRU and LCE+FIFO.
Deep reinforcement learning is a technique that is widely used in caching decisions. We plan to use GNN with reinforcement learning to optimize cache hit rate and content access latency in future work.