An adaptive replica configuration mechanism based on predictive file popularity and queue balance in mobile edge computing environment

In the current internet of things era, various devices can provide more services by connecting to the Internet. However, the explosive growth of connected devices will cause cloud core overload and significant network delays. To overcome these overload and delay problems, the mobile edge computing (MEC) network is proposed to provide most of the computing and storage near the radio access network to reduce the traffic of the core cloud network and provide lower latency for the terminal. Mobile edge computing can work with third parties to develop multiple services, such as mobile big data analysis and context-aware services. However, while using the service, it may also encounter a large amount of popular data being accessed in a short period of time. Without proper handling of replica generation and deployment, even in low-latency environments, it can still kill the benefits of MEC due to increased access time. Although many scholars have proposed related issues for copy replication, there are still.parts that can be improved. To avoid the situation of insufficient availability of replicas, replica replication is performed, but infinite replicas may lead to a significant increase in traffic and waste of resources. And when deploying replicas, it is necessary to avoid placing them on congested nodes and consider how to achieve better load balancing. To improve the above problems, we synthesize the advantages of previous algorithms, make up for the shortcomings, and propose an adaptive replica configuration mechanism to predict the popularity of files and replicate replicas to low-blocking nodes. This method spreads the subsequent access workload by copying the popular file in advance to improve the system's overall performance.


Introduction
In general, cloud platforms can provide multiple service models, such as infrastructure as a service (IaaS) (Bhardwaj et al. 2010), platform as a service (PaaS) (Lawton 2008), and software as a service (SaaS) (Buxmann et al. 2008). In the IaaS model, computing resources and structures are provided to companies, including servers, storage, network topologies, and virtual machines. Based on the IaaS model, users can scale on demand to provide more flexible and innovative services to balance the dynamic workloads. In the PaaS model, the vendors offer a development environment to application developers, including operating systems, databases, web servers, and the programming-language execution environment. The last kind of service, called SaaS, allows users to use the applications over the Internet on demand by its authority. However, in recent years, many IoT applications have increased the load of the cloud core network and caused a long delay, making it impossible for users to obtain better service quality.
To solve the mentioned problems, the concept of MEC (mobile edge computing) Younis et al. 2019;Hensh et al. 2021;Zhu et al. 2021;Siriwardhana et al. 2021;Xu et al. 2022) network has been proposed in recent years to provide information technology (IT) and cloud services by the radio access network (RAN) (Demestichas et al. 2013). MEC provides computing and storage services near IoT devices, reducing cloud load by filtering network data and reducing network traffic to the cloud. Besides, the advantages of the MEC network include providing ultra-low latency, large bandwidth, realtime computing, and flexible services by authorizing thirdparty applications closer to the mobile clients (Schmalstieg and Höllerer 2017;Sukhmani et al. 2019), such as location tracking, mobile big data analysis, video optimized transmission, and context awareness.
Traditionally, the access method in the cloud environment is to upload data of the local device to the server for further calculation. When the access workload of the server gets higher, the system must duplicate more replicas to disperse the works. In the meantime, the duplication process may lead to insufficient storage, and the availability and access performance of the file will decline. Also, maintaining hardware equipment usually requires higher costs (Shvachko et al. 2010;Yeh and Tu 2018). Therefore, proposing an effective replicating strategy is essential in the cloud environment. This kind of issue is relatively important in the MEC network. Basically, the MEC environment can provide localized content for users in the nearby service area, such as video streaming (Younis et al. 2019), AR/VR (Schmalstieg and Höllerer 2017;Sukhmani et al. 2019;Siriwardhana et al. 2021), and other services. While providing these kinds of services, the popular files may be requested frequently and repeatedly. Although the MEC network structure can reduce the service's response time by deploying the server near the end-users, it may also increase the time for accessing the data once there are no sufficient available data or congestion of service nodes. The mentioned problems will instead eliminate the original advantages of the MEC network.
In this research, an effective data replication strategy has been proposed. The primary purpose of the protocol is to improve the problem of insufficient availability, while the high popularity replicas are under content localization. Besides, the congestion problem while the terminal devices request node services frequently will be improved, too. Here, the proposed method's main idea is to predict each data's popularity. When the popularity of the data is in high demand, it will be duplicated for access in advance. This mechanism can make the system respond to the request while under environmental changes. Besides, generate an appropriate number of replicas and assign them to the service node in a low blocking state with the most queue space to provide services. The proposed protocol can enhance the availability, access efficiency, and load balancing under the MEC environment.
The rest of this article is organized as follows. Section 2 describes the related works of replicating systems. Section 3 shows the details of the proposed protocol, and Sect. 4 describes the experimental design. The experiments and analysis of our proposed protocol are illustrated in Sect. 5. Finally, the conclusion and future are presented in Sect. 6.

Related works
In this section, the concepts of the mobile edge computing network will be introduced first. After that, some famous file replication strategies will be discussed. The comparisons of the advantages and the disadvantages of these strategies will be described in this section, too.

Mobile edge computing network
The emergence of cloud computing has led computing technology into a new era. The main reason for it becoming more popular is because the operating modes can reduce the overhead costs of cloud providers and improve the system's scalability. However, cloud data centers are usually far away from the terminal equipment and users. It is less conducive to the applications which require low latency while accessing. So far, some interesting research topics have been proposed in the MEC network Taleb et al. 2017;Jararweh et al. 2016). For example, the internet of things (Yadav and Vishwakarma 2018), the Internet of Vehicles (Lim et al. 2017), and AR (Schmalstieg and Höllerer 2017;Sukhmani et al. 2019) are these kinds of applications. In other words, when the mentioned applications are applied under the cloud computing environment, it may lead to long transmission delays, Internet congestion, degradation of QoS, etc. To solve the above problems, the MEC network structure, which is closer to the mobile users, has been proposed (Reznik et al. 2017;Hensh et al. 2021;Taleb et al. 2017).
The concept of the MEC network is deploying the computing center near end-users for data processing. This method can prevent a large amount of raw data from being transmitted to the cloud data center. This deployment can help reduce the cloud data center load and the response time for end-users while requesting services. Besides, the computing center under the MEC network is near mobile users. Under such a structure, the system can provide context-aware services for local users by collecting RAN messages in the area more efficiently. Unfortunately, the popular local of the content may be requested simultaneously and frequently, causing the virtual machine nodes to become crowded and easily become the system's bottleneck. As a result, the efficiency of the entire system will be reduced. Hence, an efficient data replication strategy proposed to improve the mentioned problems is also an important issue under the MEC network. Subsequently, the related works of data replication strategies proposed in the past will be introduced in the following subsection.

Data replication strategies
In the past, scholars have proposed some data replication algorithms to improve the load balance or access efficiency for the system under different network architectures Shvachko et al. 2010;Yan et al. 2015;Wei et al. 2010;Chang et al. 2008;Rahmani et al. 2017;Wang and Hsuan-Fu 2010;Zhang et al. 2018). For example, to enhance the capability of cloud storage systems, Qingson et al. proposed an efficient dynamic replication management scheme called CDRM (Wei et al. 2010). The main idea of CDRM is to distribute the replicas to nodes with low blocking probability. Under such a mechanism, tasks can be processed more quickly, and service efficiency and load balancing can be improved. However, CDRM always selects the nodes with the lowest blocking probability as the service nodes, and this will cause the selected service nodes to be accessed all the time. Under such a circumstance, the workload of each service node will be unbalanced, and the overall access efficiency will decrease. Scholars have proposed a dynamic data replication algorithm (DDRA)  to improve the problem under the cloud computing network. The main idea of the DDRA algorithm is to provide more suitable service nodes for users depending on the nodes' blocking probability and the queue space within the reference nodes. This mechanism can prevent tasks from being distributed into the congesting node and achieve load balancing for the system. However, DDRA will decide whether to increase a new duplication based on the ratio of file popularity to the number of replicas, and the popularity threshold is the average of all previous file accesses. Under such a circumstance, when there is a considerable gap in access times between the files, the currently popular files will be judged as not popular because the original number of access times is smaller than the average threshold.
Consequently, no new replica will be added by the system. The workload of the nodes will keep increasing, and the access efficiency will decrease. On the other hand, files with many original access times will still cause unnecessary resource waste by adding new replicas because the current number of accesses still exceeds the threshold.
Wang proposed an adaptive file replication mechanism called PARM (Yan et al. 2015) in the cloud environment. Its main idea is to predict the popularity of the files for fast adjustment by applying the characteristics of atomic decay to the access times of files. However, since PRAM does not set a stop-loss point for replica generation, this will cause popular files to be duplicated constantly, and the system's workload will keep increasing.
To improve the problem of setting the popularity threshold, scholars have proposed an adaptive file replication strategy called ADRM . The primary method is duplicating the popular files by predicting the popularity of the archive. By applying the ADRM strategy, the number of replicas can be controlled with an appropriate ratio by setting the ratio of the number of replicas to avoid excessive resource consumption caused by the excessive addition of replicas. Unfortunately, the ADRM strategy does not consider replica configuration, resulting in replicas not being deployed on nodes to handle the access workload.
In this research, an adaptive replica configuration mechanism (ARCM) has been proposed to find the files that have high popularity in advance. This can help to duplicate the file to disperse the workload generated by subsequent popular files beforehand. After that, when a file has been analyzed as a popular file, and the ratio of the number of replicas is sufficient, the system will move the archive from the high-blocking node to the low-blocking node to avoid excessive replica generation. Finally, the service node is selected to achieve load balancing by allocating replicas to the low-blocking node with more space. The differences between the algorithms are shown in Table 1.

The proposed adaptive replica configuration mechanism
In this research, an adaptive replica configuration mechanism (ARCM) has been proposed to optimize the replica configuration in the MEC environment, and the system architecture is shown in Fig. 1. Here, the MEC servers are responsible for responding to the users' requests, and the users will request and access the data from the MEC servers. Furthermore, each MEC server will be in charge of allocating the file to the virtual machines (VM) and collecting the historical access records. The role of the VMs is to manage the file for users to access. When the MEC server collects all the system information (including the file's access frequency and the block node…) probability from VMs, the system will execute the ARCM algorithm to allocate the file. The detailed procedure of the ARCM algorithm will be introduced as follows.
In the ARCM mechanism, there are two strategies in work: data replication and service node selection. The goal of data replication is to calculate the popularity of the file to decide whether to replicate the file or not. This can help to improve the access efficiency for replica storage architecture. For the goal of service node selection, the system will select a proper node to place the replicas. This can help to increase the load balance of the overall system. The detail of the procedure will be described next.

Data replication strategy
To improve the access efficiency for the replica storage architecture, the first key strategy in the work of ARCM is to adjust the number of replicas adaptively. Here system must get the time interval since the last access for each file and then calculate the average time interval among all files and predict the number of file access to evaluate whether the file will become popular or not in the future. Finally, the corresponding update operations are given according to the popularity of the replica and the ratio of the number of replicas. The operations include: adding more replicas, moving replicas, or deleting replicas. The details are described below.

Calculate average time difference
To adjust the number of replicas adaptively, the first thing is to decide whether the file is popular or not. To do so, the system must calculate the time interval since the last access for each file and then calculate the average time interval among all files, and the related procedure can be shown as formula (1) (Yan et al. 2015). Here, Ave time represents the average time interval of all files, T i represents the i-th access time and n represents the total number of accesses. When the newest average time interval is less than the last time interval, it means that the file has a popular trend. In contrast, it means that the data are not popular.

Predict the number of file access
In the ARCM mechanism, the concept of atomic decay (Kaxiras 2003) will be applied to predict the number of file accesses. This prediction also aims to evaluate whether the file is popular or not. The procedure can be described as formula (2) (Yan et al. 2015), where Pre access is the number of files accesses predicted, and PA i represents the i-th stage visits. Then, the file access times predicted in this stage are compared with the predicted access times in the previous stage. The file has a popular trend if there are more predicted access times in this stage. Conversely, it means that the file is not popular.

Replica update strategy
When the average time interval results and the number of predicted file accesses are in a popular trend, the system will further apply the Ratio of the number of Replicas (RR) as the threshold for deciding whether to add more replicas or not. This is because maintaining the number of replicas based on RR value will help the system to get better access efficiency (Yu et al. 2012). Here, the system will apply the formula (3) to calculate the current requested ratio of the number of replicas Request nr and then compare with the RR value . In formula (3), nr i indicates the number of current requested replicas of the file and tnr i is the total number of replicas. When the current requested ratio of the number of replicas is smaller than the RR threshold, the system will add more replicas. In contrast, if the current requested ratio of the number of replicas is greater than or equal to the RR threshold, the number of replicas is sufficient. Under such a circumstance, the system will move the replicas from the node with the highest blocking probability to the node with the lowest blocking probability. Furthermore, when the result of the average time interval is in a popular trend, and the number of predicted file accesses is not in a popular trend, the system will also move the replicas from the node which has the highest blocking probability to the node with the lowest blocking probability.
For the last situation, when the results of the average time interval and the number of predicted file accesses are not a popular trend, the system will delete the replicas from the node with the highest blocking probability. Noticeably, while removing the replicas, the system must ensure that the number of replicas is kept at least three copies to maintain basic usability. The overall flowchart is shown in Fig. 2  .

Service node selection strategy
In the previous subsection, the concepts of deciding whether the file is popular or not and the conditions about adding, moving, or deleting files are introduced. Next, another critical strategy in the work of the proposed ARCM algorithm is filtering the proper service nodes to place the replicas that will be given. This strategy can help to ensure the quality of service and the load balancing of the overall system. Here, the strategy can be divided into two phases, and the details are shown as follows. An adaptive replica configuration mechanism based on predictive file popularity and queue balance in… 111

Anti-blocking phase
To avoid replicas being allocated to the congested nodes, users must spend more time accessing the file, and the new task must wait until the service node is available. This procedure will increase the overall latency of the MEC environment. To improve these dilemmas, the blocking probability of each node is obtained by calculating the node's arrival rate and the request delay time (Wei et al. 2010). Through this phase, the system can filter out the congested nodes, selecting the more suitable nodes for services.
To calculate the arrival rate, formula (4) has been provided in ARCM. Here, p j represents the popularity of the file being accessed, r j represents the number of replicas, and k is the actual arrival ratio in all requests. Through this formula, the system can calculate the arrival rate of each node k i (Hsieh and Chiang 2019) After getting the arrival rate, the system can bring the result into formula (5) to get the Block Probability (BP) . In formula (5), s i represents the delay time which refers to the time for the terminal device to read the replica, and c i represents the number of memory blocks divided by the node. To make the system more realistic, the system will apply the M/M/1 (Wei et al. 2010) rule to simulate the arrival rate for each task. When the memory blocks are full of tasks, new tasks must wait in the queue, which is called blocked.
After calculating the blocking probability of each node through formula (5), the system will set up the AvgBP value as the low blocking probability, and this value will be the threshold for deciding whether the node is congested or not. To sum up, the system can select the nodes under low blocking for services. This can help decrease the latency,

Reference queue balance phase
After selecting the node whose block probability is lower than the average value, the system will find the node with the most queue space as the preferentially serving node from the nodes with a lower average blocking probability. Here, the method in this phase will apply the concepts of reference queue (RQ) proposed by Chiang et al. (2019). By preferentially assigning replicas to the nodes with the most queue space, tasks will be evenly distributed and processed quickly. This will help to improve access efficiency and achieve better load balancing. Figure 3 shows the entire flow of the service node selection strategy.

Example
Basically, by applying the ARCM protocol, the system can get a more efficient file replication strategy to maintain the availability of replicas, and the replica access efficiency can be improved under the MEC environment. An example is given in this section to help understand the proposed ARCM protocol.
Once the terminal devices are connected to the MEC environment for requesting the file access services, the system will search for the related data according to the request and then set up the initial number of replicas based on the file type. At this time, the system must manage the record of the files accessed by nodes. Each record will also be compared to the data accessed in the past. Then, the system will dynamically adjust the number of replicas by analyzing the popularity of the file and calculating the replica ratios. Finally, the system will generate a new replica and move it to the appropriate service node. Here, the related assumptions and settings in the example are based on the DDRA algorithm (Hsieh and Chiang 2019), and the details are shown in Table 2.
When the terminal device is connected to the MEC environment to request access to the service, the system will analyze the file type and allocate replica numbers according to the file classifications proposed in Yan et al. (2015). If the files have a higher usage rate and require a longer time for storage, the system will allocate more replicas for the files. In contrast, for the files that have a lower usage rate and do not require too much time for storage, the system will allocate fewer replicas for them. The related allocations and file classifications are shown in Table 3. Besides, when the file size exceeds 64 MB, it will be stored in Block Level (Ghemawat et al. 2003). In contrast, it will be stored in File Level when the file size is less than 64 MB (Yu et al. 2012). By applying this mechanism, the system will access the files from different blocks to avoid the access delay caused by accessing the big file from a single node.
Next, when the terminal device accesses the files, the system will record the file ID, storage node, start  timestamp, end timestamp, and the file size of each file according to the system storage format under the MEC environment. The format is shown in Table 4. When a time interval has passed, the system will generate a log analysis based on the number of accesses and the time interval between each access. The relevant random examples are shown in Table 5 (Yan et al. 2015).
In the initial status, because the system has no historical data, the file will not be added, moved, or deleted at this time, and the record's status can be shown in Table 6. At the end of the second period of time, the system will compare the current access data (shown in Table 7) to the historical data. In predicting the number of accesses, it will apply the atomic decay method (Yan et al. 2015), and the result is shown in Table 8. Subsequently, the comparison of the average access time interval is shown in Table 9.
After comparison, the system can analyze whether the file is in a popular trend or not. For the number of files accessed, the ARCM algorithm applies the atomic decay method to make predictions. It then compares the predicted value to the previous access record to analyze whether the file has a popular trend in access times. The results are shown in Table 10. We can see that the file classification of B and C are in a popular trend. Subsequently, the historical average access time interval will be used as the threshold value and analyze whether the file is a popular trend or not based on the access time interval. The comparison result is shown in Table 11. Finally, to avoid adding too many replicas to the popular files, the system must maintain the number of replicas at a specific ratio  by setting the replica ratio to avoid excessive waste of resources under the ARCM algorithm. This can help to achieve better access efficiency. Based on the experimental results proposed by Chaing et al., the system will also set the RR threshold value at 30% . Subsequently, the system will calculate the current replica ratio of each file and compare it to the RR threshold value to analyze whether the number of currently requested file replicas is sufficient or not. The results are shown in Table 12. Finally, based on the results of Tables 10 and 11, the system will decide to add, move, or delete the replicas, and the related results are shown in Table 13.
So far, the system has finished the process of the data replication strategy. Next, the system will continue to filter the proper service nodes to place the replicas. Here, the system will calculate the blocking probability of each node through formula (5), where the arrival rate k is set to 0.2, which refers to the experimental results proposed by Wei et al. (2010). For the delay time s i , in order to help the system to compare the access performance more clearly, the setting environment will be consistent with the assumption of 1 s upon the request, read, and return of the files, respectively. Therefore, the delay time of the homogeneous node is set to 3 s. Finally, the related arrival rate and blocking probability of each node are shown in Tables 14 and 15, and the threshold value of low blocking probability AvgBP is 0.00026438.
After calculating the average AvgBP value, the system can get the set of low-blocking nodes whose BP values are smaller than the average from the system nodes.  Subsequently, in addition to preventing end-users from choosing the congested nodes for services, the system must consider whether tasks are evenly distributed. At this time, the system will also compare the queue space status of each node. Here, the node with low blocking probability and the largest queue space will be the best service node. The results are shown in Table 16.
Assume that File A is currently predicted to be popular and has reached the condition for adding new replicas. The system will find out the set of low-probability blocked nodes whose blocking probability is lower than AvgBP (Node 1, Node 2, Node 3, Node 6, Node 8) and have the most queue space (Node 1, Node 3, Node 8) to place the replicas for File A. When more than two nodes meet the     An adaptive replica configuration mechanism based on predictive file popularity and queue balance in… 115 above conditions simultaneously, the node with the smallest blocking probability will be selected as the serving node. In the overall example, a new replica for File A will be added to Node 8 for further services under the MEC environment.     In this research, the adaptive replica configuration mechanism called ARCM has been proposed to improve access efficiency load balancing under the MEC environment. The main idea is to calculate the popularity of the file to decide whether to replicate the file, and then the system will select a proper node to place the replicas. The related experiment and analyses will be given in the next section.

Experiments and analysis
In this section, the environment for experiments, results, and related analysis will be given to prove the performance of the ARCM mechanism.

Environment for experiments
In this research, the dynamic configuration algorithm experiments are simulated by Dev C?? under Cloudsim environment, and parameter configurations are based on the settings proposed in Hsieh and Chiang (2019). Furthermore, the CDRM, DDRA, PARM, ADRM, and Random algorithms are invoked to compare the performance for simulation. The experiments simulate the number of nodes from loosely environments (20 nodes) to densely environments (100 nodes) in the small MEC environment. Also, in the experiments, we also simulate assigning different workloads (low, medium, and high workload) for nodes. Besides, the number of nodes in the experiments is set to 20, 40, 60, and 100, respectively. In the low workload environment, 1000 tasks are assigned per cycle. In the medium workload environment, 5000 tasks are assigned per cycle, and 10,000 tasks are assigned per cycle in the high workload environment. To observe the task allocation status of each algorithm, the node capacity is set to be homogeneous, and each node has 5 task queue spaces. The remaining parameters based on the settings proposed in Hsieh and Chiang (2019) and Yan et al. (2015) are shown in Tables 17, 18, and 19. Finally, the proposed ARCM algorithm will compare to other algorithms through node utilization, Mean Average Deviation (MAD), the number of replicas, throughput, and completion time.

Performance analysis
In the subsection, we will analyze the performance of each algorithm based on the different environments, including loosely, ordinary, and densely environments. Furthermore, we will also observe and analyze each algorithm's performance and usage status under different workloads. The results are shown as follows.

Loosely environment {Node = 20;
Workload = 1000~10,000} Figures 4, 5, and 6 show the simulation results of low to high workload in the loosely environment when the number of nodes is set to 20. Here, the Random algorithm does not dynamically configure the number of replicas as the environment changes. Therefore, the node which owned the replica initially will have a higher node utilization rate. On the contrary, the node that does not been assigned with the replica will stay idle. For the CDRM algorithm, it also has a high utilization rate on specific nodes. The main reason is that nodes with better access efficiency are more likely to be assigned more tasks. Besides, for the PARM and ADRM algorithms, the simulation will distribute the replica randomly to understand the difference with other algorithms even when the PARM and ADRM algorithms lack a replica configuration scheme. Finally, for the DDRA and ARCM algorithms, both of them consider the blocking probability and the number of queued nodes while assigning the tasks. Hence these two algorithms are more balanced than CDRM in terms of node utilization. To help to understand the status of the loading of the system clearly, we calculate the MAD value through formula (6). In the formula, n represents the total number of nodes and u i represents the utilization rate of the nodes.  An adaptive replica configuration mechanism based on predictive file popularity and queue balance in… 117 When the MAD value gets higher, task distribution is uneven in the overall system. In contrast, the lower the MAD value, the better load balancing the system can achieve. From Figs. 7, 8, and 9, it can be seen that the ARCM algorithm has the best load balancing. This is because the ARCM algorithm considers each node's blocking probability and queuing space while replacing the replicas. This can help to allocate the workload more

Mean Average Deviation
Workloads -1k balancing. In other words, the ARCM algorithm can dynamically adjust the number of replicas by predicting the popularity of files. When the file is analyzed as popular, the system can quickly process the workload generated by the file, which has increased popularity by adding more replicas in advance. This is why the ARCM algorithm can allocate system nodes more balanced.
In Fig. 10, we can see that the non-dynamic Random algorithm will not adjust the replicas as the environment changes under the low workload environment. The number of replicas will remain at three. Besides, the default number of replicas in the CDRM algorithm is 1. Although the availability of replicas will be adjusted while executing the algorithm, the access efficiency will decrease due to the insufficient availability of replicas in the beginning. For the ARC and the ADRM algorithms, the average number of replicas used is lower than the results of the DDRA algorithm. This is because the replicas configuration settings of ARCM and ADRM are stricter than those of DDRA. Hence, the number of replicas is relatively stable.
Furthermore, the ADRM algorithm has better results regarding the number of replicas than the DDRA algorithm. However, the performance of the node utilization rate is poor to the DDRA algorithm due to the incorrect replica configuration method. Finally, the PARM algorithm does not set up a stop-loss point for generating replicas, resulting in unlimited additions of popular replicas.
When the workload increases to 5,000, we can observe that the ARCM and DDRA algorithms become comparable. This is because the DDRA algorithm only applies a single factor to determine popularity. When the number of file access exceeds the average value, this factor will be judged as a popular file. Under such a circumstance, the system will add new replicas for this popular file. In contrast, the replicas will be deleted. Hence, the DDRA algorithm will easily lead the number of replicas to keep changing in the system. The ARCM algorithm can more accurately predict environmental changes than DDRA. The related results can be seen in Fig. 11 Figure 12 shows that the number of replicas used by the ARCM algorithm is less than that of DDRA at the beginning when the workload increases to 10,000. As time goes by, the number of replicas used by these two algorithms is almost the same. This is because the number of users' requests and the workload are getting higher. Thus, the differences in the popularity of each file will become more obvious. Therefore, the results of the popularity of the two algorithms are more likely to have the same situation. Figures 13 and 14 show the simulation results of throughput and mean job time of node 20 from a low workload to a high workload environment. The results show that the ARCM algorithm has the best results in terms of throughput and completion time. As the MAD results are shown in Figs. 7, 8, and 9, it is known that the ARCM algorithm performs better in load balancing and can distribute the workload more evenly. Furthermore, the ARCM algorithm can help to improve the problem of delaying the overall working time by avoiding allocating the tasks on the specific nodes concentratedly. Finally, the experiment results show that the ARCM algorithm can improve throughput performance and the completion time for loosely MEC environments.

Mean Average Deviation
Workloads -10k An adaptive replica configuration mechanism based on predictive file popularity and queue balance in… 119 6.2.2 Ordinary environment {Node = 40, 60; Workload = 1000~10,000} In this subsection, the performance and analysis results under the ordinary MEC environment when the workload is increased from 1000 to 10,000 and the number of nodes is 40 or 60, respectively, will be given. The results can also be seen in Figs. 15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,and 36. According to the results of Figs. 15,16,17,18,19,20 and Figs. 26,27,28,29,30, and 31, we can observe that since the non-dynamic random algorithm only allocates a fixed number of replicas to nodes, this random algorithm also has the problem of uneven task allocation. The dynamic replica configuration CDRM algorithm provides faster services to end-users by assigning tasks to the nodes with the lowest blocking probability. Unfortunately, this cause the nodes with better computing capabilities will always receive more task requests than others, and the system cannot reach the goal of load balancing. For the PARM and the ADRM algorithms, both of them also cannot reach load balancing due to the lack of proper replica configuration methods. As for the ARCM algorithm, it can predict the future trend for the files by analyzing the historical data. When a file is predicted as a popular trend, the system will add more replicas to the nodes and can distribute the workload balancing. Hence, the ARCM algorithm can achieve a better load balance.
Besides, the CDRM algorithm initially sets one replica for each file, which may easily lead to insufficient file availability in the early stage. For the ARCM algorithm, it has the procedure to classify the task at the beginning. If the files are in short-term storage, the system will allocate three replicas; five replicas will be allocated if the files are in long-term storage. Hence, the ARCM algorithm requires fewer replicas in low workload environments than the DDRA algorithm. In the middle workload environment, the performances of the ARCM and the DDRA algorithms are almost the same. Finally, in the high workload environment, the number of replicas of the ARCM and the DDRA algorithm is almost the same after 60 min. The related An adaptive replica configuration mechanism based on predictive file popularity and queue balance in… 121 results can be seen in Figs. 21,22,and 23 and Figs. 32,33,and 34. According to the results shown in Figs. 24 and 25 and Figs. 35 and 36, when the MAD is getting lower, the system has better load balancing since the working capacity of the nodes is set to be the same. Therefore, it can be observed from the simulation results that compared to the DDRA and the CDRM algorithms. The ARCM An adaptive replica configuration mechanism based on predictive file popularity and queue balance in… 123 6.2.3 Densely environment{Node = 100; Workload = 1000~10,000} Figures 37,38,39,40,41,42,43,44,45,46,and 47 are the results of the algorithms under the densely MEC environment. In this circumstance, with the workload getting higher, the number of replicas for the CDRM algorithms tends to increase. Due to the number of nodes is more than other algorithms, more replicas must be allocated to serve the requests. The ARCM algorithm uses the atomic decline method to predict the popularity of the files and pre-configures popular replicas for the files to disperse the requested workloads. Hence, it can achieve high throughput and low completion time. The ARCM and the DDRA algorithms control the number of replicas by setting the ratio of RR value to avoid generating too many replicas to occupy system space. Thus, these two algorithms can achieve better results in better throughput and completion time.
Based on the results shown in Figs. 40, 41, and 42, we can observe that the non-dynamic Random has a higher MAD value than other algorithms. This means that it cannot improve the performance of the system configuration. Furthermore,Figs. 43,44,and 45 show the results of the status of the number of replicas among each algorithm. Here, the CDRM algorithm only allocates one number of replicas for the files at the beginning. This will cause the problem of the insufficient number of available replicas in the early stage of the system. Hence, this algorithm cannot effectively and timely distribute the replicas workload. For the DDRA algorithm, it improves the problem of setting the number of initial replicas and refers to the remaining queue space to achieve load balancing when replicas are configured. Hence, the DDRA algorithm has better performance than the CDRM algorithm. Besides, the PARM algorithm does not set the limitation of the number of replicas, and this will easily cause to add the replicas excessive. The ADRM algorithm improves the problem of DDRA in generating popular replicas. Hence, the ADRM algorithm uses less number of replicas than the DDRA algorithm. However, the ADRM algorithm lacks a proper replica configuration method. Hence, the performance of node utilization, throughput, and completion time is slightly worse than that of the DDRA algorithm. Finally, the proposed ARCM algorithm requires more replicas than the DDRA algorithm. However, it still performs better in node utilization, MAD, throughput, and completion time than other algorithms. This is because the ARCM  algorithm adds the concept of prediction and predicts the future status for each file by analyzing the trend of historical data. This can help to respond to the changes in the system. Hence, the proposed ARCM algorithm's overall performance can improve for the MEC environment.

Conclusions
Mobile edge computing is an emerging field that handles most computing and storage by deploying the MEC servers near the radio access networks. This network architecture can help reduce the cloud core load and provide users with low latency, high bandwidth, and more diverse application services. However, when a large amount of popular data are accessed in a short period, the system must generate many replicas, which will reduce access efficiency and cause additional traffic overhead. An adaptive replica configuration mechanism based on predictive file popularity and queue balance in… 125 In this research, an adaptive dynamic replication mechanism called the ARCM algorithm is proposed under the MEC environment to predict the popularity of the file and improve the configuration's performance on popular files. Basically, the proposed ARCM mechanism applies the concepts of atomic decay and the access time interval to predict which file will be popular and then refers to the replica ratio for replication. Finally, the system can add the new replica or move to the appropriate service nodes to effectively disperse the workload for the nodes to enhance the system's overall performance. In the first experimental environment, Figs. 4,5,6,7,8,and 9 show the ARCM

Mean Average Deviation
Workloads -10ka Fig. 42 Mean average deviation for 10,000 workloads, node = 100 algorithm performs better in load balancing and can distribute the workload more evenly under a loosely environment. This is because the ARCM algorithm considers each node's blocking probability and queuing space while replacing the replicas. Besides,in Figs. 10,11,12,13,and 14, the experiment results show that the ARCM algorithm can improve the throughput performance and completion time for the loosely MEC environments.
In the second experimental environment, Figs. 24, 25 and 35, 36 show that the ARCM algorithm can evenly distribute the workload to avoid allocating tasks on specific nodes. Therefore, ARCM has better throughput and completion time results in the ordinary environment. In the last environment, the densely environment, Figs. 37,38,39,40,41,42,43,44,45,46,and 47 show that the ARCM algorithm can predict the popularity of the files and pre- An adaptive replica configuration mechanism based on predictive file popularity and queue balance in… 127 configures popular replicas for the files to disperse the requested workloads by the atomic decline method. Although the proposed ARCM algorithm requires more replicas than the DDRA algorithm, it still performs better than other algorithms in node utilization, MAD, throughput, and completion time in densely environments. As a result, the experimental results show that the system throughput can be improved and has a better completion time under loosely, ordinary, and densely environments. Therefore, the ARCM algorithm can improve access efficiency and service quality and maintain better load balancing in the MEC environment. For future work, we will focus on improving the prediction model. Currently, predictions are made only by analyzing historical reading records. If multiple factors such as forward-looking trend analysis and big data analysis can be added, a more diverse and advanced forecasting system will be provided.
Author contribution (1) M-LC conceived of the presented idea and verified the analytical methods.
(2) H-CH conceived the presented idea, verified the analytical methods, and wrote the manuscript. (3) T-YC verified the analytical methods and wrote the manuscript. (4) W-LL carried out the experiment. (5) H-WC carried out the experiment. Hereby, I consciously assure that for the manuscript, the following is fulfilled: (1) This material is the author's original work, which has not been previously published elsewhere. (2) The paper is not currently being considered for publication elsewhere. (3) The paper reflects the author's own research and analysis truthfully and completely. (4) The paper properly credits the meaningful contributions of co-authors and co-researchers. (5) The results are appropriately placed in prior and existing research contexts. (6) All sources used are properly disclosed (correct citation). Literally copying of text must be indicated as such by using quotation marks and giving proper reference. (7) All authors have been personally and actively involved in substantial work leading to the paper and will take public responsibility for its content. I agree with the above statements and declare that this submission follows the policies of Solid State Ionics as outlined in the Guide for Authors and the Ethical Statement.
Funding We would like to thank the Editor-in-Chief and the referees for many valuable comments and suggestions which have resulted in several improvements to the presentation of the paper. This research was partially supported by the National Science Council, Taiwan, ROC, under Contract No.: MOST110-2221-E-018-014.
Data availability In the Author contribution, the name of (4) should be corrected as: (4) T-LL carried out the experiment and grammar correction.

Declarations
Conflict of interest Our paper entitled ''An Adaptive Replica Configuration Mechanism Based on Predictive File Popularity and Queue Balance in Mobile Edge Computing Environment'' is submitted to you. To the best of our knowledge, the named authors have no conflict of interest, financial or otherwise.
Informed consent There is no informed consent statement in the research.