Modeling inter-DC bulk data transfer in optical networks with edge storage

In communication networks that span large geographical regions, transfer of bulk data is often costly and hard to manage. One prominent reason for that is bandwidth contention among long-lasting flows, and between long-lasting and short-lived ones on inter-datacenter links. Delaying the transfer of bulk data to off-peak hours by temporarily storing transfer at edge of the network can help reduce bandwidth contention during peak hours, and improve overall bandwidth utilization and network utility. Although extensive research has been conducted on its use, to date no model has been developed to quantify the performance of storage assisted bulk transfers across multiple congested links. This research gap has limited our understanding of the operation of storage assisted transfer under different network conditions. In this paper, we model bulk data transfer in inter-datacenter optical networks with edge storage, and derive closed-form equations for bulk data transfer on inter-DC links that exhibit diurnal traffic patterns and varying peak hours. Our study reveals that when the data transfer requests arrive uniformly in a day, the probability of a successful transfer increases linearly with the time data is allowed to wait in edge storage. Our study also shows that this probability decreases linearly with β + D, where β and D are the durations of background traffic peak and size of data transfers, respectively, when the network is moderately loaded. And, the benefit of storage will barely increase when the allowed waiting time goes beyond β + D.


Introduction
Alongside the enormous amounts of traffic generated by mobile and interactive devices at the edge of the network (Xia et al. 2020), a significant portion of Internet traffic is bulky and delay insensitive (Luo et al. 2020). Examples of such traffic include the data downloads 540 Page 2 of 18 from planetary-scale observatory instruments and new-generation high-throughput genomic sequencing equipments (Stephens et al. Jul. 2015). Such data are typically transferred to the core of network and propagated through the inter-datacenter optical network (Nicolaescu et al. 2021;Lu et al. 2015). During peak hours, these bulky data might have to compete for bandwidth with interactive applications and among themselves, resulting in congestion on inter-DC optical links and unpredictable completion times (Noormohammadpour and Raghavendra 2018; Wang et al. 2019).
One option for mitigating bandwidth contention is capacity expansion on congested links. However, increasing inter-DC link bandwidth in a wide area network is expensive (Greenberg et al. 2009) and capacity expansion technologies, such as dense wavelength division multiplexing (DWDM), on existing optical fiber is bounded by the nonlinear Shannon limit (Essiambre and Tkach 2012;Zhao et al. Sep. 2013). Real-time Internet backbone traffic statistics indicate that peak-time bandwidth requirements can exceed average values by a factor of two or three ("Center for Applied Internet Data Analysis." 2020), and designing a network that can handle such levels of peak demand will inevitably lead to increased capital expenditures and inefficient bandwidth utilization. As an alternative to bandwidth expansion, putting off the delivery of delay-insensitive bulk transfer requests to off-peak hours by storing them at edge of the networks is a cost-effective way to reduce bandwidth contention during peak-hours and improve overall network utilization (Nicolaescu et al. 2021), particularly when bandwidth demand is unevenly distributed in space and time (Li et al. Jun. 2018;Lu and Zhu Dec. 2017). However, introducing storage into the transfer process adds an additional dimension of complexity to the problem (Lin et al. Jul. 2020;Lin et al. Mar. 2016), making performance evaluation more difficult. While a considerable amount of interesting research work has been conducted experimentally, or algorithmically, our understanding of the performance gain that storage may bring to data transfer across large geographic areas lags far behind.
In this paper, we aim to model the performance of data transfer across a large geographic area that traverses multiple links with slightly different diurnal traffic patterns (Laoutaris and Rodriguez 2018;Wu et al. 2017). We propose a visually intuitive tool named Network State Progression Diagram (NSPD) to capture the randomness of request arrivals and bandwidth availability. With the tool, we derive closed-form equations for data transfer with edge storage (ES) in a three-DC linear network. The equations are then used to demonstrate how data transfer demand, network load, and traffic peak variability may influence network performance.
The remainder of this paper is organized as follows. In Sect. 2, we review the existing work on storage assisted data transfer. In Sect. 3, we introduce the system model along with the assumptions. In Sect. 4, we mathematically model the network and scheduling with NSPD. In Sect. 5, we present numerical results. We conclude the paper in Sect. 6.

Related work
A review of the existing research on storage assisted transfer reveals that such studies have been primarily focused on scheduling algorithms, aiming to maximize flow, minimize cost, or to achieve other objectives.
Researches on flow maximization include the study by Patel et al. (Patel et al. 2008), in which the allocation of bandwidth and buffer space was considered. Iosifidis et al. (Iosifidis et al. 2011;Iosifidis et al. Jun. 2017) developed the joint storage control-routing policy for max-flow problem defined by utilizing the minimum necessary storage capacity and proper storage placement. In (Tong et al. 2016), Tong et al. coped with randomly varying residual bandwidth and stabilized an inter-datacenter network without prior knowledge of the link residual bandwidth.
Researches on cost minimization include the online optimizer Postcard (Yuan et al. 2012), which was proposed to minimize operational costs on inter-datacenter traffic with store-and-forward (SnF). Nandagopal et al. (Nandagopal and Puttaswamy 2012) minimized the overall peak bandwidth to reduce billable bandwidth usage, and the purchasing capacity is incremented once the residual capacity is insufficient. Chhabra et al. (Chhabra et al. 2010) proposed a scheme in which varying capacities and costs were assigned to both the network and individual links to minimize cost on a layered storage network.
In (Wang et al. Aug. 2014), Wang et al. sought to reduce the temporal peak traffic load on links by applying an SnF transfer mode and a lexicographical minimization approach to balance the traffic load. S. Su et al. scheduled dynamic bandwidth resources for multiple bulk data transfers and employed max-min fairness to periodically optimize the network concurrent allocation rate (Su et al. Dec 2014). Wu et al. proposed an underlying bulk data transfer service featuring optimal routing for distinct chunks over time to maximize the overall weight of all jobs (Wu et al. 2017). Long Luo et al. took into account the revenue earned from deadline-met transfers and the penalty paid for deadline-missed ones (Luo et al. 2018). Aiqin Hou et al. investigated bandwidth scheduling for multiple reservations, with the objective of maximizing the ratio of successful scheduling (Hou et al. 2018).
In an effort to study real-world application of storage assisted transfer, Laoutaris et al. showed that without additional transit cost, 24 terabytes of data may be transferred across different time-zones (Laoutaris et al. Dec 2013). The authors previously introduced Net-Stitcher, a system for stitching together unutilized bandwidth via unconstrained assistive storage (Laoutaris et al. Aug 2011). In another work (Li et al. 2012), the authors introduced an ISP-friendly bulk data transfer strategy to significantly reduce inter-domain traffic. In (Shi et al. 2011), iDTT, an overlay architecture providing delay-tolerant data transfer service for P2P applications based on the underutilized network capacity, was introduced. In Although storage assisted transfer has been studied extensively in various contexts in the past literature, to the best of our knowledge, little analytic modeling efforts have been carried out.

System Model and Assumptions
The inter-DC network is simplified into a concatenation of N links, as in Fig. 1. Background traffic f(t) on the links exhibits diurnal patterns, with alternating busy and idle periods and a repeating period of T, as shown in Fig. 2(a). During busy periods, link bandwidth is heavily utilized by background traffic and bulk data is assumed to be off limits to avoid severe network congestion, and hence the state of the links can be binary, i.e., busy or idle. Transfer through a link is only possible during idle periods, and in the case of multiple links, only when there exists overlapping idle periods on all the links. The time of center and duration of busy periods, μ and β respectively, are assumed to be randomly distributed on different links, and are assumed to have a joint probability distribution ψ(μ;β). The distribution of a network can be obtained by predicting based on historical traffic data or studying the characterization of the traffic features (Alasmar et al. 2019;Krishnaswamy et al. 2020;Muelas et al. 2020;García-Dorado, et al. 2011). Requests arrive to the network with distribution p r (t) in T, and have a maximum allowed waiting time of W (i.e., deadline). Upon arrival, a request under consideration will see the network (i.e., the links) in some random state and its fate will be determined based on whether or not there exists an overlapping window wide enough to complete its transfer in the future. Benefiting from the sparsity of bulk requests (Jurkiewicz et al. 2021), we can handle only one such request in our model, by assuming that other requests "disappear" into the background traffic.

The mathematical model
The basic idea is that, when a request arrives into the network, it will see the network in some random state. By knowing the distribution of request arrivals and future network state evolutions, based on which we can decide whether a request can be transferred, we will be able to determine the probability of successful transfers. Below, we first introduce the Network State Progression Diagram (NSPD), then discuss how calculations can be made with NSPD, and present the modeling of a 2-link case as an example.

The network state progression diagram (NSPD)
Let D be the transfer time of a request. Given μ and β, we use g β,μ (t) to denote the window within which a successful transfer may start on a certain link. The relation of g β,μ (t) and f(t) is shown in Fig. 2a and b. Note that, since the "fate" of a request is solely determined by the network state evolution it sees upon arrival, what matters to us is the relative time difference between the arrival and current network state, instead of the absolute time of arrival. Thus, we can further normalize g β,μ (t) by ĝ β (t) whose values in one period start with 0 and end with 1 in Fig. 2(c), and every point on it corresponds to a possible evolution requests could see. For example, the same network state evolution over time will be seen when a request with the arrival time t arriving at g β,μ (t) (in Fig. 3a) and when it with the arrival time t + ∆t arriving at g β,μ+∆t (t) (in Fig. 3b), because both the relative time differences of the two cases are μ-t. Also, the point x of ĝ β (t) (in Fig. 3c) can represent that network state evolution (in Fig. 3d) as the same network state evolving from that time. Mathematically, for all requests arriving at an arbitrary time t, they will see the same network state evolution x if: When a request arrives to the network, the evolution is the combination of evolutions of all links starting from the time of arrival. We "attach" ĝ n (t) of all links to an N-dimensional space, such that any point in the space can represent a certain combination of evolutions of all links, as shown in Fig. 4. With (1), the probability that an arriving request sees evolution x can be obtained by  where x = (x 1 , x 2 , …, x N ), μ = (μ 1 , μ 2 , …, μ N ), and β = (β 1 , β 2 , …, β N ).
Considering the periodicity of ĝ n (t) , it is necessary to include only one period of ĝ n (t) into the space. Network state evolutions will "wrap" around the boundary of the standard ĝ n (t) . For example, Fig. 4 shows an exemplary evolution. With (x 1 , x 2 ), the evolution starts from a point where the transmittable window has not yet open on either links, followed by an open transmittable window on the vertical axis, and then by open transmittable windows on both axes. When it reaches T on the vertical axis, the window on that axis closes but that on x 1 remains open until T is also reached on the horizontal axis. After that, the process will repeat. It is worth noting that in the case of edge storage, evolutions will always progress along some diagonal path, because requests stored at the edge of the network will see evolutions on all links progressing simultaneously at the same pace.
For requests arriving at point x (i.e., requests see that particular evolution), a successful transfer can be made, if [D + β n , T] (for n = 1,2…N), of all links can be reached simultaneously within the waiting time limit W (Fig. 5). We define S W as the set of points under which a request can be successfully transferred within W. That is, x ∈ S W , if there exists a waiting time w ∈ [0,W] that satisfies where 0 ≤ x n ≤ T , i n = x n +w T , and ∏ N n=1 i n = 0 which means it also can reach [D + β n , T] in the following periods for some of the links.
With the discussions above, we have the probability of a successful transfer, denoted as P, the integral of p e over S W , that is:

General picture for the two links case
Even though NSPD can be used to model networks with potentially arbitrary number of links, for simplicity reasons, we limit our following discussions to the two links case. The center of busy-hour (μ) on the two links follows a quadrilateral pyramidal distribution (QPD) uniquely determined by the maximum offset interval d, as an alternative to the truncated normal distribution (Muelas et al. 2020) with the same interval d and maximum probability value, as shown in Fig. 6. Durations of busy periods are assumed to be β with probability 1 on both links. Requests are assumed to arrive uniformly in a day. We also assume that d is smaller than three hours, as the center of busy-hour appears typically only in a window of a few hours (Muelas et al. 2020).
Using (1), (2), and the assumption of QPD on busy-hour distribution, we can get p e (x). The process of obtaining p e (x) is not important to our discussions and thus is omitted here. The readers are referred to Appendix A for details. We have for the two links case: The areas within which p e (x) takes a non-zero value is shown in Fig. 7(a). Physically, these areas include all possible state evolutions on both links, under the constraint of the maximum offset interval d. The stripe along the center diagonal contains the evolutions where the transmittable windows on both links are fairly aligned, while the two triangles on the corners contain the cases where the transmittable windows are further apart, but still overlap. Note that p e (x) integrated over all shaded areas in Fig. 7(a) will result in a 1, i.e., probability of 1.
The calculation of S W can be separated into three different cases, when the network is loaded differently as measured by β + D. Under the three cases, different parts of the shaded areas in Fig. 7(a) are covered. This process can be intuitively understood by sweeping S 0 diagonally (in the 2-dimensional space) toward smaller values of (x 1 , x 2 ), as shown in Fig. 7(b-d). S 0 itself (with vertical line shades), together with the areas it sweeps through (with horizontal line shades), overlaps with the shaded areas in Fig. 7(a), and the resulting overlapping areas indeed form the collection of points where a successful transfer may take place. Like p e (x), calculation of S W is placed in Appendix B for brevity.
From (5) we know points along the same diagonal path have the same value of p e . Thus, an increased W that "sweeps" S 0 diagonally toward the upper left corner will at first result in a linear increase in the overall probability of a successful transfer. When the "sweeping" exceeds boundary in the upper left corner and wraps around, part of the swept-through area will overlap with S 0 , thus slowing down the increase in the overall probability. As shown in Fig. 7, when d is small, this slowed down increase happens only briefly. When the entire space has been swept through by a long enough W, increasing it further will not bring additional benefits, i.e., no more increase in the overall probability. It is thus clear that when W increases from zero, the probability of successful transfer will increase linearly first, and then quickly converge to some constant value. And this holds true under all three cases, i.e., when 0 < β + D < d (light load), d ≤ β + D < T-d (medium load), and T-d ≤ β + D < T (heavy load), respectively. Based on these observations, the probability of a successful transfer can be depicted roughly in Fig. 8, in which W L and W S are defined as the boundaries of the three sections. In [W L , W S ], P exhibits a non-linear relation with W. When d is small (no more than 3 h in our study), as relative to T of 24 h, this region is small, so discussions are omitted here and the corresponding equations are placed in Appendix C instead.
Thus the relation between P and W can be expressed in general as Intuitively, P 0 is the probability of a successful transfer without any storage. P S is the maximally achievable probability of a successful transfer when W is large enough. Based on the NSPD presented above, P 0 , P S , as well as k, can be calculated. Thus, we can get the equations of P for the three different cases in Fig. 7(b-d). And based on the NSPD, W L and W S of the respective cases can also be determined. It is clear that for all three cases, W L equals β + D, and W S will have to be determined separately in each case. We summarize the components under all three cases in Table 1.

Equations under Medium Load (d ≤ β + D < T-d)
As W increases from zero, S 0 "sweeps" toward the upper left corner, and the probability of successful transfer will increase linearly with W before it reaches β + D (W L in all three cases). As requests are assumed to arrive uniformly in T, even increase on S W will results in even increase on the probability of successful transfer, and such, the slope is 1/T. When W goes beyond β + D, i.e., when S 0 sweeps beyond the corner, the points with negative coordinates on both dimensions will be covered, but such points will not contribute to the probability of successful transfer, because they are equivalent to the points around the lower-right corner, which already have been covered by S 0 . This leads to smaller and slowly decreasing (thus non-linear) slope between W L and W S . From Fig. 7(b), it can be seen that when W is no less than β + D + d, all shaded area in Fig. 7(a) will have been swept through. This leads to successful transfer probability of 1, and increasing W further will not result in any further increase in P. Thus, W S equals β + D + d.

Equations under light load (0 < β + D < d)
From Fig. 7(c), it can be seen that when W < β + D, not only the areas along the diagonal are covered, but also the areas in the upper right and lower left corners. That leads to a slightly steeper slope k. At the same time, all shaded areas will already have been fully swept through, when W reaches 2β + 2D, leading to a W S of 2β + 2D. And for the same reason, P 0 is higher than that in the medium load case, with one additional component in the equation brought about by the partly covered areas in the upper right and lower left corners.

Equations under heavy load (T-d ≤ β + D < T)
From Fig. 7(d), not all the shaded areas with non-zero probability can be swept through no matter how large W is, and when W is larger than T, the swept area will not increase any more. This results in the maximum successful transfer probability of This decreases rapidly if the variability of peak time and load increase. The slope k is It is worth noting that in this study, bulk transfer requests are assumed to arrive evenly during a day, a key assumption that leads to the linear relation between the probability of successful transfer and the waiting time, as what has been discussed above. If requests arrive unevenly, the relation between the probability of successful transfer and the waiting Table 1 Components for (6) under three cases

100%
Heavy Load Page 11 of 18 540 time will no longer be linear. At the same time, the maximum peak time offset d is assumed to be small (3 h in our numerical study). This results in a relatively small [W L , W S ] interval, during which the probability of successful transfer has a non-linear increase with regard to W, and gradually saturates to its maximum value. This interval will increase, when d increases.

Numerical results
In this section, we first validate the rationality of QPD instead of truncated normal distribution (TND) in the model. Then the accuracy of the model is verified with simulations. We present the numerical results obtained with the equations above (under QPD), and compare them with simulated results applying the same conditions and two-link topology as in Sect. 4.2 except under TND to validate the rationality of QPD. In all situations, only marginal differences are observed between the analytical results under QPD and simulated results under TND. Further, the requests with the number n uniformly distributed in a day and the duration following negative exponential distribution having an average duration of D are considered in the simulations (nTND). Here, total network load cannot exceed network limit, i.e., nD + β < T. The number of flows above 500 MB of data is around 10 -6 orders of magnitude in WAN, and decreases exponentially as the request size doubles (Jurkiewicz et al. 2021;"Flow-models." 2021;Bauer et al. 2021). When there are 100 million flows in a day, the number of bulk flow is around 100. So we set n = 100. We average the results of 10 simulations, each with 15,000 periods (days) and a total of 1.5 × 10 6 bulky flows. Figure 9 shows that longer waiting time results in linear increase in successful transfer probability. Under light or medium load, successful transfer can be guaranteed, regardless of the value of d. And to achieve high transfer probability, storage time must match the sum of request duration and network load. The gap between QPD and nTND appears only at W = 0 h. When the load is high, the gap increases as waiting time increasing, and could be around 15% when W = 24 h, d = 3 h, β = 23 h, and D = 0.01 h. Both the maximal gain from storage and gap are highly dependent on d. Figure 10 investigates how maximum offset interval d affects the successful transfer probability. Little probability change is observed even under medium load in Fig. 10(a) with β=12h. Intuitively this is because with QPD (and similarly TND) peak hours occur most frequently around the center and decay quickly further apart from it. The small average offset has very little effect when there are enough network resources left. Under heavy load, relatively large decrease on the successful transfer probability can be observed. And the longer waiting time is, the bigger decrease will be (Fig. 10b). The gap between QPD and nTND will be larger when d increases and a long waiting time allowed. Figure 11 shows the impact of load on network performance, when d = 1 h and d = 3 h. Without storage (W = 0 h), as the load increases, the network performance degrades almost linearly. And the difference between QPD and nTND also decreases. With storage, successful transfer can be guaranteed under low and medium load. And with longer allowed waiting time, the network will be able to serve at a higher load without any loss. When the allowed waiting time is half a day (12 h), the network will be able to serve without any loss under medium load, and will still be able to serve around half of the requests when the network is operating at very high load. Under heavy load (β + D > T-d), the transfer probability deteriorates rapidly as the load increases, and this explains why there is large difference between the model with one request and nTND with 100 requests, when β = 23 h. The model adapts well to all but extreme loads. For example, the gap is less than 1.5% even when β = 22.8 h in Fig. 11(b). The gap between model and nTND will be small when transfer probability decreases relatively slowly with increasing the overall network load. Figure 12 shows 3-dimentional view of the successful transfer probability against network load β + D and storage time limit W, when d = 3 h.
These simulation results suggest our model provides a mathematically tractable yet practically accurate approximation of the system except extreme load situations. The process of deriving S W is as follows.
With a maximum waiting time W no greater than T, it is sufficient to consider the cases where i 1 and i 2 take values no greater than 1, in (3).

Appendix C
The successful transfer probability when W L ≤ W ≤ W S . a. Medium load (d ≤ β + D < T-d) b. Light load (0 < β + D < d) c. Heavy load (T-d ≤ β + D)