Continuous-Time Stochastic Analysis of Rumor Spreading with Multiple Operations

In this paper, we analyze a new asynchronous rumor spreading protocol to deliver a rumor to all the nodes of a large-scale distributed network. This protocol relies on successive pull operations involving k different nodes, with k≥2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k\ge 2$$\end{document}, and called k-pull operations. Specifically during a k-pull operation, an uninformed node a contacts k-1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k-1$$\end{document} other nodes at random in the network, and if at least one of them knows the rumor, then node a learns it. We perform a detailed study in continuous-time of the total time Θk,n\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Theta _{k,n}$$\end{document} needed for all the n nodes to learn the rumor. These results extend those obtained in a previous paper which dealt with the discrete-time case. We obtain the mean value, the variance and the distribution of Θk,n\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Theta _{k,n}$$\end{document} together with their asymptotic behavior when the number of nodes n tends to infinity.


Introduction
Randomized rumor spreading or gossiping is an important communication mechanism that allows the dissemination of information in large-scale and open networks.

1
A large-scale and open network comprises a collection of sequential computing entities (e.g., processes, processors, nodes, agents, peers) that join and leave the system at any time, and communicate with one another by exchanging messages.Randomized rumor spreading was initially proposed by Deemers et al. [1] for the update of a database replicated at different sites, and has then been adopted in many applications due to its robustness and simplicity.In contrast to reliable communication broadcasts which must provide agreement on the broadcast value with possibly additional ordering guarantees on the delivery of updates from different sources, a randomized rumor spreading procedure provides reliability only with some probability.A randomized spreading rumor protocol describes the rules required for one or more pieces of information known by an arbitrary node in the network (we call such a node an informed node) to be spread to all the nodes of the network.The push and pull protocols are the basic operations used by the nodes to propagate an information over the entire network [1,2].With the push operation, an informed node contacts some randomly chosen node in the system, and gives it the rumor while with the pull operation, an uninformed node contacts some random node and asks for the rumor.Note that in both cases the contacted node may already know the rumor or not.The same node can perform both operations according to whether it knows or not the rumor, which corresponds to the push-pull protocol, or performs only one, either a pull or push operation, which corresponds to the pull or push protocols respectively.One of the important questions raised by these protocols is the spreading time, that is the time needed for the rumor to be known by all the nodes of the network.
To answer such a question, one first needs to specify how synchronized nodes are, or in other words whether we suppose that all the nodes of the system act in a synchronous way or not.In the former case, the system model is synchronous while in the latter case it is asynchronous.The most studied one is the synchronous model.This model assumes that all the nodes of the network act synchronously, which allows the algorithms designed in this model to divide time in synchronized rounds.During each synchronized round, each node a of the network selects at random one of its neighbors b and either sends the rumor to b if a knows it (push operation) or gets the rumor from b if b knows the rumor (pull operation).In this model, the spreading time of a rumor is defined as the number of synchronous rounds necessary for all the nodes to know the rumor.When the underlying graph is complete, it has been shown by Frieze [3] that the number of rounds divided by log 2 (n) converges in probability to 1 + ln(2) when the number n of nodes in the graph tends to infinity.Further results have been established (see for example [4,5] and the references therein), the most recent ones resulting from the observation that the rumor spreading time is closely related to the conductance of the graph of the network, see [6].Investigations have also been done in different topologies of the network as in [7][8][9][10], in the presence of link or nodes failures as in [11], in dynamic graphs as in [12], and in general graphs in terms of vertex expansion [13].Another alternative consists in letting the nodes make more than one call during the push or pull operations [14].The authors show that the push-pull protocol takes O (log n/ log log n) rounds in expectation if the number of neighbors of a node is chosen independently according to a power law distribution with exponent β ∈ (2, 3).
In large scale networks, that is in networks involving several thousands of nodes, assuming that all nodes act synchronously is a very strong assumption.Thus several authors, including [15][16][17][18][19], suppose an asynchronous model, that is a model in which nodes asynchronously trigger operations with randomly chosen nodes in the system, either to push, pull or push-pull information.The asynchronous gossip protocol is usually modeled by a continuous-time stochastic (Markovian) process [15][16][17][18][19].This type of stochastic processes belongs to the death process category, which has many applications in demography, queuing theory, performance engineering, epidemiology, biology and many other distributed applications.For instance, in [20], an analysis of the susceptible-infected model -corresponding to an asynchronous push-pull modelallows us in some cases to explicit the state probabilities by using the Laplace transform on the Kolmogorov forward equation.However, these techniques prove ineffective when the transition rate is non-linear (Laplace transform inversion becomes a tricky exercise).Most of the rumor spreading protocols studied in the asynchronous models rely either on the push-pull operations or on the push operations.Indeed, pushing the information allows us to initiate the rumor very quickly but then struggles to reach the few uninformed nodes.In contrast, the pull algorithm attracted very little attention because this operation was long considered inefficient to spread a rumor within a large scale network [21].It is actually very useful in systems fighting against message saturation (see for instance [22]).The ineffectiveness of the pull protocol stems from the fact that it takes some time before the rumor reaches a phase of exponential growth.
The objective of this paper is to further develop this line of inquiry by studying the k-pull protocol in the continuous-time case.This protocol counterbalances the slow initiation of pull-based rumour spreading protocols by increasing the chances of learning the rumor with each operation.A local clock following an exponential distribution with rate λ is associated with each uninformed node of the system.Each time the clock of an uniformed node rings, this node contacts k −1, with k ≥ 2 distinct nodes, chosen at random uniformly among the n − 1 other nodes.If at least one of these contacted nodes knows the rumor, the initiator of the k-pull operation learns the rumor and clears its clock.
Concretely, the k-pull operation is interesting in all situations in which you would like to benefit from multiple concomitant responses to build up your opinion.This is typically the case in fault tolerant distributed applications (including Byzantine fault tolerance, consensus, clock synchronization, leader election), as well as in large distributed applications (e.g., peer-to-peer communication, blockchain) where nodes require sufficiently many responses/votes to cope with the presence of faulty nodes.Other scenario that would take advantage of such an operation is cybersecurity.When investigating cybersecurity incidents, experts typically conduct their investigation in the form of a knowledge graph to explore and discover complex attack paths.For instance, the Defants company [23] has developed a model for representing raw system and network information in the form of a knowledge graph.This graph can contain up to millions of nodes about an incident that took place on a network of several hundreds machines and users.In such a graph, the nodes model the elements of the system: machines, user accounts, session information, files, services and so on.Two nodes are related if an action carried out on one of them has generated the second.These actions may include decryption, decompression, execution or packet transmission.By relying on the k-pull operation, the expert will be able to compare or audit concomitant responses, giving rise to fast and informed decisions.
The remainder of the paper is organized as follows.In Section 2, we present the asynchronous k-pull protocol and introduce the random variable Θ k,n which represents the total amount of time needed for all the nodes to know the rumor.We prove in Section 3 that the mean number of k-pull operations needed to inform all the n nodes of the system, assuming that a single node initially knows the rumor, that is E(Θ k,n ), is equivalent to k ln(n)/(k − 1)λ when the number of nodes n in the system tends to infinity.We also show that the limiting variance of Θ k,n is equal to (1 + 1/(k − 1) 2 )π 2 /6λ 2 when n tends infinity.The distribution of the rumor spreading time Θ k,n is analyzed in Section 4. We provide explicit limiting distributions of Θ k,n − E(Θ k,n ) and Θ k,n − k ln(n)/(k − 1)λ when n tends to infinity.All these results are illustrated using numerical values.Section 5 concludes the paper.

The model
As we will see in this section, the results obtained for the discrete-time model of the k-pull rumor spreading, which has been analyzed in [24], cannot be used to deal with the continuous-time model of the k-pull rumor spreading.Indeed, we need here a much more sophisticated analysis.
We consider a complete network of size n in which each node may be asked for a piece of information (pull event).The algorithm starts with a single node informed of the rumor.A local clock following an exponential distribution with rate λ is associated with each uniformed node of the system.Each time the clock of an uniformed node s rings, this node contacts k −1, with k ≥ 2, distinct nodes, chosen at random uniformly among the n−1 other nodes.If at least one of these contacted nodes knows the rumor, node s learns it and clears its clock (i.e., s remains contactable but does not contact other nodes).We suppose that the k-pull operation once triggered is instantaneous i.e., the time for a node to contact k − 1 other nodes and to receive their response is immediate Since the clock of an uninformed node rings after a time that is exponentially distributed with rate λ, we naturally introduce the continuous-time Markov chain Z = {Z t , t ≥ 0}, where Z t represents the number of informed nodes at continuoustime t ≥ 0. Specifically the transitions of Z occur at successive instants τ 0 = 0, τ 1 , . .., where the τ i − τ i−1 , i ≥ 1, are independent and exponentially distributed with rate (n − i)λp k,n (i), where Indeed, p k,n (i) is the probability, given that Z t = i, that the set of k − 1 chosen nodes (i.e., k − 1 among n − 1) at the next alarm clock contains at least one node among the i informed nodes.Hence the global clock of the process rings according to an exponential distribution whose rate is proportional to the amount of uniformed nodes.Note that a jump of process Z corresponds to a new informed node.
Observe also that the continuous-time model of the rumor spreading corresponds to a physical time, that is the total amount of time needed for all the n nodes to learn the rumor, while the discrete-time model stands for the total number of k-pull operations needed for all the n nodes to learn the rumor.
We denote by Θ k,n the random variable defined by which represents the continuous-time model i.e., the total amount of time needed for all the nodes to know the rumor.The spreading time Θ k,n can thus be expressed as a sum of independent and exponentially distributed random variables.Specifically, introducing the notation where The authors of [24] used two technical lemmas (Lemma 2 and 3 of [24]) to analyze the asymptotic behavior of the moments and the distribution of the rumor spreading time in the discrete-time case.These two lemmas allow them to provide simple lower and upper bounds for the probabilities p k,n (i).These bounds are then used to get other asymptotically equal bounds of the moments of the discrete-time rumor spreading time which the sum of geometric random variables with parameters p k,n (i).In the continuous-time case, the bounds of the p k,n (i) obtained in [24] do not lead to asymptotically equal bounds of the moments of the continuous-time rumor spreading time which is the sum of exponential random variables with rates (n − i)λp k,n (i).This is due to the multiplicative factor n − i which arises in the rates (n − i)λp k,n (i).We thus need much more refined bound than those obtained in [24].

Moments of the rumor spreading time
We analyze in this section the first two moments of the rumor spreading time by using appropriate lower and upper bounds.The following technical lemma is used to obtain their asymptotic behavior.In this lemma, γ is the Euler-Mascheroni constant given by γ ≈ 0.5772156649.Lemma 1.Let g be a Lipschitz function on interval [0, 1].
where ε(n) is the remainder of the Euler-Mascheroni representation of the harmonic sum in terms of the logarithm which satisfies lim n−→∞ ε(n) = 0.
Proof.Using the integral form of the remainder for the Taylor series of function g, we get function g ′ being the derivative of g.We then have since the last term is a Riemann sum.We then use the following well-known development of the harmonic sum As usual, for two sequences (u n ) and (v n ) we introduce the notation In particular, from Lemma 1 we get, for any function g

Expected rumor spreading time
Using (1) and ( 2), the expected rumor spreading time writes Using the fact that 0 ≤ h ≤ k in Relation (1), we easily get Introducing the notation where We now show that a refined analysis of the terms β n and γ n leads to a precise description of the asymptotic behavior of the expected rumor spreading time E(Θ k,n ).
For k ≥ 2, we introduce the function f k defined, for all x ∈ [0, 1] by Observe that function f k is clearly Lipschitz on [0, 1].It is frequently used in the remainder of the paper.Lemma 2. For all k ≥ 2, we have Proof.For all k ≥ 2 and y ∈ R, using the identity and by definition of function f k , we have we obtain where ε(n) is such that lim n−→∞ ε(n) = 0. Using now twice Lemma 1, once with g(x) = f k (x) and once again with We need to compute the quantity Coming back to the definition of f k , we introduce the polynomial . These two observations lead to dx, and the variable change x := 1 − x to deal with the second difference leads to dx.
Now the whole point in order to compute this integral is to factorize x in the first difference 1/q k−2 (x) − 1/q k−2 (0), and to factorize 1 − x in the second difference 1/q k−2 (x) − 1/q k−2 (1), so as to remove the apparent singularities and to recover computable quantities.
Concerning the first difference, we observe that The other difference requires slightly more attention.We claim that the following formula holds Admitting the result for the time being, we obtain .
Eventually, this provides where the last equality comes from the observation q k−2 (x) = xq k−3 (x) + 1.As a final result, we recover, since (xq k−3 (x) + 1) ′ = xq ′ k−3 (x) + q k−3 (x), the value There remains to prove formula (8).The formula can easily be proved using a recursion procedure.Alternatively, one may write, using the fact that 1 − x j = (1 − x) j−1 ℓ=0 x ℓ , the relations which completes the proof.
We consider now the term γ n .
Lemma 3.For all k ≥ 2, we have Proof.By definition of function f k , we obtain easily The function f k being Lipschitz on interval [0, 1], defining k is the derivative of f k , we get, using Relations ( 7) and ( 9), This bound tends to 0 when n tends to infinity.Using Lemma 2, we have lim which completes the proof.
The following theorem gives the asymptotic behavior of the expected rumor spreading time.It will be used in Corollary 12 to get an asymptotic behavior of the distribution of Θ k,n .Theorem 4. For all k ≥ 2, we have Proof.Relations ( 4) and ( 6) give Using Lemma 2 and Lemma 3, we easily get the desired result.
In particular, from Theorem 4 we get, We illustrate our results through Table 1, which gives the expected rumor spreading time E(Θ k,n ) when λ = 1, for different values of k and n and the approximation given by Theorem 4. For that purpose, we introduce the notation We observe that the asymptotic value F k,n is very close to E (Θ k,n ).Observe also, as expected, that E (Θ k,n ) increases with n and decreases with k.Moreover, for example when k = 5, we see that if the local clock of each uninformed node rings at an expected frequence of 1 per unit of time (i.e.λ = 1) then, when the number n of nodes is equal to 10 5 , the expected rumor spreading time is equal to 14.76602 units of time, which is quite small regarding the large number of nodes.

Variance of the rumor spreading time
The following lemma is needed to obtain the limiting value of the variance of Θ k,n when n tends to infinity.Lemma 5. Let g be a Lipschitz function on interval [0, 1].
Proof.Function g being Lipschitz on interval [0, 1], for all x, y ∈ [0, 1], we have and g ′ is the derivative of function g.We then have, by taking x = i/n and y = 0, This means that lim which completes the proof.
Using ( 1) and ( 2), the variance of the rumor spreading time writes As we did for the expected rumor spreading time and introducing the notation , where In the following two lemma, we obtain the limiting value of both ζ n and η n when n tends to infinity.Lemma 6.For all k ≥ 2, we have Proof.By definition of function f k , we have Observing that we have We denote respectively by ζ n,1 , ζ n,2 and ζ n,3 these three sums.Concerning ζ n,3 , we have Applying twice Relation (3), once with g(x) = f 2 k (x) and once again with Concerning ζ n,2 , since we have Finally, for term ζ n,1 , we have Applying again Lemma 5 with function g Putting these three limits together leads to which completes the proof.
We analyze now the limiting value of η n when n tends to infinity.Lemma 7.For all k ≥ 2, we have Proof.Using again the function f k , we obtain The function f k being Lipschitz on interval [0, 1], defining is the derivative of f 2 k , we get, using Relations ( 12) and ( 14), Using Relation (13), we have lim It follows that This result together with the result of Lemma 6 leads to which completes the proof.
Theorem 8.For all k ≥ 2, we have Proof.Relations ( 10) and (11) give The use of Lemma 6 and Lemma 7 leads to the desired result.
We illustrate this result through Table 2, which gives the variance of the rumor spreading time V(Θ k,n ) when λ = 1, for different values of k and n and its limiting value given by Theorem 8.For that purpose, we denote this limiting value by V k , that is We observe that V (Θ k,n ) is very close to its limiting value V k .Observe also that V (Θ k,n ) decreases with k.

Distribution of the rumor spreading time
This section provides explicit limiting distributions such as Θ k,n − E(Θ k,n ) when n tends to infinity.We introduce the notation µ k,n (i) = λ(n − i)p k,n (i).Recall that U k,n (i) is exponentially distributed with rate µ k,n (i) and that The main result of this section is Theorem 10 whose proof needs the following lemma.Lemma 9.For all k ≥ 2, we have and we obtain, using inequality (5), By definition of function f k we write Observing that function f k is increasing on interval [0, 1] and that f k (1) = 1, we obtain using Relation (13), with 2n + 1 instead n, The lim m−→∞ lim sup n−→∞ of both terms is 0 because i 1/i 2 is a converging series.This proves the first relation.
Concerning the second relation, introducing the notation we obtain in the same way, using inequality (5), As we did for term ∆ m,n (k), we have which in turn leads to the same result.
We are now able to prove the following theorem.Theorem 10.Let (Z i ) i≥1 be a sequence of i.i.d.random variables exponentially distributed with rate 1 and let W be defined by Observe that the random variables V k,ℓ and V k,ℓ are independent.The rest of the proof consists in checking the hypothesis of the principle of accompanying laws of Theorem 9.1.13 of [25].We introduce the notation Using the fact that E(R k,n (i)) = 0 and that the R k,n (i) are independent, we have and, in the same way, Using now the Markov inequality, we obtain, for all ε > 0, Putting together these results, we deduce that for all ε > 0, we have Let us introduce the notation Using ( 15) and ( 16) and the fact that the R k,n (i) are independent, we have The hypothesis of the principle of accompanying laws of Theorem 9.1.13 of [25] are properties (18) and (19).We can thus conclude that This means, from Relation (17), that where W (1) and W (2) are independent and identically distributed as W .The same reasoning applies in the case where n = 2ℓ.
Corollary 11.For all x ∈ R and k ≥ 2, we have Proof.L. Gordon has proved in [26] that which completes the proof.
We illustrate the result of Corollary 11 using simulations experiments.We introduce the notation We plot in Figure 1 the cumulative distribution function P {Θ k,n − E(Θ k,n ) ≤ x} for the values k = 2 and k = 10 when n = 10 4 nodes using a sample of 10 4 values resulting from the distribution of Θ k,n .We also plot in Figure 1 the corresponding limits F 2 (x) and F 10 (x).We observe that the limiting distribution F k (x) is very close to the simulation results which tells us that the convergence speed seems to be quite good.Indeed, the maximal absolute value of the difference observed between P Θ k,10 4 − E(Θ k,10 4 ) ≤ x and F k (x) is less than 0.006.

Conclusion
We considered the asynchronous k-pull protocol in continuous time and analyzed the total amount of time needed for all the nodes to know a rumor that a single node possesses initially.We obtained asymptotic values of both the expectation and the variance of this total amount of time.We also provided explicit limiting results concerning its distribution.All this work has been done by considering that the network of nodes is a complete network which means to each node may be asked for the rumor.Clearly, this analysis will not hold for more general networks and this is one of our objectives for future works.

Declarations
Funding : There has been no support for this work.
Conflicts of interest/Competing interests : there are neither Conflicts of interest nor Competing interests for this work.
Availability of data and material : Not applicable.