Deep learning of stochastic contagion dynamics on complex networks

Forecasting the evolution of contagion dynamics is still an open problem to which mechanistic models only offer a partial answer. To remain mathematically and/or computationally tractable, these models must rely on simplifying assumptions, thereby limiting the quantitative accuracy of their predictions and the complexity of the dynamics they can model. Here, we propose a complementary approach based on deep learning where the effective local mechanisms governing a dynamic are learned automatically from time series data. Our graph neural network architecture makes very few assumptions about the dynamics, and we demonstrate its accuracy using stochastic contagion dynamics of increasing complexity on static and temporal networks. By allowing simulations on arbitrary network structures, our approach makes it possible to explore the properties of the learned dynamics beyond the training data. Our results demonstrate how deep learning offers a new and complementary perspective to build effective models of contagion dynamics on networks.

Forecasting the evolution of contagion dynamics is still an open problem to which mechanistic models only offer a partial answer. To remain mathematically and/or computationally tractable, these models must rely on simplifying assumptions, thereby limiting the quantitative accuracy of their predictions and the complexity of the dynamics they can model. Here, we propose a complementary approach based on deep learning where the effective local mechanisms governing a dynamic are learned automatically from time series data. Our graph neural network architecture makes very few assumptions about the dynamics, and we demonstrate its accuracy using stochastic contagion dynamics of increasing complexity on static and temporal networks. By allowing simulations on arbitrary network structures, our approach makes it possible to explore the properties of the learned dynamics beyond the training data. Our results demonstrate how deep learning offers a new and complementary perspective to build effective models of contagion dynamics on networks.
Our capacity to prevent or contain outbreaks of infectious diseases is directly linked to our ability to accurately model contagion dynamics. Since the seminal work of Kermack and McKendrick almost a century ago [1], a variety of models incorporating ever more sophisticated contagion mechanisms have been proposed, studied and used [2][3][4][5]. These mechanistic models have provided invaluable insights about how infectious diseases spread, and have thereby contributed to the design of better public health policies. However, several challenges remain unresolved, which call for contributions from new modeling approaches [6][7][8].
For instance, many complex contagion processes involve the nontrivial interaction of several pathogens [9][10][11][12], and some social contagion phenomena, like the spread of misinformation, require to go beyond pairwise interactions between individuals [13][14][15]. Also, while qualitatively informative, the forecasts of most mechanistic models lack quantitative accuracy. Indeed, most models are constructed from a handful of mechanisms which can hardly reproduce the intricacies of real complex contagion dynamics. One approach to these challenges is to complexify the models by adding more detailed and refined mechanisms. However, mechanistic models become rapidly intractable as new mechanisms are added. Moreover, more complex models require the specification of a larger number of parameters whose values can be difficult to infer from limited data.
On another front, the range of applications of deep learning has grown remarkably fast in recent years [16], from computer vision [17] to natural language processing [18] and time series forecasting [19]. More specifically, deep learning is now being applied to problems related to Network Science using tools like Graph Neural Networks (GNN) which have been designed to handle arbitrarily structured data [20][21][22]. Recent work showed great promise for applications in community detection and link prediction [23,24], in the prediction of dynamical observables [25], in network reconstruction [26] and the detection of structural perturbations [27], as well as in the context of discovering new materials and drugs [28,29]. In general, GNNs have been shown to adequately handle in-tricate tasks, making them prime candidates to tackle several challenges of contagion dynamics modeling.
Here, we demonstrate how deep learning can be used to build effective models of stochastic contagion dynamics taking place on complex networks. Instead of constructing a model by specifying the mechanisms driving the dynamics, we consider learning these mechanisms directly from data. We start by posing the machine learning problem explicitly and propose a deep GNN architecture with a reliable protocol to train it. We demonstrate the validity of our approach using various dynamics on networks with increasing complexity on static and temporal networks. Finally, we show how our approach can provide predictions for previously unseen network structures, therefore allowing the exploration of the properties of the learned dynamics beyond the training data.
Our approach assumes that a hidden stochastic contagion dynamics takes place on a network G = (V, E), where V is the set of nodes and E is the set of links. As this hidden dynamical process evolves over time, it generates a multivariate time series X = X(t) : t ∈ [0, T ] , where X(t) is the global state of the system at time t. This global state X(t) = x i (t) : i ∈ V consists of the state of every node i at this time, noted x i (t). We also denote the finite set of the possible node states by Ω, such that x i (t) ∈ Ω. We assume that this dynamical process can be entirely defined by its local transition probabilities (LTPs), denoted p µ→ν σ . These correspond to the probability that a node of degree k in state µ at time t transitions to state ν at time t + ∆t given the states of its first neighbors σ, where µ, ν ∈ Ω and σ ∈ Ω k . By doing so, we assume that the hidden process is stationary, Markovian and discrete both in time and in terms of its available global states. Note that generalizing our approach to non-Markovian and/or continuous-state dynamics is straightforward; we choose to limit the presentation of our approach to these more restricting assumptions for the sake of clarity and conciseness.
We consider a trainable GNN model composed of a set of free and tunable parameters Θ that takes as inputs the current global state of the system, X(t), and the structure of the network at time t (e.g. its adjacency list), and outputs the predicted LTPs, denotedp µ→ν σ (Θ). Our objective is then to learn a set of parameters Θ * such that for all transitions µ, ν ∈ Ω and all possible neighborhoods σ. Note that there exist other techniques that can also perform this task. For instance, Bayesian approaches have been used for the parameters inference of simple and complex contagion under the assumption of a predefined model [30,31]. In general, these approaches work well on synthetic data generated by the predefined model, in some cases even without the need of likelihood computations [31]. However, these simple models are only coarse representations of real systems. A more general approach is then to infer the complete Markov chains directly [32], where each LTP is considered as a parameter of the model. We adopt a similar strategy in that we use GNNs to parametrize these Markov chains with an underlying network structure. With their easily adaptable architecture, GNNs have the advantage of combining their parameters hierarchically to compute the LTPs as well as being universal estimators [33,34], meaning that they can in principle estimate any set of LTPs. Furthermore, their parameters estimation is scalable and efficiently performed with stochastic gradient descent [35,36]. In practice, the objective described by Eq. (1) must be encoded into a loss function, denoted L(Θ). An appropriate choice is where K is the set of degree classes, Z is the total number of possible input configurations (µ, σ), and the term between brackets corresponds to the cross entropy between the LTPs of the underlying process (the "ground truth") and the LTPs predicted by the GNN,p µ→ν σ . Equation (2) quantifies the information difference [37] between the original LTPs and the ones predicted by the GNN, averaged over a uniform distribution of all possible inputs (µ, σ).
Two obstacles prevent the exact evaluation of Eq. (2) in realistic settings. First, the LTPs p µ→ν σ that we want the model to predict -the "ground truth"-are a priori unknown. Second, finite datasets only explore a finite fraction of the configuration space. We propose to resolve both of these issues by approximating Eq. (2) from the dataset X directly. To do so, we consider the following approximate loss, where ω i (t) is a weight assigned to node i at time t, x Ni (t) is the state of the neighbors of node i at time t, and where V ⊆ V and T ⊆ [0, T ] are subsets of the training dataset. Note that the complements of these subsets can also be used to validate the model during training (see Supplementary Material).
Two approximations are at play in Eq. (3). First, the first three sums of Eq. (2) are approximated using a sampling Lines and markers were obtained by averaging the LTPs over every σ corresponding to a same value of . The standard deviation around these average values is shown using a colored area around the lines (typically narrower than the width of the lines) and using vertical bars for the markers. The training datasets were generated using a BA network composed of |V| = 1000 nodes and with average degree k = 4, and we used the parameters (τ, γ) = (0.04, 0.08) and (η, γ) = (8, 0.06) for the simple and complex contagion dynamics, respectively. All technical details related to training are provided in Supplementary Material. scheme where the weights ω i (t) rebalance the importance to "overrepresented" inputs, i.e. inputs (µ, σ) whose frequency of occurrence in X, noted ρ(µ, σ), is high. Choosing x Ni(t) ) −λ effectively evaluates Eq. (3) with a relaxed version of importance sampling [39], which reduces the importance of frequent inputs and increases that of the rare ones. This procedure is identical to standard importance sampling when λ = 1. Second, the average over ν is replaced by a Monte Carlo approximation since the "ground truth" LTPs, p µ→ν σ , are a priori unknown. Note that, for any specific input (µ, σ), this approximation converges to its expected value only if the corresponding number of transitions found in the subsets of the training dataset is sufficiently large. This second approximation will therefore be necessarily poor for rare inputs, and we set λ < 1 to limit their influence on the quality of the training. To isolate the effect of this second approximation, we consider a semi-exact training scheme where only the first three sums of Eq. (2) are approximated using the importance sampling scheme, and the sum on ν is computed using the "ground truth" LTPs-identically to Eq. (2). The models trained under the semi-exact scheme, denoted GNN * , represent a best case scenario of the models trained using Eq. (3), and will therefore allow us to assess the impact of the dataset quality on the accuracy of the predictions.
We propose the GNN architecture detailed in Supplementary Material. In a nutshell, the GNN is a nonlinear function f x i (t), x Ni (t) that receives the state of a node i and the The dashed lines correspond to the GNN prediction error, i.e. the average Jensen-Shannon distance (JSD) between the LTPs of the dynamics used to the generate the training datasets (labeled as GT for "ground truth") and the predictions of the GNNs trained with these datasets. Similarly, the dotted lines correspond to the JSD error of GNN trained on the same datasets but using a semi-exact training scheme (denoted by GNN * ). For comparison, the solid lines correspond to the JSD error of an uninformed baseline in which all local transitions are equiprobable, i.e. p µ→ν σ = |Ω| −1 where |Ω| is the number of node states. The prediction error of the MLEs is shown with markers to clearly illustrate where data is lacking in the training datasets. Lines and markers were obtained by averaging the JSD errors over every σ corresponding to a same value of k. The standard deviation around these average values is shown with a colored area around the lines and with vertical bars for the markers. The training datasets are generated using a ER or BA network composed of |V| = 1000 nodes and with average degree k = 4. We used the same parameters as on Fig. 1 for the simple and complex contagion dynamics, and used (τ1, τ2, γ1, γ2, ζ) = (0.02, 0.01, 0.12, 0.13, 10) for the interacting contagion dynamics. All technical details related to training are provided in Supplementary Material. states of its neighbors N i , and then returns the LTPs providing the probabilities for nodes i to be in each of the available states at t + ∆t, This function is identical for all nodes and, in practice, the GNN is applied in parallel to every node. To allow the GNN to be applicable to any neighborhood size, the states of the neighbors are aggregated with an attention mechanism inspired by [40]. A notable advantage of this GNN architecture is its inductive nature: It learns how to combine the states of the neighbors locally, i.e. through the edges of the network. It is therefore independent from the global network structure, and can consequently be used on any network structure once it has been trained. We illustrate our approach by applying it to three types of stochastic contagion dynamics whose behaviors heavily depend on the structure of the underlying network [3,12,41]. We first consider a simple contagion dynamics: The discrete-time susceptible-infected-susceptible (SIS) dynamics in which nodes are either susceptible (S) or infected (I). At each time step, infected nodes transmit the disease to each of their susceptible neighbors with probability τ , and recover from the disease with probability γ, thus becoming susceptible again. The LTPs of this dynamics are Note that the state of the neighbors of a node, σ, is fully specified by the number of infected neighbors, . We stress that the GNN is not a priori designed to compute the LTPs as a function of , but rather computes them as a function of the complete state of the neighbors, σ. The GNN must therefore learn that the number of infected neighbor is sufficient to compute the LTPs. Equation (5a) assumes that disease transmission events are independent. In other words, the probability for a susceptible node to be infected by any of its infectious neighbors does not depend on the state of its other neighbors. The second dynamical process we consider lifts this assumption by replacing Eq. (5a) with the "Planck-like" nonmonotonic infection probability where z(η) is fixed such that p S→I equals 1 at its maximum, the position of this maximum being controlled by the parameter η > 0. We refer to this second dynamics as complex [43] since transmission is now a nonlinear process that depends on the state of the other neighbors of a node. While unrealistic in the context of disease spreading, Eq. (6) has an interesting interpretation in the context of the propagation of a social behavior. The probability that an individual adopts a new behavior increases monotonously with its number of friends that have adopted the behavior if it appears new or scarce (low ). However, the same individual will become reluctant to adopt this behavior if it has become mainstream and popular (high ), resulting in monotonously decreasing probability of adoption.
The third contagion dynamics we consider is an extension of the SIS dynamics to two interacting infectious diseases in which nodes can be in four different states: Susceptible to both diseases (S 1 S 2 ), infected by one disease and susceptible to the other (I 1 S 2 or S 1 I 2 ), or infected by both diseases (I 1 I 2 ). Like the SIS dynamics, nodes infected by disease g will transmit it to a susceptible neighbor with probability τ g and will recover from g with probability γ g . The interaction between the two diseases affects the probability of transmission whenever either node, the infected or the susceptible one, is already infected by the other disease. A node infected by disease g will then infect its neighbor susceptible to g with probability ζτ g , where ζ controls the strength of the interaction between the two diseases. This interacting contagion dynamics is encoded in 12 LTPs similar to Eqs. (5) whose expressions are given in Supplementary Material. In Fig. 1, we show the LTPs predicted by the GNN and compare them with Eqs. (5)-(6) as well as with the maximum likelihood estimators (MLEs) of p µ→ν σ . The MLEs are computed from the time series and correspond to the frequency at which nodes in state µ with neighborhood σ transition to ν at the next time step. We find that the GNN fits remarkably well the LTPs of the simple and complex contagion dynamics. In fact, we found the predictions of the GNN to be systematically smoother than the ones provided by the MLEs. This is because the MLEs estimate the LTPs for each individual pair (µ, σ) from disjoint subsets of the training dataset. This implies that a large number of samples of each pair (µ, σ) is needed for the MLEs to provide accurate estimates of every LTPs; a condition rarely met in realistic settings, especially for high degree nodes. This also means that the MLE cannot interpolate in between nor can it extrapolate beyond the pairs (µ, σ) present in the training dataset. In contrast, the GNN works in a conceptually different manner: It is differentiable by construction and all parameters are hierarchically involved in the estimation of the LTPs. Its predictions are therefore smoother, more consistent and able to inter-or extrapolate beyond the training dataset. It also means that the GNN benefits from any sample to improve its predictions for all LTPs. Figure 2 addresses the accuracy of the predictions of the GNN for unseen local network structures, i.e. predictions of LTPs corresponding to pairs (µ, σ) that are not necessarily present from the training dataset. These predictions were extracted from the GNN by applying it to a star graph of k + 1 nodes-k nodes of degree 1 connected to a central node of degree k. Because σ is typically multivariate, and thus cannot be summarized by a single scalar as for the simple and complex contagion dynamics presented in Fig. 1-it is three-dimensional for the interacting contagion dynamics-, we use the Jensen-Shannon distance (JSD) [37] to quantify the similarity between the LTPs predicted by the GNN and the "ground truth". Figure 2 shows averages of the JSD, with standard deviations, over all possible neighborhoods σ and states µ for various values of k. Note that the JSD is similar to the loss function of Eq. (2), but it has the notable advantages of being symmetrical, of being more forgiving if a LTP is wrongly predicted to be equal to zero (something that happens with MLEs), and of allowing meaningful comparisons between more than two distributions at the same time (by virtue of being a metric). Figure 2 confirms that the GNN provides more accurate predictions than MLEs in general. This is especially true in the case of the interacting contagion on BA networks, where MLEs are only marginally different from the uninformed baseline for high degree nodes. This is a consequence of how scarce the inputs are for this dynamics compared to both the simple and complex contagion dynamics for training datasets of the same size, and of how fast the size of the set of possible inputs scales, thereby quickly rendering MLEs completely ineffective for small training datasets. Figure 2(F) therefore provides a telling illustration of the challenge of inferring the parameters of slightly complex contagion models from limited data. The GNN, on the other hand, is less affected by the scarcity of the data, since any sample improves the predictions for all LTPs, as discussed above. In fact, the GNN accuracy is often close to the best case scenario, illustrated by the GNN * , trained under similar conditions but with a semi-exact training scheme. Figure 2 also suggests that the underlying network structure of the training dataset (mainly its degree distribution) plays a crucial role in the accuracy of the GNN's predictions for unseen local structures. Indeed, our results show that the GNN provides more accurate predictions when interpolating in between observed values of k than when extrapolating for values of k higher than those present in the training dataset. In other words, it is easier for the GNN to interpolate in-between "landmarks" provided by the dataset than to extrapolate beyond the training dataset without any bearings whatsoever. Heterogeneous networks, like Barabási-Albert networks, are therefore expected to yield accurate predictions over a wider range of local structures (µ, σ) than homogeneous networks, even if the training datasets are similar in size. This is also supported by the fact that the best case scenario in terms of GNN accuracy (GNN * ) is significantly poorer for homogeneous networks [Figs. 2(A-C)] than for heterogeneous networks [Figs. 2(D-F)]. The amount of valuable information contained in a given dataset could therefore be interpreted through the lens of the properties of the training dataset's underlying network structure. Altogether, these findings suggest a wide applicability of our approach for real complex systems, whose underlying network structures recurrently exhibit a heterogeneous degree distribution [47].
To further illustrate the inductive nature of our GNN architecture, we use it to recover the bifurcation diagram of the three contagion dynamics and to project their behaviors on real-world networks. In the infinite-size limit |V| → ∞, these dynamics have two possible steady states: the absorbing state where all nodes are susceptible, and the endemic state in which a fraction of nodes remains infected over time [3,12,41]. These steady states exchange stability during a phase transition which is continuous for the simple contagion and discontinuous for the complex and interacting contagion dynamics. The position of the phase transition depends both on the parameters of the dynamics and on the topology of the network on which they take place. Note that for the complex and interacting contagion dynamics, the stability of absorbing and endemic states do not change at the same point, giving rise to a bistable regime where both states are stable. Figure 3 shows the bifurcation diagrams obtained by analyzing a mean field framework which uses the LTPs extracted from the GNN, as well as those obtained by performing numerical simulations with these same LTPs. In particular, the mean field perspective demonstrates the possibility to develop an approximate representation of the GNN predictions that, although not as accurate as the predictions obtained by numerical simulations, is amenable to mathematical analysis (i.e. position of fixed points, their stability and the critical thresholds of the phase transitions). The first row of Fig. 3 shows that the GNN correctly predicts whether the phase transition is continuous or discontinuous, as well as the existence of a bistable regime. Quantitatively, the predictions are also strikingly accurate-essentially perfectly accurate for the simple contagion dynamics-which is remarkable given that the bifurcation diagrams were obtained on networks the GNN had never seen before. The second row of Fig. 3 shows the same bifurcation diagrams but using GNN * models trained under the semi-exact training scheme. The almost perfect overlap and thresholds of the curves indicates that there is virtually no intrinsic limit to the accuracy the GNN's predictions, and highlights once more the critical influence of the information contained in the training dataset. In other words, our GNN's architecture is as accurate as its training data allows it to be.
Finally, similar levels of accuracy are obtained when simulating the three contagion dynamics on static [ Fig. 4(A)] and temporal [ Figs. 4(B-D)] real-world complex networks. Recall that the GNN used to produce these results is the same as the one used in Fig. 3, and was trained on a synthetic static network of a different size. The projections obtained from the GNN remain in the same vicinity as those generated by the "ground truth" LTPs throughout the timespan of the dataset, and show similar patterns of population infections/recoveries when the configuration of the network changes.
In summary, we introduced a data-driven approach that learns the effective mechanisms governing the propagation of contagion dynamics on complex networks. We proposed a reliable training protocol, and we validated the projections of our GNN architecture on simple, complex and interacting contagion dynamics using synthetic as well as real-world static and temporal complex networks. Interestingly, we found that our approach performs better when trained on data whose underlying network structure is heterogeneous, which could prove useful in real-world applications of our method given the ubiquitousness of scale-free networks [47].
By recovering the bifurcation diagram of various dynamics, we illustrated how our approach can leverage time series from an unknown dynamical process to gain insights about its properties-the existence of a phase transition and its order. Most importantly, we showed how to explicitly extract the LTPs inferred by the GNN architecture, which in turn can help uncover the underlying mechanisms governing the dynamics and help build effective mechanistic models. In a way, we see this approach as the equivalent of a numerical Petri dishoffering a new way to experiment and gain insights about an unknown contagion dynamics-that is complementary to traditional mechanistic modeling.
Although we focused the presentation of our method on contagion dynamics, its potential applicability reaches many other realms of complex systems modeling where intricate mechanisms are at play. We believe this work establishes solid foundations for the use of deep learning in the design of more effective realistic models of complex systems.
We are grateful to Emily Cyr and Guillaume St-Onge for helpful comments and to François Thibault, Patrick Desrosiers, Louis J. Dubé, Simon Hardy and Laurent Hébert-Dufresne for fruitful discussions. This work was supported by the Sentinelle Nord initiative from the Canada First Research Excellence Fund (CM, EL, AA), the Conseil de recherches en sciences naturelles et en génie du Canada (CM, AA) and the Fonds de recherche du Québec-Nature et technologie (EL).