Model-based clustering of multiple networks with a hierarchical algorithm

The paper tackles the problem of clustering multiple networks, directed or not, that do not share the same set of vertices, into groups of networks with similar topology. A statistical model-based approach based on a finite mixture of stochastic block models is proposed. A clustering is obtained by maximizing the integrated classification likelihood criterion. This is done by a hierarchical agglomerative algorithm, that starts from singleton clusters and successively merges clusters of networks. As such, a sequence of nested clusterings is computed that can be represented by a dendrogram providing valuable insights on the collection of networks. Using a Bayesian framework, model selection is performed in an automated way since the algorithm stops when the best number of clusters is attained. The algorithm is computationally efficient, when carefully implemented. The aggregation of clusters requires a means to overcome the label-switching problem of the stochastic block model and to match the block labels of the networks. To address this problem, a new tool is proposed based on a comparison of the graphons of the associated stochastic block models. The clustering approach is assessed on synthetic data. An application to a set of ecological networks illustrates the interpretability of the obtained results.


Introduction
Networks are key objects for describing interactions between individuals or entities in complex systems.Today, entire collections of networks emerge in more and more fields of application.To list a few examples, in social sciences face-to-face contacts among individuals at different time periods are represented as a set of behavioral networks (Isella et al., 2011).In medical research, a brain connectome is a network describing a patient's brain activity (Donnat and Holmes, 2018).In biology, metabolic networks for hundreds of different bacteria are available (Weber-Zendrera et al., 2021).In ecology, foodwebs represent the interactions of species in different ecosystems (Poisot et al., 2016).
When analyzing multiple networks, most questions are related to graph comparison.We may wish to quantify the (dis)similarity between networks, detect outliers or some temporal evolution of networks.In general, it is informative to reduce the dimension of the data by finding groups of networks sharing similar characteristics.For instance, we may want to automatically group together patients with the same brain state, or identify bacteria with roughly the same metabolism, or in the context of climate change find ecological networks with similar overall organization.The focus of this work is on clustering of networks that may not share the same set of vertices and may vary in size, and we seek a method that partitions the networks according to their topology.

Graph comparison
The clustering task requires some notion of graph similarity.However, networks have complex structure, and so graph comparison is not trivial and similarity or graph distances can be defined in many ways.
A widespread approach is based on graph embeddings.A graph embedding is a low-dimensional vector representation of a network encoding structural information about the network.Traditional embeddings are hand-crafted and composed of local or global network summary statistics like the edge density, nodes degrees and clustering coefficients.Then, graph similarity is defined using the distance between the embedding vectors, and graph clustering is easily performed using off-the-shelf machine learning algorithms as k-means or spectral clustering.Clearly, the clustering result heavily depends on the chosen embedding.
An alternative to graph embeddings are model-based approaches.Here a statistical model is introduced and networks forming a cluster are assumed to be generated independently from a common probabilistic model.To put it differently, data are modeled by a finite mixture model of random graph models and mixture components correspond to clusters of networks.The problem of graph comparison is thus recast as a problem of estimating and comparing the probabilistic models that generated the observed networks (Stanley et al., 2016;Sabanayagam et al., 2022).
A major advantage of model-based approaches over graph embeddings is the possibility to quantify uncertainty of the results.For instance, one may compute the posterior probability for a network to belong to a given cluster or compare the likelihood of two clusterings.Furthermore, it provides a natural framework for model selection, that is, the automated choice of the best number of clusters.
For graph comparison, two general settings must be distinguished.In the first case, all networks are defined on the same set of vertices, as for networks with a temporal dynamic or connectomes, where a vertex always refers to the same brain region.Then graph distances can be based on local features comparing structures of node neighborhoods.In the second case, the sets of vertices are completely different from one network to another, without any correspondence among the nodes of the different networks.This is the setting we are interested in.The mangal database (Poisot et al., 2016), for example, provides hundreds of foodwebs from all over the globe, where each foodweb describes an ecosystem coming with its own set of species.To compare such networks, local features are useless, and only the overall topology of the networks is meaningful.

Mixture models for sets of networks
Using finite mixtures to perform clustering has a long-standing tradition (Titterington et al., 1985;McLachlan and Peel, 2000), but only recently, this approach has been explored for graph clustering.To define a mixture model, a random graph model for the mixture components has to be chosen.For networks with constant node sets, the stochastic block model and generealized linear models may be used (Stanley et al., 2016;Signorelli and Wit, 2019), or extensions of measurement error models, where networks are considered to be perturbations of some ground-truth graph (Mantziou et al., 2021;Young et al., 2022).Mukherjee et al. (2017) and Sabanayagam et al. (2022) propose nonparametric models, where the distribution of the mixture components is estimated by a graphon estimate.Shortcomings of the latter approach include the restriction to undirected graphs and the lack of interpretation, since analyzing graphons is not convenient.
In this paper, a new mixture model is proposed.As we desire an interpretable model, we choose the popular stochastic block model (SBM) (Nowicki and Snijders, 2001) for the mixture components.The SBM is a highly flexible model, which accommodates a large variety of heterogeneous graph topologies as often encountered in applications.A further advantage is the interpretability of the parameters of a SBM.Many model variants exist (see Matias and Robin (2014) for a review), which underlines the relevance of SBM.In particular, extensions of the SBM for collections of networks include repeated measurements of a ground-truth SBM network (Le et al., 2018), a mixture of SBMs with fixed nodes (Stanley et al., 2016), and networks that are generated by SBMs with varying parameters (Chabert-Liddell et al., 2022).
The SBM is a discrete latent variable model and parameter estimation is challenging due to its involved dependence structure.Several inference algorithms have been proposed like variational EM-algorithms (Daudin et al., 2008), MCMC methods (Nowicki and Snijders, 2001;Peixoto, 2014), a pseudo-likelihood approach (Amini et al., 2013), a Bayesian approach based on the integrated classification likelihood (ICL) (Côme and Latouche, 2015), spectral clustering (Rohe et al., 2011) and, more recently, a variational autoencoder using neural networks (Mehta et al., 2019).None of them is perfect, some are time-consuming and not scalable to large networks, others are fast, but provide unstable results.

Graph clustering algorithms
A simple clustering approach is based on graph distances.That is, one computes a similarity matrix for the pairwise comparison of the networks and then a clustering is derived via spectral clustering (Mukherjee et al., 2017;Sabanayagam et al., 2022).This approach does not account for the uncertainty of estimates and lacks a natural model selection device.
In a mixture model the clustering task becomes an inference problem, since cluster labels correspond to latent variables of the model.In general model-based clustering, EMtype algorithms (McLachlan and Krishnan, 2008), MCMC (Liu, 2008) and hierarchical agglomerative algorithms (Fraley and Raftery, 2002) are traditionally used to jointly infer cluster labels and model parameters.In the case of graph clustering, for mixtures of networks with a constant node set, EM algorithms are developed (Stanley et al., 2016;Signorelli and Wit, 2019) as well as Gibbs samplers (Mantziou et al., 2021;Young et al., 2022).They all have the disadvantage that the number of clusters must be set by the user.
In the present work, we explore the development of a hierarchical agglomerative algorithm.Starting from an oversegmented clustering with singleton clusters, clusters are successively merged to larger clusters while optimizing some criterion.Interestingly, the algorithm provides a whole cluster hierarchy that can be visualized by a dendrogram and intermediate clusterings are easily inspected.If the criterion includes a penalization of the number of clusters, the algorithm automatically stops when any further merge of groups results in a deterioration of the objective.Thus, model selection is performed automatically.Such penalized criterions are naturally obtained by using Bayes factors (Robert, 2007).
For our mixture model of SBMs, we follow the line of research initiated by Côme and Latouche (2015) that consists in choosing the integrated classification likelihood (ICL) as the objective for the hierarchical agglomerative algorithm.We show that the algorithm can be implemented efficiently and assess its performance by numerical experiments.

Block-label matching
In our algorithm an interesting issue is encountered during the aggregation of two clusters.
Indeed, merging clusters amounts to combine the corresponding SBMs.However, due to the label-switching problem in the SBM, this is not simple.Using the graphon functions (Lovász and Szegedy, 2006) of the SBMs, we propose a new tool to match block labels in a computationally efficient way.This tool should also be of interest beyond our clustering algorithm, whenever two SBMs are compared and the problem of label-switching occurs.

Contributions
The contributions of the paper are as follows.
• A finite mixture model of SBMs is introduced for collections of networks (Section 2).
• A hierarchical agglomerative algorithm to cluster networks and estimate model parameters is developed (Section 3 and 4).
• We propose a new tool to match block labels of two SBMs (Section 5).
• A numerical study assesses the performance of the algorithm and illustrates its utility on a collection of foodwebs (Section 6).
• Details for an efficient implementation of the algorithm are provided in the Appendix.

Mixture of stochastic block models
In this section we first recall the definition of the classical SBM for a single network.Then we introduce the mixture of SBMs for a collection of networks without vertex correspondence.Throughout the paper we consider directed binary networks without self-loops, but extensions to other types of networks are straightforward.

Stochastic block model for a single network
Consider a network with n vertices.Denote (π, γ) the parameters of a SBM with K blocks, where π = (π 1 , . . ., π K ) ∈ (0, 1) K are the block proportions with k∈ K π k = 1 and γ = (γ k,l ) k,l ∈ (0, 1) K×K the connectivity matrix.Let Z = (Z 1 , . . ., Z n ) ∈ K n be a vector of independent discrete latent variables for the nodes, with P(Z i = k) = π k for all k ∈ K and i ∈ n .Conditionally on the node labels Z = (Z) m∈ M , the observed where B(γ) is the Bernoulli distribution.Denote SBM n (π, γ) the distribution of A.

Mixture of SBMs for a collection of networks
Now we consider a collection of networks modeled by a finite mixture model, where each mixture component is a SBM.That is, networks belonging to the same cluster are independent realizations of the same SBM.m) denotes the adjacency matrix of the m-th network.Networks may have different numbers n (m) of vertices and no correspondence among the nodes is assumed.We introduce independent discrete latent variables The associated numbers of blocks, say K c , are not constrained to be equal.We assume that all networks in cluster c are in-dependent realizations of the SBM with parameter (π (c) parameters of the mixture model, and note that θ is identifiable only up to label switching.That is, switching cluster labels always results in the same probability distribution of A. In addition, in every SBM, the node labels are also identifiable only up to label switching.We adapt the notation of the node labels by adding superscript (m) , that is,

Clustering and estimation using the ICL criterion
In a mixture of SBMs, graph clustering becomes the recovery of the latent variables U from the data A. We develop a clustering algorithm by maximizing the so-called integrated classification likelihood criterion (ICL), defined as the log-likelihood function of the complete data, that is, the observations and the latent variables.Traditionally, this criterion has been used for model selection in various latent variable models, often in connection with the EM algorithm (Biernacki et al., 2000).More recently, Côme and Latouche (2015) showed that the ICL can also be used for directly estimating the latent variables.Compared to alternative approaches like EM, an unequivocal advantage is that model selection is performed automatically.Here we adapt the approach to mixtures of SBMs.In this section, the ICL is first introduced for a single cluster, then defined for our mixture model.

ICL criterion for a single cluster
In this subsection A is assumed to be a collection of i.i.d.networks of a SBM with K blocks and parameters (π, γ).Considering a Bayesian framework, let p(π, γ) be a prior distribution on the SBM parameters and define the ICL criterion as Interestingly, by integrating out the model parameters, the criterion only depends on the observations A and the latent nodel labels Z.The value of Z optimizing the ICL, that is, corresponds to the node labels maximizing the posterior distribution of Z and hence is a natural estimate of the latent variables.Using the following prior where α 1 , . . ., α K , η k,l , ζ k,l are hyperparameters, the ICL sbm has closed-form expression, which is given in the Appendix.

ICL criterion for a mixture of SBMs
In a mixture of SBMs, there are two types of latent variables, namely the clustering U of the networks and the node labels Z.The ICL is then defined as where p(θ) is a prior on the model parameters.The values ( Û, Ẑ) that maximize the ICL are convenient estimates of the graph clustering and the node labels.They are defined as Again we consider classical independent conjugate priors given by where λ c , α k , η k,l , ζ k,l are hyperparameters.Let I c be the set of indices of networks belonging to cluster c, that is, Then, one can show that the ICL can be rewritten as The last term on the right-hand side has closed form given by log The ICL criterion is not exactly a similarity measure that compares clusters of networks, but it is a model-based likelihood criterion that defines which is the best clustering.

Hierarchical clustering algorithm
To solve the discrete optimization problem given by ( 4), we propose a greedy hill-climbing algorithm.The algorithm is initialized by a mixture of M SBMs by setting U (m) = m for m ∈ M , that is, every network forms a cluster on its own.Then, at every iteration, two clusters are combined to a single larger cluster.More precisely, for any pair of clusters where U and Z are the current latent variables and U c∪c and Z c∪c the ones obtained by merging the clusters c and c .Finally, the cluster aggregation yielding the largest ICL increase is actually performed.The algorithm stops automatically when the ICL would decrease if any further clusters are merged.The granularity of the final clustering Û depends on the data and on the hyperparameters λ c .
The algorithm also requires initial values for the latent node labels Z.We propose to adjust a simple SBM to each network A (m) yielding an estimate (π (m) , γ (m) ) of the SBM parameters as well as node labels Z (m) .Our implementation uses the variational EM algorithm of the R package blockmodels (Leger, 2016).
The aggregation of two clusters raises an issue related to the the non-identifiability of the block labels in a SBM.In fact, it occurs that node labels in the two clusters do not refer to the same type of blocks.However, in our algorithm, for a given cluster, node labels must designate the same SBM block in every network.If this is not the case, it is necessary to relabel the nodes before merging the clusters.In Section 5 we develop a new tool to find the best correspondence of the block labes of two SBMs.
After merging two clusters, the current node labels can be further improved by searching the maximum of ICL mix in Z, while keeping the clustering U fixed.This amounts to maximize the term ICL sbm for the newly created cluster.We propose an adaptation of the procedure by Côme and Latouche (2015) to fit a SBM to a single network.Roughly, for every node we test if changing its node label increases the ICL or not.This algorithm is presented in the Appendix.
Algorithm 1 summarizes the entire clustering algorithm.It provides the best clustering Û, node labels Ẑ and also parameter estimates for the SBM of every cluster.
In view of the computing time, it is important that the evaluation of ICL variations ∆ c,c is fast.The Appendix provides explicit and fast formulae for the computation of ∆ c,c .

Matching of SBM node labels
Given the node labels, say Z (c) and Z (c ) , and the SBM parameters of two clusters of networks, the goal is to find the best match of the block labels of the two SBMs.A for m ∈ M do Fit a SBM to A (m) yielding parameters (π (m) , γ (m) ) and node labels Z (m) . end Update Z and θ according to Algorithm 3.
naive strategy consists in ordering one part of the SBM parameters, for instance, the block proportions π 1 , . . ., π K or the diagonal elements of the connectivity matrix γ in a monotone order.However, as none of the parts of the parameter contains all relevant information, Figure 1: Graphons of a SBM with two different orders of the block labels.
there are always cases where such an approach fails.To take into account both parts of the parameter (π, γ), we propose to use the graphon of the SBM as shown in this section.

Graphon of a SBM parameter
The graphon, introduced by Lovász and Szegedy (2006), is a function g that can be used as a generative model for exchangeable random graphs including SBM.
First, generate independent random variables U i ∼ U [0, 1] for the vertices i ∈ n .Then, conditionally on U i and U j , draw an edge A i,j ∼ B(g(U i , U j )).The graphon of a SBM SBM (π, γ) is given by where ) is a piecewise constant function depending on the entire SBM parameter.
Clearly, it also depends on the order of the block labels.Changing the block labels implies the permutation of the piecewise constant parts of the graphon as illustrated in Figure 1.

Label-dependent distance measure for two SBM parameters
To compare SBMs with parameters (π (c) , γ (c) ) and (π (c ) , γ (c ) ), consider the L 2 -distance of their graphons.By the piecewise constant character, the distance is a finite sum given by where |R k,l,k ,l | denotes the area of R k,l,k ,l defined as This distance is zero if and only if parameter values are identical ((π (c) , as well as the order of the blocks.Thus, it is a label-dependent distance measure.Furthermore, the graphon distance is well-defined even when the number of blocks of the two models differ.

Matching SBM blocks
Our tool to match block labels of two SBM parameters consists in finding the permutations yielding the smallest graphon distance.More precisely, let K c and K c be the number of blocks in (π (c) , γ (c) ) and (π (c ) , γ (c ) ), resp.Denote by S K the set of all permutations of K and a parameter with permuted blocks by We define permutations σc and σc as (σ c , σc ) ∈ arg min The solution is not unique, as for any τ ∈ S K (c) the minimum is also attained with the permutations τ • σc and τ • σc .But for our purpose, any solution is convenient for matching block labels.
For the practical computation of σc and σc , an exhaustive exploration of all permutations S K (c) and S K (c ) is feasible when the number of blocks K c and K c are not too large.However, we propose a general simplification based on an identifiability property of graphons (Bickel and Chen, 2009).Defining the marginal of graphon g as ḡ(u) = g(u, v)dv, one can show for undirected networks that there exists a unique graphon g can which defines the same distribution as g and whose marginal ḡcan is monotone decreasing.
Graphon g can is called the canonical representation of g.Hence, instead of exploring all possible permutations of the block labels, we choose as σc and σc the permutations providing the canonical representation of the graphons.This is justified for undirected networks.
In the directed case, where the marginals ḡ(u) = g(u, v)dv and ḡ(v) = g(u, v)du are not the same, a reasonable adaptation is to first order blocks according one marginal, say ḡ.Then, if ḡ is constant over two SBM blocks, order these two blocks such that the other marginal ḡ is decreasing over these two blocks.

Relabeling nodes during cluster aggregation
Let us summarize all steps to relabel nodes when merging two clusters.First, estimate the SBM parameters for both clusters by the maximum a posterior estimator defined by The estimator has simple closed-form expressions detailed in the Appendix.Next, the permutations σc and σc to obtain the canonical representations of the associated graphons are determined and the node labels Z (c) and Z (c ) are updated accordingly by σ (1) , . . ., Z

Numerical Study
We conduct numerical experiments to assess the performance of our clustering algorithm.
The algorithm will be very soon available on CRAN as the R package grupchclust.

Estimation accuracy
In the single network setting it is well known that parameter estimates converge to the true SBM parameter when the number of nodes increases.In the multiple network framework a different question is the accuracy of the estimators as a function of the number M of networks, when the network size is bounded.To study this question we consider a mixture with a single component.Concretely, we simulate multiple networks with only 10 vertices from a SBM with 3 blocks.Fitting a SBM to a single small network by a standard estimation algorithm, as done at the initialisation step of our algorithm, yields SBM estimates with 2 blocks only, since data do not provide enough evidence to estimate the parameters of a more complex model.We apply to these synthetic data a variant of our hierarchical agglomerative algorithm with no stopping criterion, merging all networks to a single cluster.Contrary to a one-by-one analysis of the networks, we observe that our approach that jointly analyzes all networks, allows to discover the richer true SBM with 3 blocks, even when all networks are small.More precisely, Figure 2 displays the proportion of 100 simulated data sets on which the procedure correctly selects a SBM with 3 blocks (blue).Obviously, this proportion increases with M , and for 500 networks and more, no errors are made.We also compare the estimated node labels Ẑ to the true ones by the adjusted Rand index (ARI) (Hubert and Arabie, 1985).The ARI (red line) is strictly increasing in the sample size M indicating that the fit gets better and better.Finally, we evaluate the quality of the estimated SBM by comparing the associated graphon with the one of the true SBM as described in Section 5.4.This is a valid comparison even when the number of blocks are not the same, which is the case for small sample sizes.We see from Figure 2 that the graphon distance (black line) steadily decreases when providing more an more data to the algorithm, meaning that the estimation accuracy improves.

Graph clustering
To study the impact of both the number M of networks and the sizes n (m) of the individual networks on the performance of the clustering algorithm, data from a mixture of four SBMs with equal block proportions are simulated.We consider two sample sizes M ∈ {20, 100} and networks with two different mean numbers of vertices, say n mean .For n mean = 30, the numbers of nodes n (m) vary from 15 to 100.For n mean = 100, all n (m) are chosen in [20,200].
For each of the four settings, Figure 3  We can also compare our clustering algorithm to alternative methods.Most clustering procedures in the literature based on a graph distance consist in constructing a similarity matrix with the graph distance for all pairs of networks to which a spectral clustering algorithm is applied to derive a clustering.Here we propose to consider the graphon distance defined in this work as a graph distance evaluated on the SBM parameter estimated on a single network.For the spectral clustering algorithm we specify the correct number of clusters to be found.We refer to this method as the graphon spectral clustering (GSC) approach.The procedure is applied to the same data as our model-based algorithm and the corresponding ARIs are also represented in Figure 3 b) (in blue).Obviously, in all settings our graph clustering approach has much better performance although we provided the correct number of clusters to the GCS method.Moreover, no substantial improvement of GSC is observed when more data are available.This is in accordance with our understanding of such approaches, where the estimation uncertainty is not taken into account and networks are only analyzed separately.We conclude that model-based approaches as ours, where a common descriptor of each cluster is computed using all data associated with one cluster, have a real advantage over graph distance methods.Furthermore, 68% of the outliers (13 networks) are in clusters that do not contain any data from the three-component mixture model.Thus, the algorithm is able to make a distinction between data from the mixture model and most of the noisy observations.

Robustness to model assumption
In terms of the ARI, our algorithm attains a value of 0.95, which is considerably larger than the ARI of 0.065 obtained with the GSC procedure with the same number of clusters, that is 16.Varying the number of clusters, the highest ARI for GSC is achieved with 4 clusters and reaches the value 0.72, which is still far below the ARI of the model-based approach.We conclude that our approach gives also very satisfying results when the data contains outliers or noisy observations.

Application to ecological networks
The mangal database (Poisot et al., 2016) provides a huge collection of ecological networks available via the R package rmangal.We extract the 187 networks, where interactions among different taxa (vertices) are of the type predation.The median number of vertices per foodweb is 19 (ranging from 5 to 708) and the median number of edges 32 (ranging from 4 to 27, 745).Our goal is the identification of foodwebs that have the same network structure regardless of the taxa or the size of the foodwebs.Is there any kind of universal topology of foodwebs?How many different organization forms of an ecosystem exist, and how can they be described and compared?
Our agglomerative cluster algorithm applied to these foodwebs discovers 17 clusters.There is a dominant cluster containing 115 networks (61%), 5 clusters of intermediate size (6 to 14 networks) and none of the remaining 11 clusters contains more than 4 networks.
Figure 6 represents the SBM parameter associated with the dominant cluster.It contains six blocks, block proportions are in the range [0.06, 0.28], half of the connectivity parameters γ k,l are lower than 0.01 and the largest connectivity parameters is 0.87.To interpret the different blocks, we consider the probabilities of in-coming and out-going edges for a node in block k ∈ K defined as A It is instructive to represent the clustering in connecection with the geographic location of the foodwebs (Figure 6).Foodwebs of the dominant cluster (lightblue circles) are present all over the globe and correspond indeed to some global or universal structure of ecosystems.
Interestingly, also the intermediate clusters are all spread over several continents.This means that different types of graph topology are not related to a particular geographic region.We conclude that the results of our algorithm provide many insights on the structure of foodwebs and raise new questions in ecology.

Conclusion
We have developed an approach to cluster networks according to their graph topologies.We illustrated that a model-based approach, where a description of each cluster is computed, outperforms methods based only on pairwise distances between networks, as we inherently take into account the estimation uncertainty.Another advantage of our hierarchical algorithm is the automated selection of the number of clusters, which is done in a single run of the algorithm contrary to EM-type algorithms, where different numbers of clusters must be explored separately.Moreover, a finite mixture of SBMs is a highly interpretable model, which is important in practical applications.Finally, we propose a new tool to match the block labels of two SBMs, which may be useful in other contexts.
i,K ) ∈ {0, 1} K , useful count statistics for the m-th network are where s (m) k is the number of vertices assigned to block k, a k,l the number of edges that link a vertex of block k with a vertex in block l and b k,l is the number of pairs with a vertex of block k and a vertex in block l that are not connected.Moreover, denote With these notations at hand, the ICL is given by

Computation of ∆ c,c
In this section it is shown how to evaluate ∆ c,c efficiently.Denote U c∪c the cluster labels afte merging clusters c and c , that is, U where K max = max{K c , K c } and K min = min{K c , K c } are the maximal and minimal number of blocks in the clusters c and c .An inspection of the above expression reveals that only the last two terms depend on the current number of clusters C. In addition, the other terms do not change from one iteration to another if both c and c have not been changed in the previous iteration, that is, if none of them is the result of the latest cluster aggregation.Hence, for those clusters the new value of ∆ c,c is the previous value plus constant κ C defined as where we use that the number C of clusters has diminished by 1 compared to the previous iteration.In short, for all pairs of clusters (c, c ) where both clusters have remained unchanged in the previous iteration, the update is simply For all pairs (c, c ), where one of the clusters has been obtained by the last cluster aggregation, ∆ c,c is computed according to (10).Moreover, we can avoid the computation of the statistics s k,l for all m at every iteration by storing them during the entire algorithm and only performing local updates when necessary.

Parameter estimates
The MAP estimate of the SBM parameters associated with cluster c is given by π

Update of nodes labels
After the aggregation of two clusters and relabeling the node, we can further improve node labels Z (c) of the new cluster c by maximizing the associated ICL criterion ICL sbm .We propose an adaptation of the algorithm by Côme and Latouche (2015), that fits a SBM to a single network, to multiple networks.Indeed, the proposed procedure is an algorithm to adjust one SBM to a collection of i.i.d.networks.The idea is to randomly choose a vertex and search its best block assignment in terms of the ICL.So, one by one, node labels are changed until no other swap would further improve the ICL.In the context of graph clustering, the convergence of this procedure is fast, since the current node labels are very good initial points.
For notational convenience, we drop superscript (c) of A (c) and Z (c) and simply write A and Z.All computations in this section only involve quantities related to the cluster under consideration.Now, an iteration of the procedure consists of the following steps.First, select a network indice, say m * ∈ M , and one of its vertices, say i * ∈ n (m) .Denote  closer look on the ICL criterion to better understand its dependency on the model size K.
Let us compare the value of the ICL for a SBM with K blocks containing an empty block to the ICL value of the same data, but with the SBM where the empty block is deleted, that is, a SBM with K − 1 blocks.The relation is given by ICL sbm (K) = ICL sbm (K − 1) + log Γ(Kα) Γ((K − 1)α) + log Γ((K − 1)α + m n (m) ) Γ(Kα + m n (m) ) .
The second and third term on the right-hand side are a penalty or the price to pay for using a larger model containing an empty block.Thus, by maximizing the ICL, parsimonious models are automatically favored.Now, the change of the ICL ∆ →h m * ,i * is exactly the same term as in ( 12) plus the penalty term given in ( 13).
The whole procedure to update node labels is summarized in Algorithm 2.Moreover, Algorithm 3 presents all steps of the aggregation of two clusters.

Algorithm 1 :
Agglomerative algorithm for graph clustering Input: Collection of networks A. Set U (m) = m for m ∈ M and set C = M .

Figure 2 :
Figure 2: Proportion of collections of networks on which the number of SBM blocks is

Figure 3 :
Figure 3: Estimated number of clusters for our hierarchical algorithm, and ARI for both a) displays the estimated number of clusters on 100 simulated collections of networks.When networks are small (n mean = 30), the cluster number is often underestimated (for both, M ∈ {20, 100}).Increasing the network size to n mean = 100 should increase the accuracy of the estimation of the underlying SBMs as seen in the previous section, and indeed we observe in Figure3 a) that the number of clusters is significantly more often correctly estimated.Moreover, Figure3 b) illustrates the ARI of the obtained graph clusterings (in red) compared to the truth.Again, we see that increasing the number of vertices n mean per network has not the same effect as increasing the number of networks M .This confirms that a good clustering result goes in hand with high estimation accuracy of the model parameters.

Figure 4 :
Figure 4: Dendrogram of clustering obtained by the algorithm.In the bar below spurious

Figure 4
Figure4shows the dendrogram of the clustering obtained with our procedure.The

Figure 5 :
Figure 5: Graphon of SBM parameter of the dominant cluster with in-coming and out-

Figure 6 :
Figure 6: Geographical representation of the clustering of the foodwebs.
of ICL sbmFor simplicity, hyperparameters for all priors are set to identical values, that is, α = α k , η = η k,l and ζ = ζ k,l for (k, l) ∈ K 2 .Using the one-hot encoding for node labels Z , c } if m ∈ I c ∪ I c and U (m) c∪c = U (m) otherwise.Likewise, denote Z c∪c the node labels after aggregation and relabeling with Z ( ) c∪c = {σ (Z (j) ), j ∈ I } for ∈ {c, c }, where σ are the permutations that match the block labels.For convenience, denote by β(x, y) = log Γ(x)Γ(y) Γ(x+y) the logarithm of the Beta function of x and y.Moreover, for any c ∈ C , (k, l) ∈ K c , denote

− 1
Changes in the statistics.Let s (m * ) k be the count statistic before the swap and s (m * ) k its value after the swap.We apply the same notation for all other statistics.Clearly, while the other terms remain unchanged.Defineδ k,•i * = i =i * Z (m * ) i,k A (m * ) i,i * , δ ,i * • = j =i * Z (m * ) j, A (m * ) i * ,j .Algorithm 2: ICL maximization algorithm for fitting one SBM to multiple networks Input: Set of networks A, initial node labels Z.while not converged do Select a network m * ∈ M and one of its vertices i * ∈ n (m * ) .for h ∈ K doCompute the impact ∆ →h m * ,i * on the ICL of moving node i * to block h.endDetermine the best block assignmenth * = arg max h∈ K ∆ →h m * ,i * .Set Z (m * ) i * = h * .endOutput: Updated node labels Z.Then, for any k, ∈ K , k=g δ ,i* • + 1 k=h δ ,i * • − 1 =g δ k,•i * + 1 =h δ k,•i * .When considering the matrix (a(m * )k, ) k, , only the g-th and h-th row and the g-th and h-th column change when moving i * from g to h.We introduce the number of possible dyads from nodes in block k to nodes in block in graph m defined asr ) 1 k=g + s (m * ) 1 k=h − s (m * ) k 1 =g + s (m * ) k 1 =h + 21 k=g, =g − 1 k=g, =h − 1 k=h, =g .. For any m = m * , the statistics remain unchanged, that is,a (m) k,l = a (m) k,l , b(m)k,l = b (m) k,l and r (m) k,l = r (m) k,l .Finally, we define function Ψ : R + × Z → R as Ψ(a, z) = log Γ(a + z) Γ(a) 1{a + z > 0}.
large values of d in k indicates that the species in block k are often eaten by other species, out k ≥ 0.05) and few chance to be eaten (d in k ≤ 0.05).Then 18% of the species (two blocks) are predators.The remaining 39% are somewhere in the middle of the food pyramid with both good chances to be eaten and to eat others (d in k ≥ 0.05, d out k ≥ 0.05).So this is the typical structure of most foodwebs in the database.To compare this topology with others, consider, for instance, the cluster containing the largest network with 708 nodes.The adjusted SBM has 29 blocks, which is explained by the very large network size.The question is whether this SBM is a kind of finer version of the SBM of the dominant cluster or whether there is a significant difference.Here block proportions lie in [0.004, 0.14], two third of the connectivity parameters γ k,l are lower than 0.01 and the maximal value is 0.91.Furthermore, 53% of the species are vegetarians, 24% are predators and 7% are in-between.The remaining 16% are networks with very few interactions (d in k ≤ 0.015, d out k ≤ 0.015) and such inactive species are absent in the dominant cluster.It is clear that this network structure is very different from the dominant cluster.
k ≥ 0.05).Our model contains two vegetarian blocks representing 43% of the species.Likewise, we define predators by a significant probability to eat others (d block assignment of i * .For any block h ∈ K compute the impact on the ICL of moving node i * to block h, that is,∆ →h m * ,i * = ICL sbm (A, Z →h m * ,i * ) − ICL sbm (A, Z), where Z denotes the current node labels with Z (m * ) i * = g, and Z →h m * ,i * the labels after moving node i * to block h, that is, Z (m * ) i * = h.Finally, we choose the best block assignment as h * = arg max * ,i * , and set Z (m * ) i * = h * .For the efficient computation of the ICL changes ∆ →h m * ,i * , two cases have to be distinguished: moving node i * to block h (i) does not empty block g; (ii) does empty block g and so the number of blocks K diminishes.But first, let us have a look on the evolution of the count statistics s