Cost-effective Network Disintegration through Targeted Enumeration

Finding an optimal subset of nodes or links to disintegrate harmful networks is a fundamental problem in network science, with potential applications to anti-terrorism, epidemic control, and many other fields of study. The challenge of the network disintegration problem is to balance the effectiveness and efficiency of strategies. In this paper, we propose a cost-effective targeted enumeration method for network disintegration. The proposed approach includes two stages: searching for candidate objects and identifying an optimal solution. In the first stage, we use rank aggregation to generate a comprehensive ranking of node importance, upon which we identify a small-scale candidate set of nodes to remove. In the second stage, we use an enumeration method to find an optimal combination among the candidate nodes. Extensive experimental results on synthetic and real-world networks demonstrate that the proposed method achieves a satisfying trade-off between effectiveness and efficiency. The introduced two-stage targeted enumeration framework can also be applied to other computationally intractable combinational optimization problems, from team assembly via portfolio investment to drug design.


I. INTRODUCTION
E XPLORING the internal correlation structure of com- plex networks is an important research paradigm for understanding complex systems [1], [2].In most cases, we hope to ensure network connectivity, which has promoted research on network robustness in recent decades [3]- [6].However, if a network is harmful, such as terrorist networks [7], criminal networks [8], epidemic spreading networks [9], financial contagion networks [10] and cancer networks [11], efficiently disrupting the structure and function of the network becomes a meaningful and challenging task.This so-called network disintegration problem has attracted increasing attention among researchers [12]- [16].
The core of the network disintegration problem-also known as the "network attack," [12], [17] "graph fragmen-tation," [18] and "network dismantling" [15], [19]-is determining the node or link set to be removed under certain constraints and various disintegration goals [20], [21].This problem is typically NP-hard for general graphs [22] and its mathematical essence is a combinatorial optimization problem.In addition to early research based on exact combinatorial optimization methods to find an optimal network disintegration solution [23]- [25], researchers have also attempted to calculate the centrality measures of the nodes and then remove them individually, starting with the nodes with the highest centrality values, to develop a network disintegration strategy [12], [17], [26].However, the set composed of a single important node may not be the most critical set of nodes, and with the increased availability of large-scale networks, novel heuristic or approximate algorithms have been proposed to find vital nodes in complex networks [27]- [29].A recent study suggested an iterative algorithm to select multiple controlled nodes based on the spectral properties of the grounded Laplacian matrix obtained by deleting specific rows and columns from the Laplacian matrix of the network [30].Furthermore, some studies introduced evolutionary algorithms to the network disintegration problem and attempted to find a nearoptimal strategy from the considerable solution space [31], [32].Inspired by advances in artificial intelligence to solve many practical problems, some studies have developed deep reinforcement learning or machine learning to find influential nodes in complex networks [33], [34].
An outstanding challenge in the network disintegration problem is to take into account the computational cost.Although considerable progress has been made in the study of network disintegration, it remains challenging to achieve a good balance between effectiveness and efficiency.Methods with good effectiveness (giving a more accurate estimate), such as mathematical programming, evolutionary algorithms, and deep learning approaches, typically have poor efficiency (effectiveness per running time), limiting their applications in large-scale networks.On the contrary, high-efficiency methods, such as centrality methods and heuristic algorithms, are typically unsatisfactory in terms of effectiveness, yielding nonoptimal solutions.Similar tradeoff problems have been studied for related tasks to estimate the highest degree and the optimal individuals to vaccinate [35].To find a compromise between effectiveness and efficiency, we propose a targeted enumeration method in this paper.We first extract a smallscale candidate set of nodes to reduce the scope of the enumeration, and then find the optimal combination among the candidate nodes through enumeration.The core and difficulty of the method is to efficiently determine the set of candidates.
We propose solving this problem using rank aggregation.The resulting two-stage targeted enumeration method has a highly flexible framework that does not require domainspecific knowledge and leads to a cost-effective network disintegration strategy.

II. NETWORK DISINTEGRATION MODEL
Consider an undirected and unweighted graph G = (V, E) with a finite set of nonempty nodes V and a set of links E. Let N = |V | and W = |E| be the number of nodes and links, respectively, and define different nodes as 1, 2, • • • , N .This paper focuses on the removal of nodes and assumes that all links connected to the node will be deleted after the node is removed.Let V ⊆ V denote the set of nodes to be removed; thus, Ĝ = (V − V , Ê) is the network that remains after removing the nodes in V , and n = | V | is the strength of disintegration.As a reference, let G be the residual network after randomly removing the n nodes.We denote the network disintegration strategy as and its elements are x i = 1 if the corresponding ith node satisfies i ∈ V ; otherwise, x i = 0; thus, we can obtain the disintegration strength as n = N i=1 x i .Regardless of the type and scale of attacks to which the network is subject, it will inevitably damage its inherent structure and functions, which will also be reflected in the objective function of the network performance.Based on this, we introduce the following objective function to measure the disintegration effect where Γ represents the measurement function of network performance.If . The monotonicity of Γ ensures that the network performance strictly decreases monotonically with the network disintegration process and leads to Φ > 0 if n > 0. Φ reflects the disintegration effect of different network disintegration strategies.The larger Φ suggests a better disintegration effect.There is an important reference value, that is, Φ = 1.
If Φ > 1, it means that the disintegration strategy is superior to random removal of nodes.Eq. (1) shows that the goal is to design a node removal strategy, that is, a subset of nodes to be removed, which can maximize the disintegration effect Φ. Thus, the optimization model for the disintegration strategy can be described as the following general mathematical model Usually, the disintegration effect is measured by the size of the largest connected component [17].However, it changes very little when removing a small number of nodes from the network.Therefore, in this study, we employ natural connectivity [36], [37] as a measure function Γ among a variety of alternative ways.Natural connectivity is a measure function of structural robustness in complex networks, which can be mathematically derived from the graph spectrum [36], [37].This measure characterizes the redundancy of alternative links by weighting the total number of closed walks with all lengths in the network and can also be interpreted as the Helmholtz free energy of a network [38].From a mathematical perspective, it can be derived from the graph spectrum as an average eigenvalue: where S is the total weighted number of closed walks and A(G) is the adjacency matrix of the network G with eigenvalues Natural connectivity has been shown to change strictly monotonically with the addition or deletion of links and then provides a sensitive and reliable measure of the robustness of the graph [36], [37], [39], [40].Moreover, for networks with a large spectral gap between the largest eigenvalue λ 1 and the second largest eigenvalue, we can consider the following approximation of natural connectivity [41]:

AGGREGATION
From a mathematical perspective, network disintegration is a typical combinatorial problem that considers n nodes from N nodes without repetition.For a small network size N , we can directly obtain an optimal solution by enumerating all combinations of However, for largescale networks, there will be a problem of combinatorial explosion.To construct a heuristic method, the selected n nodes should be important according to some criterion.If we extract a small-scale candidate set of vital nodes Ṽ in advance and then enumerate all combinations only among the candidate set, it will dramatically improve the efficiency of the enumeration.We use Ñ to denote the size of the candidate set Ṽ , where n ≤ Ñ ≤ N .Then, the enumeration range can be reduced from C n N to C n Ñ .Now, the core problem is to find candidate objects.There are numerous criteria that characterize the importance of nodes.If we only use a single criterion, then some potential key nodes may be missed.Therefore, we simultaneously consider multiple node importance criteria using rank aggregation (RA).In network science, the centrality of nodes is a common approach to assess the importance of nodes [26].Thus, we first generate multiple node rankings based on various centrality measures.Then we combine these individual rankings into a consensus ranking using the rank aggregation method.Finally, we determine the candidate objects Ṽ based on the consensus ranking.
In this study, among a variety of alternative methods, we choose the graph-based rank aggregation method [42], [43] to aggregate these individual rankings into a single consensus ranking R. The graph-based rank aggregation method has been shown to outperform other rank aggregation methods, particularly for high-dimensional ranking.
Consider M rankings of N nodes given by the M node importance criterion and use R i = [r i1 , r i2 , • • • , r iN ] to denote the node importance ranking given by the criterion c i , where r ij represents the rank of the node j based on the criterion c i (i = 1, 2, • • • , M ).The transition matrix for the criterion c i is denoted by P ci = (p ci st ) N ×N , where p ci st = 1 if node s outranks node t under c i ; otherwise, p ci st = 0. Based on the transition matrix, we denote the adjacency matrix for a competition graph as A = (a st ) N ×N , where a st = M i=1 p ci st .Furthermore, based on the adjacency matrix A, we denote the competition graph of the network nodes as G c .The nodes in the directed and weighted graph G c represent the nodes in the real network, and each directed link e st represents an outranking relation from node s to t.The weight of the directed link e st represents the number of times node s is placed ahead of node t in all aggregated measure rankings.We also denote the in-degree and out-degree of node j in the competition graph G c by d − j = N s=1 a sj and d + j = N t=1 a jt , respectively.Thus, we can define the ratio of out-in degrees (ROID) as follows: which can be used to quantify the strength of node j and rank all nodes according to their ROID [42].The higher the ROID value, the higher the rank of the nodes.
To better understand the process of searching for candidate objects, an illustration is shown in Fig. 1.Taking into account a sample network that contains 10 nodes and 23 links and has a network topology as shown in Fig. 1(a), we employ three common centrality measures: degree centrality (DC) [17], betweenness centrality (BC) [12], eigenvector centrality (EC) [44].The individual ranking of the nodes based on the three centrality measures is shown in Fig. 1(b), (c),  and (d).The aggregated ranking R is shown in Fig. 1(e).Details on the ranking are provided in Table I.We set the disintegration strength n as 2 and the size of the candidate set Ñ as 4 and then obtain the candidate set {2, 3, 8, 9} based on the aggregated ranking, as shown in the orange node in Fig. 1(e).The comparison results of the node ranking with different centrality measures are visualized in Fig. 1(f).Each curve represents a node, and the height of the curve represents the node ranking according to the corresponding criterion.The wavy curves suggest that there are distinct differences between the three individual rankings.For example, node 2 ranks first with DC but fifth with BC; node 10 ranks first with EC but sixth with DC.In the far right of Fig. 1(f), the aggregated ranks are also presented.The RA method integrates all information from individual rankings and achieves a comprehensive ranking, effectively overcoming the one-sidedness of the individual measure.To some extent, this method takes the "average" of multiple rankings.
Intuitively, the number of criteria for the importance of the node M and the combination of these criteria will affect the candidate objects and further influence the disintegration effect.To explore the effect of the node importance criterion on the candidate set Ṽ , Fig. 2 shows the Venn diagram of candidate sets obtained using various combinations of node importance criteria in three real-world networks.As we see in Fig. 2, if we only use a single criterion (M = 1), the set of candidates with different combinations of criteria varies significantly.However, as M increases, the intersection of candidate sets with different criteria also expands observably.For example, in the network shown in Fig. 2(a), there are only 4 overlapping nodes when M = 1 but 9 overlapping nodes when M = 3; these results indicate that rank aggregation can help us search for a stable and credible candidate set.Without loss of generality, we choose D-B-E as the combination of the node importance criterion in the following experimental analysis.

ENUMERATION
In the previous section, we proposed selecting Ñ candidate nodes by rank aggregation.Now, we need to find the optimal combination among the candidate set through enumeration.The size of the candidate set Ñ will directly affect the effectiveness and efficiency of the proposed method.Considering that n ≤ Ñ ≤ N , we assume that Ñ = n + (N − n)α, where 0 ≤ α ≤ 1 is the redundancy coefficient.When α reaches the maximum value 1, it becomes an exhaustive enumeration.While α < 1, we call it targeted enumeration (TE).
A higher α will lead to better effectiveness but worse efficiency.Fig. 3(a) shows the disintegration effect Φ as a function of the redundancy coefficient α in two typical synthetic networks: the Newman-Watts (NW) model of smallworld network [45], and the scale-free (SF) network [46].The curve shown first increases and then flattens, indicating that a small value of the redundancy coefficient is sufficient for the targeted enumeration and increasing α contributes little to the disintegration effect.These results also suggest that the process of selecting candidate objects is effective to some extent.In practical applications, the value of α can be determined based on real needs.
The algorithmic process of the TE is summarized below.First, we choose Ñ candidate nodes based on the aggregate ranking of the nodes.Then, we enumerate all possible combinations among the candidate set.Finally, we find the optimal solution that corresponds to the largest disintegration effect Φ.In the example shown in Fig. 1, if the redundancy coefficient is considered to be α = 0.25, then there are C n Ñ = C 2 4 = 6 combinations, among which the combination {2, 8} is the optimal solution.
Next, we briefly analyze the time complexity of the TE method.As described above, the time complexity of the TE method includes three parts: calculating the centrality of the nodes, aggregating multiple rankings, and enumerating among the candidate sets.In the first part, the time complexity for DC is O(W ), the time complexity for BC is O(N W ) [47], and the time complexity for EC is O(N + W ) [48].In the second part, the time complexity of the rank aggregation is O( Ñ 2 ).Considering that n N and α 1 in most realistic cases, we can also assume that n = log(N ) and α = log(N )/N and then obtain Ñ = n + (N − n)α ≈ 2 log(N ).Thus, the time complexity of the second part is O(log 2 (N )).In the third part, with the assumption that n = log(N ) and α = log(N )/N , the number of enumerations can be given as: A schematic of the enumeration times C n Ñ with a varying network size N when assuming n = log(N ) and α = log(N )/N is shown in Fig. 3(b).We see that the number of enumerations is less than 1000, even with the large network size N = 10 6 , which is acceptable.

A. Experiments in synthetic networks
To demonstrate the applicability of the proposed method, we next evaluate its performance on two kinds of typical synthetic networks: the NW network and the SF network.We use five other methods for comparison: degree centrality, betweenness centrality, eigenvector centrality, collective influence (CI) [27] and tabu search (TS) [49].
Fig. 4(a) and (b) show the disintegration effect Φ as a function of the disintegration strength n with different disintegration methods.We also set α equal to 0.01.As shown in Fig. 4(a) and (b), the proposed method is almost close to the TS method, which can achieve a good disintegration effect.Both methods consistently outperform other methods on all synthetic networks.It is worth pointing out that, even for the heterogeneous SF network with γ = 2.5, in which the vital nodes are apparent and then all methods work well, the TE method still maintains a weak advantage compared to other methods except for the TS method.In addition to improved effectiveness, the TE is also markedly efficient.Fig. 4(c) and  (d) show the computation time of different methods as a function of network size.As shown in Fig. 4(c) and (d), with increasing network scale, the growth rate of the TS method is markedly higher than that of the other methods.In contrast, the proposed method is more efficient.In this representation, a metabolic network is made up of nodes, substrates that are connected to each other through links, which are the actual metabolic reactions.(c) In the air traffic control network, the nodes represent airports or service centers, and links are created from the preferred routes recommended by the National Flight Data Center.

B. Experiments in real-world networks
Since synthetic networks cannot completely summarize the typical properties of real-world networks, we apply the TE method to several realistic scenarios using the aforementioned methods.Table II shows details of realworld networks used in our study.The data sets are publicly accessible and are retrieved from the KONECT Project (http://konect.cc/),the Network Data Repository (https://networkrepository.com/index.php), and the Colorado Index of Complex Networks (https://icon.colorado.edu).We assume that the real-world networks considered in this paper are simple graphs with undirected, unweighted, and single links.We show the disintegration effect Φ and the running time of the six methods in Fig. 5. Along with the TS method, the proposed method achieves superior performance compared to the other four methods with respect to the disintegration effect.It is obvious that the disintegration effect of these two methods is more stable.For example, for the disintegration strategy based on EC, its effect is second only to TE and TS methods in 9-11 Hijackers, Infect-Dublin and Gnutella networks, but not so good in Autobahn and Facebook networks.However, the TS method leads to good effectiveness but poor efficiency.
In other words, the proposed method has lower cost to obtain a disintegration effect that is similar to that achieved by the TS method.Compared to centrality-based methods, although the efficiency of the proposed method is lower than that of centrality-based methods, it is acceptable, indicating that the proposed method achieves a satisfying balance between effectiveness and efficiency.

VI. CONCLUSION
In summary, we proposed a cost-effective network disintegration method called targeted enumeration (TE).Specifically, the TE method was divided into two stages.In the first stage, we used rank aggregation to transform multiple rankings of nodes into a comprehensive ranking.We then selected the top Ñ nodes based on the aggregated ranking as the candidate set of nodes to remove.The size of the candidate set was controlled by the redundancy coefficient 0 ≤ α ≤ 1.We showed that rank aggregation can help to find a stable and credible candidate set.The second stage was a targeted enumeration, where, instead of enumerating all possible combinations in the general sense, we enumerated within the scope of the candidate set.The optimal solution was the combination of nodes corresponding to the largest disintegration effect Φ.We showed that a small value of the redundancy coefficient α was sufficient for the targeted enumeration, which is crucial for the feasibility of the TE.Numerical experiments on synthetic and real-world networks have shown that the TE significantly outperforms conventional methods and achieves results that are close to those high-cost intelligent algorithms.In terms of efficiency, the TE was acceptable compared to conventional methods.The critical point of the proposed method was to determine a set of valid candidates.In this study, the introduction of rank aggregation ensured the validity of the candidate set.The aggregated ranking combined multiple node importance criteria and avoided missing potential key nodes from the candidate set.Although it is not the best one in terms of effectiveness or efficiency, the proposed method achieves a satisfying trade-off between effectiveness and efficiency.
The proposed TE method has a highly flexible framework that does not require domain-specific knowledge.Various node importance criteria, rank aggregation methods, and different levels of redundancy coefficient α can be used depending on the real situation.As a typical combinatorial optimization problem, selecting n objects among N objects (n N ) is common in many application scenarios, including personnel selection, portfolio investment, and drug design.For these problems, finding an optimal solution in a condensed scope is an intuitive approach.The proposed method provides a general executable framework for implementation.

Fig. 1 .
Fig. 1.Illustration of searching the set of candidates by aggregating the rankings.(a) The sample network, where N = 10, W = 23, n = 2, and Ñ = 4.And the numbers represent the labels of the nodes.(b) to (d) Individual node rankings based on degree centrality, betweenness centrality, and eigenvector centrality, respectively.The size of the node is proportional to its ranking.(e) The aggregated ranking of the nodes.The orange nodes make up a set of candidates Ṽ .(f) Comparison of the ranking of nodes with various centrality measures.

Fig. 2 .
Fig. 2. The Venn diagram of candidate sets based on various combinations of node importance criteria in real-world networks.In the figure, D, B, E, C, and S represent the degree centrality, betweenness centrality, eigenvector centrality, closeness centrality, and subgraph centrality, respectively.The size of the candidate set Ñ is 10.(a) The network contains friendships between boys in a small high school in Illinois, where a node represents a boy and an link between two boys shows that they are friends.And the numbers represent the labels of the nodes.(b) The metabolic network of Caenorhabditis elegans.In this representation, a metabolic network is made up of nodes, substrates that are connected to each other through links, which are the actual metabolic reactions.(c) In the air traffic control network, the nodes represent airports or service centers, and links are created from the preferred routes recommended by the National Flight Data Center.

Fig. 3 .
Fig. 3.The effectiveness and efficiency of the TE method.(a) The network disintegration effect Φ under different redundancy coefficients α.The results shown are the average of 10 network instances under the same parameters.Results for the NW network of size N = 1000.The NW network starts with a regular network with local connections in the range K = 6, with the probability p = 0.2 of adding a new link between a randomly selected unconnected pair of nodes.Scale-free network with size N = 1000 and degree exponent γ = 3.0.(b) The number of enumerations as a function of network size N according to Eq.(6) when assuming n = log(N ) and α = log(N )/N .

5 Fig. 4 .
Fig.4.Performance of TE in synthetic networks.We set the geodesic distance as 2 for the CI method.For the TS algorithm, we assign the tabu list length to 5, the number of candidate solutions to 5, and the maximum number of iterations without improving the optimal solution will be 2000.The numerical results shown are the averages of 20 different network instances under the same parameters.(a) The disintegration effect of the TE method on NW network with size N = 1000 of varying neighbor numbers K and connection probability p.(b) The disintegration effect of the TE method on SF network with size N = 1000 of varying degree exponent γ.(c) The computation time of different methods as a function of network size N on the NW network.All simulation results are obtained on a desktop computer with an Intel Core i7-9700 CPU with 3.00GHz and 16.0 GB of RAM.(d) The computation time of different methods on the SF network.

Fig. 5 .
Fig.5.Performance of TE in real-world networks.We evaluated the disintegration effect Φ and the running time of the six methods on nine real-world networks of different types and set the disintegration strength as n = ln N for different networks.

TABLE I THE
RANKINGS AND VALUES OF NODES IN THE SAMPLE NETWORK BASED ON DIFFERENT CENTRALITY MEASURES AND RANK AGGREGATION