A method based on k-shell decomposition to identify influential nodes in complex networks

Identifying the most influential nodes in complex networks is an open research issue, which can be divided into two sub-problems: identifying and ranking the influential nodes based on their individual influence and selecting a group of nodes for maximum propagation in the network. Prior research has only focused on one of these sub-issues. In this paper, a new method is proposed that measures the spreading power of influential nodes (the first sub-problem) and selects the best group from them (the second sub-problem). The proposed method allocates the input network to different communities and weighs the graph edges using common neighbors and the degrees of the two end vertices in each community. Next, the method measures and ranks the nodes' propagation power in each community and selects a group of influential nodes to initiate the propagation process. The effectiveness of the proposed method is shown through conducting experiments on both synthetic and real networks. The method is compared with other previously known methods based on ranking accuracy, discrimination nodes’ ranks, and spread amount of influence. The results show that the proposed method outperforms other methods in all test datasets, indicating its significant superiority in identifying the most influential nodes in complex networks.


Introduction
The rapid development of network science attracts much attention to complex networks in various fields, including society [1], biology, physics [2], time series [3], transportation [4], and immunization strategy [5]. In our daily life, we encounter many complex networks such as communication networks, social networks, biological networks, and the World Wide Web. Such networks are composed of many nodes with non-obvious characteristics, which are the source of various research problems [6]. One of the most important research issues is identifying nodes in these networks with high propagation power. These nodes can play a critical role in dissemination. The phenomenon of diffusion in complex networks can be done in various fields, such as the spread of epidemic diseases [7], technical innovations [2], product promotion [8], and behavior acceptance [9]. This can help better understand the mechanisms hidden in complex phenomena and guide human production and life.
The problem of finding the super-spreader nodes divided into two parts: in the first part, the identification of influential nodes and their ranking based on the spread power of each node is discussed, while in the second part, the goal is to select a group of influential nodes to achieve maximum group propagation [1].
Many indicators have been presented to identify the most influential nodes, including degree, closeness, betweenness, k-shell, eigenvalue centrality, and page ranking. In all these methods, the propagation power is calculated according to the network structure and the position of each node in it. Then, the influential nodes are selected from among the nodes at the top of the ranked list. In order to identify a group of influential nodes with the maximum spread in the network, finding a seed set with a specific size is usually necessary so that the final influence obtained is maximized by activating the nodes in the set [10]. Finding a seed set with a given size is in the category of NP-Hard problems. A simple way to determine this set is to use identified influential nodes. In this method, first, all the nodes are ranked based on the amount of propagation power, and the top nodes are selected as the target set in the specified number. However, Kitsak et al. showed in [11] that selecting a group of nodes at the top of the ranked list is not efficient due to the considerable overlap of the nodes in this list. Therefore, there is a need to use another method with sufficient speed and proper efficiency.
For this purpose, it is possible to use the characteristics of communities in complex networks. In communities, nodes have the most connections with each other and the least connections with nodes in other communities. This feature is widely observed in real-world networks [12]. Communities greatly influence propagation in networks. Therefore, the motivation to select nodes from different communities as the origin of the start of propagation arose [3]. The next point is the significant difference between the nodes and edges in the communities regarding diversity and strength of communication. By considering these differences, a better method can be provided to calculate the spread power of nodes.
Therefore, this paper proposes a method to identify the most influential nodes and select an optimal subset of them to maximize propagation in the network.

3
A method based on k-shell decomposition to identify influential… The proposed method utilizes the community structure feature to separate the network into distinct communities. It measures the spreading power of nodes and ranks them within each community, resulting in a diverse subset of nodes as the initial seed set. The innovations made in the article are as follows: • Presenting a semi-local method for distinguishing nodes with similar local characteristics but different neighbors in terms of propagation power. • Using the concept of the communication diversity of each node in measuring its spread power: in the proposed method, if a node has less communication diversity, it gets a lower score. • Differentiating the edges of the graph by weighing the communication edges between the nodes in each community.
The structure of this article is as follows. In Sect. 2, an overview of related works is presented. The proposed method and its parts are introduced in detail in Sect. 3. The datasets, evaluation parameters, and the obtained results are reviewed in Sect. 4, and finally, in Sect. 5, the conclusion and future works are discussed.

Related work
In the presented approach, the spread power of each node is first measured to identify influential nodes, and then an optimal subset is selected. Related tasks are divided into two sections: in Sect. 2.1, tasks related to measuring the spread power of each node and identifying influential nodes are examined. In Sect. 2.2, studies that select the optimal subset of nodes to maximize influence are introduced.

Measuring the spread power of network nodes
The calculation of node spreading power and identification of influential nodes in a network have been the focus of numerous research studies. To this end, various indicators have been proposed, some of which will be discussed below.
The degree centrality shown by DC determines the node's importance by comparing the node's degree. The degree centrality of node i is determined using Eq. 1: where k i is degree of node i. A node with a high degree also has a high influence [13].
Betweenness centrality [14], which is denoted by (BC), measures the importance of a node by the number of shortest paths that pass through it, and it is obtained by Eq. 2: where N jk shows the shortest paths from node j to node k. Njk(i) is the number of N jk from node i. The greater the number of shortest paths that pass-through node i, the more influential the node is.
Closeness centrality (CC) [13] calculates the influence of nodes by the inverse of the sum of the shortest paths between nodes, which is shown in Eq. 3: where d ij represents the distance between node i and node j. The higher CC(i), the more critical node i is.
Eigenvector centrality [15], denoted by (EC), uses the importance of neighbors in addition to the number of neighbors to calculate the influence of a node. EC(i) can be calculated with Eq. 4.
The largest eigenvalue of adjacency matrix A is denoted by λ, and x j is the input value of the j th eigenvector corresponding to λ.
PageRank [16], denoted by PC, uses an iterative approach to obtain the influence of nodes. PC(i) of node i is calculated by Eq. 5: The degree of influence of node i in step q is shown as PC(i) q . The higher the PC score, the more influential the node is.
The k-shell decomposition method was proposed by Kitsak [17] to show the importance of nodes in the network. In this method, all network nodes whose degree is one are removed and placed in shell 1. The process of removing nodes whose degree is less than or equal to one is repeated until there are no nodes in the network with a degree less than or equal to 1. All the removed nodes are placed in shell 1 at this stage. Then, this method continues to determine shell 2, shell 3, and so on. It should be noted that the node with a higher k-shell value is located in a more central position in the network [17]. In the k-shell method, it is assumed that the nodes located in a higher shell have a higher propagation power. Also, in this method, all the nodes in the same shell are given the same rank.
When Zheng and Zhang [18] used the k-shell method to measure the propagation power of nodes, they realized that in this method, only the remaining degree for each node is considered. To solve this problem, they proposed a method called mixed degree decomposition (MDD), in which the contribution of the remaining degree and the removed degree of each node was considered simultaneously to calculate the strength of that node. If k r and k e are the remaining degree and removed degree of node v i , respectively, the MDD of node v i is calculated as Eq. 6: Bai and Kim [19] used the balanced combination of degree and coreness of neighbors to solve the problem of assigning the same rank to a large number of nodes. Based on this, the coreness of the neighborhood of node v, which CNC represents, can be calculated with Eq. 7: where Γ vi is the set of neighbors of node v i and k s (v j ) is the k-shell value of its neighbor node v j .
Next, the coreness value of the extended neighborhood CNC of node v i is recursively calculated according to Eq. 8.
In the CLD (clustering coefficient and local degree) measure [20], the effect of topological connections between neighbors is also considered on the node's spread power in addition to the number of neighbors. The more connections between the nearest neighbors of a node, the greater the influence of this node. Therefore, by combining the sum of the degrees of the nearest neighbors of a node and its clustering coefficient, the centrality of CLD is presented as Eq. 9: where k(v j ) is the degree of node j, Г(v i ) is the set of the nearest neighbors of node i, and C(v i ) is the clustering coefficiency of node i.
Ma et al. [21] used the gravity law to calculate the influence of one node on other nodes in spreading activity. They used the k-shell value of the node as the mass and the shortest path distance between any two nodes in the network as the distance related to Newton's gravity formula. Equation 10 shows how to calculate the gravity for each v i node.
where s(v i , v j ) is the shortest distance between nodes v i and v j . Ψ(v i ) is a set of nodes that are adjacent to node v i up to the specified level r. The authors of this article consider the value of r to be 3; Therefore, in this case, Ψ(v i ) contains 3 levels of neighboring nodes v i .
Namtirtha et al. [22] proposed the gradient neighborhood method and weighted shell node by assigning weight to the edges using the degree and k-shell index of two endpoint nodes. In the ksd w method, the weight of all the edges connected to the respective node is added together and it is used to measure the spread power of each node. Equation 11 is used to calculate the weight of each edge, and Eq. 12 is used to obtain ksdw of node v i .
where Γ(v i ) is the set of nodes that are in the neighborhood of node v i . c 1 and c 2 are two adjustable parameters. The authors used a set of c 1 and c 2 parameters whose values are in [0, 1].
In reference [23], the authors developed a novel ranking approach called SHKS, which builds upon the strengths of the k-shell decomposition method and incorporates the concept of structural holes (SH). Unlike previous attempts that focused solely on enhancing the k-shell decomposition method, this research considers both the k-shell-and SH-based centrality of a node, along with those of its immediate and second-order neighbors. By flexibly combining k-shell and SH, the SHKS algorithm can identify not only the highly connected core nodes with large k-shell indices but also the less-connected nodes that act as crucial bridges between different parts of the network. This approach thus has the potential to provide more comprehensive and accurate insights into network structures and dynamics.
Based on Newton's gravity formula, Li et al. [24] proposed another innovative method to measure the spread power of network nodes. In the proposed method, the degree of each node is used instead of the mass of the objects. The authors proposed the local gravity model to reduce the computational complexity and not reduce the accuracy of the final result. Only the effect of up to r levels of nodes is considered in this method. In the experiments, different values of r were tested. When the value of r became close to the half diameter of the network (r ≈ D/2), the diffusion estimate was obtained almost equivalent to the general method.
The multi-characteristics gravity (MCG) model presented in [25] is an advanced model that considers multiple characteristics of nodes and employs the gravity law to account for factors such as the quantity of neighboring nodes, their influence, the positioning of nodes, and the pathway information between them. This model can be relevant to the proposed method as it shares the common feature of considering multiple factors to identify influential nodes in complex networks.
Similarly, the innovative gravity model presented in [26] uses effective distance to identify influential nodes through information fusion and multi-level processing.
This model integrates both global and local information of complex networks. Additionally, it can uncover the concealed topological structure of real-world networks to provide more precise outcomes. While this method also considers the importance of nodes in the network, it differs from our proposed method in its use of effective distance and information fusion techniques.
The refined generalized mechanics model presented in [27] employs information entropy to assess the significance of each neighboring node on local information, while the shortest distance is used to determine the interaction between each node on global information. This approach can also be relevant to the proposed method as it considers both local and global information to identify influential nodes.
The approach presented in [28] proposes an extended degree measure and E-shell hierarchy decomposition method to determine the nodes' location within the network's hierarchical structure. By integrating these two components, a hybrid measure of characteristic centrality is suggested to assess the nodes' significance. This approach can also be relevant to the proposed method as it considers the network's hierarchical structure to identify influential nodes.
The multi-attribute decision-making strategy proposed in [29] utilizes various local and semi-local attributes to determine the node's spread power and classification. While this technique also considers node attributes and employs local and semi-local attributes, it differs from our proposed method in terms of the specific attributes used and the method of assessment.
The enhanced method for determining cluster rank presented in [30] takes into account both the shared hierarchy of nodes and their surrounding neighborhood to calculate a node's influence score. The primary benefit of this algorithm is that it examines the intricate correlation structure among neighborhoods and utilizes this information to uncover the most influential nodes. This approach can also be relevant to the proposed method as it considers both the network's structure and surrounding neighborhood to identify influential nodes, while it focuses on the connection structure similarity between the node and its neighbors.
Lastly, the DEMATEL method presented in [31] is a graph theory-based approach that allows for the efficient identification of a node's importance in the network by incorporating comprehensive information from the entire complex system. This approach can also be relevant to the proposed method as it aims to identify influential nodes by using comprehensive information from the entire complex networks.

Select the optimal group of nodes to maximize influence
In [32], a method is proposed to select the initial seed set from super-spreaders by reducing the overlap that exists among members with high k-shell. This method is based on the k-shell decomposition method, which arranges the nodes of each community by the number of shells from large to small. The proposed CKS + method measures and ranks the nodes' spread power in each community and selects a group of influential nodes with the highest rank to initiate the propagation process. This method is similar to our proposed method in terms 1 3 of separating input graph into communities, using the k-shell decomposition method, and selecting the initial seed set from influential nodes.
The community finding influential node (CFIN) algorithm introduced in [33] focuses on identifying k users from a network's community structure to maximize influence spread. This algorithm includes two key parts: seed selection and local community spreading. In the first component, seed nodes are chosen from relevant communities detected by a community detection algorithm. The second component focuses on independent influence spread within each community, utilizing a straightforward path with the final seed nodes. CFIN is relevant to our proposed method because it also considers community structure for seed selection.
In [34], the authors introduce an algorithm that improves time efficiency by using optimal pruning and minimizing dominating nodes. This algorithm also modulates node scores with high Rich-club coefficients to select seed nodes. The first step is to select an optimal set of nodes using minimum dominating nodes and node scores to reduce computational overhead. Afterward, scoring adjustment is performed to choose seed nodes that evade the Rich-club phenomenon and promote diffusion in large-scale social networks. This algorithm is relevant to our proposed method in terms of time efficiency and node selection based on node scores.
Finally, [35] proposes a method to increase influence in a social network by identifying influential nodes through community structure and influence distribution. The technique involves two stages: candidate and greedy. In the candidate stage, a heuristic algorithm selects potential nodes from the interior and boundary of each community, while in the greedy stage, the sub-modular property-based greedy algorithm is used to determine the seed nodes with the greatest incremental influence from the candidate set. This method is relevant to our proposed method because it also considers community structure and uses a greedy algorithm for seed selection.

Proposed method
Consider a network represented by a graph G = ⟨V, E⟩, where V denotes the set of nodes that correspond to individual users, and E ⊆ V × V represents the edges that represent the relationships between the users. An edge e connecting two nodes v and u can be represented as e = {v, u}, and the nodes v and u are referred to as neighbors. To denote the set of neighbors of node v, we use Г(v) ⊂ V. The degree of node v, which indicates the number of its neighbors, is given by k(v). Definition 1 (Community structure) One of the key characteristics of complex networks is their community structure, which refers to the way in which network nodes interact with one another in groups. A group of nodes in a graph can be classified as a community if the number of communication edges between them is substantially higher than the number of edges they share with nodes not belonging to the group. Let us denote the single community by com and the set of communities by communities.

Definition 2 (Common neighbors)
The count of nodes that are directly connected to both endpoint nodes of an edge is known as the number of common neighbors. In other words, it represents the number of nodes that share a direct connection with both nodes at either end of the edge. In order to compute the quantity of common neighbors for both endpoints of an edge, Eq. 14 is utilized.
where Г(v) and Г(w) denote the set of neighbors of nodes v and w. is used. Given that friends in higher cores tend to have a greater spreading power, the shell number can be incorporated as a coefficient in Eq. 16. By doing so, we obtain Eq. 17, which allows us to calculate the distribution of friends for each node across different cores.

Definition 5
The community diversity of node v is defined by Eq. 18, which quantifies the number of distinct communities to which the neighboring nodes of node v belong.
where n represents the number of communities and p i is the probability of the presence of friends of a node in community i. This probability is calculated as Definition 6 (min-max normalization) In this method, the current value of each indicator is subtracted from its maximum value and then divided by the difference between the maximum and minimum values. Equation 19 illustrates the calculation of the min-max normalization: The general framework of the proposed method is illustrated in Fig. 1. The proposed method begins with the entry of the complex (social) network as a list of edges. Next, the network is partitioned into different communities, and the edges of each community are assigned weight. Then, within each community, the spread power of the nodes is measured and ranked. Finally, a group of influential nodes is selected to initiate the propagation process. The following sections will examine each part of the proposed method in detail.

Partitioning the network graph into different communities
By partitioning the input network into communities, it becomes possible to select spreading nodes from various parts of the network. This approach is more effective than selecting sources from a particular region of the network, which can lead to a limited spread. Instead, by initiating spreading from different communities, it is possible to reach a larger number of nodes across the network. In this paper, the method introduced in [36] is used to extract the input graph to communities. This method is proposed for the rapid extraction of community structure in large networks and is in the category of bottom-up methods. For instance, if the (19) new_value = current_value − max max − min Fig. 1 Steps of the proposed method algorithm is applied to a small network which has been created by the authors, the outcome can be observed in Fig. 2.

Weighing the edges of each community
Upon identifying the communities, we proceed to weigh them to differentiate the edges within each community. In most networks, the amount of real communication between network nodes is not readily available. Therefore, we use the number of common neighbors and the degree of the nodes at both ends of the edges to calculate the weight of the edges. The choice of these criteria is motivated by the fact that a higher number of common neighbors between two nodes implies a stronger connection between them. Moreover, an edge that connects two nodes with a large number of friends is more critical in terms of communication. Equation 20 Fig. 2 Extracted communities in sample network is employed to assign weights to the communication edges connecting nodes within each community.
In Eq. 18, a value of one is added to the number of common neighbors in order to ensure that the weight of the edge is not zero if the two endpoints of the edge do not have any mutual friends. Pseudocode 1 provides a detailed description of the process for calculating the edge weights within each community.
The algorithm takes as input a graph G = (V, E) and the communities of nodes in the graph (represented as a dictionary where the keys are the community names and the values are the sets of nodes in the community). The output of the algorithm is a list of weights for each edge in the graph. In summary, the algorithm calculates the weight of edges based on the number of common neighbors between nodes and their degrees.

Calculate the spread power of the nodes of each community
After the edges within each community have been weighted, the algorithm calculates the spread power of each node within the community. The spread power is a metric that measures the potential influence of a node on the other nodes within the community. The ranking of nodes is then based on their corresponding spread power values, with higher power values indicating nodes that are more influential within the community.
After normalizing the three indicators for each community node using the min-max method, their diffusion power can be evaluated. Equation 21 can be utilized to calculate the spread power of each community node, where the adjustable parameters α, β, and γ fall within the range of [0,1]. As mentioned in Definition 6, the normalized indicators are scaled between 0 and 1 to ensure fair comparison

3
A method based on k-shell decomposition to identify influential… Finally, to increase the monotonicity of the nodes, the expanded spread power of each node v is calculated by Eq. 22: where Г(v) represents the set of neighbors of node v.

Choosing the best group of powerful nodes
Once the diffusion power of nodes across network communities has been determined, Pseudocode 2 is employed to identify the optimal set of influential nodes.
In pseudo-code 2, the communities with a population less than the input threshold value, θ, are removed in lines 2 to 4. Then, the number of candidates for each community to be part of the seed set is determined based on the ratio of the community population to the total population (excluding the removed communities) in lines 8 to 10. Communities with higher population can have more candidates for the seed set. Once the number of candidate members for each community is determined, nodes with the highest value of expanded spread power, calculated in the previous section, are selected from each community and added to the initial seed set.

Time complexity of the proposed method
The proposed method's time complexity involves the following calculations: (1) The community detection algorithm [34] employed in the proposed method has a time complexity of O(nlogn) when applied to a graph with n nodes.

SP(v)
(2) Pseudocode 1 is used to weigh the edges of each community. Assuming the number of communities is |communities|= c, and each community has n/c nodes (where n is the number of nodes in the graph), on average, each node has a degree of k , which means it has k neighbors. Therefore, the time complexity of Pseudocode 1 is O c × n c × k . (3) Calculating the spread power of each community's nodes: In this section, the sum of weights of neighboring edges for each node is calculated with a complexity of k . Calculating the diversity of a node's friends in different cores first requires using the k-shell algorithm, which has a complexity of O(|E|). Then, the diversity of a node's friends in different cores and communities is calculated for each node with a complexity of k + k . Finally, the expanded propagation power of each node can be calculated with a complexity of O c × n c × k .
Finally, to select a suitable subset of influential nodes, Pseudocode 2 is used, which consists of two parts: The first part (lines 2-6) removes any communities with a size smaller than the threshold value θ and calculates the total population size of the remaining communities. The time complexity of this part isO(c) . The second part (lines 8-17) selects a seed set by iterating over each community and selecting nodes based on their extended spread power. The second loop is repeated for the number of communities, and in each iteration, the inner loop is executed for a maximum of the number of members in the community; so, it has a complexity of O c × n c . As a result, the time complexity of the proposed method can be expressed asO(nlogn).

Evaluation
In order to assess the effectiveness of the proposed approach, hereafter referred to as HKCD (hybrid k-shell-based methods using community detection), a comparison was made between various methods, such as degree centrality (DC), betweenness centrality (BC), closeness centrality (BC), k-shell (KS) [17], extended neighborhood coreness (CNC +) [19], clustering coefficient and local degree (CLD) [20], mixed core, semi-local degree and weighted entropy (MCDE) [37], local gravity (LG) [24], and multi characteristics gravity (MCG) [25]. All methods were implemented using the Python programming language and executed on a computer equipped with a Core i7 2.6 GHz CPU and 32 GB RAM.
These methods were applied to eight real-world datasets, including Karate Club, Dolphins, Copperfield, JazzMusician, NetScience, Hamsterster, PowerGrid, and PGP. Additionally, the proposed approach was tested on a sample network shown in Fig. 2, as well as on two synthetic datasets generated using the Lancichinetti-Fortunato-Radicchi model [38], which is a standard for creating networks with community structure. The LFR model has several parameters such as the number of nodes, average node degree, mixing parameter of the community structure, and powerlaw degree distribution. For the LFR-200 dataset, the parameters used were γ = 2, ⟨d⟩ = 5, |V|= 200, and µ = 0.2, while for LFR-1000 dataset, the parameters were

3
A method based on k-shell decomposition to identify influential… γ = 2, ⟨d⟩ = 10, |V|= 1000, and µ = 0.2. Further details regarding the datasets used can be found in Table 1, which includes information such as the number of nodes (|V|), number of edges (|E|), average degree ( k ), and maximum degree ( Max(k)).

Ability to assign separate ranks to different nodes
The initial experiment focuses on evaluating the ability of different methods to discriminate between nodes. To accomplish this, the study employs the monotonicity function [19], which is calculated using Eq. 23: The function takes into account the number of nodes assigned to each rank in the ranking list R, denoted by |V| r , as well as the total number of nodes, |V| . The resulting value falls between 0 and 1, with higher values indicating greater discrimination ability in the ranking list.
The ranking of the nodes in Fig. 2 is presented in Table 2. The table is limited to only the first 19 ranks and a few methods due to its size. As Table 2 shows, although the sample network has 50 nodes, the KS method only places all nodes in the first, second, and third ranks because it assumes that the nodes in a shell are equally powerful, while the HKCD method has been able to assign a separate rank to each node almost by differentiating them in terms of the level of spread power. In other words, it has been able to distinguish nodes from each other in terms of the extent of their spread power. So, the k-shell method performs the poorest, and the proposed method, HKCD, performs better than all other methods and has a higher monotonicity. Figure 3 depicts the distribution of nodes across different ranks. It is worth noting that the greater the variation in the number of nodes assigned to each rank, the more desirable the method is in terms of the monotonicity index. The results are presented  Fig. 3 which indicate that the proposed method (HKCD) outperforms other methods, particularly in higher ranks. Notably, both the closeness and CNC + methods also exhibit good performance after the proposed method. It is quite clear in this figure that other methods have failed to rank the weak nodes. Only the proposed method, and to some extent, the CC model have been able to place the weak nodes correctly at the end of the list from rank 30 onward. Table 3 shows the discriminating power of the compared methods on other datasets. The data in the table show the superiority of the proposed method compared to other methods. Of course, in the PGP dataset, the CNC + method has similar results to the proposed method. However, in other datasets, the proposed method has shown better results in differentiating rank to nodes. The superiority of the proposed method is in selecting nodes from different parts of the network and paying attention to local and general measures in the graph, which makes the results of the proposed method more significant than other methods. This superiority can be due to the use of different indicators, such as the diversity of the presence of friends of the node in different cores and communities, as well as the use of the total weight of the edges connected to the node because it is rare to find nodes in the graph for which all these indices are the same. In this way, the combination of these criteria can result in different spread powers for different nodes.

Ranking accuracy
In the second experiment, the accuracy of various methods in ranking nodes based on their spread power is examined. To conduct this experiment, first the spread power of each node is calculated using a diffusion model, and based on the obtained spread power, the actual rank of that node is determined. This creates a ranked list of nodes (List 1). Then, using the ranking method proposed by each method, we rank the nodes (List 2). We then calculate the correlation between these two lists (List 1 and List 2). The higher the correlation between these two lists, the more accurately the method was able to rank the nodes. In other words, that method has performed well. Various models have been proposed to simulate the spread process in the network, including the linear threshold model (LT), the independent cascade model (IC), and the susceptible-infected-recovered model (SIR) [39]. In the LT, a threshold must be assigned to each node, and if the number of active neighbors of the node exceeds this threshold, the node becomes active and participates in the propagation process. Since calculating this threshold is difficult in practice and even impossible, it is randomly assigned in many studies.
In the IC diffusion model, each node is in one of the active or inactive states. When using the IC to calculate the spread power of a node, it is set to the active state, and other nodes are set to the inactive state. In each time interval, the activated nodes are given a chance to activate their inactive neighbors so that they can activate them with a p probability. After this attempt, node v becomes inactive. The deactivated node cannot try to activate the nodes in subsequent intervals. This model is repeated until there is an active node left. Finally, the number of activated nodes is counted, and the spread power of S is obtained. The IC model is a probabilistic model, and it is necessary to execute it in significant iterations.
IC model is a specific type of SIR model in which nodes are divided into three groups: susceptible, infected, and recovered. The infected group consists of active nodes that transmit the disease (message) to their neighbors in the network with probability β. Once a node participates in spreading, it recovers with probability γ; meaning it is removed from the list of active nodes and cannot infect its neighbors anymore; otherwise, it can still try to activate its neighbors in the next round. Due to its simplicity, widespread acceptance, and ability to simulate the spread process accurately, the IC model is used in this paper to create a real ranking list of nodes. It should be noted that to increase the accuracy of the model in calculating the spread of nodes, the model was run 1000 times, and the obtained average for the spread power of each node is used.
To calculate the correlation between the ranking list generated by the IC model (List 1) and the ranking list generated by any other method (List 2), Kendall's tau is used. Kendall's tau is a statistical tool used to compare the similarity between two rankings of the same set of items. It measures the number of disagreements between the two rankings, specifically, the number of pairs of items where the relative order is different in both rankings. The Kendall's tau coefficient, which ranges between −1 and 1, represents the degree of agreement between the two rankings, with a score of 1 indicating that the rankings are identical, −1 indicating that they are completely opposite, and 0 indicating no association between them. Table 4 demonstrates that the proposed method for ranking nodes based on their centrality level achieves higher accuracy compared to other methods across all datasets. Nevertheless, only in the Karate Club dataset does the MCG method perform equally as well as HKCD. The HKCD method's capacity to incorporate the diversity of a node's friends across diverse clusters and communities, as well as consider the weight of edges connected to each node when computing its centrality, has resulted in its ability to maintain a high-quality ranking of nodes in terms of structural features, although variations in datasets.

Spread amount of influence
In order to evaluate the effectiveness of the proposed approach in select seed set, a comparison was made between various methods, such as the generalized degree discount (DD) [40], distance-based coloring with degree (DCD) [41], ks (k-shell), distance-based coloring with k-shell (DCK) [41], Maji [42], IMSN [37], and community-based k-shell (CKS) [35]. The spread power of the seed sets selected by different methods was evaluated using the IC model. To accomplish this, the active nodes for each approach were identified in the IC model and the dissemination process was launched from this collection. The final number of activated nodes resulting from the dissemination process was determined as the propagation extent for each respective set. To ensure the accuracy of the computed propagation outcome, the IC model was executed 1000 times and the average spread power was calculated for each set. This approach was adopted to ensure consistency between the simulated and real-world propagation scenarios. The spread amount of influence of each set chosen by various methods in the eleven datasets is presented in Fig. 4. This figure illustrates the amount of spread achieved by the IC model, with the seed set size on the horizontal axis and the spread amount on the vertical axis. The size of the seed set has been adjusted to the dataset size. Figure 4 shows the superiority of the proposed method in all eleven datasets. The k-shell method proved to be ineffective in all datasets due to the choice of high-numbered shells as seeds, leading to substantial overlap between the friends of the seed set nodes. In contrast, the CKS method exhibited promising performance, particularly after the proposed HKCD method, and yielded comparable outcomes for some datasets. By selecting seeds from different communities, the diversity of nodes in the seed set was maintained as the size of the seed set grew, resulting in more nodes being activated as the seed set size increased.

Ablation study
To enhance the overall performance of the proposed method, it is composed of various elements that collectively influence the outcome. Hence, it becomes essential to establish means of assessing the contribution of each of these parts to the overall methods' performance. The proposed method comprises three components: the summation of edge weights connected to the node, the dispersion of node friends in different shells, and the dispersion of node friends in different communities. These components are adjusted using parameters α, β, and γ, respectively, and their impact on the quality of the method is investigated. Figure 5 illustrates the monotonicity and correlation of the method in different settings, with seven states evaluated to examine the parameters' effect. In Case I, only parameter γ = 1 is considered, while parameters α and β are both zero, meaning that only the dispersion of node friends in different communities is considered. In the other cases, various combinations of parameters are tested. The results in Fig. 5 show that the method's performance is superior in Case VII Fig. 4 Spread power of the selected sets in different datasets (α = β = γ = 0), where all components of the method are considered, compared to other cases. Therefore, utilizing all components of the model significantly enhances its quality.

Running time
In Fig. 6, the execution time of the proposed method is compared with other methods on some datasets. The results of this experiment show that, despite the fact that, unlike many other methods, the proposed method uses a combination of different indices, the computational efficiency of the proposed method is acceptable with changes in network size. The main reason for the suitability of the execution time of the proposed method in separating networks into communities and extracting relevant indices for each node locally within that community.

Summary and future work
This paper proposes a novel method for selecting an optimal set of influential individuals to start the process of information propagation. The proposed method is based on a combination of the total weight of the edges connected to each node and the diversity of the presence of its neighbors in different shells and communities. It outperforms other existing methods in terms of ranking accuracy and the spread of influence in both small and large datasets. The proposed method stands out due to its intelligent selection of the initial seed set from different communities, leading to the spread of influence to different parts of the network. Additionally, selecting nodes based on the weight of the edges and the diversity of their friends' presence in different shells and communities further improves the method. By partitioning the network into different communities, the proposed method limits searches from the entire network to smaller communities, making it possible to use the method on more extensive networks.
The proposed method provides a framework for measuring the spread power and selecting a group of influential nodes to maximize influence. There is potential for further research to enhance the proposed method, including its application to weighted and directed networks. Additionally, the authors plan to assess the effectiveness of their proposed method on alternative models in future studies.
Author's contribution B and A wrote the main manuscript. B, R, and A prepared all figures. R, A, B, and M edited the manuscript. All authors reviewed the manuscript.
Funding This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

Data availability
The datasets analyzed during the current study are available in the Konect repository, Konect.cc.

Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval Not applicable.