Since humans have become the main body in social networks, their social relationships have also been projected onto the network. This kind of network structure containing social relationships is called a social network. Due to the complex relationships between people, it can be called a complex network. [1] We can summarize some people with high similarity in social networks into a community. The so-called community is composed of a group of closely connected nodes in the network. These nodes are relatively closely connected to each other, but are rarely connected to other closely connected nodes. Most intuitively, the process of community discovery is a clustering process. Community discovery aims to discover the most reasonable and true community structure in social networks, and it has become one of the important tasks to explore the network structure [2]. In order to discover excellent community structure, many scholars and researchers have proposed many community discovery algorithms and many algorithms have also been developed to a certain extent. At present, the research results discovered by the community can be applied to many fields, such as: personalized recommendation, network public opinion analysis, disease infection network, etc. [3].
Because the membership of some nodes in the community structure of social networks is not single, the classic discovery algorithm for non-overlapping communities is no longer suitable for overlapping community structures. Therefore, the research on the discovery of overlapping community structures has gradually become a hot spot in this field. After years of development, a large number of relevant research results have appeared in this field. These methods can be roughly divided into: faction filtering algorithms, local expansion algorithms, edge division discovery Algorithms and label propagation algorithms and other 4 categories [4]. The following is a brief introduction to the classic algorithms in these categories and the research in recent years:
The faction filtering algorithm CPM was first proposed by Palla et al. [5]. They believe that a community is composed of some connected complete subgraphs. The operation of the algorithm is to form a fully connected subgraph composed of k nodes (that is, k Factions), search for neighboring factions consisting of k-1 identical nodes from the network to find overlapping community structures. Later, Palla et al. [6] proposed a CPMd algorithm that can handle directed graphs. The algorithm uses directed k factions instead of k factions in the CPM algorithm, completing the overlapping community discovery on the directed graph.. Lu et al. [7] proposed an overlapping community discovery algorithm based on the expansion of greedy factions. First, search for the largest faction in the network, calculate the link strength according to the degree of association between the factions, and convert the original network graph into the largest faction graph. Under the condition of maximizing the fitness function, greedily expand the seed factions in the largest faction graph for community discovery. Zhang et al. [8] merged the searched factions in the network according to the coupling strength, so as to hierarchically divide the obtained tree diagram to obtain the overlapping community structure. This type of method uses factions as the medium to explore the structure of overlapping communities, but its results are not ideal when dealing with relatively sparse network structures, and the time complexity of the algorithm is relatively high.
The basic idea of the local expansion algorithm is that in the network, the seed nodes are found according to the relevant strategies formulated, and then the community expansion is carried out according to the found seed nodes through the local optimization function to obtain the optimal community division [9]. For example, by Lancchinetti The proposed LFM algorithm [10] found the community based on the fitness function of the node, and then selected nodes outside the community as seed nodes for community expansion. Su et al. [11] use random walk strategy for community expansion. For such methods, the most important thing is the selection of seed nodes. For this reason, Wang et al. [12] proposed the concept of a structural central node, and used it as a seed node for local community expansion. Sun et al. [13] proposed the vertex cover growth rate to select the seed node, and combined the random walk strategy to expand the community to discover overlapping communities. Li et al. [14] proposed an overlapping community discovery algorithm based on the greedy expansion of seed nodes, which uses a greedy strategy based on the fitness function to expand the seed set according to the seed nodes. The algorithm can find high-quality overlapping community structures.
Based on the method of label propagation, each network node is assigned a label containing overlapping membership relationships, and through the propagation of these labels between neighboring nodes, the node's membership relationship with each community finally reaches a stable state, thereby obtaining community discovery results. Overlapping community discovery algorithm LPA [15] research. Typical representatives of this type of method in the field of overlapping community discovery are the multi-label COPRA [16] algorithm and the Speaker-listener model-based SLPA algorithm [17]. Lu et al. LPANNI [18] Algorithm
It is not difficult to find that the above algorithm is mainly for the study of nodes and their attributes in the network, in order to find overlapping community structures. In contrast, the edge division discovery algorithm starts from the perspective of edges and discovers the community structure. Wang [19] et al. proposed a label propagation algorithm based on edge propagation probability for the traditional overlapping label propagation algorithm COPRA; GUO [20] proposed an overlapping community discovery algorithm based on edge density clustering. First, take edges as The research object uses density clustering to detect closely connected core edge communities. Then, according to the boundary edge attribution strategy, the boundary edge is divided into the core edge community closest to it. For isolated edges, a community attribution based on edge degree and edge is proposed Isolated edge processing strategy, Wang [21] proposed an adaptive overlapping community discovery algorithm based on the mixed parameters of edge trust, which defines the set of neighbors on the network side and the trust function between its neighbors, through information transmission Obtain the total amount of information of the edge to realize adaptive discovery of overlapping communities. At present, edge partition discovery algorithms have become an important class of overlapping community discovery algorithms.
However, many existing community discovery algorithms are based on the topology of the network, that is, the local information between nodes, while ignoring the influence of the connection relationship between nodes. Real social networks are based on the relationship between people living in reality, and independent people lead to individual differences. A real community should fully consider the importance of connections between nodes. Only in this way can we study the behavior patterns of the entire community through individuals. Many algorithms only perform cluster analysis for the purpose of simply discovering communities, thus ignoring the influence of node connection relationships. In order to overcome this problem, we will propose the relationship similarity of nodes to evaluate the value of nodes to the community. For all nodes in the community, there is a certain degree of similarity between nodes in the same community. We express this value through the local clustering coefficient of the node; and for a node in the community, each node Nodes directly connected to themselves have a higher similarity, and we express this value through the similarity of the relationship between the nodes. On this basis, the two are combined, and an overlapping community discovery algorithm based on relationship similarity and local expansion is proposed. The algorithm sets the node with the highest number as the core node, and then judges whether the node in the community can belong to multiple communities at the same time according to the tendency of the nodes in the community. If it is, mark it as an overlapping node and divide it into the community. The node is removed from the network. Iteratively obtain the division result of overlapping communities. Experimental results show that the algorithm has better performance than several classic overlapping community partitioning algorithms.