A Novel Attributed Community Detection by Integration of Feature Weighting and Node Centrality


 Community detection is one of the basic problems in social network analysis. Community detection on an attributed social networks aims to discover communities that have not only adhesive structure but also homogeneous node properties. Although community detection has been extensively studied, attributed community detection of large social networks with a large number of attributes remains a vital challenge. To address this challenge, a novel attributed community detection method through an integration of feature weighting with node centrality techniques is developed in this paper. The developed method includes two main phases: (1) Weight Matrix Calculation, (2) Label Propagation Algorithm-based Attributed Community Detection. The aim of the first phase is to calculate the weight between two linked nodes using structural and attribute similarities, while, in the second phase, an improved label propagation algorithm-based community detection method in attributed social network is proposed. The purpose of the second phase is to detect different communities by employing the calculated weight matrix and node popularity. After implementing the proposed method, its performance is compared with several other state of the art methods using some benchmarked real-world datasets. The results indicate that the developed method outperforms several other state of the art methods and ascertain the effectiveness of the developed method for attributed community detection.


Introduction
Community detection is an essential social network analysis method developed for the aim of detecting communities of nodes that are similar to each other with respect to some similarity criteria [1][2][3]. Especially, community detection is a vital network analysis task that has been applied to many real applications such as social networks analysis from Facebook, Twitter or blogosphere, information networks in web or IT infrastructures, molecular complex network from biomedical data as in protein-to-protein interaction networks [4][5][6]. Typical community detection algorithms attempt to identify clusters of nodes that have high intra community similarity and low inter community similarity i.e., the nodes inside the community are similar and nodes in outer communities are dissimilar [7][8][9][10].
Several types of community detection methods can be distinguished. The traditional stream of methods focuses solely on the structural dimension of the social networks, i.e., the relationship between network nodes and ignore nodes' attributes [11][12][13]. A such approach uses nodes' structural similarity as a basis for social network analysis and community detection [14][15][16]. However, most of real-world social networks present more information about social network nodes than just the existing relationships between them. For instance, in several social network graphs, it is rather common that certain nodes' attributes such as age, gender, interests, are available as part of a-priori knowledge [17][18][19]. For instance, one may find in Facebook network graph, both information concerning the communication between the different users and the profile of these users, which may include gender, age and location.
In such cases, the network is called attributed social network in which attributes are assigned to each node. In an attributed social network, the network structure determines the first dimension, while attributes ascertain the second dimension of the social network [20][21][22].
Another stream of traditional community detection algorithms (compared to structural methods) relies only on the node attributes to discover network communities and entirely relinquish the relationship between the nodes of social networks. It is clear that methods that consider only network structure or only node attributes to detect a community do not employ all of the existing knowledge in a attributed social network. As a result, if a method can simultaneously use both structure and vector of attributes in detecting communities, this weakness can be defeated and the accuracy of community detection will be improved [23][24][25][26].
It is expected that an ideal community should not only be well linked in terms of its node structure but should also present similar distribution of its node attribute values. This motivates the development of a new challenging stream of methods referred to "attributed community detection methods". This research area is receiving more and more attention due to the fact that the common use of network structure and node attributes increases the information about social network nodes and makes the combination of network and attribute information very challenging. In this case, similarities between nodes can be calculated using two criteria, including structural and attribute similarity. The structural and attribute similarities can be computed using network topology and nodes' vector attributes, respectively. The network topology structure demonstrates the relationship between nodes, while the node attribute demonstrates the common properties between these nodes. These similarities play vital roles in the organization of the network communities as in [12,27,28]. Fig. 1 presents an example of a small co-authorship network, where each scholar is associated with two attributes consisting of "research area" and location "country". When a traditional (structural) community detection method is applied to this attributed social network, the generated communities would only have a cohesive structure, but their attributes may have random values. In case of an attributed community detection scheme, one shall consider both the topological structure of the network as well as the attributes of its nodes. The objective is to partition the social network into structurally cohesive and attribute homogeneous communities. Fig. 1 illustrates two such communities. In the left community, the attribute "research area" shows a common value of "DM" (which stands for data mining), but the attribute "country" takes random values. In contrast, for the right community, the author labels correspond mainly to "US" but their research interests cover a wide range of topics. Intuitively, the left community is likely to include a closely working international team in data mining, which explains the spread in location attribute, while the right community would suggest an internal collaboration within a large lab in the US. Therefore, attributed community detection algorithm would shed light on the different communities that can be inferred while respecting both the node structure and attributes such detected communities have both high structural cohesiveness and value homogeneity in many attributes.  [29] Although a lot of attributed community detection approaches have been proposed in the recent years, they still suffer from several drawbacks that are summarized below:  Most of these community detection algorithms rather relinquish node attributes and only focused on the network structural knowledge despite the claim to handle both aspects simultaneously.
 Many of the previous attributed community detection methods utilize all the attributes to calculate the similarity between two nodes, while nodes in some attributed social networks are described through a large-number of features, some of which might be redundant or irrelevant. The presence of a large number of these irrelevant and redundant features in the attributed social network has a negative effect on the performance of the network analysis algorithm and also increases the computational complexity.
 In most of the previous attributed community detection methods, the weight of all attributes is considered equal, while some attributes can be considered more important than others.
 One of the most widely used methods for community detection is Label Propagation Algorithm (LPA). The LPA algorithm, initially devotes a unique label for each node and then chooses the node with the highest frequency in different updated iteration.
As a result, this algorithm faces instability and low performance due to the progress of giant communities resulting from the equal importance of nodes and random behavior in the updating iteration.
To defeat the aforementioned disadvantages of the previously attributed community detection approaches, in this study, an improved LPA algorithm is devised. The proposed method attempts to detect efficient communities at an acceptable time. In this proposed method, a combination of structural and attributed similarity is put forward to detect the final communities. Moreover, in this developed algorithm, utilizing the node centrality measure, the importance of the nodes in label updating process is employed. The proposed community detection method has several novelties compared to the standard LPA and other previous attributed community detection methods, summarized into the following:  An improved LPA algorithm that combines structural and attributed similarity through a new similarity measure in a way to reduce iteration time and keeps the original time efficiency is proposed.
 A new method for attribute weighting and attribute feature selection is proposed. The proposal is shown to, on one hand, to increase the quality of the similarity measure in terms of accommodating of both structural and attribute information, and, on the other hand, to reduce the burden of complexity by eliminating irrelevant and redundant attributes.
 An improved criterion of node importance in social networks is developed, that can be employed as a communities heads. Utilizing the node centrality criterion to calculate the importance of attributes and applying this criterion in propagating label is shown to improve the community detection performance.
 A comprehensive testing using some publicly available dataset with a comparison of some state-of-the-art community detection methods have been conducted and whose results demonstrated the technical soundness and practical usefulness of the proposal.
In the remainder of this study in Section 2 previous work in the area of community detection is reviewd. Section 3 details the proposed attributed community detection method. Section 4 reports the experimental results on different real-world attributed social network datasets.
Finally, Section 5 presents the conclusion and some perspective work.

Related Works
Community detection has been a challenging research field in the social network analysis, machine learning and graph mining where several extensive reviews on the community detection methods have been produced [11,[30][31][32][33]. In this section, some of the popular community detection methods is reviewed.
In terms of the type of social network, previous community detection approaches can be grouped into two categories: Non-Attributed and Attributed community detection approaches [34]. In the Non-Attributed community detection, only the linkage network structures are considered, while the node attributes are fully ignored. This category of community detection can be divided into four main classes consisting of Hierarchical community detection, Modularity algorithms, Random Walk models, and Label propagation models [35][36][37]. On the other hand, Attributed graph methods focus on both structural and attribute knowledge. This category also be partitioned into four popular groups consisting of Edge Weighting, Augmented network, Fitness Function Optimization, and Unified Distance [38].
Hierarchical community detection is a useful and an old algorithm for discovering groups in social network analysis [2,39,40], which starts by defining a similarity criterion. Hierarchical community detection-based model can be grouped into two categories: agglomerative-based model and divisive-based model. In random walk-based models, each vertex includes an initial walker's state. Next, walkers select the current vertex neighborhood at random for localization. Modularity-based models use modularity criterion to identify clusters. These community detection algorithms assume a high modularity quantity for the detected community. The aim of modularity models is to identify clusters of graphs to optimize modularity performance [41,42].
The before-mentioned community detection methods consider only one factor of the network structure but ignore node attributes information.
Attributed community detection methods employ information from both network structural and attributes to detect communities in social network [43], where the detected communities are clusters of densely linked vertices and also very similar to each other in their attributes.
Many community detection methods is developed to employ attribute knowledge in addition to the social network structure. These previous attributed community detection methods are grouped into four classes: (1) Converting an attributed social network to a weighted network, (2) distance models, (3) model-based methods, and (4) subspace models.
The first group try to convert the original attributed network to a weighted graph. In these methods, The weighted graph of the network is constructed based on the attribute information of nodes. The distance-based group consists of distance models and the network structural knowledge is remained in a vertices similarity measure, that integrated with the attribute similarity measure. The model-based group are utilized probabilistic model try to avoid the design of an unrealistic distance criterion. The subspace-based models detect the communities only on the subspace-based of their related properties as a subset of all nodes' attributes, particularly for large social network. In Fig. 2, a categorization of the previously mentioned community detection algorithms is highlighted where one distinguishes the Attributed community detection algorithms and the Non-attributed community detection algorithms we well their various ramifications. In overall, we noticed that the detecting communities with previously proposed attributed community detection methods is too slow to be effective in big data scenarios, in aspect of computational complexity and aspext of executation time. Indeed, They utilize data structures that must be entirely reconstructed if the input data modifies; and, sometimes, the number of social network clusters must be determined by the user. Furthermore, most previous attributed community detection methods require categorical attributes, these methods force the user to discretize non-categorical attribute ranges, resulting in a loss of information in similarity criteria. As a result, there is a need for novel efficient community detection algorithma that consider both structural and attributed information with the capability to detect communities in a reasonable time.

The proposed Attributed Community Detection Method
In this section, we detail our innovative attributed community detection method by integrating the concept of Label Propagation Algorithm with Feature Selection (LPAFS). The LPAFS method is cast into the class of attributed community detection methods and considers importance of nodes attributes in its community detection process. LPAFS includes two main phases: (1) Weight Matrix Calculation, (2) Label Propagation Algorithm-based Attributed Community Detection. The aim of the first phase is to calculate the weight between two nodes that have edges with each other using structural and attribute similarities. For this purpose, two techniques of attribute selection and attribute weighting were used as will be detailed later. In the next phase, we utilize the previously calculated weight matrix and node popularity to detect the different communities. The flowchart of the developed attributed community detection method is shown in Fig.3. Also, the details of these two phases are explained in the reminder of this section. Computing the sum label of a neighbor node and choose the label with higher node s strength.

Weight Matrix Calculation
Although several community detection approaches has been develped, the community detection of real social network with a large number attributes, remains challenging [13]. In this regard, similarity of nodes can be calculated using two criteria: structural similarity and attributed similarity. The structural similarity is calculated considering network structure, while the attribute similarity is evaluated by utilizing the internal properties of nodes that are completely autonomous of the network structure.
More formally, let G = (V, E, A) be the attributed social network, where V is the set of vertices, E the set of links denoting the existing relationshipb between the vertices, and A shows the set of attribute vectors of each node. n=|V| is the number of nodes, m=|E| is the number of edges, and ( 1 , 2 , 3 … ) indicates to each node in V, its corresponding k characteristic (attributes), where k stands for the dimension of node attributes.
Nodes in attributed social networks are usually described by a large number of attributes.
Many of these attributes may be irrelevant to the given network analytic application, which may impact the efficiency and computational complexity. Therefore, reducing the dimensions of network attributes is a fundamental task in the applications of social network analysis.
One distinguishes two main ways to reduce the size of dimension: feature extraction and feature selection [44,45]. In feature extraction, the initial space of features is mapped to a smaller space [46][47][48][49] where a small number of features are generated by combining the initial features in a way to preserve the information in the original inputs [50][51][52][53]. On the other hand, in feature selection, a subset of the primary attributes is selected according to some predefined criterion in a way to improve prediction and performance [54][55][56][57].
In the first step of this phase, the weight of an individual attribute is calculated using Laplacian Score (LS) feature selection method [58]. Laplacian Score is filter-base based feature weighting method that. The purpose of LS is to measure the importance of features based on their ability to preserve locality. The Laplacian Score for Attribute is calculated using where, ( ) denotes the value of the attribute in the -th node, ( ) determines the average of the attribute in all nodes, is a diagonal matrix such that = ∑ , and shows the neighborhood relationship between different nodes, calculated as Equation (2): where, is a suitable constant, represents − ℎ node. and are neighbors if there is an edge between them and ( , ) denotes the Jaccard Distance between two nodes of and calculated as follows: where Г( ) corresponds to the first-order neighborhood of node .
After calculating the importance value for all attributes, the weight of the edges between different nodes in the social network should be determined based on node attributes and network structure. In the proposed method, a convex combination of structural similarity and attributed similarity is utilized to calculate the weight between nodes. The weight of the edge between the nodes and is calculated based on the following equation: where, and indicate structural similarity and attributed similarity between node and nodes, respectively. is a parameter that controls the balance between structural and attribute components.
In this paper, the structural similarity between the two nodes is calculated using the concept of Network Motifs [59]. For this purpose, network neighbors and triple censuses were used to calculate the Network Motifs-based similarity method. The proposed similarity measurement was calculated using the node conflict distribution in network patterns. Each pair of network nodes and each of their common neighbors create a triple network motif. Therefore, each pair of network nodes can be a member of many three network motifs. The effect of these three network patterns is the main difference between this method of calculating similarity and other methods of neighborhood similarity. Network Motifs-like similarity is defined by the following equation: where,  (i) represents the neighbors of nodes and ∅( , , ) represents the number of Network Motifs including nodes , and .
On the other hand, the attributed similarity criterion is calculated using the proposed attribute similarity method. One of the drawbacks in the previous methods is that the weight of all the attributes is constant in calculating the similarity. While in many real networks the value of attributes will not be equal. To solve this limitation, this paper presents an improved similarity criterion called weighted attributed similarity. This criterion is calculated as follows: where , and represent the total number of attributes, the total number of numerical attributes and the total number of non-numerical attributes, respectively. Moreover and non-numerical attributes, respectively, modelled using (7) and (8): where, indicates the weight of attribute calculated using Laplacian Score. Moreover ( ( ), ( )) is calculated using Equation (9).
where ( ) and ( ) denote the l-th numerical attributes of node and -th non-numerical attributes of node

Label Propagation Algorithm-based Attributed Community Detection
By considering an attributed social network = ( , , ) and the number of clusters K, the proposed community detetcio method in this paper is to group the initial social networl vertex- The standard Label Propagation community detection, which utilizes only social network structure to detect dommunities, first, defines a unique label to each node and next chooses the most frequent node in different steps of update process. All nodes under the dense sub community which attained the similar label can be grouped as community if the algorithm reaches an iteration in which there is a maximum number of adjacent tags [60].
Due to the assumption of similar influence of nodes, random process in the updating step and tiebreaker mode, the standard LPA faces instability and low performance. In each community, there are more distinguished nodes that make a major importance (the center of community), therefore, social network nodes located near the center of community campared to border nodes, more influential nodes. Typically, the nodes in the community have different influence scores and the nodes with higher importance play the role of the dominant nodes, and the nodes with lower importance is in charge of non-domination. All the nodes in the community are not of similar importance, and a given node either affects other nodes (dominant) or is affected by another node (non-dominance).
As mentioned, in the first phase, the graph = ( , , ) was converted into the graph = ( , , ) with structure-attributes similarity. In this phase, the popularity of each node in social network is calculated using Laplacian Node Centrality. The nodes with higher popularity in the weighted social network will be penetrable on their neighbor in terms of structure and attributes. Those nodes that occupy a central position in the community and have a significant number of links with other nodes of the community make a remarkable portion in leading, guiding, and constructing ability and consistency in the group, while the members of the community that are on the border of clusters may lead and guide a mediator duty between different clusters. Calculating the measure of centrality or identifying more "central" and Influential nodes has been a major impact in the analysis of social networks [61,62]. In this work, the Laplacian Centrality (LC) [62] is employed to calculate the centrality of nodes.
Formally, for the generated weighted matrix of graph G, is defined as below: and is titled the sum weight of nodes and ( ) is the set of 's neighbors. Also, the Laplacian Energy of is calculated as below: Finally, the Laplacian Centrality ( , ℎ) of node is calculated using the following formula: Where ℎ is the graph acquired using deleting from graph ℎ.
After calculating the Laplacian Centrality of each node, each node's Label Influence (LI) is computed as follows: where, ( , ) denotes the influence of the label on the node , that The pseudocode of the developed attributed community detection is shown in Algorithm 1.

Input: Social Network G = (V, E)
Output: Detected Communities C = {C 1 , C 2 , … . C k } 1. Calculating Weighing Matrix using Eq. (4) 2. Allocate a unique label to initial social network node 3. Compute the label influence using Eq. (13) 4. While the label of nodes change or i < Threshold do 5. Vector is constructed by ordering nodes based on their strength.

7.
For each node in update its labels acceptance using Eq. (14) 8. In case of a tie-break, calculate the sun label of a neighbor node and choose the sun label with the highest importance.

End while
11. Generate communities according to equal labels.

Computational Complexity
The complexity of the developed algorithm is computed in this subsection. Suppose that a graph = ( , ) with = | | nodes and = | | links and let be the average node degree.
Our method described in Algorithm 1 contains several repetitions and each iteration independently runs with itself compitional complexity. First phase, the complexity of the computation to initialize all nodes is indicated by unique labels with ( ) compliexity. Then, in the next phase, label influence is calculated. For this purpose, Weight Matrix should be computed. The complexity of this phase is ( ), which can be approximated by ( ) because in a such large-scale social network, it holds ≪ , and node centrality is calculated as ( ). In the next phase, we perform the sorting the nodes using their influences. This yields a complexity of ( ). The next pahse performs the label propagation procedure. The conoutiional complexity of this procedure is calculated with ( ) complexity that equivalent to ( ). In the final phase, each nodes are assigned to their communities with complexity of ( ). Generally, the final complexity of developed method is ( + + 2 ) ≈ ( ).
As a result, since real social networks tend to have sparse edges, the nodes will be approximately equal to edges, therefore, ( ) ≈ ( ). In other words, the time complexity of the proposed method is reduced to linear complexity of the number nodes of the network, which corresponds to a significant improvement over alternative implementation schemes.

Experimental Result
To measure the efficiency of the developed community detection approach, various experiments are designed. The performance of the developed approach is compared with four new methods of attributed community detection: Adapt-SA-Soft [63], Adapt-SA-PCA [63], and Subspace Stochastic Block Model (SSBM) [64]. The detail of these methods is listed in Table 1.
In these experiments the developed approach and other compared community detection methods were implemented using Python language programming on an Intel Core-i7 CPU with 8GB of RAM. The results are acquired over ten separate and autonomous runs to attain more precise and acceptable assessments.

Adapt-SA-Soft
Adapt-SA-soft is a weighted K-means method with local learning and an additional step to the fuzzy K-means for attributed community detection. 2017

Adapt-SA-PCA
Adapt-SA-PCA is a weighted K-means method with local learning and an PCA dimensionality reduction phase for attributed community detection.

SSBM
Subspace stochastic block model (SSBM) to detect the community structures in attributed social network. The main purpose of this method is to consider both topological structure and attribute information as the latent factors to analysis the organization of communities.

2020
In the reminder of this section, the details of the used datasets, evaluation criteria, proposed method parameter and experimental results, are described.

Used Dataset
In this study, several datasets with different properties are used in the experiments to show the effectiveness and robustness of the developed attributed community detection approach.
These attributed social network real-world datasets consist of Citeseer, Cora, Cornell, Texas,

Washington, Wisconsin and Political Blogs.
Cora [65] contains a set of vertex demonstrating scientific publications where an edge between two vertex is a citation from a publication to another. The attributes of this dataset is demonstrated using a set of unique words.
Citeseer [65] is another citations network where each vertex belongs to one groups of agents, artificial intelligence, databases, human-computer interaction, information retrieval, and machine learning.
WebKB [65] citations network includes scientific publications, which belong to Web page networks of four universities: 1) Cornell; 2) Texas; 3) Washington, and 4) Wisconsin. Each page network is classified into five classes: course, faculty, student, project, and staff.
The main specifications of these attributed social network datasets are detailed in Table 2.
These social network datasets have been selected according to various properties such as number of attributes, number of nodes, number of edges, and number of communities.
Because datasets with large attributes are one of the most challenging aspects of attributed community detection, more attention has been paid to the number of attributes to choose the dataset.

Evaluation Criteria
Two major categories of criteria; namely, external and internal criteria, are utilized to measure the efficiency of our method as well as alternative different methods. In the reminder of this subsection, these measures will be detailed. Accuracy is another popular statistical criterion that refers to the correlation of the evaluations to a determined value. In the community detection, a high accuracy value indicates a high correlation between the detected communities and the real communities in the social network.
This criterion shows the ratio of the number of nodes whose community has been correctly detected to the total number of nodes.
The F-measure integrates the precision and recall concepts from an information retrieval perspective. The recall and precision evaluations are calculated as follows: where is the set of node pairs that are assigned to the same community and is the set of node pairs that have the same label.
F1-Score integrates the precision and recall concepts into a single evaluation as follows:

Proposed Method Parameters
Unlike many community detection methods that have many parameters, the proposed method involves only one user-defined parameters. The parameter in Equation (4)

Experimental Results
In this subsection, to evaluate the efficiency of the proposed method, the results of our community detection method are compared with previous attributed community detection methods that employ both network structural knowledge and node attributes. The community detection results are demonstrated in Table 3-5. To evaluate the community detection methods, NMI, Accuracy, and F1-score criteria are employed. Table 3 records the average NMI value of ten autonomous runs. The best average values of accuracy are marked in boldface. The reported results of Table 3 show that in all cases the proposed method outperforms other community detection methods. It also reveals that the average accuracy of the developed approach in all social network datasets was 0.375, which is 0.026 higher than the average NMI for the second-ranked method (i.e. SSBM).  Moreover, Table 4 shows that in all social network datasets except Wisconsin, the developed approach had the highest accuracy and ranked first among all compared attributed community methods. Also, in the Wisconsin dataset, the accuracy of the developed approach ranked second with a difference of 0.001 compared to the SSBM community detection method.

Sensitivity analysis
In this subsection, we investigate the sensitivity analysis with respect to parameter in the proposed method. Fig. 9 shows the NMI performance metric when varying the parameter from 0.2 to 0.8 in the different dataset. The result shows that setting the parameter to 0.6 yields optimal NMI value. The choice of the parameter values between 0.2 and 0.8 is motivated by the desire the avoid extreme cases where the tradeoff between attributed similarity and structural similarity scores is not exhibited.

Statistical analysis
In this subsection, the nonparametric Friedman test [66] is applied to assess the statistical significance of our findings. In other words, we use Friedman test to compare the efficiency of the different community detection approaches on different social network data. For this purpose, each community detection approach is ranked on each social network dataset the SPSS statistical package was used.

Discussion
The developed approach has at least three major innovations that have made it perform better than other methods: 1-The majority of previous community detection approaches mostly eliminate node attributes information and only look at the network structure despite the claim to handle both aspects of the network at the same time. In this study, through a novel similarity measure, an improved attributed community detection method that integrates structural and attributed similarity is developed and successfully tested.
2-Irrelevant attributes, as well as redundant attributes, strongly affect the efficiency of the learning model and the result of attributed community detection method. Therefore, attributed community detection method should recognize and weigh original attributes of nodes and ignore the irrelevant and redundant attributes as far as possible. Most previous attributed community detection algorithms, due to the neglectance of attribute weighting phase fail to ignore the irrelevant and redundant attributes accurately. To accommodate this objective, in this study, an efficient attributed community detection method that can efficiently and effectively igonre irrelevant attibutes is developed.
3-In popular community detection algorithms such as LBA, the node centrality is ignored and all nodes are assumed to be equally important, while it is acknowledged that such hypothesis does not hold in many real dataset as those considered in our study. Therefore, our approach was devised to account for such discrepancy and applied to the label updating process.

Conclusion and future works
Community detection is one of the most important research topics in graph mining and social network analysis. Community detection includes clustering the nodes of an input social network into several communities satisfying some specified measures. In the last decade, it has been widely studied how to detect communities in attributed social network. Although a large number of community detection methods have been developed, most of these methods are not appropriate for social networks with large number of attributes due to the low efficiency, high computational complexity, and not being parameter-free. To defeat these weaknesses, in this paper, a novel attributed community detection method is developed by combination of feature weighting and node centrality techniques. After implementation of the developed method, its performance is compared with those from different community detection methods in attributed social networks using popular six real-world datasets. The results denote that the proposed method outperforms other previous methods, ascertain the effectiveness of the proposed method for attributed community detection problem and lies down foundations for the emergence of new social network based analytics that supports large-scale community discovery taking into multiple attributes information.
Due to the fact that this article developed a new measure for calculating similarity in attributed social networks, this measure can be utilzed in other fields of social network analysis, such as link prediction, Information diffusion, and node reputation. For example, in future research, this new similarity calculation measure can be rmployed to predict links in protein-protein interaction networks.

Declaration
• Ethics approval and consent to participate: Not applicable

• Consent for publication: Not applicable
• Availability of data and materials: The dataset used in this study can be obtained from the corresponding author on reasonable request.

• Competing interests:
The authors declare that they have no competing interests.
• Funding: This work is supported by the Academy of Finland Profi5 Project No. 3261291 on DigiHealth.

• Authors' contributions
The specific contributions made by each author is as follows: