A novel indicator weight tuning method based on fuzzy theory in mobile social networks

Mobile social network supports mobile communication and asynchronous social networking. How to measure the importance of nodes is crucial and this problem remains to be answered. Most of the existing methods are subjective, so how to determine the weights of the centrality indicators is the key to solve the problem. In this paper, 9 common centrality indicators are viewed as our research object. We introduce fuzzy theory to partition the indicator weights, and to be more speciﬁc, we deﬁne a membership degree function to get the initial weight interval. With relative entropy, the weights of each centrality indicator can be obtained. By calculating the random data generated by simulation, genetic algorithm with single point crossover is used to optimize the weight of each indicator. Experiments show that the optimized weights are more effective and differentiable.


Introduction
Mobile social networks (MSNs) [1][2][3][4][5][6][7] are networks with device mobility and social communication. With phones or tablets, people who share common interests can create a profile, multimedia posts, instant messaging and play social gaming. What's more, MSNs are used to many fields such as fitness, music, dating, mobile payments and mobile commerce.
With the development of a variety of online social platform, these platform (such as QQ, weibo, circle of friends, etc.) are much more than a social platform for user to communicate, they are the main medium for the generation and dissemination of social information. A mobile social network (MSN) is a mobile communication network centered on "people", in which the efficiency of communication is usually guided by the social relations of people. Figure 1 is the MSNs model. Influence Maximization (IM) [8][9][10][11][12][13][14][15][16] problem is proposed for the study of social networks, and it comes from marketing of economics. Using social network method to analyze the social relations of mobile users in the network can further improve the efficiency of information transmission and forwarding. Social network analysis (SNA) [17][18][19][20] is a method to explain some social phenomena, and it can also reveal certain social laws through quantitative analysis of social science attributes such as social attributes and relationship attributes. How to find the top-k nodes is the key to solve this problem, so many researchers have come up with a number of centrality indicators to measure the importance of nodes. However, in the process of synthesizing these indicators, the determination of the centrality indicators' weight is mostly artificial, with strong subjectivity and low credibility. In this work, we focus on the problem of confirmation of centrality indicators' weight, which is essential for the identification of vital nodes. We apply relative entropy to fuzzy theory to obtain the searching space, and we use genetic algorithm to get the optimized weight. Totally, our contributions are as follows.
• For the purpose of figuring out the differences among these 9 centrality measures, we use relative entropy to calculate their initial weight, and we introduce fuzzy theory to determine weight interval of the chosen centrality indicators. • Genetic algorithm is used to find the optimal weights and the results show that our method increase the objectivity and credibility of this evaluation method.

Related work
There have been many researches on the identification of important nodes in networks. From the perspective of network analysis, centrality indicators can be divided into two parts: local-based centrality indicators and path-based centrality indicators. The local-based influence measurement method uses the local nature or topology of nodes to calculate the influence of nodes. This method has the advantage of simiplicity and ease of operation and the disadvantage of low accuracy because it ignores the role of nodes in the overall network. If we purely take the links held by the node into consideration, the importance of node can be denoted by degree centrality. Degree centrality (DC) [21,22] is a typical method based on local information, and it holds that the influence of a node is reduced to the number of its neighbor nodes. In a social network, a node represents a person, an edge represents the friendship between them, so DC believes that the person with more friends is more important. In human protein-protein interaction network [40], hub proteins play a key role in realizing protein functions and life activities. DC considers hub proteins that interact with multiple protein to be more important. Although DC is very simple and easy to understand, it lacks precision and relevance in some cases because nodes with fewer neighbors may be more important than nodes with more neighbors. It may be discarded because it only considers the directly ties of node rather than the indirected ones. Supposing a node might be linked to abundant neighbors which are not connected within the network, under the circumstances, we can say that the node is relatively central in the local scope. The ability of a node to affect depends on its ability to affect its neighbors. H-index centrality (HIC) [41] is introduced to measure the importance of nodes. On account of the consideration of the globe information, the distance is a key factor, and Lü et al. [42] generalized the concept of HIC and proposed n order HIC. K-shell centrality (KS) holds that the location of a node in the network determines its importance. The larger the shell value of a node, the closer the node is to the center of the network and the more important the node is. Path-based centrality usually takes information about the entire network into account. Closeness centrality (CC) [24,25] calculates the "closeness" of each node from others in the network. Bavels is the first one to come up with betweenness centrality (BC) in 1948 [23], which refers to the times a node lies on the favored position between other pairs of nodes in the network. In the degree centrality indicator, we consider that nodes with more connections are more important. In reality, however, having more friends does not ensure that the person is important, and having more important friends is deemed to provide more powerful information. That is to say, we try to summarize the importance of this node in terms of the importance of it's neighbor node. Katz centrality (Katz) [26] can distinguish the importance of different neighbors by assigning different weights to neighbor nodes. Burt proposed structure hole (SH) [43] in 1992 and he thought that SH is the "bridge" between two groups, which is located in the gap of the network.
Because the network is dynamic, some iterative update centrality indicators have been proposed in succession. Eigenvector centrality (EC) [29,30] is such an indicator and it's a good "all-over" centrality indicator. Based on this indicator, PageRank [44], which can be used in weighted network and directed network, pays attention to direction and weight. Qi

Problem description
A general language for describing complex relationships is networks. A MSN can be seen as a graph G(V, E), which is composed of nodes V and links E. The individuals correspond to the nodes and relationships between individuals are expressed by links. Inevitably, there is subjectivity in determining the weight of multiple centrality indicators of node, so we combine fuzzy theory with information entropy to design an optimization method of index weight to solve the above problems and apply it to the determination of central indicator weight. Some notations used in this article is shown in Table 1.
As shown in Figure 2, in the algorithm designed in this paper, the introduction of fuzzy theory aims to use special membership functions to intervalize the weight of the preliminary determination given by experts, and take these intervals as the search space of the genetic algorithm, so as to realize the fine-tuning of the initial weight of the algorithm in a specific space.

Preliminary determination of indicator weight
Selection of centrality indicators The influence of nodes can be reflected by assigning corresponding weights to nodes. It's also called the centralities of nodes. There is no uniform definition and standard for what is a important node. Different methods measure the   importance of nodes from different perspectives. We choose most of the centrality indicators and list them in Table 2.
Then we use formula (1) to normalize the column elements of Table 2. This allows the data to be mapped uniformly to the interval between 0 and 1. To be technically accurate, information is a change of entropy [31]. That is to say, entroy is the amount of information that we don't know. So even though information is hard to define, amount of information is easy to define, and there's a very simple way of measuring information in terms of bits. When you get information about a system, you reduce its entropy. Then we can measure the weight of each centrality indicator by information entropy, which is shown in formula 2. The corresponding value of each centrality indicator, included in Table 2, can be calculated.
DC-based in which x i is the value of the ith node.
where k is Boltzmann's Constant, and W is the number of microscopic states or configurations. As a result, if the information entropy of a target is smaller, then the indicator changes more, the more information can be provided, the larger the corresponding weight is. Figure 3 clearly shows the differences in important nodes identified by different centrality indicators in the same network graph. We use 6 different colors to partition the nodes according to DC in the first subgraph of Figure 3 and node 7 has the maximum DC and is marked pink. The nodes with a DC value of 3 are node 3, node 6 and node 8, and they are all marked purple. Node 1 has the minimum DC and it's marked orange. So if we choose DC as the criterion to measure whether the node is important or not, node 7 is obviously more important than node 1. The second subgraph of Figure 3 shows node 3, marked in pink, has the maximum BC because it's a cut vertex in the network graph. Node 1, node 6 and node 8 have the same BC value 0 since they are not at the hub of the network. In the third subgraph, we also chose 6 colors to divide the nodes. Obviously, node 3 and node 7 have the same and maximum CC, while node 1 has the minimum CC value 0.31. In addition, the CC value of all nodes changes relatively little. EC-based indicator is used to distinguish influencial nodes in the fourth subgraph of Figure 3. Node 1 has the minimum EC and it's marked orange. The most influential nodes are node 4 and node 5, as they're not far from any of the other nodes.

Calculation of indicator weight
We determine indicator weight according to the data characteristics of the selected centrality indicators. Since we can obtain the central indicator of each node, we choose relative entropy (RE) [32,33] to calculate the initial weight of each central indicator. We first use the network in Figure 4 to introduce the calculation of the initial weight of each indicator. We calculated the centrality indicators for all nodes and listed the results in Table 3. Algorithm 1 describes the process of using RE to calculate initial weights. Place the calculated values in Table 2 as columns; 4: Normalize the column of Table 2 by formula 1; 5: Measure the weight of each centrality indicator by formula 2; 6: end for 7: return The indicator weight of each node can be found in Table 3.

Determination of membership degree function (MDF)
Firstly, the fuzzy sets [35,36] A1, A2 and A3 are taken to represent three levels of indicator weights, namely "small, moderate and large" respectively, and the corresponding ones are generated MDF [34], as shown in Figure 4. In this paper, gaussian function [45] is used to represent fuzzy sets.
where c i is the mean of the indicator weights, and σ is corresponding variance. In terms of parameter setting, in order to subdivide indicator weights, the variance of the normal MDF is determined by the interval range formed by the initial weight value. In other words, by constantly adjusting the variance, the intercept of the gaussian MDF of the fuzzy set A2 on the X-axis is exactly equal to the interval formed by the initial weight value. At the same time, in general, three MDF in this paper have the same variance. In terms of setting the mean value, the mean value of the three normal MDF is set as the minimum value, the mean value and the maximum value of the initial indicators weight set, so as to cover the weight indicator more evenly by determining the position of MDF, relevant parameters are shown in Table 4.

Determination of weight interval
The initial value of each indicator weight is substituted into formula (1) to calculate the membership degree (MD). According to the principle of maximum MD, the grade of 9 initial indicator weights is determined. The purpose of this paper is to ensure that the change of     indicator weight does not exceed its existing level. Through formula (1), the corresponding weights of the X-coordinate of the intersection of the three MDF in Figure 4 can be calculated as 0.08 and 0.185, respectively. Thus, the change interval of each indicator weight can be obtained, as shown in Table 5.

Optimization of indicator weight
In order to simulate the evaluation process, this paper generates 9 sets of random numbers based on 9 centrality weight indicators, and each set of data contains 1 000 random numbers ranging from 0 to 1 as the initial scoring data of each indicator.
In this paper, the weight of a group of evaluation indicators is taken as the calculation unit, and the variance of a group of calculated evaluation results thereby is taken as the fitness function, we use this fitness function to design the genetic algorithm (GA) [37] to solve the following mathematical problems: The range of weight The probability density Figure 6. Comparison of coverage between groups with different variances.
x i 1 x i x i 2 , i ∈ 1, 2, · · · , 10 (7) where Y represents a set of generated evaluation results, Z is the random fractional matrix of each indicator, X refers to any set of indicator weights, x i indicates the weight of the i-th indicator in this group, and x i 1 and x i 2 respectively represent the upper and lower bounds of the weight fluctuation. Maximizing the variance of a group of evaluation results can be obtained by formula (5). Formula (6) is the matrix representation form generated by the evaluation results. Formula (7) indicates that all weight values shall not exceed their corresponding fluctuation range. The sum of the indicator weights in the same group is 1, which can be guaranteed by formula (8).
In the current research, the basic idea of negative feedback adjustment of the indicator weight based on the variance of the evaluation result is that if the change of an indicator has no significant change to the evaluation result, the weight of the attribute should be 0. On the contrary, the bigger the difference of evaluation results is, the larger the attribute weights are. And the variance statistically reflects differences in the level of an important indicator. Based on the idea of maximum variance, a set of weights should make the corresponding evaluation results reached the maximum total variance [38,39], so that the evaluation results in the overall coverage are more reasonable for actual situation, as shown in Figure 6.
It is clear that the blue curve allows a wider range of weights than the green curve in Figure 6, that is the distribution of evaluation results in group a with larger variance is more Perform the single point crossover to generate offspring; 7: Calculate fitness for these offspring; 8: end for 9: end while 10: return The optimal solution with maximum variance. extensive than that in group b with smaller variance. Obviously, the evaluation results of group a are more favorable to distinguish the importance in the process of node identification. The steps of the GA optimization algorithm to select the final indicator weight with the maximum variance are shown in Algorithm 2.
In this paper the design of GA, a chromosome is composed of a set of indicator weights. So the main problem is to make sure that after mutation and crossover, offspring chromosomes still meet formula 8, that is the sum of the weight equals 1. In this paper, we choose a single point crossover method to process: first of all, the two parent chromosomes in any genetic crossover occurs, gene combined will change, this part of the changes will be made by the other 8 genes cross didn't happen to share, to ensure that the offspring chromosomes gene sum to 1, as shown in Figure 7. ulation. At the same time, this method can make the changes of uncrossed genes exceed the fluctuation range as less as possible. Then this method will not cause changes in the sum of genes. The optimization of indicator weight was conducted by randomly generated data and weight interval obtained by fuzzy theory, and the design of GA. We set the algorithm to run 6 times, and each time is conducted based on different random score. The crossover probability in GA is 0.9, the largest number of iterations is 1 000 times, and computational convergence condition is shown in Figure 8.

Experiment analysis
It can be seen from Figure 8 that in multiple optimizations based on different random scores, the algorithm achieves convergence around 8 times. Generally, the average value of the optimal chromosome in 6 optimizations is taken and normalized to be used as the final indicator weight for the calculation of evaluation results in node identification process. The optimization results of each indicator weight are shown in Table 6. The column in which Op-average is located represents the mean of the results of the previous 6 optimizations.

Results
In this work, we use 9 generated datasets to act as the initial scores of centrality indicators, and thses numbers are ranging from 0 to 1. Then, with the proposed quantitative method, the optimal weights of 9 centrality indicators can be obtained. What's more, we can avoid subjectivity, and obtain the weight of each index more objectively.

Discussion
In view of the influence of important nodes in the robustness of network structure and direction of network evolution, many researches have been focusing on the identification of key nodes, where many centrality measures have been presented. Based on the importance of the centrality measures on the issue of imporant nodes, we introduced fuzzy theory and the GA mechanism. In this paper, we proposed a quantitative method to solve the problem of indicator weight determination in vital node identification. Specially, the relative entropy is used to define the initial weights of 9 centrality indicators, and MDF is applied to the determination of the weight interval. Then, GA is exploited to obtain the optimal weights. Through the comprehensive consideration of these centrality indicators, we have a further understanding of the social relationships in mobile social networks, and at the same time, we can use these relationships to further think about how to improve efficiency of communication.