A proposal for the EI index for fuzzy groups

In this article, a measure that quantifies the relational structure within and between groups is proposed, comprising not only the analysis of disjoint or non-disjoint groups, but also of fuzzy groups. This measure is based on the existing measure known as EI index. The current EI index is a measure of homophily applied to networks with the presence of disjoint groups, although disjoint groups on a large scale rarely exist in many empirical networks. In addition, the combination of edge and node weights in the evaluation of the EI index is also proposed. We tested the measure in two networks in different contexts. The first is a co-authorship network, where researchers, actors in the network, are divided according to the time of Ph.D. completion. The second network is formed by trade relations between countries of the American continent, where countries are grouped according to the Human Development Index. The application of the proposed measure in these two networks is justified by the imprecision of the information or by the difficulty of allocating nodes in a specific group, being necessary to define affiliation levels. Therefore, the new measure allows expanding the analysis of social networks, for different types of attributes, thus generating previously unexplored knowledge.


Introduction
In general, a social network is a structure formed by nodes (actors) and edges (interactions) used in studies of the relationships between individuals, groups or organizations. Focused essentially on topological structure, social networks studies apply a set of methods and measures to identify, visualize and analyze social networks looking for patterns of interactions and their implications (Newman 2001b, a).
In several networks, it is common to observe that actors tend to have affinities or similarities (attributes) with their peers. According to Crandall et al. (2008) there are two mechanisms of reasons for this, for example, actors can modify their behavior to make them more in line with the behavior of their peers, a process known as social influence (Friedkin 2006). Another distinct reason, an effect termed homophily, is that actors tend to form relationships with others who are already like them. In other words, in homophily, individual characteristics drive the formation of links, while in social influence, the links existing in the network serve to engage actors' characteristics. Kim and Altmann (2017) mention that the nature of homophily is shown in many empirical and theoretical studies. The study of these authors also concluded that homophily affects network formation. Homophily is the term used for the preference of actors to connect with other actors who share common attributes (McPherson et al. 2001). In studies on homophily, we seek to know if the nodes of a network disproportionately establish links with others that resemble them in some way, that is, we want to verify the occurrence of a higher incidence of relations between actors that have similar attributes.
However, actors can belong to many associative groups simultaneously, with various levels of affiliation, and distinct disjoint groups rarely exist on a large scale in many empirical networks (Leskovec et al. 2008). (Saha et al. 2014) also mentions that people participate in a wide variety of groups. In addition, Lee and Brusilovsky (2017) point out that society is currently goaded by information and knowledge, what generates new homophily dimensions. Information, knowledge and some attributes such as economic blocks in commercial networks; communities on social networks such as Facebook, Twitter, among others; and other attributes linked to behaviors, tastes and attitudes generate non-disjoint groups. Currently, publications that use the E I index as a measure of homophily are concentrated in disjoint or mutually exclusive groups. Situations in which network actors are present in more than one group are not commonly explored. One of the barriers found in the analysis of non-disjoint groups is the absence of a measure, since the E I index is defined for disjoint groups (Andrade and Rêgo 2019).
Motivated by this gap, Andrade and Rêgo (2019) suggest a method that generalizes the E I index developed by Krackhardt and Stern (1988). This method quantifies the relational structure within and between groups that encompasses the analysis of both disjoint and non-disjoint groups. Furthermore, we observe that the process of social influence has already been studied in the context of fuzzy groups (Li and Wei 2019;Khalid and Beg 2019).
In this context, the objective of this work is to expand the generalized metric suggested by Andrade and Rêgo (2019), adapting it to also cover groups where the nodes present several levels of affiliations, fuzzy groups. We can highlight as advantages of study, for example, the ability to address networks that analyze political behavior, studying relationships between voters with different positions in the political spectrum and networks of friendships with bilingual speakers, analyzing the relationships between speakers with different levels of language fluency. In our work, we analyzed two networks. A co-authorship network formed by researchers with a Ph.D. in production engineering, where the time of Ph.D. completion, defined the fuzzy groups. The other network is formed by trade relations between American countries, in which we use the Human Development Index (HDI) to form fuzzy groups. This paper is organized as follows. In Sect. 2, we briefly present the E I index proposed by Krackhardt and Stern (1988), which measures homophily in networks with disjoint groups. Then, in Sect. 3, we present our measure, which is a generalization of the current E I index, encompassing fuzzy groups. Two applications of the proposed measure are made in Sect. 4. Finally, we discuss the results of the applications in Sect. 5 and present conclusions.

EI index
The E I index, proposed by Krackhardt and Stern (1988), essentially quantifies the relational structure within and between groups (Everett and Borgatti 2012;Krackhardt 1994). The E I index was implemented in the popular social network analysis package UCINET (1999) as a measure for homophily. This measure analyzes the tendency of people to connect with others similar to them, as well as social insertion, i.e., how a node or group of nodes decides to connect to other nodes in a network Hanneman and Riddle (2005).
Homophily is one of the most widespread and robust trends in human interaction, describing how people tend to seek out and interact with others who are more like themoften characterized as "birds of a feather" named by McPherson et al. (2001). As a mechanism of social relations, it can explain the group composition in terms of social identities ranging from ethnicity to age (Lazarsfeld et al. 1954). Indeed, ethnicity, along with geography and kinship, are the main motivating factors behind homophilic practices (McPherson et al. 2001). Everett and Borgatti (2012) are among the researchers who treat the E I index as a measure of homophily and heterophily, where smaller values (internal connections) indicate greater homophily and larger values (external connections) indicate lower homophily or greater heterophily. The E I index as a measure of homophily is essentially used to quantify the individuals' propensity to interact with similar actors (Burt 1991;McPherson et al. 2001). In addition, the E I can be used as a segregation measure (Sweet and Zheng 2017), where segregation is defined as the "unequal" distribution of two or more groups of people in different units or social positions (Bojanowski and Corten 2014).
The E I index is defined as the difference between the intergroup and intragroup ties divided by the total number of ties for normalization. It is a simple and attractive measure of homophily because it does not depend on the density of the network (Everett and Borgatti 2012). Formally, the E I index is given by where E L is the number of external links (links between nodes belonging to different groups); I L is the number of internal links (links between nodes belonging to the same group). The E I index ranges from -1 (all bonds are internal) to +1 (all bonds are external). The index can be calculated for the entire network, for each group or for each individual actor.
Although commonly used in an unweighted network, some authors like Andrade and Rêgo (2018) and Danchev and Porter (2016) have also used the E I index in weighted networks. In weighted networks, the E I index is calculated using the weight of the edges, this way E L is the sum of the weights of the edges that connect different cells of the partition and I L is the sum of the weights of the edges that connect actors of the same cell of the partition. As with the unweighted network, the E I index for weighted networks assumes values between −1 and +1. Generally, the weight of an edge represents the frequency or strength of the relationship. Therefore, when the value of the E I index approaches −1, it means that the internal relations are stronger or more intense. As the index approaches +1, it shows that external relations are stronger or more intense.
In recent years, the inclusion of numerical attributes has been observed in the analysis of social networks. Attributes are resources of nodes and are used to give weight to them, representing their importance or contribution in the network (Andrade and Rêgo 2018;Liu et al. 2015;Benyahia and Largeron 2015). In this work, we will also consider the nodes' weights and insert them in the topological structure of the network. For this, we use the method proposed by Andrade and Rêgo (2018). By this method, the edge weight is equal to the frequency or strength of the relationship between two nodes multiplied by the average weights of the nodes. The intuition is that, in cases where information about quantitative features of nodes is available, the weight of a link should not only depend on the strength of the connection (original edge weight), but also on the average importance of the connected nodes. Formally, if v i is the weight of node i and w i j is the original weight of the link between nodes i and j, then, including the nodes' weights, the new edges' weights are given by The inclusion of the nodes' weights contributes to a more efficient analysis of the network by combining factors inherent to the network with external factors (Andrade and Rêgo 2018). External factors attribute a certain "status" to individuals in the network and through the E I index it is possible to verify whether this status also influences the formation of relationships. However, this conclusion is only reached by comparing it with the E I index without considering external factors.

EI index: fuzzy case
Every day, when describing certain phenomena (characteristics), we use degrees that represent qualities or partial truths.
As an example, let us consider the group of elderly people. There are at least two approaches to mathematically formalize this set. The first, distinguishing from which age the individual is considered elderly. For example, A = {x : x ≥ 65}, where x is an individual age measured in years. In this case, the set is well-defined. The second, less conventional, occurs in such a way that individuals are considered elderly to a greater or lesser extent, that is, there are ele-ments that would belong more to the elderly class than others. This means that the younger the individual, the lower his or her degree of belonging to that class. Thus, we can say that individuals belong to the elderly class, with greater or lesser intensity. Mathematically, we call fuzzy sets the sets to which the elements have degrees of membership. As opposed to the traditional sets where elements belong or not to them, to define a fuzzy set, B, we need to specify a membership function, μ B : → [0, 1], where μ B (w) represent for an element w of the universe, , to what extent w belongs to B and higher values of μ B (w) indicate a higher membership degree. The formalization of fuzzy sets was presented by Zadeh (1996) as an extension of the classical notion of sets.
To explore cases of fuzzy groups, we have developed a new metric to obtain the E I index, which is an adaptation of the metric proposed by Andrade and Rêgo (2019) to generalize the original E I index measure for use with overlapping groups.
Let A be the set of all attributes for nodes in a social network with n nodes. For X ∈ A, let μ X (v i ) be the membership level of node v i to a given group, 0 ≤ μ X (v i ) ≤ 1. Moreover, for a generic set of nodes, S, consider the following sets of indices Thus, the number of external and internal links for a generic set of nodes, S, is given, respectively, by: where in the unweighted case x i j is 1 or 0 depending on whether there is or not a link between nodes v i and v j , in the case of only edge weights x i j = w i j and in the case of edge and node weights x i j = z i j . Alternatively, for X ∈ A, we can define the number of external and internal links for the group of nodes, S X , which has attribute X , respectively, as follows: and where x i j is defined exactly as before.
Since membership functions by definition assume values between 0 and 1 and the definitions of external and internal links involve products of membership functions, in order to avoid overestimating the external links, we recommend the use of trapezoidal membership functions. In order to obtain the trapezoidal membership functions, we suggest performing the following steps: (i) Determine the highest value before which the degree of membership is known to be null. (ii) Determine the lowest value from which it is known for certain that the degree of membership is null. (iii) Determine the lowest value with degree of membership 1. (iv) Determine the highest value with degree of membership 1.
To better explain our proposed method, we present here a simple example to explain how the new metric works on a specific network. Suppose there is a network with four nodes that belong with different membership levels to two groups, A and B (as show in Figure 1). In the network, let us consider calculating the E I index for the set of nodes {1, 2}. Note that nodes 1 and 2 have no connection and that node 0 is connected to both of them. Disregarding the edges' and nodes' weights, we have x 10 = 1 or x 01 = 1 and x 20 = 1 or  and  Fig. 1. It is easy to verify that the proposed metric is a generalization of the E I index proposed in Krackhardt and Stern (1988) in the sense that if groups are disjoint and the membership functions are either 0 or 1, then it coincides with (1).

Homophily in co-authorship and trade networks
In this section, we apply the proposed method in two networks studied in previous publications. These networks present the fundamental element for our approach, which is the presence of fuzzy groups, in addition to information about the nodes' weights. As a means of comparison, we also analyze the cases of disjoint (Everett and Borgatti 2012) and non-disjoint (Andrade and Rêgo 2019) groups. In this way, the E I index will be obtained for 4 situations: without considering the weight of edges and nodes, unweighted (UW); regarding only the nodes' weight, Z_unweighted (ZU); considering only the edges' weight, weighted (W); taking into account both weights, Z_weighted (ZW).
To evaluate whether the E I index for a given group is compatible with what is expected when connections occur randomly, i.e., without preference of members for external or internal relations, for the unweighted and the Z_unweighted cases, we calculate the expected E I index for each one of the analyzed cases considering the average of 5000 randomly generated binomial graphs with the same density and size as that of the original graphs. We also added a probability, p-value, which expresses how unlikely it is to obtain an E I index at least as extreme as that observed in the randomly generated binomial graphs. We considered one-sided p-values calculated by the relative frequency of times that the simulated E I obtained a value greater (resp., smaller) than or equal to the observed E I , when the expected E I is smaller (resp., larger) than the observed one.

Data
To implement the proposed E I index, we use data from two real networks. Next, we give some details about these networks.

PQ network
First, we show how the arbitrary choice of disjoint groups, according to the Ph.D. completion time, affects the E I index of these groups. We delimit three cases of disjoint groups (T1, T2 and T3) varying the limits of the groups, Table 2, in the fuzzy regions, Table 3. Figure 2 shows the E I index for the entire network, for each of the arbitrary limits. As expected, the result is heavily dependent on these limits. The definitions of the groups formed according to the Ph.D. completion time for the disjoint, non-disjoint and fuzzy case, followed the criteria in Table 3. For the disjoint case, we consider the intermediate case T2.
We use the researchers' h-index as the node weights. The h-index is a measure that combines, in a simple way, the number of publications and the impact of publications and is given by the maximum value h such that a researcher has published h works and each of these works has been cited h or more times Hirsch (2010). Figure 3 shows how the relationships between researchers occur. In general, most nodes in the non-disjoint case have an E I index of −1 (60%). In the fuzzy case and in the disjoint case, the nodes present similarity in relation to the proportion of E I index higher and lower than zero; however, in the fuzzy case, the distribution of the E I index is more uniform. Figure 4 shows the E I index for the entire network. In general, when the nodes belong to non-disjoint groups, it is observed that the E I indexes are smaller, with a predominance of in-group relationships. On the other hand, when the groups are disjoint, the network has higher but still negative  Fig. 5. In general, when nodes belong to nondisjoint groups, it is observed that the E I indexes are smaller. In the case of disjoint and fuzzy groups, the E I indexes are  The experienced group's E I indexes are negative, especially in the non-disjoint case. This shows that the internal connections of this group are larger than the external ones. The youth and senior groups have a positive EI index, with the youth being superior to seniors. This shows that the external relations of these surpass the internal ones. Therefore, we can conclude that the experts cooperate with each other while young and senior Ph.D. are more open to cooperating with other groups. It is worth mentioning that the E I indexes obtained do not reveal a tendency towards homophily or heterophily, as they do not differ significantly from the results obtained by the random simulated network, since the p-values are all greater than 0.05. Note that the edge weighting affected more the E I index of the disjoint case, making the relationships more heterogeneous. This is most noticeable in the case of experienced groups.
We also analyzed the behavior of groups of researchers with the same level of scholarship in relation to the experience level group attributes. The scholarship level in order of importance and the total number of researchers are: 1A (8%), 1B (5%), 1C (8%), 1D (19%) and 2 (59%). The analyses of the EI index of these groups are shown in Fig. 6 for the cases of disjoint, non-disjoint and fuzzy groups, and studying the UW, ZU, W and ZW networks. In general, when nodes belong to non-disjoint groups, it is observed that the E I indexes are smaller, with in-group relationships predominating. On the other hand, when the groups are disjoint or fuzzy, the network has higher E I indexes.
As for scholarship levels, there is a different behavior of the E I indexes for the different connection types, weighted or not. Level 1A has the highest E I indexes in the unweighted network, without or with the inclusion of the node weights and in the weighted network considering the node weights. Level 1A, the highest level of the scholarship, concentrates the most productive and influential researchers in the research area, being composed of 10 exclusively senior researchers and 2 exclusively experienced researchers. Although most are seniors, the in-group relationship is predominant in the non-disjointed case and external relationships are more common when the group is fuzzy or disjoint. Level 1A E I indexes are all negative in the weighted network. Level 1C, an intermediate scholarship level, also does not include young researchers. In the weighted network, with and without node weights, as well as in the unweighted network (only in the non-disjoint case), the E I index of the level 1C is the small- est and negative. Therefore, for researchers at this level, most connections occur between researchers in the same experience level group. It is noteworthy that the E I indexes obtained do not reveal a tendency towards homophily or heterophily, as they do not differ significantly from the results obtained by random simulated networks since the p-values are all greater than 0.05.

Trade of American countries network
We use the Human Development Index (HDI) to form groups and first show how the arbitrary choice of disjoint groups, according to the HDI, affects the E I index of these groups. We delimited three cases of the disjoint groups (T1, T2 and T3) varying the thresholds of the groups, Table 4, in the fuzzy regions, Table 5. Figure 7 shows the E I index for the entire network, for each of the arbitrary thresholds. As expected, the result is heavily dependent on these limits.
The definitions of the groups formed according to the HDI for the disjoint, non-disjoint and fuzzy case, followed the criteria in Table 5, where the intermediary case T2 was used for the disjoint case. Figure 8 shows the E I index at the individual level of the 30 countries. In general, countries have positive E I indexes, that is, intergroup relations higher than in-groups. In the non-disjoint case, it is possible to notice that some countries predominate in-group relations. The in-group relationship is also more visible when the network is unweighted. Figure 9 shows the E I index for the entire network. In general, when nodes belong to non-disjoint groups, it is observed that the E I indexes are smaller. On the other hand, when the groups are fuzzy, the network has higher E I indexes. The E I indexes are positive, except the E I index in the case of  Fig. 10. In general, the low and medium groups have the highest E I indexes, close to 1. The countries of these groups have intergroup relations higher than in-groups, the E I indexes are statistically significant, that is, these groups are prone to heterophilia. The group with high HDI has the lowest E I indexes in the unweighted network, being the one with the highest ingroup relationship, but the E I indexes increase significantly in the Z_Unweighted, weighted and Z_Weighted networks. Thus, the relationships are stronger with other groups in these networks. The group of countries with very high HDI has the lowest E I indexes in the weighted network, with and without the node weights, revealing a closer relationship between countries in the group. The E I indexes of the groups with high and very high HDI do not differ statistically from those presented by the random simulated network.
We also analyzed the behavior of groups of countries by region in relation to the HDI group attributes. The regional divisions are north, south and central, with 3, 12 and 15 countries, respectively. The analyses of the E I indexes of these groups are shown in Fig. 11 for the cases of disjoint, nondisjoint and fuzzy groups, and studying the UW, ZU, W and ZW networks. In general, when nodes belong to non-disjoint groups, it is observed that the E I indexes are smaller. On the other hand, when the groups are disjoint or fuzzy, the regions have higher E I indexes.
As for the regions, there is a behavior different from the E I index depending on the connection type, weighted or unweighted. The northern region has the highest E I indexes on the UW and ZU networks. The northern region's E I indexes decrease in the weighted network, indicating that northern region have stronger relations with countries in the same HDI group. The southern region in the UW network has the lowest E I indexes, positive in the disjoint and fuzzy case, and negative in the non-disjoint case. In weighted networks, with and without node weights, the E I indexes are positive and higher in the southern region, indicating that the forces of relations are more intense between countries of different HDI groups. The E I indexes of the regions do not reveal a tendency towards homophily or heterophily, as they do not

Conclusion
In this work, we have proposed a new network measure, which is a generalization of the E I index to measure homophily in cases of fuzzy groups. Fuzzy groups are particularly important when actors may belong to many associative groups simultaneously and with various levels of affiliation. Therefore, for a better understanding of the structure of networks, the measure developed allows the analysis of multiple associations and different levels of association. We also show that incorporating node weights into the analysis can give us more insights into the homophily of relations.
We explored two networks with the new measure. In a coauthorship network, the Ph.D. completion time was used to form groups. In a commercial network among countries, we use the Human Development Index (HDI) to form groups. We obtain the E I index for the networks considering the cases of disjoint, non-disjoint and fuzzy groups, and analyzing different relational forces, unweighted, weighted, without and with node weights. As we have seen in these networks, the proposed measure allows expanding the analysis of social networks. Through a homophily analysis, it is possible to identify whether a certain group of nodes has a tendency to work together or not.
In general, it is clear that fuzzy groups generate more homogeneous cooperation or commercial relations. This was already expected due to the fact that the actors present multiple associations with the same degree of association, equal to 1. In the co-authorship network, we noticed that the researchers allocated as experienced are the ones that cooperate the most with each other. These relationships are favored because there are more experienced researchers. The smaller number of young and senior researchers also justifies the predominance of external relations by these researchers. In the trade network, we noticed that relations between countries with different levels of development are more common. In the case of the groups with low and medium HDI, we note that the E I index close to 1 is statistically significant, revealing the tendency towards heterophilia in these two groups, revealing their dependency on more developed countries.
In addition to the two examples of networks used to illustrate the measure, other networks also present actors that belong to different groups of attributes and that, due to the imprecision or limitations of the information, it is necessary to resort to the fuzzy system. Thus, we expect that many other studies may benefit from this measure. Data availability Enquiries about data availability should be directed to the authors.

Conflict of interest
The authors have not disclosed any competing interests.