Fuzzy clustering optimal k selection method based on multi-objective optimization

Because of the complexity of data sets from the real world, it is difficult to classify the data sets clearly and effectively, thus we prefer to adopt fuzzy clustering approaches to analyze the data sets. However, due to the variety of fuzzy clustering algorithms, the different number of clusters will lead to different clustering results. The number of clusters is closely related to the clustering division, so how to determine the number of fuzzy clustering (k ) has become a problem. Until now, many researchers have proposed utilizing fuzzy clustering validity indexes to deal with this kind of problem. However, the effectiveness index of fuzzy clustering can only be evaluated on the basis of the fuzzy clustering algorithm FCM to divide the clusters. When the range of k value is too large, FCM’s clustering for different k values is quite time-consuming. From this perspective, this paper proposes a fuzzy clustering optimal k selection method based on multi-objective optimization (FMOEA-K). Different from the traditional methods, this method combines the fuzzy clustering effectiveness index with multi-objective optimization algorithm (MOEA), and uses multi-objective optimization algorithm to search the appropriate cluster center concurrently. Because of the concurrency of the multi-objective optimization algorithm, the calculation time is shortened. The experimental results show that compared with the traditional method, the FMOEA-K can shorten the calculation time and improve the accuracy of calculating the optimal k value.


Introduction
As the development of the Internet technology, clustering analysis has been applied to clustering engineering including machine learning and pattern recognition (Webb 2003;Fatehi and Asadi 2017), life science and medicine including genetics, biology, microbiology (Li et al. 2015a, b;Cui et al. 2016), computer science including network mining, spatial database analysis, image segmentation (Fu et al. 2015;Gu et al. 2016;Khan et al. 2016) widely. Clustering is one kind of unsupervised learning method, which can actively group data points, such that the data points of the same cluster have extremely B Guonan Cui wangls@nuaa.edu.com Lisong Wang wangls@nuaa.edu.cnm 1 Nanjing University of Aeronautics and Astronautics, 29Th, Jiangjun Avenue, Jiangning District, Nanjing City 211106 Jiangsu Province, China high similarity and with data points which belongs to different clusters have great difference. Now cluster algorithm can be divided into two types (Xu and Wunsch 2005). The first category is hard clustering algorithm, such as k-means, k-means + + (Arthur and Vassilvitskii 2006), intelligent Kmeans (de Amorim 2008), etc., hard clustering algorithm divides a data set into multiple clusters, and each object belongs to only one cluster. The second one we denote fuzzy cluster algorithm allowing each object to belong to multiple clusters according to the degree of membership including Fuzzy C-Means (Bezdek et al. 1984), MAFC (Shang et al. 2019), etc.
Determining the value of k plays a crucial role in our work, no matter hard clustering algorithm or fuzzy clustering algorithm. Because of the complexity of data sets from the real world, it is difficult to determine k without prior knowledge and understanding of data structure. Some researchers propose to utilize the clustering validity index to solve to address this contradiction (Dunn 1973;Gath and Geva 1989). There are two categories of clustering valid-ity index: external validity index and internal validity index. The external validity index evaluates the partitions by comparing them with the previously assumed results (Cui et al. 2017); the internal validity index evaluates the partition by checking the results, and generally measures the rationality of the results by the compactness (comp) and separation degree (sep). The smaller of the value comp + sep, the more rational of k. Through the clustering validity index, the structure of the data set can be better analyzed from the data sets, so as to obtain the optimal classification number of the data sets (Vaidya et al. 2013). Compared with external validity index, the internal validity index pays more attention to the degree of compactness and separation between clusters. In the past few decades, the internal validity index of fuzzy clustering has been proposed to verify the fuzzy clustering, for instances, partition coefficient(PC) (Bezdek 1974), partition entropy(PE) (Bezdek 1973), XB (Xie and Beni 1991), FS (Fukuyama 1989), SC (Zahid et al. 1999), PBM index (Pakhira et al. 2004(Pakhira et al. , 2005, VM index . Besides of this, PCAES index (Wu and Yang 2005), SSD (Wang et al. 2018), PBM indicators (Begum 2010), etc. However, the effectiveness index of fuzzy clustering can only be evaluated on the basis of the fuzzy clustering algorithm FCM to divide the clusters. When the range of k values is too large, it is quite time-consuming to divide clusters by FCM corresponding to different k values. On the other hand, some clustering effectiveness indicators only show a monotonically decreasing state as the number of clusters k increases only when the cluster center is close to the correct cluster center. When the cluster center is not close to the correct cluster center, they cannot determine the optimal number of clusters k.
This paper presents an algorithm for clustering optimal k selection based on Multi-Objective Optimization. Different from the traditional methods, this method combines the fuzzy clustering effectiveness index with multi-objective optimization algorithm, and uses multi-objective optimization algorithm to search the appropriate cluster center concurrently. Because of the concurrency of the multi-objective optimization algorithm, the calculation time is shortened, and FMOEA-K converts the clustering effectiveness index, combined with MOEAs to provide the correct clustering center, reducing the error of calculating the optimal k value.
In Sect. 2, we introduce some basic concepts of Multi-Objective Optimization, morphological similarity distance and some Fuzzy Clustering validity indexes. Then we proceed to describe the algorithm in detail in Sect. 3. In Sect. 4, FMOEA-K is used in the experiment and compared with other clustering validity indexes. Finally, we conclude our work and make a discussion of the related work.

Multi-objective optimization
The definition of Multi-Objective Optimization Problem (MOP): ., x n } is solution with n variables. The objective function f (x) is an objective vector of x. F : R n → R m represents the mapping from n-dimensional decision variable space to m-dimensional objective function space. Let a and b be two solutions, each of which has n decision variables, if and only if ∀l = 1, 2, ..., m and ∃k = 1, 2, ..., m, A solution x ∈ R n is not Pareto dominated by any one solution, then x is called a Pareto optimal solution (Coello et al. 2007). The set of all the Pareto optimal solutions is called the Pareto set and its projection in the objective space is called the Pareto front(PF) (Miettinen 2012). The ideal point (u ideal ) is the best objective values in PF:

Morphological similarity distance
Morphological similarity distance: Li et al. (2009) show that most clustering algorithms are based on Euclidean distance, but the traditional distance can't accurately measure the similarity as shown in Fig. 1. Shape similarity distance (MSD) studies the relationship between vector similarity and difference, so that reduces the error of similarity. The definition is as follows: where . , x in } is vector, ASD is the absolute sum of the two vector differences, ED is Euclidean distance, SAD is Manhattan distance.

The fuzzy clustering validity index and its existing problems
Over the recent decades, there are various clustering validity indicators (CVI). Due to the huge scale of real-world data and the complexity of data forms, research shows that none (1) DescriptionMembership-based indicators (allocating coefficient algorithm(PC) (Bezdek 1974), allocating entropy algorithm(PE) (Bezdek 1973)): they only consider the membership information and do not consider other sample information such as data structures when designing indicators.
(2) Separation-based indicators(The indicator of FS (Fukuyama 1989)) uses the imbalance ratio of two clusters to expand the distance between their centers. (3) Compactness-based indicators(The indicator of XB (Xie and Beni 1991)) employ the Euclidean distance to measure intra-class compactness and inter class separation, and membership factor is used as the weighting factor of distance measure. (4) (Wu and Yang 2005) propose a partition coefficient and exponential separation index(PCAES). It assigns a normalized distribution coefficient and a separation index to each identified cluster, which better judges the selected approach whether has an outstanding ability of clustering. However, PCAES lacks consideration for increments, especially when handling stream data, we cannot supervise the evolution of the stream data well. The definition of the PCAES is shown in equation, where u M is the degree of clustering. Zahid et al. (1999) propose the indicator of SC that not only considers the position of measured separation ratio and compactness, but also the data structure and the geo-metric properties of membership matrix. Yet SC always defines the compactness and separation of measurement of data structure, ignoring the definition of each clustering, which is shown in equation.
(6) Shang et al. (2019) optimizes two different clustering validity criteria. The improved MKFC is a single objective clustering algorithm. In the clustering process, it cannot take into account the overall distribution of data, resulting in a significant decline in clustering performance with the increase in the number of clusters. Therefore, another validity index is proposed: the improved XB index. XB index adopts a ratio form. This way it takes into account the separation and compactness of clusters to effectively avoid local optimization and improve clustering performance. It focuses on the optimization of effectiveness indicators and depends on the results obtained by clustering algorithm (FCM). However, FMOEA-K mainly solves the problem that the number of clusters k is unknown. In order to avoid the inaccuracy of cluster division caused by FCM's sensitivity to the number of clusters, FMOEA-K refers to the effectiveness index of fuzzy clustering to establish a conflict function, and predicts the best number of clusters k of FCM based on MOEA before using FCM algorithm.

FMOEA-K algorithm
In this section, in order to solve some problems of clustering validity index, we propose a fuzzy clustering optimal k selection algorithm based on Multi-Objective Optimization, which combines the fuzzy clustering validity index with Multi-Objective Optimization algorithm to solve the optimal k value problem of fuzzy clustering.

The compactness and separation of fuzzy clustering indexes
In this paper, the existing fuzzy clustering effectiveness index is improved and used as the conflict function of the biobjective model. The FS index (Fukuyama 1989) combines the membership degree u and the Euclidean distance between the data points and the data center, which can well measure the compactness within the cluster, but ignores the geometric characteristics of the data and produces certain randomness  (Yang et al. 2018). Therefore, in order to consider the geometric characteristics of the data, the Euclidean distance in FS index is replaced by morphological similarity distance (MSD) (Li et al. 2009;Rezaee 2010). Compared with Euclidean distance, MSD takes into account the shape difference between vectors and reduces the randomness caused by geometric features.
. , x n is the data set for cluster analysis, and x i = x i1 , x i2 , x i3 , . . . , x in represents the characteristic of N of x i . Therefore, the compactness function based on MSD is: where k is the number of clustering centers,C = {c 1 , c 2 , . . . , c k }is the set of clustering centers, U membership matrix, u i j ∈ U is the membership coefficient of the j − th data in the i − th cluster, and MSD x j , c i represents the deviation of the j − th data in the i − th cluster. The separability function tests the relationship between different clusters, and the degree of separation can be obtained by using the distance measure between fuzzy clusters. Because some fuzzy clusters may have overlapping points (as shown in Fig. 2), the average value of overlapping degree between all fuzzy clusters can be calculated by using the separation function proposed in Rezaee (2010), which can reduce the separation error between clusters. Let F p and F q is two fuzzy clusters belonging to a fuzzy partition (k, U ). The separation function is shown in formula (6) (7) (8) (9): where S F p , F q and p, q ∈ C are the similarity of fuzzy clustering F p ,F q in data set X . F p x j is the covariance matrix of x j in the p − th cluster. h(x j ) represents a kind of weight. It can adjust the emphasis on overlapping data points according to the sharing degree of overlapping data points between fuzzy clusters, as shown in formula (10): FDCS is defined as the sum of compactness Comp and separation Sep. The smaller the value of FDCS, the smaller the degree of overlap of fuzzy clusters, and the more compact the cluster inside. As shown in formula (11):

Bi-objective model establishment
As the number of clusters k increase, the value of FDCS will decrease. But FDCS can't be used as the objective function of bi-objective directly, because the value of FDCS will decrease with the increase in k only when the cluster center is right as shown in Fig. 3. If FDCS is used as the objective function under the incorrect clustering center, the original correct k non-dominant solution may be controlled and the result will fall into the local optimum (Handl and Knowles 2007).
where theorem 1 is monotone decreasing.
Proof If we assume f (x) decrease monotonically as k increases, then it satisfies s 1 − s 2 > 0, f 1 (s 1 ) − f 1 (s 2 ) < 0: where k 2 − k 1 ≤ −1 and −1 < exp −FDCS (k 2 ,U ) − exp −FDCS (k 1 ,U ) < 1 We can conclude from (13): As k increases, f 1 (x) keeps monotonically decreasing. Proof Let x be the decision variable corresponding to an optimal (s, k) on PF, i.e., in the current solution, ∀s ∈ X , s s, so we can get the minimized value f 1 (x ). And because the second part of f 1 (x ) is constant. As a result, when f 2 (x) = k, 1−exp −FDCS(k,U ) is at its minimum. Therefore, the optimized value of Min F (x) is x .

Theorem 2 Each optimized Pareto is corresponding to the optimized value of min F (x)
According to Theorem 2, we can transform FDCS to equation (13) from Theorem 1, FDCS can be guaranteed that f 1 (x) decreases monotonically with the increase in objective function f 2 (x) under the incorrect clustering center.

Algorithm design
After the bi-objective model is established, the bi-objective model needs to be optimized by MOEA, and finally the PF composed of the optimal solution corresponding to each k value is obtained (Wang et al. 2020). If the PF distribution is not uniform, the subsequent PF analysis may fall into a local optimum, and the resulting k value is not accurate. The CDG proposed in the literature (Cai et al. 2017) uses grid positioning to ensure the diversity of PF. Therefore, CDG is used as the optimization algorithm to optimize the dual objective model.

Require:
X : data set k min : maximum number of clusters k max : minimum number of clusters P: the current population u ideal : the approximate ideal point u nadir : the approximate nadir point n: the number of grid Ensure: Membership matrix U 1: normalize the data; 2: Random initialization the cluster center c i for k = k min , ..., k max ; 3: repeat 4: Iteratively call function FC M(U , V ) for value of k; 5: Use FCM to cluster and get membership matrix U (k, X ); 6: k + +; 7: until k > k max ; 8: Initializing the G S(P) of CDG: initializing the grid distance and grid neighbors in GS; The initialization of algorithm-related parameters is shown in Algorithm 1. Membership U of the data set is obtained by FCM algorithm. The number of grid neighbors, grid distance, ideal point, and nadir point of GS are initialized.

Require:
mGen: Maximum generation ; N : the population size; T : the maximum grid distance for neighborhood; k: a range of k values; P: the current population; 1: Update ideal point and nadir point; 2: Initialize G S(P) 3: repeat 4: Initialize an empty set Q = ∅; 5: for each x ∈ P do 6: Obtain the neighboring solution as the mating pool of x by neighbor selection (N S); 7: Selection two solution randomly from N S, generate an offspring solution y. from two solution, the two solution by DE operators, and y is added to Q; 8: end for 9: Update ideal and nadir point, P = P ∪ Q; 10: Update the G S(P); 11: Selection parent if the number of P < N , randomly select N − P fromP and add to P, otherwise use Rank-Base Selection(RBS) to select N solutions; 12: gen + +; 13: until gen > max Gen 14: Return PF of the current population P; 15: Return the optimal k from P F by D B index; After the GS is established, a bi-objective model based on CDG Multi-Objective Optimization algorithm is shown in Algorithm 2. In the following section, we will explain the steps of the algorithm in detail: In steps 1-2, the population P is randomly initialized, and an ideal and nadir point is determined based on P (Cai et al. 2017).
In steps 4-8, N children are produced from P, and an empty set Q is defined for sorting the children. For each solution x, their matching solution is obtained by neighbor selection (NS) (Cai et al. 2017). In step 7, randomly select

Require:
Q: the combined population; Ensure: Membership matrix Apopulaation P 1: for each x ∈ Q do 2: initialize R(x) = r 1 (x), ..., r m (x) = 0, ..., 0 3: end for 4: for l = 1 to m do do 5: for each subproblems S l (k) do do 6: [S , I ] = sort l (S l (k)) Sort based CDG and I stores the ranks 7: each x ∈ S l (k) do r 1 (x) = I (x) 8: end for 9: end for 10: sort all x ∈ Qbased on R (x) in lexicographic order 11: Q = L E X I C OG R AP H I C − SO RT (Q) 12: select first N solutions P = Q (1 : N ) two solutions x i , x j from N S, and generate their children y from solutions x, x i , x j through de operation , and then add y to set Q.
In step 9, the ideal point and nadir point are updated by using the combination population P = PU Q to make it closer to the real value. In step 10, the GS will be updated with the new population P.
In step 11, a solution with the ideal(nadir) value of all subproblems is selected base RBS in Algorithm 3.
Finally, in step 14-15, after obtaining PF about k value through MOEA, use Davies-Bouldin (DB) index (Davies and Bouldin 1979) (in this paper, we will still use MSD to replace the Euclidean distance in DB index) as shown in formula (14). The DB index is used to transform the current PF, and optimal k value is obtained by the ratio of the sum of the scattering within the cluster and the separation between the clusters. The smaller the DB index, the better the number of clusters k divided. When the DB is within the specified range, the minimum value is the optimal k value.
where c) is the scattering value in the i-th cluster, V i is the number of data points in the cluster, and c is the cluster center of the cluster. d i j = MSD (c i , c j ) is the morphological similarity distance between the two clustering centers.

Computational complexity
In Algorithm 1, the setup of a grid system requires O(m N ) computations, where m is the number of objectives and N is the population size. In Algorithm 2, the update of the

Experiment preparation
In order to prove that the PF obtained by this algorithm is uniform, we compare it with the PF obtained by other Multi-Objective Optimization algorithms EMO-KC (Wang et al. 2018), NSGA-III Jain and Deb 2013)) when the range of k is large. In this paper, four UCI real data sets (iris, Bupa Lever Disorder, Wisconsin Diagnostic Burst Cancer, Wine) are used for experiments in the range of k ∈ [2, 21]. The parameter in this paper settings in Cai et al. (2017) are referred for the optimization algorithm experiments. The parameter configuration of DE operation in FMOEA-K is the same as in Wang et al. (2020): CR = 1.0, F = 0.5, η = 20 and p m = 1 n . Figure 4 shows the PF obtained by four optimization algorithms in the interval k ∈ [2, 18].
It can be observed that the PF of all algorithms is evenly distributed in Wine, WDBC, and BLD. However, the PF of EMO-KC (Wang et al. 2018) is obviously unevenly distributed in Iris data set, while the PF of FMOEA-K and NSGA-III is still evenly distributed. This shows that the PF The above experiments prove that the PF obtained by FMOEA-K through Multi-Objective Optimization algorithm is uniform. Next, we need to prove whether the optimal clustering number k obtained by FMOEA-K is accurate. We add 6 new artificial data sets and 6 real data sets in the optimal k value comparison experiment, and compare the results of FMOEA-K and clustering validity index (PC, PE, XB, pcaes, SC, VW). These 6 artificial data sets are: Data-3, Data-3Noise, Data-4, Data-5, Data-6, and Data-4X. The artificial data set is shown in Fig. 5, where the number in the name represents the number of clusters in the data set. Table 1 gives a brief description of the data sets described above, where Data-4X takes into account a strong overlap, there are noise points in Data-3Noise. The artificial data set mentioned above is similar to the data set used in Cui et al. (2017); Zhang et al. (2008). In most literatures, iris has only two clusters. They think that the number of clusters reaches the optimal when k = 2, but there are still some clustering algorithms that can generate three clusters, so generally speaking, k = 2 or 3 are considered reasonable. Figure 6 shows the optimal k obtained by FMOEA-K on different data sets. The horizontal axis shows the value range of k, and the vertical axis shows the calculated results (marked with triangle filled points) after the data is normalized. In order to reduce the randomness, when testing the optimal k value obtained by FMOEA-K, we first normalize the values of all data sets to the interval [0,1], and use FMOEA-K to test each target data set 20 times. Finally, according to the corresponding results, we calculate the average value as the final value of the test results. Table 2 shows the optimal number of clusters calculated by different algorithms from the artificial data sets and the real data sets. k opt represents the expected optimal number of clusters.

Experiment results analysis
Combined with the results shown in Table 2 and Fig. 6, most of the algorithms in Data-3, Data-4, and Data-6 with obvious clustering and separation effect can get the expected results. The results of PCAES, VW, and FMOEA-K on the Fig. 6 The optimal k value obtained by FMOEA-K in the difference data sets Data-3Noise show that they are robust to noise. For the data set of Data-5, the points in some areas of the data set overlap. Some algorithms such as PC, XB, and PCAES suggest that the optimal k is 4, while FMOEA-K, VW, and SC suggest that the optimal k is 5. VW and FMOEA-K can still find the optimal k which is consistent with the expectation in the Data-4X with connection points and point overlap. For BLD, WBCD, Wine in real data sets, almost all algorithms calculate the same optimal k as expected. Although the results of different algorithms in Iris data set are k = 2 or k = 3, the optimal k of FMOEA-K is also in this rea reasonable range. According to the comparison of the above results with six different clustering validity indices, we believe that the optimal k obtained by FMOEA-K is effective. We compare the calculation time of FMOEA-K and the traditional method (using the combination of XB indicator and FCM). Under the same data set Iris, the maximum value of the k value range is set to 10, 50, 100, 150. The result is shown in Fig. 7. According to the results, as the range of k values continues to expand, it takes less time to obtain the optimal k value due to the parallel characteristics of MOEAs in FMOEA-K than traditional methods.
In order to further study the accuracy of FMOEA-K results, we will use FMOA-K and 6 validity index combined with FCM algorithm to get k opt on Data-4X obtains the corresponding clustering results, as shown in Fig. 8. It can be seen that the clustering results of FMOEA-K and VW on Data-4X according to the corresponding k opt are in line with the experimental expectations, while the results of other indicators based on their corresponding k opt are not in line with the expectations and are disturbed. According to the clustering results, FMOEA-K can well identify the k opt with interference point data in the process of calculating k opt . To sum up, the optimal k value of FMOEA-K in different data sets is effective.

Conclusions
This paper proposes FMOEA-K algorithm which combines Multi-Objective optimization algorithm to solve the problem that the effectiveness index of Fuzzy Clustering with the increase in clustering k. Different from general effectiveness index, we convert clustering task to dual-mode model, which includes cluster indexes and cluster number k. Meanwhile, we get the PF of dual-mode model by utilizing Multi-Objective optimization. Finally, the optimal value k is analyzed by DB index, so that the number of clustering k can be accurately under different types of data sets.
The comparison results with other 6 different cluster validity indexes show that FMOEA-K is effective for most data sets in this study. We could use more real data sets to test Data availability Enquiries about data availability should be directed to the authors.

Declarations
Conflict of interest The authors have not disclosed any competing interests.
Ethical approval All applicable international, national, and/or institutional guidelines for the care and use of animals were followed.
Informed consent Informed consent was obtained from all individual participants included in the study.