A Quantitative Discriminant Method of Elbow Point for the Optimal Number of Clusters in Clustering Algorithm
Clustering, a traditional machine learning method, plays a significant role in data analysis. Most clustering algorithms depend on a predetermined exact number of clusters, whereas, in practice, clusters are usually unpredictable. Although the Elbow method is one of the most commonly used methods to discriminate the optimal cluster number, the discriminant of the number of clusters depends on the manual identification of the elbow points on the visualization curve. Thus, experienced analysts cannot clearly identify the elbow point from the plotted curve when the plotted curve is fairly smooth. To solve this problem, a new elbow point discriminant method is proposed to yield a statistical metric that estimates an optimal cluster number when clustering on a dataset. First, the average degree of distortion obtained by the Elbow method is normalized to the range of 0 to 10. Second, the normalized results are used to calculate the cosine of intersection angles between elbow points. Third, this calculated cosine of intersection angles and the arccosine theorem are used to compute the intersection angles between elbow points. Finally, the index of the above computed minimal intersection angles between elbow points is used as the estimated potential optimal cluster number. The experimental results based on simulated datasets and a well-known public dataset (Iris Dataset) demonstrated that the estimated optimal cluster number obtained by our newly proposed method is better than the widely used Silhouette method.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Due to technical limitations, full-text HTML conversion of this manuscript could not be completed. However, the latest manuscript can be downloaded and accessed as a PDF.
Posted 11 Jan, 2021
Received 06 Jan, 2021
On 26 Dec, 2020
Invitations sent on 26 Dec, 2020
On 26 Dec, 2020
On 26 Dec, 2020
On 26 Dec, 2020
On 24 Nov, 2020
Received 22 Nov, 2020
Received 18 Nov, 2020
On 17 Nov, 2020
On 17 Nov, 2020
Received 17 Nov, 2020
Invitations sent on 11 Nov, 2020
On 11 Nov, 2020
On 27 Oct, 2020
On 27 Oct, 2020
On 27 Oct, 2020
On 13 Sep, 2020
Received 12 Sep, 2020
Received 27 Aug, 2020
Received 24 Aug, 2020
On 21 Aug, 2020
Invitations sent on 20 Aug, 2020
On 20 Aug, 2020
On 20 Aug, 2020
On 19 Aug, 2020
On 18 Aug, 2020
On 15 Aug, 2020
On 13 Aug, 2020
A Quantitative Discriminant Method of Elbow Point for the Optimal Number of Clusters in Clustering Algorithm
Posted 11 Jan, 2021
Received 06 Jan, 2021
On 26 Dec, 2020
Invitations sent on 26 Dec, 2020
On 26 Dec, 2020
On 26 Dec, 2020
On 26 Dec, 2020
On 24 Nov, 2020
Received 22 Nov, 2020
Received 18 Nov, 2020
On 17 Nov, 2020
On 17 Nov, 2020
Received 17 Nov, 2020
Invitations sent on 11 Nov, 2020
On 11 Nov, 2020
On 27 Oct, 2020
On 27 Oct, 2020
On 27 Oct, 2020
On 13 Sep, 2020
Received 12 Sep, 2020
Received 27 Aug, 2020
Received 24 Aug, 2020
On 21 Aug, 2020
Invitations sent on 20 Aug, 2020
On 20 Aug, 2020
On 20 Aug, 2020
On 19 Aug, 2020
On 18 Aug, 2020
On 15 Aug, 2020
On 13 Aug, 2020
Clustering, a traditional machine learning method, plays a significant role in data analysis. Most clustering algorithms depend on a predetermined exact number of clusters, whereas, in practice, clusters are usually unpredictable. Although the Elbow method is one of the most commonly used methods to discriminate the optimal cluster number, the discriminant of the number of clusters depends on the manual identification of the elbow points on the visualization curve. Thus, experienced analysts cannot clearly identify the elbow point from the plotted curve when the plotted curve is fairly smooth. To solve this problem, a new elbow point discriminant method is proposed to yield a statistical metric that estimates an optimal cluster number when clustering on a dataset. First, the average degree of distortion obtained by the Elbow method is normalized to the range of 0 to 10. Second, the normalized results are used to calculate the cosine of intersection angles between elbow points. Third, this calculated cosine of intersection angles and the arccosine theorem are used to compute the intersection angles between elbow points. Finally, the index of the above computed minimal intersection angles between elbow points is used as the estimated potential optimal cluster number. The experimental results based on simulated datasets and a well-known public dataset (Iris Dataset) demonstrated that the estimated optimal cluster number obtained by our newly proposed method is better than the widely used Silhouette method.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Due to technical limitations, full-text HTML conversion of this manuscript could not be completed. However, the latest manuscript can be downloaded and accessed as a PDF.