Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:An Efficient $k$-modes Algorithm for Clustering Categorical Datasets

Jun 06, 2020

Karin S. Dorman, Ranjan Maitra

Figure 1 for An Efficient $k$-modes Algorithm for Clustering Categorical Datasets

Figure 2 for An Efficient $k$-modes Algorithm for Clustering Categorical Datasets

Figure 3 for An Efficient $k$-modes Algorithm for Clustering Categorical Datasets

Figure 4 for An Efficient $k$-modes Algorithm for Clustering Categorical Datasets

Share this with someone who'll enjoy it:

Abstract:Mining clusters from datasets is an important endeavor in many applications. The $k$-means algorithm is a popular and efficient distribution-free approach for clustering numerical-valued data but can not be applied to categorical-valued observations. The $k$-modes algorithm addresses this lacuna by taking the $k$-means objective function, replacing the dissimilarity measure and using modes instead of means in the modified objective function. Unlike many other clustering algorithms, both $k$-modes and $k$-means are scalable, because they do not require calculation of all pairwise dissimilarities. We provide a fast and computationally efficient implementation of $k$-modes, OTQT, and prove that it can find superior clusterings to existing algorithms. We also examine five initialization methods and three types of $K$-selection methods, many of them novel, and all appropriate for $k$-modes. By examining the performance on real and simulated datasets, we show that simple random initialization is the best intializer, a novel $K$-selection method is more accurate than two methods adapted from $k$-means, and that the new OTQT algorithm is more accurate and almost always faster than existing algorithms.

* 28 pages, 16 figures, 5 tables

View paper on

Share this with someone who'll enjoy it:

Title:An Efficient $k$-modes Algorithm for Clustering Categorical Datasets

Paper and Code