Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Aravindakshan Babu

Scalable K-Medoids via True Error Bound and Familywise Bandits

May 27, 2019

Aravindakshan Babu, Saurabh Agarwal, Sudarshan Babu, Hariharan Chandrasekaran

Figure 1 for Scalable K-Medoids via True Error Bound and Familywise Bandits

Figure 2 for Scalable K-Medoids via True Error Bound and Familywise Bandits

Figure 3 for Scalable K-Medoids via True Error Bound and Familywise Bandits

Figure 4 for Scalable K-Medoids via True Error Bound and Familywise Bandits

Abstract:K-Medoids(KM) is a standard clustering method, used extensively on semi-metric data. Error analyses of KM have traditionally used an in-sample notion of error, which can be far from the true error and suffer from generalization error. We formalize the true K-Medoid error based on the underlying data distribution, by decomposing it into fundamental statistical problems of: minimum estimation (ME) and minimum mean estimation (MME). We provide a convergence result for MME and bound the true KM error for iid data. Inspired by this bound, we propose a computationally efficient, distributed KM algorithm namely MCPAM. MCPAM has expected runtime $\mathcal{O}(km)$ and provides massive computational savings for a small tradeoff in accuracy. We verify the quality and scaling properties of MCPAM on various datasets. And achieve the hitherto unachieved feat of calculating the KM of 1 billion points on semi-metric spaces.

Via

Access Paper or Ask Questions