Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Bogdan-Adrian Manghiuc

Nearly-Optimal Hierarchical Clustering for Well-Clustered Graphs

Jun 16, 2023

Steinar Laenen, Bogdan-Adrian Manghiuc, He Sun

Abstract:This paper presents two efficient hierarchical clustering (HC) algorithms with respect to Dasgupta's cost function. For any input graph $G$ with a clear cluster-structure, our designed algorithms run in nearly-linear time in the input size of $G$, and return an $O(1)$-approximate HC tree with respect to Dasgupta's cost function. We compare the performance of our algorithm against the previous state-of-the-art on synthetic and real-world datasets and show that our designed algorithm produces comparable or better HC trees with much lower running time.

* This work is accepted at the 40th International Conference on Machine Learning (ICML'23)

Via

Access Paper or Ask Questions

Hierarchical Clustering: $O(1)$-Approximation for Well-Clustered Graphs

Dec 16, 2021

Bogdan-Adrian Manghiuc, He Sun

Figure 1 for Hierarchical Clustering: $O(1)$-Approximation for Well-Clustered Graphs

Figure 2 for Hierarchical Clustering: $O(1)$-Approximation for Well-Clustered Graphs

Figure 3 for Hierarchical Clustering: $O(1)$-Approximation for Well-Clustered Graphs

Figure 4 for Hierarchical Clustering: $O(1)$-Approximation for Well-Clustered Graphs

Abstract:Hierarchical clustering studies a recursive partition of a data set into clusters of successively smaller size, and is a fundamental problem in data analysis. In this work we study the cost function for hierarchical clustering introduced by Dasgupta, and present two polynomial-time approximation algorithms: Our first result is an $O(1)$-approximation algorithm for graphs of high conductance. Our simple construction bypasses complicated recursive routines of finding sparse cuts known in the literature. Our second and main result is an $O(1)$-approximation algorithm for a wide family of graphs that exhibit a well-defined structure of clusters. This result generalises the previous state-of-the-art, which holds only for graphs generated from stochastic models. The significance of our work is demonstrated by the empirical analysis on both synthetic and real-world data sets, on which our presented algorithm outperforms the previously proposed algorithm for graphs with a well-defined cluster structure.

* This work appeared at the 35th Conference on Neural Information Processing Systems (NeurIPS'21)

Via

Access Paper or Ask Questions