Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Anup Bhattacharya

Hardness of Approximation of Euclidean $k$-Median

Nov 09, 2020

Anup Bhattacharya, Dishant Goyal, Ragesh Jaiswal

Figure 1 for Hardness of Approximation of Euclidean $k$-Median

Figure 2 for Hardness of Approximation of Euclidean $k$-Median

Figure 3 for Hardness of Approximation of Euclidean $k$-Median

Figure 4 for Hardness of Approximation of Euclidean $k$-Median

Abstract:The Euclidean $k$-median problem is defined in the following manner: given a set $\mathcal{X}$ of $n$ points in $\mathbb{R}^{d}$, and an integer $k$, find a set $C \subset \mathbb{R}^{d}$ of $k$ points (called centers) such that the cost function $\Phi(C,\mathcal{X}) \equiv \sum_{x \in \mathcal{X}} \min_{c \in C} \|x-c\|_{2}$ is minimized. The Euclidean $k$-means problem is defined similarly by replacing the distance with squared distance in the cost function. Various hardness of approximation results are known for the Euclidean $k$-means problem. However, no hardness of approximation results were known for the Euclidean $k$-median problem. In this work, assuming the unique games conjecture (UGC), we provide the first hardness of approximation result for the Euclidean $k$-median problem. Furthermore, we study the hardness of approximation for the Euclidean $k$-means/$k$-median problems in the bi-criteria setting where an algorithm is allowed to choose more than $k$ centers. That is, bi-criteria approximation algorithms are allowed to output $\beta k$ centers (for constant $\beta>1$) and the approximation ratio is computed with respect to the optimal $k$-means/$k$-median cost. In this setting, we show the first hardness of approximation result for the Euclidean $k$-median problem for any $\beta < 1.015$, assuming UGC. We also show a similar bi-criteria hardness of approximation result for the Euclidean $k$-means problem with a stronger bound of $\beta < 1.28$, again assuming UGC.

Via

Access Paper or Ask Questions

Noisy, Greedy and Not So Greedy k-means++

Dec 02, 2019

Anup Bhattacharya, Jan Eube, Heiko Röglin, Melanie Schmidt

Figure 1 for Noisy, Greedy and Not So Greedy k-means++

Abstract:The k-means++ algorithm due to Arthur and Vassilvitskii has become the most popular seeding method for Lloyd's algorithm. It samples the first center uniformly at random from the data set and the other $k-1$ centers iteratively according to $D^2$-sampling where the probability that a data point becomes the next center is proportional to its squared distance to the closest center chosen so far. k-means++ is known to achieve an approximation factor of $O(\log k)$ in expectation. Already in the original paper on k-means++, Arthur and Vassilvitskii suggested a variation called greedy k-means++ algorithm in which in each iteration multiple possible centers are sampled according to $D^2$-sampling and only the one that decreases the objective the most is chosen as a center for that iteration. It is stated as an open question whether this also leads to an $O(\log k)$-approximation (or even better). We show that this is not the case by presenting a family of instances on which greedy k-means++ yields only an $\Omega(\ell\cdot \log k)$-approximation in expectation where $\ell$ is the number of possible centers that are sampled in each iteration. We also study a variation, which we call noisy k-means++ algorithm. In this variation only one center is sampled in every iteration but not exactly by $D^2$-sampling anymore. Instead in each iteration an adversary is allowed to change the probabilities arising from $D^2$-sampling individually for each point by a factor between $1-\epsilon_1$ and $1+\epsilon_2$ for parameters $\epsilon_1 \in [0,1)$ and $\epsilon_2 \ge 0$. We prove that noisy k-means++ compute an $O(\log^2 k)$-approximation in expectation. We also discuss some applications of this result.

Via

Access Paper or Ask Questions