The fuzzy $K$-means problem is a generalization of the classical $K$-means problem to soft clusterings, i.e. clusterings where each points belongs to each cluster to some degree. Although popular in practice, prior to this work the fuzzy $K$-means problem has not been studied from a complexity theoretic or algorithmic perspective. We show that optimal solutions for fuzzy $K$-means cannot, in general, be expressed by radicals over the input points. Surprisingly, this already holds for very simple inputs in one-dimensional space. Hence, one cannot expect to compute optimal solutions exactly. We give the first $(1+\epsilon)$-approximation algorithms for the fuzzy $K$-means problem. First, we present a deterministic approximation algorithm whose runtime is polynomial in $N$ and linear in the dimension $D$ of the input set, given that $K$ is constant, i.e. a polynomial time approximation algorithm given a fixed $K$. We achieve this result by showing that for each soft clustering there exists a hard clustering with comparable properties. Second, by using techniques known from coreset constructions for the $K$-means problem, we develop a deterministic approximation algorithm that runs in time almost linear in $N$ but exponential in the dimension $D$. We complement these results with a randomized algorithm which imposes some natural restrictions on the input set and whose runtime is comparable to some of the most efficient approximation algorithms for $K$-means, i.e. linear in the number of points and the dimension, but exponential in the number of clusters.