We explore ways in which the covariance ellipsoid ${\cal B}=\{v \in \mathbb{R}^d : \mathbb{E} <X,v>^2 \leq 1\}$ of a centred random vector $X$ in $\mathbb{R}^d$ can be approximated by a simple set. The data one is given for constructing the approximating set consists of $X_1,...,X_N$ that are independent and distributed as $X$. We present a general method that can be used to construct such approximations and implement it for two types of approximating sets. We first construct a (random) set ${\cal K}$ defined by a union of intersections of slabs $H_{z,\alpha}=\{v \in \mathbb{R}^d : |<z,v>| \leq \alpha\}$ (and therefore ${\cal K}$ is actually the output of a neural network with two hidden layers). The slabs are generated using $X_1,...,X_N$, and under minimal assumptions on $X$ (e.g., $X$ can be heavy-tailed) it suffices that $N = c_1d \eta^{-4}\log(2/\eta)$ to ensure that $(1-\eta) {\cal K} \subset {\cal B} \subset (1+\eta){\cal K}$. In some cases (e.g., if $X$ is rotation invariant and has marginals that are well behaved in some weak sense), a smaller sample size suffices: $N = c_1d\eta^{-2}\log(2/\eta)$. We then show that if the slabs are replaced by randomly generated ellipsoids defined using $X_1,...,X_N$, the same degree of approximation is true when $N \geq c_2d\eta^{-2}\log(2/\eta)$. The construction we use is based on the small-ball method.