Human-swarm interaction is facilitated by a low-dimensional encoding of the swarm formation, independent of the (possibly large) number of robots. We propose using image moments to encode two-dimensional formations of robots. Each robot knows the desired formation moments, and simultaneously estimates the current moments of the entire swarm while controlling its motion to better achieve the desired group moments. The estimator is a distributed optimization, requiring no centralized processing, and self-healing, meaning that the process is robust to initialization errors, packet drops, and robots being added to or removed from the swarm. Our experimental results with a swarm of 50 robots, suffering nearly 50% packet loss, show that distributed estimation and control of image moments effectively achieves desired swarm formations.