Most of the recent advances in crowd counting have evolved from hand-designed density estimation networks, where multi-scale features are leveraged to address scale variation, but at the expense of demanding design efforts. In this work, we automate the design of counting models with Neural Architecture Search (NAS) and introduce an end-to-end searched encoder-decoder architecture, Automatic Multi-Scale Network (AMSNet). The encoder and decoder in AMSNet are composed of different cells discovered from counting-specific search spaces, each dedicated to extracting and aggregating multi-scale features adaptively. To resolve the pixel-level isolation issue in training density estimation models, AMSNet is optimized with a novel Scale Pyramid Pooling Loss (SPPLoss), which exploits a pyramidal architecture to achieve structural supervision at multiple scales. During training time, AMSNet and SPPLoss are searched end-to-end efficiently with differentiable NAS techniques. When testing, AMSNet produces state-of-the-art results that are considerably better than hand-designed models on four challenging datasets, fully demonstrating the efficacy of NAS-Count.