We propose a neural network component, the regional aggregation layer, that makes it possible to train a pixel-level density estimator using only coarse-grained density aggregates, which reflect the number of objects in an image region. Our approach is simple to use and does not require domain-specific assumptions about the nature of the density function. We evaluate our approach on several synthetic datasets. In addition, we use this approach to learn to estimate high-resolution population and housing density from satellite imagery. In all cases, we find that our approach results in better density estimates than a commonly used baseline. We also show how our housing density estimator can be used to classify buildings as residential or non-residential.