Abstract:Localizing and counting large ungulates -- hoofed mammals like cows and elk -- in very high-resolution satellite imagery is an important task for supporting ecological studies. Prior work has shown that this is feasible with deep learning based methods and sub-meter multi-spectral satellite imagery. We extend this line of work by proposing a baseline method, CowNet, that simultaneously estimates the number of animals in an image (counts), as well as predicts their location at a pixel level (localizes). We also propose an methodology for evaluating such models on counting and localization tasks across large scenes that takes the uncertainty of noisy labels and the information needed by stakeholders in ecological monitoring tasks into account. Finally, we benchmark our baseline method with state of the art vision methods for counting objects in scenes. We specifically test the temporal generalization of the resulting models over a large landscape in Point Reyes Seashore, CA. We find that the LC-FCN model performs the best and achieves an average precision between 0.56 and 0.61 and an average recall between 0.78 and 0.92 over three held out test scenes.