Object counting, whose aim is to estimate the number of objects from a given image, is an important and challenging computation task. Significant efforts have been devoted to addressing this problem and achieved great progress, yet counting the number of ground objects from remote sensing images is barely studied. In this paper, we are interested in counting dense objects from remote sensing images. Compared with object counting in a natural scene, this task is challenging in the following factors: large scale variation, complex cluttered background, and orientation arbitrariness. More importantly, the scarcity of data severely limits the development of research in this field. To address these issues, we first construct a large-scale object counting dataset with remote sensing images, which contains four important geographic objects: buildings, crowded ships in harbors, large-vehicles and small-vehicles in parking lots. We then benchmark the dataset by designing a novel neural network that can generate a density map of an input image. The proposed network consists of three parts namely attention module, scale pyramid module and deformable convolution module to attack the aforementioned challenging factors. Extensive experiments are performed on the proposed dataset and one crowd counting datset, which demonstrate the challenges of the proposed dataset and the superiority and effectiveness of our method compared with state-of-the-art methods.