Scene classification is a fundamental task in interpretation of remote sensing images, and has become an active research topic in remote sensing community due to its important role in a wide range of applications. Over the past years, tremendous efforts have been made for developing powerful approaches for scene classification of remote sensing images, evolving from the traditional bag-of-visual-words model to the new generation deep convolutional neural networks (CNNs). The deep CNN based methods have exhibited remarkable breakthrough on performance, dramatically outperforming previous methods which strongly rely on hand-crafted features. However, performance with deep CNNs has gradually plateaued on existing public scene datasets, due to the notable drawbacks of these datasets, such as the small scale and low-diversity of training samples. Therefore, to promote the development of new methods and move the scene classification task a step further, we deeply discuss the existing problems in scene classification task, and accordingly present three open directions. We believe these potential directions will be instructive for the researchers in this field.