Abstract:Rating how aesthetically pleasing an image appears is a highly complex matter and depends on a large number of different visual factors. Previous work has tackled the aesthetic rating problem by ranking on a 1-dimensional rating scale, e.g., incorporating handcrafted attributes. In this paper, we propose a rather general approach to automatically map aesthetic pleasingness with all its complexity into an "aesthetic space" to allow for a highly fine-grained resolution. In detail, making use of deep learning, our method directly learns an encoding of a given image into this high-dimensional feature space resembling visual aesthetics. Additionally to the mentioned visual factors, differences in personal judgments have a large impact on the likeableness of a photograph. Nowadays, online platforms allow users to "like" or favor certain content with a single click. To incorporate a huge diversity of people, we make use of such multi-user agreements and assemble a large data set of 380K images (AROD) with associated meta information and derive a score to rate how visually pleasing a given photo is. We validate our derived model of aesthetics in a user study. Further, without any extra data labeling or handcrafted features, we achieve state-of-the art accuracy on the AVA benchmark data set. Finally, as our approach is able to predict the aesthetic quality of any arbitrary image or video, we demonstrate our results on applications for resorting photo collections, capturing the best shot on mobile devices and aesthetic key-frame extraction from videos.