Abstract:Nowadays, social media has become a popular platform for the public to share photos. To make photos more visually appealing, users usually apply filters on their photos without domain knowledge. However, due to the growing number of filter types, it becomes a major issue for users to choose the best filter type. For this purpose, filter recommendation for photo aesthetics takes an important role in image quality ranking problems. In these years, several works have declared that Convolutional Neural Networks (CNNs) outperform traditional methods in image aesthetic categorization, which classifies images into high or low quality. Most of them do not consider the effect on filtered images; hence, we propose a novel image aesthetic learning for filter recommendation. Instead of binarizing image quality, we adjust the state-of-the-art CNN architectures and design a pairwise loss function to learn the embedded aesthetic responses in hidden layers for filtered images. Based on our pilot study, we observe image categories (e.g., portrait, landscape, food) will affect user preference on filter selection. We further integrate category classification into our proposed aesthetic-oriented models. To the best of our knowledge, there is no public dataset for aesthetic judgment with filtered images. We create a new dataset called Filter Aesthetic Comparison Dataset (FACD). It contains 28,160 filtered images based on the AVA dataset and 42,240 reliable image pairs with aesthetic annotations using Amazon Mechanical Turk. It is the first dataset containing filtered images and user preference labels. We conduct experiments on the collected FACD for filter recommendation, and the results show that our proposed category-aware aesthetic learning outperforms aesthetic classification methods (e.g., 12% relative improvement).
Abstract:Due to the prevalence of mobile devices, mobile search becomes a more convenient way than desktop search. Different from the traditional desktop search, mobile visual search needs more consideration for the limited resources on mobile devices (e.g., bandwidth, computing power, and memory consumption). The state-of-the-art approaches show that bag-of-words (BoW) model is robust for image and video retrieval; however, the large vocabulary tree might not be able to be loaded on the mobile device. We observe that recent works mainly focus on designing compact feature representations on mobile devices for bandwidth-limited network (e.g., 3G) and directly adopt feature matching on remote servers (cloud). However, the compact (binary) representation might fail to retrieve target objects (images, videos). Based on the hashed binary codes, we propose a de-hashing process that reconstructs BoW by leveraging the computing power of remote servers. To mitigate the information loss from binary codes, we further utilize contextual information (e.g., GPS) to reconstruct a context-aware BoW for better retrieval results. Experiment results show that the proposed method can achieve competitive retrieval accuracy as BoW while only transmitting few bits from mobile devices.