Abstract:Spatial pooling is an important step in computer vision systems like Convolutional Neural Networks or the Bag-of-Words method. The spatial pooling purpose is to combine neighbouring descriptors to obtain a single descriptor for a given region (local or global). The resultant combined vector must be as discriminant as possible, in other words, must contain relevant information, while removing irrelevant and confusing details. Maximum and average are the most common aggregation functions used in the pooling step. To improve the aggregation of relevant information without degrading their discriminative power for image classification, we introduce a simple but effective scheme based on Ordered Weighted Average (OWA) aggregation operators. We present a method to learn the weights of the OWA aggregation operator in a Bag-of-Words framework and in Convolutional Neural Networks, and provide an extensive evaluation showing that OWA based pooling outperforms classical aggregation operators.
Abstract:Image search can be tackled using deep features from pre-trained Convolutional Neural Networks (CNN). The feature map from the last convolutional layer of a CNN encodes descriptive information from which a discriminative global descriptor can be obtained. We propose a new representation of co-occurrences from deep convolutional features to extract additional relevant information from this last convolutional layer. Combining this co-occurrence map with the feature map, we achieve an improved image representation. We present two different methods to get the co-occurrence representation, the first one based on direct aggregation of activations, and the second one, based on a trainable co-occurrence representation. The image descriptors derived from our methodology improve the performance in very well-known image retrieval datasets as we prove in the experiments.