Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Josep M. Gonfaus

EarthView: A Large Scale Remote Sensing Dataset for Self-Supervision

Jan 14, 2025

Diego Velazquez, Pau Rodriguez López, Sergio Alonso, Josep M. Gonfaus, Jordi Gonzalez, Gerardo Richarte, Javier Marin, Yoshua Bengio, Alexandre Lacoste

Abstract:This paper presents EarthView, a comprehensive dataset specifically designed for self-supervision on remote sensing data, intended to enhance deep learning applications on Earth monitoring tasks. The dataset spans 15 tera pixels of global remote-sensing data, combining imagery from a diverse range of sources, including NEON, Sentinel, and a novel release of 1m spatial resolution data from Satellogic. Our dataset provides a wide spectrum of image data with varying resolutions, harnessed from different sensors and organized coherently into an accessible HuggingFace dataset in parquet format. This data spans five years, from 2017 to 2022. Accompanying the dataset, we introduce EarthMAE, a tailored Masked Autoencoder, developed to tackle the distinct challenges of remote sensing data. Trained in a self-supervised fashion, EarthMAE effectively processes different data modalities such as hyperspectral, multispectral, topographical data, segmentation maps, and temporal structure. This model helps us show that pre-training on Satellogic data improves performance on downstream tasks. While there is still a gap to fill in MAE for heterogeneous data, we regard this innovative combination of an expansive, diverse dataset and a versatile model adapted for self-supervised learning as a stride forward in deep learning for Earth monitoring.

* 2nd Workshop on Computer Vision for Earth Observation (CV4EO) Applications

Via

Access Paper or Ask Questions

Pay attention to the activations: a modular attention mechanism for fine-grained image recognition

Jul 30, 2019

Pau Rodríguez López, Diego Velazquez Dorta, Guillem Cucurull Preixens, Josep M. Gonfaus, F. Xavier Roca Marva, Jordi Gonzàlez Sabaté

Figure 1 for Pay attention to the activations: a modular attention mechanism for fine-grained image recognition

Figure 2 for Pay attention to the activations: a modular attention mechanism for fine-grained image recognition

Figure 3 for Pay attention to the activations: a modular attention mechanism for fine-grained image recognition

Figure 4 for Pay attention to the activations: a modular attention mechanism for fine-grained image recognition

Abstract:Fine-grained image recognition is central to many multimedia tasks such as search, retrieval and captioning. Unfortunately, these tasks are still challenging since the appearance of samples of the same class can be more different than those from different classes. Attention has been typically implemented in neural networks by selecting the most informative regions of the image that improve classification. In contrast, in this paper, attention is not applied at the image level but to the convolutional feature activations. In essence, with our approach, the neural model learns to attend to lower-level feature activations without requiring part annotations and uses those activations to update and rectify the output likelihood distribution. The proposed mechanism is modular, architecture-independent and efficient in terms of both parameters and computation required. Experiments demonstrate that well-known networks such as Wide Residual Networks and ResNeXt, when augmented with our approach, systematically improve their classification accuracy and become more robust to changes in deformation and pose and to the presence of clutter. As a result, our proposal reaches state-of-the-art classification accuracies in CIFAR-10, the Adience gender recognition task, Stanford Dogs, and UEC-Food100 while obtaining competitive performance in ImageNet, CIFAR-100, CUB200 Birds, and Stanford Cars. In addition, we analyze the different components of our model, showing that the proposed attention modules succeed in finding the most discriminative regions of the image. Finally, as a proof of concept, we demonstrate that with only local predictions, an augmented neural network can successfully classify an image before reaching any fully connected layer, thus reducing the computational amount up to 10%.

* IEEE Transactions on Multimedia, ECCV extension

Via

Access Paper or Ask Questions

Attend and Rectify: a Gated Attention Mechanism for Fine-Grained Recovery

Jul 24, 2018

Pau Rodríguez, Josep M. Gonfaus, Guillem Cucurull, F. Xavier Roca, Jordi Gonzàlez

Figure 1 for Attend and Rectify: a Gated Attention Mechanism for Fine-Grained Recovery

Figure 2 for Attend and Rectify: a Gated Attention Mechanism for Fine-Grained Recovery

Figure 3 for Attend and Rectify: a Gated Attention Mechanism for Fine-Grained Recovery

Figure 4 for Attend and Rectify: a Gated Attention Mechanism for Fine-Grained Recovery

Abstract:We propose a novel attention mechanism to enhance Convolutional Neural Networks for fine-grained recognition. It learns to attend to lower-level feature activations without requiring part annotations and uses these activations to update and rectify the output likelihood distribution. In contrast to other approaches, the proposed mechanism is modular, architecture-independent and efficient both in terms of parameters and computation required. Experiments show that networks augmented with our approach systematically improve their classification accuracy and become more robust to clutter. As a result, Wide Residual Networks augmented with our proposal surpasses the state of the art classification accuracies in CIFAR-10, the Adience gender recognition task, Stanford dogs, and UEC Food-100.

* Published at ECCV2018

Via

Access Paper or Ask Questions

Deep Inference of Personality Traits by Integrating Image and Word Use in Social Networks

Feb 06, 2018

Guillem Cucurull, Pau Rodríguez, V. Oguz Yazici, Josep M. Gonfaus, F. Xavier Roca, Jordi Gonzàlez

Figure 1 for Deep Inference of Personality Traits by Integrating Image and Word Use in Social Networks

Figure 2 for Deep Inference of Personality Traits by Integrating Image and Word Use in Social Networks

Figure 3 for Deep Inference of Personality Traits by Integrating Image and Word Use in Social Networks

Figure 4 for Deep Inference of Personality Traits by Integrating Image and Word Use in Social Networks

Abstract:Social media, as a major platform for communication and information exchange, is a rich repository of the opinions and sentiments of 2.3 billion users about a vast spectrum of topics. To sense the whys of certain social user's demands and cultural-driven interests, however, the knowledge embedded in the 1.8 billion pictures which are uploaded daily in public profiles has just started to be exploited since this process has been typically been text-based. Following this trend on visual-based social analysis, we present a novel methodology based on Deep Learning to build a combined image-and-text based personality trait model, trained with images posted together with words found highly correlated to specific personality traits. So the key contribution here is to explore whether OCEAN personality trait modeling can be addressed based on images, here called \emph{Mind{P}ics}, appearing with certain tags with psychological insights. We found that there is a correlation between those posted images and their accompanying texts, which can be successfully modeled using deep neural networks for personality estimation. The experimental results are consistent with previous cyber-psychology results based on texts or images. In addition, classification results on some traits show that some patterns emerge in the set of images corresponding to a specific text, in essence to those representing an abstract concept. These results open new avenues of research for further refining the proposed personality model under the supervision of psychology experts.

Via

Access Paper or Ask Questions

Regularizing CNNs with Locally Constrained Decorrelations

Mar 15, 2017

Pau Rodríguez, Jordi Gonzàlez, Guillem Cucurull, Josep M. Gonfaus, Xavier Roca

Figure 1 for Regularizing CNNs with Locally Constrained Decorrelations

Figure 2 for Regularizing CNNs with Locally Constrained Decorrelations

Figure 3 for Regularizing CNNs with Locally Constrained Decorrelations

Figure 4 for Regularizing CNNs with Locally Constrained Decorrelations

Abstract:Regularization is key for deep learning since it allows training more complex models while keeping lower levels of overfitting. However, the most prevalent regularizations do not leverage all the capacity of the models since they rely on reducing the effective number of parameters. Feature decorrelation is an alternative for using the full capacity of the models but the overfitting reduction margins are too narrow given the overhead it introduces. In this paper, we show that regularizing negatively correlated features is an obstacle for effective decorrelation and present OrthoReg, a novel regularization technique that locally enforces feature orthogonality. As a result, imposing locality constraints in feature decorrelation removes interferences between negatively correlated feature weights, allowing the regularizer to reach higher decorrelation bounds, and reducing the overfitting more effectively. In particular, we show that the models regularized with OrthoReg have higher accuracy bounds even when batch normalization and dropout are present. Moreover, since our regularization is directly performed on the weights, it is especially suitable for fully convolutional neural networks, where the weight space is constant compared to the feature map space. As a result, we are able to reduce the overfitting of state-of-the-art CNNs on CIFAR-10, CIFAR-100, and SVHN.

* Accepted at ICLR2017

Via

Access Paper or Ask Questions