Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

David Eigen

Enhancing Worldwide Image Geolocation by Ensembling Satellite-Based Ground-Level Attribute Predictors

Jul 18, 2024

Michael J. Bianco, David Eigen, Michael Gormish

Abstract:Geolocating images of a ground-level scene entails estimating the location on Earth where the picture was taken, in absence of GPS or other location metadata. Typically, methods are evaluated by measuring the Great Circle Distance (GCD) between a predicted location and ground truth. However, this measurement is limited because it only evaluates a single point, not estimates of regions or score heatmaps. This is especially important in applications to rural, wilderness and under-sampled areas, where finding the exact location may not be possible, and when used in aggregate systems that progressively narrow down locations. In this paper, we introduce a novel metric, Recall vs Area (RvA), which measures the accuracy of estimated distributions of locations. RvA treats image geolocation results similarly to document retrieval, measuring recall as a function of area: For a ranked list of (possibly non-contiguous) predicted regions, we measure the accumulated area required for the region to contain the ground truth coordinate. This produces a curve similar to a precision-recall curve, where "precision" is replaced by square kilometers area, allowing evaluation of performance for different downstream search area budgets. Following directly from this view of the problem, we then examine a simple ensembling approach to global-scale image geolocation, which incorporates information from multiple sources to help address domain shift, and can readily incorporate multiple models, attribute predictors, and data sources. We study its effectiveness by combining the geolocation models GeoEstimation and the current SOTA GeoCLIP, with attribute predictors based on ORNL LandScan and ESA-CCI Land Cover. We find significant improvements in image geolocation for areas that are under-represented in the training set, particularly non-urban areas, on both Im2GPS3k and Street View images.

Via

Access Paper or Ask Questions

Efficient Training of Deep Convolutional Neural Networks by Augmentation in Embedding Space

Feb 12, 2020

Mohammad Saeed Abrishami, Amir Erfan Eshratifar, David Eigen, Yanzhi Wang, Shahin Nazarian, Massoud Pedram

Figure 1 for Efficient Training of Deep Convolutional Neural Networks by Augmentation in Embedding Space

Figure 2 for Efficient Training of Deep Convolutional Neural Networks by Augmentation in Embedding Space

Figure 3 for Efficient Training of Deep Convolutional Neural Networks by Augmentation in Embedding Space

Figure 4 for Efficient Training of Deep Convolutional Neural Networks by Augmentation in Embedding Space

Abstract:Recent advances in the field of artificial intelligence have been made possible by deep neural networks. In applications where data are scarce, transfer learning and data augmentation techniques are commonly used to improve the generalization of deep learning models. However, fine-tuning a transfer model with data augmentation in the raw input space has a high computational cost to run the full network for every augmented input. This is particularly critical when large models are implemented on embedded devices with limited computational and energy resources. In this work, we propose a method that replaces the augmentation in the raw input space with an approximate one that acts purely in the embedding space. Our experimental results show that the proposed method drastically reduces the computation, while the accuracy of models is negligibly compromised.

Via

Access Paper or Ask Questions

Coarse2Fine: A Two-stage Training Method for Fine-grained Visual Classification

Sep 06, 2019

Amir Erfan Eshratifar, David Eigen, Michael Gormish, Massoud Pedram

Figure 1 for Coarse2Fine: A Two-stage Training Method for Fine-grained Visual Classification

Figure 2 for Coarse2Fine: A Two-stage Training Method for Fine-grained Visual Classification

Figure 3 for Coarse2Fine: A Two-stage Training Method for Fine-grained Visual Classification

Figure 4 for Coarse2Fine: A Two-stage Training Method for Fine-grained Visual Classification

Abstract:Small inter-class and large intra-class variations are the main challenges in fine-grained visual classification. Objects from different classes share visually similar structures and objects in the same class can have different poses and viewpoints. Therefore, the proper extraction of discriminative local features (e.g. bird's beak or car's headlight) is crucial. Most of the recent successes on this problem are based upon the attention models which can localize and attend the local discriminative objects parts. In this work, we propose a training method for visual attention networks, Coarse2Fine, which creates a differentiable path from the input space to the attended feature maps. Coarse2Fine learns an inverse mapping function from the attended feature maps to the informative regions in the raw image, which will guide the attention maps to better attend the fine-grained features. We show Coarse2Fine and orthogonal initialization of the attention weights can surpass the state-of-the-art accuracies on common fine-grained classification tasks.

Via

Access Paper or Ask Questions

Finding Task-Relevant Features for Few-Shot Learning by Category Traversal

May 27, 2019

Hongyang Li, David Eigen, Samuel Dodge, Matthew Zeiler, Xiaogang Wang

Figure 1 for Finding Task-Relevant Features for Few-Shot Learning by Category Traversal

Figure 2 for Finding Task-Relevant Features for Few-Shot Learning by Category Traversal

Figure 3 for Finding Task-Relevant Features for Few-Shot Learning by Category Traversal

Figure 4 for Finding Task-Relevant Features for Few-Shot Learning by Category Traversal

Abstract:Few-shot learning is an important area of research. Conceptually, humans are readily able to understand new concepts given just a few examples, while in more pragmatic terms, limited-example training situations are common in practice. Recent effective approaches to few-shot learning employ a metric-learning framework to learn a feature similarity comparison between a query (test) example, and the few support (training) examples. However, these approaches treat each support class independently from one another, never looking at the entire task as a whole. Because of this, they are constrained to use a single set of features for all possible test-time tasks, which hinders the ability to distinguish the most relevant dimensions for the task at hand. In this work, we introduce a Category Traversal Module that can be inserted as a plug-and-play module into most metric-learning based few-shot learners. This component traverses across the entire support set at once, identifying task-relevant features based on both intra-class commonality and inter-class uniqueness in the feature space. Incorporating our module improves performance considerably (5%-10% relative) over baseline systems on both mini-ImageNet and tieredImageNet benchmarks, with overall performance competitive with recent state-of-the-art systems.

* CVPR 2019

Via

Access Paper or Ask Questions

Gradient Agreement as an Optimization Objective for Meta-Learning

Oct 18, 2018

Amir Erfan Eshratifar, David Eigen, Massoud Pedram

Figure 1 for Gradient Agreement as an Optimization Objective for Meta-Learning

Figure 2 for Gradient Agreement as an Optimization Objective for Meta-Learning

Abstract:This paper presents a novel optimization method for maximizing generalization over tasks in meta-learning. The goal of meta-learning is to learn a model for an agent adapting rapidly when presented with previously unseen tasks. Tasks are sampled from a specific distribution which is assumed to be similar for both seen and unseen tasks. We focus on a family of meta-learning methods learning initial parameters of a base model which can be fine-tuned quickly on a new task, by few gradient steps (MAML). Our approach is based on pushing the parameters of the model to a direction in which tasks have more agreement upon. If the gradients of a task agree with the parameters update vector, then their inner product will be a large positive value. As a result, given a batch of tasks to be optimized for, we associate a positive (negative) weight to the loss function of a task, if the inner product between its gradients and the average of the gradients of all tasks in the batch is a positive (negative) value. Therefore, the degree of the contribution of a task to the parameter updates is controlled by introducing a set of weights on the loss function of the tasks. Our method can be easily integrated with the current meta-learning algorithms for neural networks. Our experiments demonstrate that it yields models with better generalization compared to MAML and Reptile.

Via

Access Paper or Ask Questions

A Meta-Learning Approach for Custom Model Training

Sep 21, 2018

Amir Erfan Eshratifar, Mohammad Saeed Abrishami, David Eigen, Massoud Pedram

Figure 1 for A Meta-Learning Approach for Custom Model Training

Figure 2 for A Meta-Learning Approach for Custom Model Training

Abstract:Transfer-learning and meta-learning are two effective methods to apply knowledge learned from large data sources to new tasks. In few-class, few-shot target task settings (i.e. when there are only a few classes and training examples available in the target task), meta-learning approaches that optimize for future task learning have outperformed the typical transfer approach of initializing model weights from a pre-trained starting point. But as we experimentally show, meta-learning algorithms that work well in the few-class setting do not generalize well in many-shot and many-class cases. In this paper, we propose a joint training approach that combines both transfer-learning and meta-learning. Benefiting from the advantages of each, our method obtains improved generalization performance on unseen target tasks in both few- and many-class and few- and many-shot scenarios.

Via

Access Paper or Ask Questions

Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture

Dec 17, 2015

David Eigen, Rob Fergus

Figure 1 for Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture

Figure 2 for Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture

Figure 3 for Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture

Figure 4 for Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture

Abstract:In this paper we address three different computer vision tasks using a single basic architecture: depth prediction, surface normal estimation, and semantic labeling. We use a multiscale convolutional network that is able to adapt easily to each task using only small modifications, regressing from the input image to the output map directly. Our method progressively refines predictions using a sequence of scales, and captures many image details without any superpixels or low-level segmentation. We achieve state-of-the-art performance on benchmarks for all three tasks.

Via

Access Paper or Ask Questions

Unsupervised Learning of Spatiotemporally Coherent Metrics

Sep 08, 2015

Ross Goroshin, Joan Bruna, Jonathan Tompson, David Eigen, Yann LeCun

Figure 1 for Unsupervised Learning of Spatiotemporally Coherent Metrics

Figure 2 for Unsupervised Learning of Spatiotemporally Coherent Metrics

Figure 3 for Unsupervised Learning of Spatiotemporally Coherent Metrics

Figure 4 for Unsupervised Learning of Spatiotemporally Coherent Metrics

Abstract:Current state-of-the-art classification and detection algorithms rely on supervised training. In this work we study unsupervised feature learning in the context of temporally coherent video data. We focus on feature learning from unlabeled video data, using the assumption that adjacent video frames contain semantically similar information. This assumption is exploited to train a convolutional pooling auto-encoder regularized by slowness and sparsity. We establish a connection between slow feature learning to metric learning and show that the trained encoder can be used to define a more temporally and semantically coherent metric.

* To appear at ICCV2015

Via

Access Paper or Ask Questions

Unsupervised Feature Learning from Temporal Data

Apr 15, 2015

Ross Goroshin, Joan Bruna, Jonathan Tompson, David Eigen, Yann LeCun

Figure 1 for Unsupervised Feature Learning from Temporal Data

Figure 2 for Unsupervised Feature Learning from Temporal Data

Figure 3 for Unsupervised Feature Learning from Temporal Data

Figure 4 for Unsupervised Feature Learning from Temporal Data

* arXiv admin note: substantial text overlap with arXiv:1412.6056

Via

Access Paper or Ask Questions

End-to-End Integration of a Convolutional Network, Deformable Parts Model and Non-Maximum Suppression

Nov 19, 2014

Li Wan, David Eigen, Rob Fergus

Figure 1 for End-to-End Integration of a Convolutional Network, Deformable Parts Model and Non-Maximum Suppression

Figure 2 for End-to-End Integration of a Convolutional Network, Deformable Parts Model and Non-Maximum Suppression

Figure 3 for End-to-End Integration of a Convolutional Network, Deformable Parts Model and Non-Maximum Suppression

Figure 4 for End-to-End Integration of a Convolutional Network, Deformable Parts Model and Non-Maximum Suppression

Abstract:Deformable Parts Models and Convolutional Networks each have achieved notable performance in object detection. Yet these two approaches find their strengths in complementary areas: DPMs are well-versed in object composition, modeling fine-grained spatial relationships between parts; likewise, ConvNets are adept at producing powerful image features, having been discriminatively trained directly on the pixels. In this paper, we propose a new model that combines these two approaches, obtaining the advantages of each. We train this model using a new structured loss function that considers all bounding boxes within an image, rather than isolated object instances. This enables the non-maximal suppression (NMS) operation, previously treated as a separate post-processing stage, to be integrated into the model. This allows for discriminative training of our combined Convnet + DPM + NMS model in end-to-end fashion. We evaluate our system on PASCAL VOC 2007 and 2011 datasets, achieving competitive results on both benchmarks.

Via

Access Paper or Ask Questions