Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Avishkar Saha

Learning Adaptive Neighborhoods for Graph Neural Networks

Jul 18, 2023

Avishkar Saha, Oscar Mendez, Chris Russell, Richard Bowden

Abstract:Graph convolutional networks (GCNs) enable end-to-end learning on graph structured data. However, many works assume a given graph structure. When the input graph is noisy or unavailable, one approach is to construct or learn a latent graph structure. These methods typically fix the choice of node degree for the entire graph, which is suboptimal. Instead, we propose a novel end-to-end differentiable graph generator which builds graph topologies where each node selects both its neighborhood and its size. Our module can be readily integrated into existing pipelines involving graph convolution operations, replacing the predetermined or existing adjacency matrix with one that is learned, and optimized, as part of the general objective. As such it is applicable to any GCN. We integrate our module into trajectory prediction, point cloud classification and node classification pipelines resulting in improved accuracy over other structure-learning methods across a wide range of datasets and GCN backbones.

* ICCV 2023

Via

Access Paper or Ask Questions

"The Pedestrian next to the Lamppost" Adaptive Object Graphs for Better Instantaneous Mapping

Apr 06, 2022

Avishkar Saha, Oscar Mendez, Chris Russell, Richard Bowden

Figure 1 for "The Pedestrian next to the Lamppost" Adaptive Object Graphs for Better Instantaneous Mapping

Figure 2 for "The Pedestrian next to the Lamppost" Adaptive Object Graphs for Better Instantaneous Mapping

Figure 3 for "The Pedestrian next to the Lamppost" Adaptive Object Graphs for Better Instantaneous Mapping

Figure 4 for "The Pedestrian next to the Lamppost" Adaptive Object Graphs for Better Instantaneous Mapping

Abstract:Estimating a semantically segmented bird's-eye-view (BEV) map from a single image has become a popular technique for autonomous control and navigation. However, they show an increase in localization error with distance from the camera. While such an increase in error is entirely expected - localization is harder at distance - much of the drop in performance can be attributed to the cues used by current texture-based models, in particular, they make heavy use of object-ground intersections (such as shadows), which become increasingly sparse and uncertain for distant objects. In this work, we address these shortcomings in BEV-mapping by learning the spatial relationship between objects in a scene. We propose a graph neural network which predicts BEV objects from a monocular image by spatially reasoning about an object within the context of other objects. Our approach sets a new state-of-the-art in BEV estimation from monocular images across three large-scale datasets, including a 50% relative improvement for objects on nuScenes.

* Accepted to CVPR 2022

Via

Access Paper or Ask Questions

Translating Images into Maps

Oct 03, 2021

Avishkar Saha, Oscar Mendez Maldonado, Chris Russell, Richard Bowden

Figure 1 for Translating Images into Maps

Figure 2 for Translating Images into Maps

Figure 3 for Translating Images into Maps

Figure 4 for Translating Images into Maps

Abstract:We approach instantaneous mapping, converting images to a top-down view of the world, as a translation problem. We show how a novel form of transformer network can be used to map from images and video directly to an overhead map or bird's-eye-view (BEV) of the world, in a single end-to-end network. We assume a 1-1 correspondence between a vertical scanline in the image, and rays passing through the camera location in an overhead map. This lets us formulate map generation from an image as a set of sequence-to-sequence translations. Posing the problem as translation allows the network to use the context of the image when interpreting the role of each pixel. This constrained formulation, based upon a strong physical grounding of the problem, leads to a restricted transformer network that is convolutional in the horizontal direction only. The structure allows us to make efficient use of data when training, and obtains state-of-the-art results for instantaneous mapping of three large-scale datasets, including a 15% and 30% relative gain against existing best performing methods on the nuScenes and Argoverse datasets, respectively. We make our code available on https://github.com/avishkarsaha/translating-images-into-maps.

* 7 pages

Via

Access Paper or Ask Questions