Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Dhanalaxmi Gaddam

CMR3D: Contextualized Multi-Stage Refinement for 3D Object Detection

Sep 13, 2022

Dhanalaxmi Gaddam, Jean Lahoud, Fahad Shahbaz Khan, Rao Muhammad Anwer, Hisham Cholakkal

Figure 1 for CMR3D: Contextualized Multi-Stage Refinement for 3D Object Detection

Figure 2 for CMR3D: Contextualized Multi-Stage Refinement for 3D Object Detection

Figure 3 for CMR3D: Contextualized Multi-Stage Refinement for 3D Object Detection

Figure 4 for CMR3D: Contextualized Multi-Stage Refinement for 3D Object Detection

Abstract:Existing deep learning-based 3D object detectors typically rely on the appearance of individual objects and do not explicitly pay attention to the rich contextual information of the scene. In this work, we propose Contextualized Multi-Stage Refinement for 3D Object Detection (CMR3D) framework, which takes a 3D scene as input and strives to explicitly integrate useful contextual information of the scene at multiple levels to predict a set of object bounding-boxes along with their corresponding semantic labels. To this end, we propose to utilize a context enhancement network that captures the contextual information at different levels of granularity followed by a multi-stage refinement module to progressively refine the box positions and class predictions. Extensive experiments on the large-scale ScanNetV2 benchmark reveal the benefits of our proposed method, leading to an absolute improvement of 2.0% over the baseline. In addition to 3D object detection, we investigate the effectiveness of our CMR3D framework for the problem of 3D object counting. Our source code will be publicly released.

* 5 figures, 10 pages including references

Via

Access Paper or Ask Questions

Self-Supervised Learning for Fine-Grained Visual Categorization

May 18, 2021

Muhammad Maaz, Hanoona Abdul Rasheed, Dhanalaxmi Gaddam

Figure 1 for Self-Supervised Learning for Fine-Grained Visual Categorization

Figure 2 for Self-Supervised Learning for Fine-Grained Visual Categorization

Figure 3 for Self-Supervised Learning for Fine-Grained Visual Categorization

Figure 4 for Self-Supervised Learning for Fine-Grained Visual Categorization

Abstract:Recent research in self-supervised learning (SSL) has shown its capability in learning useful semantic representations from images for classification tasks. Through our work, we study the usefulness of SSL for Fine-Grained Visual Categorization (FGVC). FGVC aims to distinguish objects of visually similar sub categories within a general category. The small inter-class, but large intra-class variations within the dataset makes it a challenging task. The limited availability of annotated labels for such a fine-grained data encourages the need for SSL, where additional supervision can boost learning without the cost of extra annotations. Our baseline achieves $86.36\%$ top-1 classification accuracy on CUB-200-2011 dataset by utilizing random crop augmentation during training and center crop augmentation during testing. In this work, we explore the usefulness of various pretext tasks, specifically, rotation, pretext invariant representation learning (PIRL), and deconstruction and construction learning (DCL) for FGVC. Rotation as an auxiliary task promotes the model to learn global features, and diverts it from focusing on the subtle details. PIRL that uses jigsaw patches attempts to focus on discriminative local regions, but struggles to accurately localize them. DCL helps in learning local discriminating features and outperforms the baseline by achieving $87.41\%$ top-1 accuracy. The deconstruction learning forces the model to focus on local object parts, while reconstruction learning helps in learning the correlation between the parts. We perform extensive experiments to reason our findings. Our code is available at https://github.com/mmaaz60/ssl_for_fgvc.

* 10 pages, 6 figures

Via

Access Paper or Ask Questions