Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Samuel Rota Buló

Panoptic Lifting for 3D Scene Understanding with Neural Fields

Dec 19, 2022

Yawar Siddiqui, Lorenzo Porzi, Samuel Rota Buló, Norman Müller, Matthias Nießner, Angela Dai, Peter Kontschieder

Figure 1 for Panoptic Lifting for 3D Scene Understanding with Neural Fields

Figure 2 for Panoptic Lifting for 3D Scene Understanding with Neural Fields

Figure 3 for Panoptic Lifting for 3D Scene Understanding with Neural Fields

Figure 4 for Panoptic Lifting for 3D Scene Understanding with Neural Fields

Abstract:We propose Panoptic Lifting, a novel approach for learning panoptic 3D volumetric representations from images of in-the-wild scenes. Once trained, our model can render color images together with 3D-consistent panoptic segmentation from novel viewpoints. Unlike existing approaches which use 3D input directly or indirectly, our method requires only machine-generated 2D panoptic segmentation masks inferred from a pre-trained network. Our core contribution is a panoptic lifting scheme based on a neural field representation that generates a unified and multi-view consistent, 3D panoptic representation of the scene. To account for inconsistencies of 2D instance identifiers across views, we solve a linear assignment with a cost based on the model's current predictions and the machine-generated segmentation masks, thus enabling us to lift 2D instances to 3D in a consistent way. We further propose and ablate contributions that make our method more robust to noisy, machine-generated labels, including test-time augmentations for confidence estimates, segment consistency loss, bounded segmentation fields, and gradient stopping. Experimental results validate our approach on the challenging Hypersim, Replica, and ScanNet datasets, improving by 8.4, 13.8, and 10.6% in scene-level PQ over state of the art.

* Project Page: https://nihalsid.github.io/panoptic-lifting/, Video: https://youtu.be/QtsiL-6rSuM

Via

Access Paper or Ask Questions

Modeling the Background for Incremental and Weakly-Supervised Semantic Segmentation

Jan 31, 2022

Fabio Cermelli, Massimiliano Mancini, Samuel Rota Buló, Elisa Ricci, Barbara Caputo

Figure 1 for Modeling the Background for Incremental and Weakly-Supervised Semantic Segmentation

Figure 2 for Modeling the Background for Incremental and Weakly-Supervised Semantic Segmentation

Figure 3 for Modeling the Background for Incremental and Weakly-Supervised Semantic Segmentation

Figure 4 for Modeling the Background for Incremental and Weakly-Supervised Semantic Segmentation

Abstract:Deep neural networks have enabled major progresses in semantic segmentation. However, even the most advanced neural architectures suffer from important limitations. First, they are vulnerable to catastrophic forgetting, i.e. they perform poorly when they are required to incrementally update their model as new classes are available. Second, they rely on large amount of pixel-level annotations to produce accurate segmentation maps. To tackle these issues, we introduce a novel incremental class learning approach for semantic segmentation taking into account a peculiar aspect of this task: since each training step provides annotation only for a subset of all possible classes, pixels of the background class exhibit a semantic shift. Therefore, we revisit the traditional distillation paradigm by designing novel loss terms which explicitly account for the background shift. Additionally, we introduce a novel strategy to initialize classifier's parameters at each step in order to prevent biased predictions toward the background class. Finally, we demonstrate that our approach can be extended to point- and scribble-based weakly supervised segmentation, modeling the partial annotations to create priors for unlabeled pixels. We demonstrate the effectiveness of our approach with an extensive evaluation on the Pascal-VOC, ADE20K, and Cityscapes datasets, significantly outperforming state-of-the-art methods.

* Accepted by T-PAMI (https://ieeexplore.ieee.org/document/9645239/). arXiv admin note: substantial text overlap with arXiv:2002.00718

Via

Access Paper or Ask Questions

Boosting Binary Masks for Multi-Domain Learning through Affine Transformations

Mar 25, 2021

Massimiliano Mancini, Elisa Ricci, Barbara Caputo, Samuel Rota Buló

Figure 1 for Boosting Binary Masks for Multi-Domain Learning through Affine Transformations

Figure 2 for Boosting Binary Masks for Multi-Domain Learning through Affine Transformations

Figure 3 for Boosting Binary Masks for Multi-Domain Learning through Affine Transformations

Figure 4 for Boosting Binary Masks for Multi-Domain Learning through Affine Transformations

Abstract:In this work, we present a new, algorithm for multi-domain learning. Given a pretrained architecture and a set of visual domains received sequentially, the goal of multi-domain learning is to produce a single model performing a task in all the domains together. Recent works showed how we can address this problem by masking the internal weights of a given original conv-net through learned binary variables. In this work, we provide a general formulation of binary mask based models for multi-domain learning by affine transformations of the original network parameters. Our formulation obtains significantly higher levels of adaptation to new domains, achieving performances comparable to domain-specific models while requiring slightly more than 1 bit per network parameter per additional domain. Experiments on two popular benchmarks showcase the power of our approach, achieving performances close to state-of-the-art methods on the Visual Decathlon Challenge.

* Accepted for publication by Machine Vision and Applications on May 21, 2020. arXiv admin note: substantial text overlap with arXiv:1805.11119

Via

Access Paper or Ask Questions