Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sinisa Stekovic

Leveraging Automatic CAD Annotations for Supervised Learning in 3D Scene Understanding

Apr 18, 2025

Yuchen Rao, Stefan Ainetter, Sinisa Stekovic, Vincent Lepetit, Friedrich Fraundorfer

Abstract:High-level 3D scene understanding is essential in many applications. However, the challenges of generating accurate 3D annotations make development of deep learning models difficult. We turn to recent advancements in automatic retrieval of synthetic CAD models, and show that data generated by such methods can be used as high-quality ground truth for training supervised deep learning models. More exactly, we employ a pipeline akin to the one previously used to automatically annotate objects in ScanNet scenes with their 9D poses and CAD models. This time, we apply it to the recent ScanNet++ v1 dataset, which previously lacked such annotations. Our findings demonstrate that it is not only possible to train deep learning models on these automatically-obtained annotations but that the resulting models outperform those trained on manually annotated data. We validate this on two distinct tasks: point cloud completion and single-view CAD model retrieval and alignment. Our results underscore the potential of automatic 3D annotations to enhance model performance while significantly reducing annotation costs. To support future research in 3D scene understanding, we will release our annotations, which we call SCANnotate++, along with our trained models.

* Github Page: https://github.com/stefan-ainetter/SCANnotatepp

Via

Access Paper or Ask Questions

PyTorchGeoNodes: Enabling Differentiable Shape Programs for 3D Shape Reconstruction

Apr 16, 2024

Sinisa Stekovic, Stefan Ainetter, Mattia D'Urso, Friedrich Fraundorfer, Vincent Lepetit

Figure 1 for PyTorchGeoNodes: Enabling Differentiable Shape Programs for 3D Shape Reconstruction

Figure 2 for PyTorchGeoNodes: Enabling Differentiable Shape Programs for 3D Shape Reconstruction

Figure 3 for PyTorchGeoNodes: Enabling Differentiable Shape Programs for 3D Shape Reconstruction

Figure 4 for PyTorchGeoNodes: Enabling Differentiable Shape Programs for 3D Shape Reconstruction

Abstract:We propose PyTorchGeoNodes, a differentiable module for reconstructing 3D objects from images using interpretable shape programs. In comparison to traditional CAD model retrieval methods, the use of shape programs for 3D reconstruction allows for reasoning about the semantic properties of reconstructed objects, editing, low memory footprint, etc. However, the utilization of shape programs for 3D scene understanding has been largely neglected in past works. As our main contribution, we enable gradient-based optimization by introducing a module that translates shape programs designed in Blender, for example, into efficient PyTorch code. We also provide a method that relies on PyTorchGeoNodes and is inspired by Monte Carlo Tree Search (MCTS) to jointly optimize discrete and continuous parameters of shape programs and reconstruct 3D objects for input scenes. In our experiments, we apply our algorithm to reconstruct 3D objects in the ScanNet dataset and evaluate our results against CAD model retrieval-based reconstructions. Our experiments indicate that our reconstructions match well the input scenes while enabling semantic reasoning about reconstructed objects.

* In Submission

Via

Access Paper or Ask Questions

HOC-Search: Efficient CAD Model and Pose Retrieval from RGB-D Scans

Sep 12, 2023

Stefan Ainetter, Sinisa Stekovic, Friedrich Fraundorfer, Vincent Lepetit

Figure 1 for HOC-Search: Efficient CAD Model and Pose Retrieval from RGB-D Scans

Figure 2 for HOC-Search: Efficient CAD Model and Pose Retrieval from RGB-D Scans

Figure 3 for HOC-Search: Efficient CAD Model and Pose Retrieval from RGB-D Scans

Figure 4 for HOC-Search: Efficient CAD Model and Pose Retrieval from RGB-D Scans

Abstract:We present an automated and efficient approach for retrieving high-quality CAD models of objects and their poses in a scene captured by a moving RGB-D camera. We first investigate various objective functions to measure similarity between a candidate CAD object model and the available data, and the best objective function appears to be a "render-and-compare" method comparing depth and mask rendering. We thus introduce a fast-search method that approximates an exhaustive search based on this objective function for simultaneously retrieving the object category, a CAD model, and the pose of an object given an approximate 3D bounding box. This method involves a search tree that organizes the CAD models and object properties including object category and pose for fast retrieval and an algorithm inspired by Monte Carlo Tree Search, that efficiently searches this tree. We show that this method retrieves CAD models that fit the real objects very well, with a speed-up factor of 10x to 120x compared to exhaustive search.

Via

Access Paper or Ask Questions

Automatically Annotating Indoor Images with CAD Models via RGB-D Scans

Dec 22, 2022

Stefan Ainetter, Sinisa Stekovic, Friedrich Fraundorfer, Vincent Lepetit

Figure 1 for Automatically Annotating Indoor Images with CAD Models via RGB-D Scans

Figure 2 for Automatically Annotating Indoor Images with CAD Models via RGB-D Scans

Figure 3 for Automatically Annotating Indoor Images with CAD Models via RGB-D Scans

Figure 4 for Automatically Annotating Indoor Images with CAD Models via RGB-D Scans

Abstract:We present an automatic method for annotating images of indoor scenes with the CAD models of the objects by relying on RGB-D scans. Through a visual evaluation by 3D experts, we show that our method retrieves annotations that are at least as accurate as manual annotations, and can thus be used as ground truth without the burden of manually annotating 3D data. We do this using an analysis-by-synthesis approach, which compares renderings of the CAD models with the captured scene. We introduce a 'cloning procedure' that identifies objects that have the same geometry, to annotate these objects with the same CAD models. This allows us to obtain complete annotations for the ScanNet dataset and the recent ARKitScenes dataset.

Via

Access Paper or Ask Questions

MonteBoxFinder: Detecting and Filtering Primitives to Fit a Noisy Point Cloud

Jul 28, 2022

Michaël Ramamonjisoa, Sinisa Stekovic, Vincent Lepetit

Figure 1 for MonteBoxFinder: Detecting and Filtering Primitives to Fit a Noisy Point Cloud

Figure 2 for MonteBoxFinder: Detecting and Filtering Primitives to Fit a Noisy Point Cloud

Figure 3 for MonteBoxFinder: Detecting and Filtering Primitives to Fit a Noisy Point Cloud

Figure 4 for MonteBoxFinder: Detecting and Filtering Primitives to Fit a Noisy Point Cloud

Abstract:We present MonteBoxFinder, a method that, given a noisy input point cloud, fits cuboids to the input scene. Our primary contribution is a discrete optimization algorithm that, from a dense set of initially detected cuboids, is able to efficiently filter good boxes from the noisy ones. Inspired by recent applications of MCTS to scene understanding problems, we develop a stochastic algorithm that is, by design, more efficient for our task. Indeed, the quality of a fit for a cuboid arrangement is invariant to the order in which the cuboids are added into the scene. We develop several search baselines for our problem and demonstrate, on the ScanNet dataset, that our approach is more efficient and precise. Finally, we strongly believe that our core algorithm is very general and that it could be extended to many other problems in 3D scene understanding.

* Accepted at ECCV 2022. Project page: https://michaelramamonjisoa.github.io/projects/MonteBoxFinder, Code: https://github.com/MichaelRamamonjisoa/MonteBoxFinder

Via

Access Paper or Ask Questions

MCTS with Refinement for Proposals Selection Games in Scene Understanding

Jul 07, 2022

Sinisa Stekovic, Mahdi Rad, Alireza Moradi, Friedrich Fraundorfer, Vincent Lepetit

Figure 1 for MCTS with Refinement for Proposals Selection Games in Scene Understanding

Figure 2 for MCTS with Refinement for Proposals Selection Games in Scene Understanding

Figure 3 for MCTS with Refinement for Proposals Selection Games in Scene Understanding

Figure 4 for MCTS with Refinement for Proposals Selection Games in Scene Understanding

Abstract:We propose a novel method applicable in many scene understanding problems that adapts the Monte Carlo Tree Search (MCTS) algorithm, originally designed to learn to play games of high-state complexity. From a generated pool of proposals, our method jointly selects and optimizes proposals that minimize the objective term. In our first application for floor plan reconstruction from point clouds, our method selects and refines the room proposals, modelled as 2D polygons, by optimizing on an objective function combining the fitness as predicted by a deep network and regularizing terms on the room shapes. We also introduce a novel differentiable method for rendering the polygonal shapes of these proposals. Our evaluations on the recent and challenging Structured3D and Floor-SP datasets show significant improvements over the state-of-the-art, without imposing hard constraints nor assumptions on the floor plan configurations. In our second application, we extend our approach to reconstruct general 3D room layouts from a color image and obtain accurate room layouts. We also show that our differentiable renderer can easily be extended for rendering 3D planar polygons and polygon embeddings. Our method shows high performance on the Matterport3D-Layout dataset, without introducing hard constraints on room layout configurations.

* Submitted to: TPAMI Special Section on the Best Papers of ICCV2021 GitHub Repository: https://github.com/vevenom/MonteScene. arXiv admin note: substantial text overlap with arXiv:2103.11161

Via

Access Paper or Ask Questions

Monte Carlo Scene Search for 3D Scene Understanding

Mar 30, 2021

Shreyas Hampali, Sinisa Stekovic, Sayan Deb Sarkar, Chetan Srinivasa Kumar, Friedrich Fraundorfer, Vincent Lepetit

Figure 1 for Monte Carlo Scene Search for 3D Scene Understanding

Figure 2 for Monte Carlo Scene Search for 3D Scene Understanding

Figure 3 for Monte Carlo Scene Search for 3D Scene Understanding

Figure 4 for Monte Carlo Scene Search for 3D Scene Understanding

Abstract:We explore how a general AI algorithm can be used for 3D scene understanding to reduce the need for training data. More exactly, we propose a modification of the Monte Carlo Tree Search (MCTS) algorithm to retrieve objects and room layouts from noisy RGB-D scans. While MCTS was developed as a game-playing algorithm, we show it can also be used for complex perception problems. Our adapted MCTS algorithm has few easy-to-tune hyperparameters and can optimise general losses. We use it to optimise the posterior probability of objects and room layout hypotheses given the RGB-D data. This results in an analysis-by-synthesis approach that explores the solution space by rendering the current solution and comparing it to the RGB-D observations. To perform this exploration even more efficiently, we propose simple changes to the standard MCTS' tree construction and exploration policy. We demonstrate our approach on the ScanNet dataset. Our method often retrieves configurations that are better than some manual annotations, especially on layouts.

* To be presented at CVPR 2021

Via

Access Paper or Ask Questions

MonteFloor: Extending MCTS for Reconstructing Accurate Large-Scale Floor Plans

Mar 20, 2021

Sinisa Stekovic, Mahdi Rad, Friedrich Fraundorfer, Vincent Lepetit

Figure 1 for MonteFloor: Extending MCTS for Reconstructing Accurate Large-Scale Floor Plans

Figure 2 for MonteFloor: Extending MCTS for Reconstructing Accurate Large-Scale Floor Plans

Figure 3 for MonteFloor: Extending MCTS for Reconstructing Accurate Large-Scale Floor Plans

Figure 4 for MonteFloor: Extending MCTS for Reconstructing Accurate Large-Scale Floor Plans

Abstract:We propose a novel method for reconstructing floor plans from noisy 3D point clouds. Our main contribution is a principled approach that relies on the Monte Carlo Tree Search (MCTS) algorithm to maximize a suitable objective function efficiently despite the complexity of the problem. Like previous work, we first project the input point cloud to a top view to create a density map and extract room proposals from it. Our method selects and optimizes the polygonal shapes of these room proposals jointly to fit the density map and outputs an accurate vectorized floor map even for large complex scenes. To do this, we adapted MCTS, an algorithm originally designed to learn to play games, to select the room proposals by maximizing an objective function combining the fitness with the density map as predicted by a deep network and regularizing terms on the room shapes. We also introduce a refinement step to MCTS that adjusts the shape of the room proposals. For this step, we propose a novel differentiable method for rendering the polygonal shapes of these proposals. We evaluate our method on the recent and challenging Structured3D and Floor-SP datasets and show a significant improvement over the state-of-the-art, without imposing any hard constraints nor assumptions on the floor plan configurations.

Via

Access Paper or Ask Questions

General 3D Room Layout from a Single View by Render-and-Compare

Jan 07, 2020

Sinisa Stekovic, Friedrich Fraundorfer, Vincent Lepetit

Figure 1 for General 3D Room Layout from a Single View by Render-and-Compare

Figure 2 for General 3D Room Layout from a Single View by Render-and-Compare

Figure 3 for General 3D Room Layout from a Single View by Render-and-Compare

Figure 4 for General 3D Room Layout from a Single View by Render-and-Compare

Abstract:We present a novel method to reconstruct the 3D layout of a room -- walls,floors, ceilings -- from a single perspective view, even for the case of general configurations. This input view can consist of a color image only, but considering a depth map will result in a more accurate reconstruction. Our approach is based on solving a constrained discrete optimization problem, which selects the polygons which are part of the layout from a large set of potential polygons. In order to deal with occlusions between components of the layout, which is a problem ignored by previous works, we introduce an analysis-by-synthesis method to iteratively refine the 3D layout estimate. To the best of our knowledge, our method is the first that can estimate a layout in such general conditions from a single view. We additionally introduce a new annotation dataset made of 91 images from the ScanNet dataset and several metrics, in order to evaluate our results quantitatively.

Via

Access Paper or Ask Questions

Casting Geometric Constraints in Semantic Segmentation as Semi-Supervised Learning

Apr 30, 2019

Sinisa Stekovic, Friedrich Fraundorfer, Vincent Lepetit

Figure 1 for Casting Geometric Constraints in Semantic Segmentation as Semi-Supervised Learning

Figure 2 for Casting Geometric Constraints in Semantic Segmentation as Semi-Supervised Learning

Figure 3 for Casting Geometric Constraints in Semantic Segmentation as Semi-Supervised Learning

Figure 4 for Casting Geometric Constraints in Semantic Segmentation as Semi-Supervised Learning

Abstract:We propose a simple yet effective method to learn to segment new indoor scenes from an RGB-D sequence: State-of-the-art methods trained on one dataset, even as large as SUNRGB-D dataset, can perform poorly when applied to images that are not part of the dataset, because of the dataset bias, a common phenomenon in computer vision. To make semantic segmentation more useful in practice, we learn to segment new indoor scenes from sequences without manual annotations by exploiting geometric constraints and readily available training data from SUNRGB-D. As a result, we can then robustly segment new images of these scenes from color information only. To efficiently exploit geometric constraints for our purpose, we propose to cast these constraints as semi-supervised terms, which enforce the fact that the same class should be predicted for the projections of the same 3D location in different images. We show that this approach results in a simple yet very powerful method, which can annotate sequences of ScanNet and our own sequences using only annotations from SUNRGB-D.

* 8 pages, 7 figures

Via

Access Paper or Ask Questions