Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Diego Martin Arroyo

Mixed Diffusion for 3D Indoor Scene Synthesis

May 31, 2024

Siyi Hu, Diego Martin Arroyo, Stephanie Debats, Fabian Manhardt, Luca Carlone, Federico Tombari

Figure 1 for Mixed Diffusion for 3D Indoor Scene Synthesis

Figure 2 for Mixed Diffusion for 3D Indoor Scene Synthesis

Figure 3 for Mixed Diffusion for 3D Indoor Scene Synthesis

Figure 4 for Mixed Diffusion for 3D Indoor Scene Synthesis

Abstract:Realistic conditional 3D scene synthesis significantly enhances and accelerates the creation of virtual environments, which can also provide extensive training data for computer vision and robotics research among other applications. Diffusion models have shown great performance in related applications, e.g., making precise arrangements of unordered sets. However, these models have not been fully explored in floor-conditioned scene synthesis problems. We present MiDiffusion, a novel mixed discrete-continuous diffusion model architecture, designed to synthesize plausible 3D indoor scenes from given room types, floor plans, and potentially pre-existing objects. We represent a scene layout by a 2D floor plan and a set of objects, each defined by its category, location, size, and orientation. Our approach uniquely implements structured corruption across the mixed discrete semantic and continuous geometric domains, resulting in a better conditioned problem for the reverse denoising step. We evaluate our approach on the 3D-FRONT dataset. Our experimental results demonstrate that MiDiffusion substantially outperforms state-of-the-art autoregressive and diffusion models in floor-conditioned 3D scene synthesis. In addition, our models can handle partial object constraints via a corruption-and-masking strategy without task specific training. We show MiDiffusion maintains clear advantages over existing approaches in scene completion and furniture arrangement experiments.

* 19 pages, 14 figures. Under review. Code to be released at: https://github.com/MIT-SPARK/MiDiffusion

Via

Access Paper or Ask Questions

NeRFMeshing: Distilling Neural Radiance Fields into Geometrically-Accurate 3D Meshes

Mar 16, 2023

Marie-Julie Rakotosaona, Fabian Manhardt, Diego Martin Arroyo, Michael Niemeyer, Abhijit Kundu, Federico Tombari

Figure 1 for NeRFMeshing: Distilling Neural Radiance Fields into Geometrically-Accurate 3D Meshes

Figure 2 for NeRFMeshing: Distilling Neural Radiance Fields into Geometrically-Accurate 3D Meshes

Figure 3 for NeRFMeshing: Distilling Neural Radiance Fields into Geometrically-Accurate 3D Meshes

Figure 4 for NeRFMeshing: Distilling Neural Radiance Fields into Geometrically-Accurate 3D Meshes

Abstract:With the introduction of Neural Radiance Fields (NeRFs), novel view synthesis has recently made a big leap forward. At the core, NeRF proposes that each 3D point can emit radiance, allowing to conduct view synthesis using differentiable volumetric rendering. While neural radiance fields can accurately represent 3D scenes for computing the image rendering, 3D meshes are still the main scene representation supported by most computer graphics and simulation pipelines, enabling tasks such as real time rendering and physics-based simulations. Obtaining 3D meshes from neural radiance fields still remains an open challenge since NeRFs are optimized for view synthesis, not enforcing an accurate underlying geometry on the radiance field. We thus propose a novel compact and flexible architecture that enables easy 3D surface reconstruction from any NeRF-driven approach. Upon having trained the radiance field, we distill the volumetric 3D representation into a Signed Surface Approximation Network, allowing easy extraction of the 3D mesh and appearance. Our final 3D mesh is physically accurate and can be rendered in real time on an array of devices.

Via

Access Paper or Ask Questions

ParGAN: Learning Real Parametrizable Transformations

Nov 09, 2022

Diego Martin Arroyo, Alessio Tonioni, Federico Tombari

Abstract:Current methods for image-to-image translation produce compelling results, however, the applied transformation is difficult to control, since existing mechanisms are often limited and non-intuitive. We propose ParGAN, a generalization of the cycle-consistent GAN framework to learn image transformations with simple and intuitive controls. The proposed generator takes as input both an image and a parametrization of the transformation. We train this network to preserve the content of the input image while ensuring that the result is consistent with the given parametrization. Our approach does not require paired data and can learn transformations across several tasks and datasets. We show how, with disjoint image domains with no annotated parametrization, our framework can create smooth interpolations as well as learn multiple transformations simultaneously.

Via

Access Paper or Ask Questions

Variational Transformer Networks for Layout Generation

Apr 06, 2021

Diego Martin Arroyo, Janis Postels, Federico Tombari

Figure 1 for Variational Transformer Networks for Layout Generation

Figure 2 for Variational Transformer Networks for Layout Generation

Figure 3 for Variational Transformer Networks for Layout Generation

Figure 4 for Variational Transformer Networks for Layout Generation

Abstract:Generative models able to synthesize layouts of different kinds (e.g. documents, user interfaces or furniture arrangements) are a useful tool to aid design processes and as a first step in the generation of synthetic data, among other tasks. We exploit the properties of self-attention layers to capture high level relationships between elements in a layout, and use these as the building blocks of the well-known Variational Autoencoder (VAE) formulation. Our proposed Variational Transformer Network (VTN) is capable of learning margins, alignments and other global design rules without explicit supervision. Layouts sampled from our model have a high degree of resemblance to the training data, while demonstrating appealing diversity. In an extensive evaluation on publicly available benchmarks for different layout types VTNs achieve state-of-the-art diversity and perceptual quality. Additionally, we show the capabilities of this method as part of a document layout detection pipeline.

* To be published in CVPR 2021

Via

Access Paper or Ask Questions

Explaining the Ambiguity of Object Detection and 6D Pose from Visual Data

Dec 01, 2018

Fabian Manhardt, Diego Martin Arroyo, Christian Rupprecht, Benjamin Busam, Nassir Navab, Federico Tombari

Figure 1 for Explaining the Ambiguity of Object Detection and 6D Pose from Visual Data

Figure 2 for Explaining the Ambiguity of Object Detection and 6D Pose from Visual Data

Figure 3 for Explaining the Ambiguity of Object Detection and 6D Pose from Visual Data

Figure 4 for Explaining the Ambiguity of Object Detection and 6D Pose from Visual Data

Abstract:3D object detection and pose estimation from a single image are two inherently ambiguous problems. Oftentimes, objects appear similar from different viewpoints due to shape symmetries, occlusion and repetitive textures. This ambiguity in both detection and pose estimation means that an object instance can be perfectly described by several different poses and even classes. In this work we propose to explicitly deal with this uncertainty. For each object instance we predict multiple pose and class outcomes to estimate the specific pose distribution generated by symmetries and repetitive textures. The distribution collapses to a single outcome when the visual appearance uniquely identifies just one valid pose. We show the benefits of our approach which provides not only a better explanation for pose ambiguity, but also a higher accuracy in terms of pose estimation.

Via

Access Paper or Ask Questions