Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ashkan Mirzaei

3DGUT: Enabling Distorted Cameras and Secondary Rays in Gaussian Splatting

Dec 17, 2024

Qi Wu, Janick Martinez Esturo, Ashkan Mirzaei, Nicolas Moenne-Loccoz, Zan Gojcic

Figure 1 for 3DGUT: Enabling Distorted Cameras and Secondary Rays in Gaussian Splatting

Figure 2 for 3DGUT: Enabling Distorted Cameras and Secondary Rays in Gaussian Splatting

Figure 3 for 3DGUT: Enabling Distorted Cameras and Secondary Rays in Gaussian Splatting

Figure 4 for 3DGUT: Enabling Distorted Cameras and Secondary Rays in Gaussian Splatting

Abstract:3D Gaussian Splatting (3DGS) has shown great potential for efficient reconstruction and high-fidelity real-time rendering of complex scenes on consumer hardware. However, due to its rasterization-based formulation, 3DGS is constrained to ideal pinhole cameras and lacks support for secondary lighting effects. Recent methods address these limitations by tracing volumetric particles instead, however, this comes at the cost of significantly slower rendering speeds. In this work, we propose 3D Gaussian Unscented Transform (3DGUT), replacing the EWA splatting formulation in 3DGS with the Unscented Transform that approximates the particles through sigma points, which can be projected exactly under any nonlinear projection function. This modification enables trivial support of distorted cameras with time dependent effects such as rolling shutter, while retaining the efficiency of rasterization. Additionally, we align our rendering formulation with that of tracing-based methods, enabling secondary ray tracing required to represent phenomena such as reflections and refraction within the same 3D representation.

Via

Access Paper or Ask Questions

EventSplat: 3D Gaussian Splatting from Moving Event Cameras for Real-time Rendering

Dec 10, 2024

Toshiya Yura, Ashkan Mirzaei, Igor Gilitschenski

Abstract:We introduce a method for using event camera data in novel view synthesis via Gaussian Splatting. Event cameras offer exceptional temporal resolution and a high dynamic range. Leveraging these capabilities allows us to effectively address the novel view synthesis challenge in the presence of fast camera motion. For initialization of the optimization process, our approach uses prior knowledge encoded in an event-to-video model. We also use spline interpolation for obtaining high quality poses along the event camera trajectory. This enhances the reconstruction quality from fast-moving cameras while overcoming the computational limitations traditionally associated with event-based Neural Radiance Field (NeRF) methods. Our experimental evaluation demonstrates that our results achieve higher visual fidelity and better performance than existing event-based NeRF approaches while being an order of magnitude faster to render.

Via

Access Paper or Ask Questions

Feed-Forward Bullet-Time Reconstruction of Dynamic Scenes from Monocular Videos

Dec 04, 2024

Hanxue Liang, Jiawei Ren, Ashkan Mirzaei, Antonio Torralba, Ziwei Liu, Igor Gilitschenski, Sanja Fidler, Cengiz Oztireli, Huan Ling, Zan Gojcic(+1 more)

Figure 1 for Feed-Forward Bullet-Time Reconstruction of Dynamic Scenes from Monocular Videos

Figure 2 for Feed-Forward Bullet-Time Reconstruction of Dynamic Scenes from Monocular Videos

Figure 3 for Feed-Forward Bullet-Time Reconstruction of Dynamic Scenes from Monocular Videos

Figure 4 for Feed-Forward Bullet-Time Reconstruction of Dynamic Scenes from Monocular Videos

Abstract:Recent advancements in static feed-forward scene reconstruction have demonstrated significant progress in high-quality novel view synthesis. However, these models often struggle with generalizability across diverse environments and fail to effectively handle dynamic content. We present BTimer (short for BulletTimer), the first motion-aware feed-forward model for real-time reconstruction and novel view synthesis of dynamic scenes. Our approach reconstructs the full scene in a 3D Gaussian Splatting representation at a given target ('bullet') timestamp by aggregating information from all the context frames. Such a formulation allows BTimer to gain scalability and generalization by leveraging both static and dynamic scene datasets. Given a casual monocular dynamic video, BTimer reconstructs a bullet-time scene within 150ms while reaching state-of-the-art performance on both static and dynamic scene datasets, even compared with optimization-based approaches.

* Project website: https://research.nvidia.com/labs/toronto-ai/bullet-timer/

Via

Access Paper or Ask Questions

GaussianCut: Interactive segmentation via graph cut for 3D Gaussian Splatting

Nov 12, 2024

Umangi Jain, Ashkan Mirzaei, Igor Gilitschenski

Figure 1 for GaussianCut: Interactive segmentation via graph cut for 3D Gaussian Splatting

Figure 2 for GaussianCut: Interactive segmentation via graph cut for 3D Gaussian Splatting

Figure 3 for GaussianCut: Interactive segmentation via graph cut for 3D Gaussian Splatting

Figure 4 for GaussianCut: Interactive segmentation via graph cut for 3D Gaussian Splatting

Abstract:We introduce GaussianCut, a new method for interactive multiview segmentation of scenes represented as 3D Gaussians. Our approach allows for selecting the objects to be segmented by interacting with a single view. It accepts intuitive user input, such as point clicks, coarse scribbles, or text. Using 3D Gaussian Splatting (3DGS) as the underlying scene representation simplifies the extraction of objects of interest which are considered to be a subset of the scene's Gaussians. Our key idea is to represent the scene as a graph and use the graph-cut algorithm to minimize an energy function to effectively partition the Gaussians into foreground and background. To achieve this, we construct a graph based on scene Gaussians and devise a segmentation-aligned energy function on the graph to combine user inputs with scene properties. To obtain an initial coarse segmentation, we leverage 2D image/video segmentation models and further refine these coarse estimates using our graph construction. Our empirical evaluations show the adaptability of GaussianCut across a diverse set of scenes. GaussianCut achieves competitive performance with state-of-the-art approaches for 3D segmentation without requiring any additional segmentation-aware training.

Via

Access Paper or Ask Questions

3D Gaussian Ray Tracing: Fast Tracing of Particle Scenes

Jul 10, 2024

Nicolas Moenne-Loccoz, Ashkan Mirzaei, Or Perel, Riccardo de Lutio, Janick Martinez Esturo, Gavriel State, Sanja Fidler, Nicholas Sharp, Zan Gojcic

Figure 1 for 3D Gaussian Ray Tracing: Fast Tracing of Particle Scenes

Figure 2 for 3D Gaussian Ray Tracing: Fast Tracing of Particle Scenes

Figure 3 for 3D Gaussian Ray Tracing: Fast Tracing of Particle Scenes

Figure 4 for 3D Gaussian Ray Tracing: Fast Tracing of Particle Scenes

Abstract:Particle-based representations of radiance fields such as 3D Gaussian Splatting have found great success for reconstructing and re-rendering of complex scenes. Most existing methods render particles via rasterization, projecting them to screen space tiles for processing in a sorted order. This work instead considers ray tracing the particles, building a bounding volume hierarchy and casting a ray for each pixel using high-performance GPU ray tracing hardware. To efficiently handle large numbers of semi-transparent particles, we describe a specialized rendering algorithm which encapsulates particles with bounding meshes to leverage fast ray-triangle intersections, and shades batches of intersections in depth-order. The benefits of ray tracing are well-known in computer graphics: processing incoherent rays for secondary lighting effects such as shadows and reflections, rendering from highly-distorted cameras common in robotics, stochastically sampling rays, and more. With our renderer, this flexibility comes at little cost compared to rasterization. Experiments demonstrate the speed and accuracy of our approach, as well as several applications in computer graphics and vision. We further propose related improvements to the basic Gaussian representation, including a simple use of generalized kernel functions which significantly reduces particle hit counts.

* Project page: https://gaussiantracer.github.io/

Via

Access Paper or Ask Questions

L4GM: Large 4D Gaussian Reconstruction Model

Jun 14, 2024

Jiawei Ren, Kevin Xie, Ashkan Mirzaei, Hanxue Liang, Xiaohui Zeng, Karsten Kreis, Ziwei Liu, Antonio Torralba, Sanja Fidler, Seung Wook Kim(+1 more)

Figure 1 for L4GM: Large 4D Gaussian Reconstruction Model

Figure 2 for L4GM: Large 4D Gaussian Reconstruction Model

Figure 3 for L4GM: Large 4D Gaussian Reconstruction Model

Figure 4 for L4GM: Large 4D Gaussian Reconstruction Model

Abstract:We present L4GM, the first 4D Large Reconstruction Model that produces animated objects from a single-view video input -- in a single feed-forward pass that takes only a second. Key to our success is a novel dataset of multiview videos containing curated, rendered animated objects from Objaverse. This dataset depicts 44K diverse objects with 110K animations rendered in 48 viewpoints, resulting in 12M videos with a total of 300M frames. We keep our L4GM simple for scalability and build directly on top of LGM, a pretrained 3D Large Reconstruction Model that outputs 3D Gaussian ellipsoids from multiview image input. L4GM outputs a per-frame 3D Gaussian Splatting representation from video frames sampled at a low fps and then upsamples the representation to a higher fps to achieve temporal smoothness. We add temporal self-attention layers to the base LGM to help it learn consistency across time, and utilize a per-timestep multiview rendering loss to train the model. The representation is upsampled to a higher framerate by training an interpolation model which produces intermediate 3D Gaussian representations. We showcase that L4GM that is only trained on synthetic data generalizes extremely well on in-the-wild videos, producing high quality animated 3D assets.

* Project page: https://research.nvidia.com/labs/toronto-ai/l4gm

Via

Access Paper or Ask Questions

RefFusion: Reference Adapted Diffusion Models for 3D Scene Inpainting

Apr 16, 2024

Ashkan Mirzaei, Riccardo De Lutio, Seung Wook Kim, David Acuna, Jonathan Kelly, Sanja Fidler, Igor Gilitschenski, Zan Gojcic

Abstract:Neural reconstruction approaches are rapidly emerging as the preferred representation for 3D scenes, but their limited editability is still posing a challenge. In this work, we propose an approach for 3D scene inpainting -- the task of coherently replacing parts of the reconstructed scene with desired content. Scene inpainting is an inherently ill-posed task as there exist many solutions that plausibly replace the missing content. A good inpainting method should therefore not only enable high-quality synthesis but also a high degree of control. Based on this observation, we focus on enabling explicit control over the inpainted content and leverage a reference image as an efficient means to achieve this goal. Specifically, we introduce RefFusion, a novel 3D inpainting method based on a multi-scale personalization of an image inpainting diffusion model to the given reference view. The personalization effectively adapts the prior distribution to the target scene, resulting in a lower variance of score distillation objective and hence significantly sharper details. Our framework achieves state-of-the-art results for object removal while maintaining high controllability. We further demonstrate the generality of our formulation on other downstream tasks such as object insertion, scene outpainting, and sparse view reconstruction.

* Project page: https://reffusion.github.io

Via

Access Paper or Ask Questions

Reconstructive Latent-Space Neural Radiance Fields for Efficient 3D Scene Representations

Oct 27, 2023

Tristan Aumentado-Armstrong, Ashkan Mirzaei, Marcus A. Brubaker, Jonathan Kelly, Alex Levinshtein, Konstantinos G. Derpanis, Igor Gilitschenski

Figure 1 for Reconstructive Latent-Space Neural Radiance Fields for Efficient 3D Scene Representations

Figure 2 for Reconstructive Latent-Space Neural Radiance Fields for Efficient 3D Scene Representations

Figure 3 for Reconstructive Latent-Space Neural Radiance Fields for Efficient 3D Scene Representations

Figure 4 for Reconstructive Latent-Space Neural Radiance Fields for Efficient 3D Scene Representations

Abstract:Neural Radiance Fields (NeRFs) have proven to be powerful 3D representations, capable of high quality novel view synthesis of complex scenes. While NeRFs have been applied to graphics, vision, and robotics, problems with slow rendering speed and characteristic visual artifacts prevent adoption in many use cases. In this work, we investigate combining an autoencoder (AE) with a NeRF, in which latent features (instead of colours) are rendered and then convolutionally decoded. The resulting latent-space NeRF can produce novel views with higher quality than standard colour-space NeRFs, as the AE can correct certain visual artifacts, while rendering over three times faster. Our work is orthogonal to other techniques for improving NeRF efficiency. Further, we can control the tradeoff between efficiency and image quality by shrinking the AE architecture, achieving over 13 times faster rendering with only a small drop in performance. We hope that our approach can form the basis of an efficient, yet high-fidelity, 3D scene representation for downstream tasks, especially when retaining differentiability is useful, as in many robotics scenarios requiring continual learning.

Via

Access Paper or Ask Questions

Watch Your Steps: Local Image and Scene Editing by Text Instructions

Aug 17, 2023

Ashkan Mirzaei, Tristan Aumentado-Armstrong, Marcus A. Brubaker, Jonathan Kelly, Alex Levinshtein, Konstantinos G. Derpanis, Igor Gilitschenski

Abstract:Denoising diffusion models have enabled high-quality image generation and editing. We present a method to localize the desired edit region implicit in a text instruction. We leverage InstructPix2Pix (IP2P) and identify the discrepancy between IP2P predictions with and without the instruction. This discrepancy is referred to as the relevance map. The relevance map conveys the importance of changing each pixel to achieve the edits, and is used to to guide the modifications. This guidance ensures that the irrelevant pixels remain unchanged. Relevance maps are further used to enhance the quality of text-guided editing of 3D scenes in the form of neural radiance fields. A field is trained on relevance maps of training views, denoted as the relevance field, defining the 3D region within which modifications should be made. We perform iterative updates on the training views guided by rendered relevance maps from the relevance field. Our method achieves state-of-the-art performance on both image and NeRF editing tasks. Project page: https://ashmrz.github.io/WatchYourSteps/

* Project page: https://ashmrz.github.io/WatchYourSteps/

Via

Access Paper or Ask Questions

Reference-guided Controllable Inpainting of Neural Radiance Fields

Apr 20, 2023

Ashkan Mirzaei, Tristan Aumentado-Armstrong, Marcus A. Brubaker, Jonathan Kelly, Alex Levinshtein, Konstantinos G. Derpanis, Igor Gilitschenski

Figure 1 for Reference-guided Controllable Inpainting of Neural Radiance Fields

Figure 2 for Reference-guided Controllable Inpainting of Neural Radiance Fields

Figure 3 for Reference-guided Controllable Inpainting of Neural Radiance Fields

Figure 4 for Reference-guided Controllable Inpainting of Neural Radiance Fields

Abstract:The popularity of Neural Radiance Fields (NeRFs) for view synthesis has led to a desire for NeRF editing tools. Here, we focus on inpainting regions in a view-consistent and controllable manner. In addition to the typical NeRF inputs and masks delineating the unwanted region in each view, we require only a single inpainted view of the scene, i.e., a reference view. We use monocular depth estimators to back-project the inpainted view to the correct 3D positions. Then, via a novel rendering technique, a bilateral solver can construct view-dependent effects in non-reference views, making the inpainted region appear consistent from any view. For non-reference disoccluded regions, which cannot be supervised by the single reference view, we devise a method based on image inpainters to guide both the geometry and appearance. Our approach shows superior performance to NeRF inpainting baselines, with the additional advantage that a user can control the generated scene via a single inpainted image. Project page: https://ashmrz.github.io/reference-guided-3d

* Project Page: https://ashmrz.github.io/reference-guided-3d

Via

Access Paper or Ask Questions