Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Christopher Metzler

Parametric Shadow Control for Portrait Generationin Text-to-Image Diffusion Models

Mar 27, 2025

Haoming Cai, Tsung-Wei Huang, Shiv Gehlot, Brandon Y. Feng, Sachin Shah, Guan-Ming Su, Christopher Metzler

Abstract:Text-to-image diffusion models excel at generating diverse portraits, but lack intuitive shadow control. Existing editing approaches, as post-processing, struggle to offer effective manipulation across diverse styles. Additionally, these methods either rely on expensive real-world light-stage data collection or require extensive computational resources for training. To address these limitations, we introduce Shadow Director, a method that extracts and manipulates hidden shadow attributes within well-trained diffusion models. Our approach uses a small estimation network that requires only a few thousand synthetic images and hours of training-no costly real-world light-stage data needed. Shadow Director enables parametric and intuitive control over shadow shape, placement, and intensity during portrait generation while preserving artistic integrity and identity across diverse styles. Despite training only on synthetic data built on real-world identities, it generalizes effectively to generated portraits with diverse styles, making it a more accessible and resource-friendly solution.

* ShadowDirector Arxiv Version

Via

Access Paper or Ask Questions

CodedVO: Coded Visual Odometry

Jul 25, 2024

Sachin Shah, Naitri Rajyaguru, Chahat Deep Singh, Christopher Metzler, Yiannis Aloimonos

Figure 1 for CodedVO: Coded Visual Odometry

Figure 2 for CodedVO: Coded Visual Odometry

Figure 3 for CodedVO: Coded Visual Odometry

Figure 4 for CodedVO: Coded Visual Odometry

Abstract:Autonomous robots often rely on monocular cameras for odometry estimation and navigation. However, the scale ambiguity problem presents a critical barrier to effective monocular visual odometry. In this paper, we present CodedVO, a novel monocular visual odometry method that overcomes the scale ambiguity problem by employing custom optics to physically encode metric depth information into imagery. By incorporating this information into our odometry pipeline, we achieve state-of-the-art performance in monocular visual odometry with a known scale. We evaluate our method in diverse indoor environments and demonstrate its robustness and adaptability. We achieve a 0.08m average trajectory error in odometry evaluation on the ICL-NUIM indoor odometry dataset.

* IEEE ROBOTICS AND AUTOMATION LETTERS, 2024
* 7 pages, 4 figures, IEEE ROBOTICS AND AUTOMATION LETTERS

Via

Access Paper or Ask Questions

CodedEvents: Optimal Point-Spread-Function Engineering for 3D-Tracking with Event Cameras

Jun 13, 2024

Sachin Shah, Matthew Albert Chan, Haoming Cai, Jingxi Chen, Sakshum Kulshrestha, Chahat Deep Singh, Yiannis Aloimonos, Christopher Metzler

Abstract:Point-spread-function (PSF) engineering is a well-established computational imaging technique that uses phase masks and other optical elements to embed extra information (e.g., depth) into the images captured by conventional CMOS image sensors. To date, however, PSF-engineering has not been applied to neuromorphic event cameras; a powerful new image sensing technology that responds to changes in the log-intensity of light. This paper establishes theoretical limits (Cram\'er Rao bounds) on 3D point localization and tracking with PSF-engineered event cameras. Using these bounds, we first demonstrate that existing Fisher phase masks are already near-optimal for localizing static flashing point sources (e.g., blinking fluorescent molecules). We then demonstrate that existing designs are sub-optimal for tracking moving point sources and proceed to use our theory to design optimal phase masks and binary amplitude masks for this task. To overcome the non-convexity of the design problem, we leverage novel implicit neural representation based parameterizations of the phase and amplitude masks. We demonstrate the efficacy of our designs through extensive simulations. We also validate our method with a simple prototype.

Via

Access Paper or Ask Questions

Z-Splat: Z-Axis Gaussian Splatting for Camera-Sonar Fusion

Apr 06, 2024

Ziyuan Qu, Omkar Vengurlekar, Mohamad Qadri, Kevin Zhang, Michael Kaess, Christopher Metzler, Suren Jayasuriya, Adithya Pediredla

Figure 1 for Z-Splat: Z-Axis Gaussian Splatting for Camera-Sonar Fusion

Figure 2 for Z-Splat: Z-Axis Gaussian Splatting for Camera-Sonar Fusion

Figure 3 for Z-Splat: Z-Axis Gaussian Splatting for Camera-Sonar Fusion

Figure 4 for Z-Splat: Z-Axis Gaussian Splatting for Camera-Sonar Fusion

Abstract:Differentiable 3D-Gaussian splatting (GS) is emerging as a prominent technique in computer vision and graphics for reconstructing 3D scenes. GS represents a scene as a set of 3D Gaussians with varying opacities and employs a computationally efficient splatting operation along with analytical derivatives to compute the 3D Gaussian parameters given scene images captured from various viewpoints. Unfortunately, capturing surround view ($360^{\circ}$ viewpoint) images is impossible or impractical in many real-world imaging scenarios, including underwater imaging, rooms inside a building, and autonomous navigation. In these restricted baseline imaging scenarios, the GS algorithm suffers from a well-known 'missing cone' problem, which results in poor reconstruction along the depth axis. In this manuscript, we demonstrate that using transient data (from sonars) allows us to address the missing cone problem by sampling high-frequency data along the depth axis. We extend the Gaussian splatting algorithms for two commonly used sonars and propose fusion algorithms that simultaneously utilize RGB camera data and sonar data. Through simulations, emulations, and hardware experiments across various imaging scenarios, we show that the proposed fusion algorithms lead to significantly better novel view synthesis (5 dB improvement in PSNR) and 3D geometry reconstruction (60% lower Chamfer distance).

Via

Access Paper or Ask Questions

TimeRewind: Rewinding Time with Image-and-Events Video Diffusion

Mar 20, 2024

Jingxi Chen, Brandon Y. Feng, Haoming Cai, Mingyang Xie, Christopher Metzler, Cornelia Fermuller, Yiannis Aloimonos

Figure 1 for TimeRewind: Rewinding Time with Image-and-Events Video Diffusion

Figure 2 for TimeRewind: Rewinding Time with Image-and-Events Video Diffusion

Figure 3 for TimeRewind: Rewinding Time with Image-and-Events Video Diffusion

Figure 4 for TimeRewind: Rewinding Time with Image-and-Events Video Diffusion

Abstract:This paper addresses the novel challenge of ``rewinding'' time from a single captured image to recover the fleeting moments missed just before the shutter button is pressed. This problem poses a significant challenge in computer vision and computational photography, as it requires predicting plausible pre-capture motion from a single static frame, an inherently ill-posed task due to the high degree of freedom in potential pixel movements. We overcome this challenge by leveraging the emerging technology of neuromorphic event cameras, which capture motion information with high temporal resolution, and integrating this data with advanced image-to-video diffusion models. Our proposed framework introduces an event motion adaptor conditioned on event camera data, guiding the diffusion model to generate videos that are visually coherent and physically grounded in the captured events. Through extensive experimentation, we demonstrate the capability of our approach to synthesize high-quality videos that effectively ``rewind'' time, showcasing the potential of combining event camera technology with generative models. Our work opens new avenues for research at the intersection of computer vision, computational photography, and generative modeling, offering a forward-thinking solution to capturing missed moments and enhancing future consumer cameras and smartphones. Please see the project page at https://timerewind.github.io/ for video results and code release.

Via

Access Paper or Ask Questions

ConVRT: Consistent Video Restoration Through Turbulence with Test-time Optimization of Neural Video Representations

Dec 07, 2023

Haoming Cai, Jingxi Chen, Brandon Y. Feng, Weiyun Jiang, Mingyang Xie, Kevin Zhang, Ashok Veeraraghavan, Christopher Metzler

Figure 1 for ConVRT: Consistent Video Restoration Through Turbulence with Test-time Optimization of Neural Video Representations

Figure 2 for ConVRT: Consistent Video Restoration Through Turbulence with Test-time Optimization of Neural Video Representations

Figure 3 for ConVRT: Consistent Video Restoration Through Turbulence with Test-time Optimization of Neural Video Representations

Figure 4 for ConVRT: Consistent Video Restoration Through Turbulence with Test-time Optimization of Neural Video Representations

Abstract:tmospheric turbulence presents a significant challenge in long-range imaging. Current restoration algorithms often struggle with temporal inconsistency, as well as limited generalization ability across varying turbulence levels and scene content different than the training data. To tackle these issues, we introduce a self-supervised method, Consistent Video Restoration through Turbulence (ConVRT) a test-time optimization method featuring a neural video representation designed to enhance temporal consistency in restoration. A key innovation of ConVRT is the integration of a pretrained vision-language model (CLIP) for semantic-oriented supervision, which steers the restoration towards sharp, photorealistic images in the CLIP latent space. We further develop a principled selection strategy of text prompts, based on their statistical correlation with a perceptual metric. ConVRT's test-time optimization allows it to adapt to a wide range of real-world turbulence conditions, effectively leveraging the insights gained from pre-trained models on simulated data. ConVRT offers a comprehensive and effective solution for mitigating real-world turbulence in dynamic videos.

* https://convrt-2024.github.io/

Via

Access Paper or Ask Questions

A Scalable Training Strategy for Blind Multi-Distribution Noise Removal

Oct 30, 2023

Kevin Zhang, Sakshum Kulshrestha, Christopher Metzler

Abstract:Despite recent advances, developing general-purpose universal denoising and artifact-removal networks remains largely an open problem: Given fixed network weights, one inherently trades-off specialization at one task (e.g.,~removing Poisson noise) for performance at another (e.g.,~removing speckle noise). In addition, training such a network is challenging due to the curse of dimensionality: As one increases the dimensions of the specification-space (i.e.,~the number of parameters needed to describe the noise distribution) the number of unique specifications one needs to train for grows exponentially. Uniformly sampling this space will result in a network that does well at very challenging problem specifications but poorly at easy problem specifications, where even large errors will have a small effect on the overall mean squared error. In this work we propose training denoising networks using an adaptive-sampling/active-learning strategy. Our work improves upon a recently proposed universal denoiser training strategy by extending these results to higher dimensions and by incorporating a polynomial approximation of the true specification-loss landscape. This approximation allows us to reduce training times by almost two orders of magnitude. We test our method on simulated joint Poisson-Gaussian-Speckle noise and demonstrate that with our proposed training strategy, a single blind, generalist denoiser network can achieve peak signal-to-noise ratios within a uniform bound of specialized denoiser networks across a large range of operating conditions. We also capture a small dataset of images with varying amounts of joint Poisson-Gaussian-Speckle noise and demonstrate that a universal denoiser trained using our adaptive-sampling strategy outperforms uniformly trained baselines.

Via

Access Paper or Ask Questions

Snapshot High Dynamic Range Imaging with a Polarization Camera

Aug 16, 2023

Mingyang Xie, Matthew Chan, Christopher Metzler

Abstract:High dynamic range (HDR) images are important for a range of tasks, from navigation to consumer photography. Accordingly, a host of specialized HDR sensors have been developed, the most successful of which are based on capturing variable per-pixel exposures. In essence, these methods capture an entire exposure bracket sequence at once in a single shot. This paper presents a straightforward but highly effective approach for turning an off-the-shelf polarization camera into a high-performance HDR camera. By placing a linear polarizer in front of the polarization camera, we are able to simultaneously capture four images with varied exposures, which are determined by the orientation of the polarizer. We develop an outlier-robust and self-calibrating algorithm to reconstruct an HDR image (at a single polarity) from these measurements. Finally, we demonstrate the efficacy of our approach with extensive real-world experiments.

* 9 pages, 10 figures

Via

Access Paper or Ask Questions

Seeing the World through Your Eyes

Jun 15, 2023

Hadi Alzayer, Kevin Zhang, Brandon Feng, Christopher Metzler, Jia-Bin Huang

Abstract:The reflective nature of the human eye is an underappreciated source of information about what the world around us looks like. By imaging the eyes of a moving person, we can collect multiple views of a scene outside the camera's direct line of sight through the reflections in the eyes. In this paper, we reconstruct a 3D scene beyond the camera's line of sight using portrait images containing eye reflections. This task is challenging due to 1) the difficulty of accurately estimating eye poses and 2) the entangled appearance of the eye iris and the scene reflections. Our method jointly refines the cornea poses, the radiance field depicting the scene, and the observer's eye iris texture. We further propose a simple regularization prior on the iris texture pattern to improve reconstruction quality. Through various experiments on synthetic and real-world captures featuring people with varied eye colors, we demonstrate the feasibility of our approach to recover 3D scenes using eye reflections.

* Project page: https://world-from-eyes.github.io/

Via

Access Paper or Ask Questions