Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kim Jun-Seong

Dr. Splat: Directly Referring 3D Gaussian Splatting via Direct Language Embedding Registration

Feb 23, 2025

Kim Jun-Seong, GeonU Kim, Kim Yu-Ji, Yu-Chiang Frank Wang, Jaesung Choe, Tae-Hyun Oh

Abstract:We introduce Dr. Splat, a novel approach for open-vocabulary 3D scene understanding leveraging 3D Gaussian Splatting. Unlike existing language-embedded 3DGS methods, which rely on a rendering process, our method directly associates language-aligned CLIP embeddings with 3D Gaussians for holistic 3D scene understanding. The key of our method is a language feature registration technique where CLIP embeddings are assigned to the dominant Gaussians intersected by each pixel-ray. Moreover, we integrate Product Quantization (PQ) trained on general large-scale image data to compactly represent embeddings without per-scene optimization. Experiments demonstrate that our approach significantly outperforms existing approaches in 3D perception benchmarks, such as open-vocabulary 3D semantic segmentation, 3D object localization, and 3D object selection tasks. For video results, please visit : https://drsplat.github.io/

* 20 pages

Via

Access Paper or Ask Questions

SoundBrush: Sound as a Brush for Visual Scene Editing

Dec 31, 2024

Kim Sung-Bin, Kim Jun-Seong, Junseok Ko, Yewon Kim, Tae-Hyun Oh

Figure 1 for SoundBrush: Sound as a Brush for Visual Scene Editing

Figure 2 for SoundBrush: Sound as a Brush for Visual Scene Editing

Figure 3 for SoundBrush: Sound as a Brush for Visual Scene Editing

Figure 4 for SoundBrush: Sound as a Brush for Visual Scene Editing

Abstract:We propose SoundBrush, a model that uses sound as a brush to edit and manipulate visual scenes. We extend the generative capabilities of the Latent Diffusion Model (LDM) to incorporate audio information for editing visual scenes. Inspired by existing image-editing works, we frame this task as a supervised learning problem and leverage various off-the-shelf models to construct a sound-paired visual scene dataset for training. This richly generated dataset enables SoundBrush to learn to map audio features into the textual space of the LDM, allowing for visual scene editing guided by diverse in-the-wild sound. Unlike existing methods, SoundBrush can accurately manipulate the overall scenery or even insert sounding objects to best match the audio inputs while preserving the original content. Furthermore, by integrating with novel view synthesis techniques, our framework can be extended to edit 3D scenes, facilitating sound-driven 3D scene manipulation. Demos are available at https://soundbrush.github.io/.

* AAAI 2025

Via

Access Paper or Ask Questions

Revisiting Learning-based Video Motion Magnification for Real-time Processing

Mar 04, 2024

Hyunwoo Ha, Oh Hyun-Bin, Kim Jun-Seong, Kwon Byung-Ki, Kim Sung-Bin, Linh-Tam Tran, Ji-Yun Kim, Sung-Ho Bae, Tae-Hyun Oh

Figure 1 for Revisiting Learning-based Video Motion Magnification for Real-time Processing

Figure 2 for Revisiting Learning-based Video Motion Magnification for Real-time Processing

Figure 3 for Revisiting Learning-based Video Motion Magnification for Real-time Processing

Figure 4 for Revisiting Learning-based Video Motion Magnification for Real-time Processing

Abstract:Video motion magnification is a technique to capture and amplify subtle motion in a video that is invisible to the naked eye. The deep learning-based prior work successfully demonstrates the modelling of the motion magnification problem with outstanding quality compared to conventional signal processing-based ones. However, it still lags behind real-time performance, which prevents it from being extended to various online applications. In this paper, we investigate an efficient deep learning-based motion magnification model that runs in real time for full-HD resolution videos. Due to the specified network design of the prior art, i.e. inhomogeneous architecture, the direct application of existing neural architecture search methods is complicated. Instead of automatic search, we carefully investigate the architecture module by module for its role and importance in the motion magnification task. Two key findings are 1) Reducing the spatial resolution of the latent motion representation in the decoder provides a good trade-off between computational efficiency and task quality, and 2) surprisingly, only a single linear layer and a single branch in the encoder are sufficient for the motion magnification task. Based on these findings, we introduce a real-time deep learning-based motion magnification model with4.2X fewer FLOPs and is 2.7X faster than the prior art while maintaining comparable quality.

* 19 pages

Via

Access Paper or Ask Questions

Learning-based Axial Motion Magnification

Dec 15, 2023

Kwon Byung-Ki, Oh Hyun-Bin, Kim Jun-Seong, Tae-Hyun Oh

Abstract:Video motion magnification amplifies invisible small motions to be perceptible, which provides humans with spatially dense and holistic understanding about small motions from the scene of interest. This is based on the premise that magnifying small motions enhances the legibility of the motion. In the real world, however, vibrating objects often possess complex systems, having complex natural frequencies, modes, and directions. Existing motion magnification often fails to improve the legibility since the intricate motions still retain complex characteristics even when magnified, which distracts us from analyzing them. In this work, we focus on improving the legibility by proposing a new concept, axial motion magnification, which magnifies decomposed motions along the user-specified direction. Axial motion magnification can be applied to various applications where motions of specific axes are critical, by providing simplified and easily readable motion information. We propose a novel learning-based axial motion magnification method with the Motion Separation Module that enables to disentangle and magnify the motion representation along axes of interest. Further, we build a new synthetic training dataset for the axial motion magnification task. Our proposed method improves the legibility of resulting motions along certain axes, while adding additional user controllability. Our method can be directly adopted to the generic motion magnification and achieves favorable performance against competing methods. Our project page is available at https://axial-momag.github.io/axial-momag/.

* main paper: 10 pages, supplementary: 4 pages, 17 figures, 1 table

Via

Access Paper or Ask Questions

HDR-Plenoxels: Self-Calibrating High Dynamic Range Radiance Fields

Aug 14, 2022

Kim Jun-Seong, Kim Yu-Ji, Moon Ye-Bin, Tae-Hyun Oh

Figure 1 for HDR-Plenoxels: Self-Calibrating High Dynamic Range Radiance Fields

Figure 2 for HDR-Plenoxels: Self-Calibrating High Dynamic Range Radiance Fields

Figure 3 for HDR-Plenoxels: Self-Calibrating High Dynamic Range Radiance Fields

Figure 4 for HDR-Plenoxels: Self-Calibrating High Dynamic Range Radiance Fields

Abstract:We propose high dynamic range radiance (HDR) fields, HDR-Plenoxels, that learn a plenoptic function of 3D HDR radiance fields, geometry information, and varying camera settings inherent in 2D low dynamic range (LDR) images. Our voxel-based volume rendering pipeline reconstructs HDR radiance fields with only multi-view LDR images taken from varying camera settings in an end-to-end manner and has a fast convergence speed. To deal with various cameras in real-world scenarios, we introduce a tone mapping module that models the digital in-camera imaging pipeline (ISP) and disentangles radiometric settings. Our tone mapping module allows us to render by controlling the radiometric settings of each novel view. Finally, we build a multi-view dataset with varying camera conditions, which fits our problem setting. Our experiments show that HDR-Plenoxels can express detail and high-quality HDR novel views from only LDR images with various cameras.

* Accepted at ECCV 2022

Via

Access Paper or Ask Questions