Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yoonwoo Jeong

RoDyGS: Robust Dynamic Gaussian Splatting for Casual Videos

Dec 04, 2024

Yoonwoo Jeong, Junmyeong Lee, Hoseung Choi, Minsu Cho

Abstract:Dynamic view synthesis (DVS) has advanced remarkably in recent years, achieving high-fidelity rendering while reducing computational costs. Despite the progress, optimizing dynamic neural fields from casual videos remains challenging, as these videos do not provide direct 3D information, such as camera trajectories or the underlying scene geometry. In this work, we present RoDyGS, an optimization pipeline for dynamic Gaussian Splatting from casual videos. It effectively learns motion and underlying geometry of scenes by separating dynamic and static primitives, and ensures that the learned motion and geometry are physically plausible by incorporating motion and geometric regularization terms. We also introduce a comprehensive benchmark, Kubric-MRig, that provides extensive camera and object motion along with simultaneous multi-view captures, features that are absent in previous benchmarks. Experimental results demonstrate that the proposed method significantly outperforms previous pose-free dynamic neural fields and achieves competitive rendering quality compared to existing pose-free static neural fields. The code and data are publicly available at https://rodygs.github.io/.

* Project Page: https://rodygs.github.io/

Via

Access Paper or Ask Questions

NVS-Adapter: Plug-and-Play Novel View Synthesis from a Single Image

Dec 12, 2023

Yoonwoo Jeong, Jinwoo Lee, Chiheon Kim, Minsu Cho, Doyup Lee

Figure 1 for NVS-Adapter: Plug-and-Play Novel View Synthesis from a Single Image

Figure 2 for NVS-Adapter: Plug-and-Play Novel View Synthesis from a Single Image

Figure 3 for NVS-Adapter: Plug-and-Play Novel View Synthesis from a Single Image

Figure 4 for NVS-Adapter: Plug-and-Play Novel View Synthesis from a Single Image

Abstract:Transfer learning of large-scale Text-to-Image (T2I) models has recently shown impressive potential for Novel View Synthesis (NVS) of diverse objects from a single image. While previous methods typically train large models on multi-view datasets for NVS, fine-tuning the whole parameters of T2I models not only demands a high cost but also reduces the generalization capacity of T2I models in generating diverse images in a new domain. In this study, we propose an effective method, dubbed NVS-Adapter, which is a plug-and-play module for a T2I model, to synthesize novel multi-views of visual objects while fully exploiting the generalization capacity of T2I models. NVS-Adapter consists of two main components; view-consistency cross-attention learns the visual correspondences to align the local details of view features, and global semantic conditioning aligns the semantic structure of generated views with the reference view. Experimental results demonstrate that the NVS-Adapter can effectively synthesize geometrically consistent multi-views and also achieve high performance on benchmarks without full fine-tuning of T2I models. The code and data are publicly available in ~\href{https://postech-cvlab.github.io/nvsadapter/}{https://postech-cvlab.github.io/nvsadapter/}.

* Project Page: https://postech-cvlab.github.io/nvsadapter/

Via

Access Paper or Ask Questions

Stable and Consistent Prediction of 3D Characteristic Orientation via Invariant Residual Learning

Jun 20, 2023

Seungwook Kim, Chunghyun Park, Yoonwoo Jeong, Jaesik Park, Minsu Cho

Figure 1 for Stable and Consistent Prediction of 3D Characteristic Orientation via Invariant Residual Learning

Figure 2 for Stable and Consistent Prediction of 3D Characteristic Orientation via Invariant Residual Learning

Figure 3 for Stable and Consistent Prediction of 3D Characteristic Orientation via Invariant Residual Learning

Figure 4 for Stable and Consistent Prediction of 3D Characteristic Orientation via Invariant Residual Learning

Abstract:Learning to predict reliable characteristic orientations of 3D point clouds is an important yet challenging problem, as different point clouds of the same class may have largely varying appearances. In this work, we introduce a novel method to decouple the shape geometry and semantics of the input point cloud to achieve both stability and consistency. The proposed method integrates shape-geometry-based SO(3)-equivariant learning and shape-semantics-based SO(3)-invariant residual learning, where a final characteristic orientation is obtained by calibrating an SO(3)-equivariant orientation hypothesis using an SO(3)-invariant residual rotation. In experiments, the proposed method not only demonstrates superior stability and consistency but also exhibits state-of-the-art performances when applied to point cloud part segmentation, given randomly rotated inputs.

* Accepted to ICML 2023

Via

Access Paper or Ask Questions

PeRFception: Perception using Radiance Fields

Aug 24, 2022

Yoonwoo Jeong, Seungjoo Shin, Junha Lee, Christopher Choy, Animashree Anandkumar, Minsu Cho, Jaesik Park

Figure 1 for PeRFception: Perception using Radiance Fields

Figure 2 for PeRFception: Perception using Radiance Fields

Figure 3 for PeRFception: Perception using Radiance Fields

Figure 4 for PeRFception: Perception using Radiance Fields

Abstract:The recent progress in implicit 3D representation, i.e., Neural Radiance Fields (NeRFs), has made accurate and photorealistic 3D reconstruction possible in a differentiable manner. This new representation can effectively convey the information of hundreds of high-resolution images in one compact format and allows photorealistic synthesis of novel views. In this work, using the variant of NeRF called Plenoxels, we create the first large-scale implicit representation datasets for perception tasks, called the PeRFception, which consists of two parts that incorporate both object-centric and scene-centric scans for classification and segmentation. It shows a significant memory compression rate (96.4\%) from the original dataset, while containing both 2D and 3D information in a unified form. We construct the classification and segmentation models that directly take as input this implicit format and also propose a novel augmentation technique to avoid overfitting on backgrounds of images. The code and data are publicly available in https://postech-cvlab.github.io/PeRFception .

* Project Page: https://postech-cvlab.github.io/PeRFception/

Via

Access Paper or Ask Questions

Self-Supervised Learning of Image Scale and Orientation

Jun 15, 2022

Jongmin Lee, Yoonwoo Jeong, Minsu Cho

Figure 1 for Self-Supervised Learning of Image Scale and Orientation

Figure 2 for Self-Supervised Learning of Image Scale and Orientation

Figure 3 for Self-Supervised Learning of Image Scale and Orientation

Figure 4 for Self-Supervised Learning of Image Scale and Orientation

Abstract:We study the problem of learning to assign a characteristic pose, i.e., scale and orientation, for an image region of interest. Despite its apparent simplicity, the problem is non-trivial; it is hard to obtain a large-scale set of image regions with explicit pose annotations that a model directly learns from. To tackle the issue, we propose a self-supervised learning framework with a histogram alignment technique. It generates pairs of image patches by random rescaling/rotating and then train an estimator to predict their scale/orientation values so that their relative difference is consistent with the rescaling/rotating used. The estimator learns to predict a non-parametric histogram distribution of scale/orientation without any supervision. Experiments show that it significantly outperforms previous methods in scale/orientation estimation and also improves image matching and 6 DoF camera pose estimation by incorporating our patch poses into a matching process.

* Presented in BMVC 2021, code is available on https://github.com/bluedream1121/self-sca-ori

Via

Access Paper or Ask Questions

Fast Point Transformer

Dec 09, 2021

Chunghyun Park, Yoonwoo Jeong, Minsu Cho, Jaesik Park

Abstract:The recent success of neural networks enables a better interpretation of 3D point clouds, but processing a large-scale 3D scene remains a challenging problem. Most current approaches divide a large-scale scene into small regions and combine the local predictions together. However, this scheme inevitably involves additional stages for pre- and post-processing and may also degrade the final output due to predictions in a local perspective. This paper introduces Fast Point Transformer that consists of a new lightweight self-attention layer. Our approach encodes continuous 3D coordinates, and the voxel hashing-based architecture boosts computational efficiency. The proposed method is demonstrated with 3D semantic segmentation and 3D detection. The accuracy of our approach is competitive to the best voxel-based method, and our network achieves 136 times faster inference time than the state-of-the-art, Point Transformer, with a reasonable accuracy trade-off.

* 16 pages, 8 figures

Via

Access Paper or Ask Questions

Self-Calibrating Neural Radiance Fields

Sep 02, 2021

Yoonwoo Jeong, Seokjun Ahn, Christopher Choy, Animashree Anandkumar, Minsu Cho, Jaesik Park

Figure 1 for Self-Calibrating Neural Radiance Fields

Figure 2 for Self-Calibrating Neural Radiance Fields

Figure 3 for Self-Calibrating Neural Radiance Fields

Figure 4 for Self-Calibrating Neural Radiance Fields

Abstract:In this work, we propose a camera self-calibration algorithm for generic cameras with arbitrary non-linear distortions. We jointly learn the geometry of the scene and the accurate camera parameters without any calibration objects. Our camera model consists of a pinhole model, a fourth order radial distortion, and a generic noise model that can learn arbitrary non-linear camera distortions. While traditional self-calibration algorithms mostly rely on geometric constraints, we additionally incorporate photometric consistency. This requires learning the geometry of the scene, and we use Neural Radiance Fields (NeRF). We also propose a new geometric loss function, viz., projected ray distance loss, to incorporate geometric consistency for complex non-linear camera models. We validate our approach on standard real image datasets and demonstrate that our model can learn the camera intrinsics and extrinsics (pose) from scratch without COLMAP initialization. Also, we show that learning accurate camera models in a differentiable manner allows us to improve PSNR over baselines. Our module is an easy-to-use plugin that can be applied to NeRF variants to improve performance. The code and data are currently available at https://github.com/POSTECH-CVLab/SCNeRF.

* Accepted in ICCV21, Project Page: https://postech-cvlab.github.io/SCNeRF/

Via

Access Paper or Ask Questions