Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Subin Jeon

Representing 3D Shapes With 64 Latent Vectors for 3D Diffusion Models

Mar 11, 2025

In Cho, Youngbeom Yoo, Subin Jeon, Seon Joo Kim

Abstract:Constructing a compressed latent space through a variational autoencoder (VAE) is the key for efficient 3D diffusion models. This paper introduces COD-VAE, a VAE that encodes 3D shapes into a COmpact set of 1D latent vectors without sacrificing quality. COD-VAE introduces a two-stage autoencoder scheme to improve compression and decoding efficiency. First, our encoder block progressively compresses point clouds into compact latent vectors via intermediate point patches. Second, our triplane-based decoder reconstructs dense triplanes from latent vectors instead of directly decoding neural fields, significantly reducing computational overhead of neural fields decoding. Finally, we propose uncertainty-guided token pruning, which allocates resources adaptively by skipping computations in simpler regions and improves the decoder efficiency. Experimental results demonstrate that COD-VAE achieves 16x compression compared to the baseline while maintaining quality. This enables 20.8x speedup in generation, highlighting that a large number of latent vectors is not a prerequisite for high-quality reconstruction and generation.

Via

Access Paper or Ask Questions

Hierarchically Structured Neural Bones for Reconstructing Animatable Objects from Casual Videos

Aug 01, 2024

Subin Jeon, In Cho, Minsu Kim, Woong Oh Cho, Seon Joo Kim

Figure 1 for Hierarchically Structured Neural Bones for Reconstructing Animatable Objects from Casual Videos

Figure 2 for Hierarchically Structured Neural Bones for Reconstructing Animatable Objects from Casual Videos

Figure 3 for Hierarchically Structured Neural Bones for Reconstructing Animatable Objects from Casual Videos

Figure 4 for Hierarchically Structured Neural Bones for Reconstructing Animatable Objects from Casual Videos

Abstract:We propose a new framework for creating and easily manipulating 3D models of arbitrary objects using casually captured videos. Our core ingredient is a novel hierarchy deformation model, which captures motions of objects with a tree-structured bones. Our hierarchy system decomposes motions based on the granularity and reveals the correlations between parts without exploiting any prior structural knowledge. We further propose to regularize the bones to be positioned at the basis of motions, centers of parts, sufficiently covering related surfaces of the part. This is achieved by our bone occupancy function, which identifies whether a given 3D point is placed within the bone. Coupling the proposed components, our framework offers several clear advantages: (1) users can obtain animatable 3D models of the arbitrary objects in improved quality from their casual videos, (2) users can manipulate 3D models in an intuitive manner with minimal costs, and (3) users can interactively add or delete control points as necessary. The experimental results demonstrate the efficacy of our framework on diverse instances, in reconstruction quality, interpretability and easier manipulation. Our code is available at https://github.com/subin6/HSNB.

* ECCV 2024 accepted

Via

Access Paper or Ask Questions

Cross-Identity Motion Transfer for Arbitrary Objects through Pose-Attentive Video Reassembling

Jul 17, 2020

Subin Jeon, Seonghyeon Nam, Seoung Wug Oh, Seon Joo Kim

Figure 1 for Cross-Identity Motion Transfer for Arbitrary Objects through Pose-Attentive Video Reassembling

Figure 2 for Cross-Identity Motion Transfer for Arbitrary Objects through Pose-Attentive Video Reassembling

Figure 3 for Cross-Identity Motion Transfer for Arbitrary Objects through Pose-Attentive Video Reassembling

Figure 4 for Cross-Identity Motion Transfer for Arbitrary Objects through Pose-Attentive Video Reassembling

Abstract:We propose an attention-based networks for transferring motions between arbitrary objects. Given a source image(s) and a driving video, our networks animate the subject in the source images according to the motion in the driving video. In our attention mechanism, dense similarities between the learned keypoints in the source and the driving images are computed in order to retrieve the appearance information from the source images. Taking a different approach from the well-studied warping based models, our attention-based model has several advantages. By reassembling non-locally searched pieces from the source contents, our approach can produce more realistic outputs. Furthermore, our system can make use of multiple observations of the source appearance (e.g. front and sides of faces) to make the results more accurate. To reduce the training-testing discrepancy of the self-supervised learning, a novel cross-identity training scheme is additionally introduced. With the training scheme, our networks is trained to transfer motions between different subjects, as in the real testing scenario. Experimental results validate that our method produces visually pleasing results in various object domains, showing better performances compared to previous works.

* ECCV 2020

Via

Access Paper or Ask Questions