Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Seungwook Kim

Harnessing the Power of Training-Free Techniques in Text-to-2D Generation for Text-to-3D Generation via Score Distillation Sampling

May 26, 2025

Junhong Lee, Seungwook Kim, Minsu Cho

Abstract:Recent studies show that simple training-free techniques can dramatically improve the quality of text-to-2D generation outputs, e.g. Classifier-Free Guidance (CFG) or FreeU. However, these training-free techniques have been underexplored in the lens of Score Distillation Sampling (SDS), which is a popular and effective technique to leverage the power of pretrained text-to-2D diffusion models for various tasks. In this paper, we aim to shed light on the effect such training-free techniques have on SDS, via a particular application of text-to-3D generation via 2D lifting. We present our findings, which show that varying the scales of CFG presents a trade-off between object size and surface smoothness, while varying the scales of FreeU presents a trade-off between texture details and geometric errors. Based on these findings, we provide insights into how we can effectively harness training-free techniques for SDS, via a strategic scaling of such techniques in a dynamic manner with respect to the timestep or optimization iteration step. We show that using our proposed scheme strikes a favorable balance between texture details and surface smoothness in text-to-3D generations, while preserving the size of the output and mitigating the occurrence of geometric defects.

Via

Access Paper or Ask Questions

3D Geometric Shape Assembly via Efficient Point Cloud Matching

Jul 15, 2024

Nahyuk Lee, Juhong Min, Junha Lee, Seungwook Kim, Kanghee Lee, Jaesik Park, Minsu Cho

Figure 1 for 3D Geometric Shape Assembly via Efficient Point Cloud Matching

Figure 2 for 3D Geometric Shape Assembly via Efficient Point Cloud Matching

Figure 3 for 3D Geometric Shape Assembly via Efficient Point Cloud Matching

Figure 4 for 3D Geometric Shape Assembly via Efficient Point Cloud Matching

Abstract:Learning to assemble geometric shapes into a larger target structure is a pivotal task in various practical applications. In this work, we tackle this problem by establishing local correspondences between point clouds of part shapes in both coarse- and fine-levels. To this end, we introduce Proxy Match Transform (PMT), an approximate high-order feature transform layer that enables reliable matching between mating surfaces of parts while incurring low costs in memory and computation. Building upon PMT, we introduce a new framework, dubbed Proxy Match TransformeR (PMTR), for the geometric assembly task. We evaluate the proposed PMTR on the large-scale 3D geometric shape assembly benchmark dataset of Breaking Bad and demonstrate its superior performance and efficiency compared to state-of-the-art methods. Project page: https://nahyuklee.github.io/pmtr.

* Accepted to ICML 2024

Via

Access Paper or Ask Questions

Multi-view Image Prompted Multi-view Diffusion for Improved 3D Generation

Apr 26, 2024

Seungwook Kim, Yichun Shi, Kejie Li, Minsu Cho, Peng Wang

Abstract:Using image as prompts for 3D generation demonstrate particularly strong performances compared to using text prompts alone, for images provide a more intuitive guidance for the 3D generation process. In this work, we delve into the potential of using multiple image prompts, instead of a single image prompt, for 3D generation. Specifically, we build on ImageDream, a novel image-prompt multi-view diffusion model, to support multi-view images as the input prompt. Our method, dubbed MultiImageDream, reveals that transitioning from a single-image prompt to multiple-image prompts enhances the performance of multi-view and 3D object generation according to various quantitative evaluation metrics and qualitative assessments. This advancement is achieved without the necessity of fine-tuning the pre-trained ImageDream multi-view diffusion model.

* 5 pages including references, 2 figures, 2 tables

Via

Access Paper or Ask Questions

Enhancing 3D Fidelity of Text-to-3D using Cross-View Correspondences

Apr 16, 2024

Seungwook Kim, Kejie Li, Xueqing Deng, Yichun Shi, Minsu Cho, Peng Wang

Abstract:Leveraging multi-view diffusion models as priors for 3D optimization have alleviated the problem of 3D consistency, e.g., the Janus face problem or the content drift problem, in zero-shot text-to-3D models. However, the 3D geometric fidelity of the output remains an unresolved issue; albeit the rendered 2D views are realistic, the underlying geometry may contain errors such as unreasonable concavities. In this work, we propose CorrespondentDream, an effective method to leverage annotation-free, cross-view correspondences yielded from the diffusion U-Net to provide additional 3D prior to the NeRF optimization process. We find that these correspondences are strongly consistent with human perception, and by adopting it in our loss design, we are able to produce NeRF models with geometries that are more coherent with common sense, e.g., more smoothed object surface, yielding higher 3D fidelity. We demonstrate the efficacy of our approach through various comparative qualitative results and a solid user study.

* 25 pages, 22 figures, accepted to CVPR 2024

Via

Access Paper or Ask Questions

Efficient Semantic Matching with Hypercolumn Correlation

Nov 07, 2023

Seungwook Kim, Juhong Min, Minsu Cho

Abstract:Recent studies show that leveraging the match-wise relationships within the 4D correlation map yields significant improvements in establishing semantic correspondences - but at the cost of increased computation and latency. In this work, we focus on the aspect that the performance improvements of recent methods can also largely be attributed to the usage of multi-scale correlation maps, which hold various information ranging from low-level geometric cues to high-level semantic contexts. To this end, we propose HCCNet, an efficient yet effective semantic matching method which exploits the full potential of multi-scale correlation maps, while eschewing the reliance on expensive match-wise relationship mining on the 4D correlation map. Specifically, HCCNet performs feature slicing on the bottleneck features to yield a richer set of intermediate features, which are used to construct a hypercolumn correlation. HCCNet can consequently establish semantic correspondences in an effective manner by reducing the volume of conventional high-dimensional convolution or self-attention operations to efficient point-wise convolutions. HCCNet demonstrates state-of-the-art or competitive performances on the standard benchmarks of semantic matching, while incurring a notably lower latency and computation overhead compared to the existing SoTA methods.

* Accepted to WACV 2024. 17 pages including references and supplementary

Via

Access Paper or Ask Questions

Stable and Consistent Prediction of 3D Characteristic Orientation via Invariant Residual Learning

Jun 20, 2023

Seungwook Kim, Chunghyun Park, Yoonwoo Jeong, Jaesik Park, Minsu Cho

Figure 1 for Stable and Consistent Prediction of 3D Characteristic Orientation via Invariant Residual Learning

Figure 2 for Stable and Consistent Prediction of 3D Characteristic Orientation via Invariant Residual Learning

Figure 3 for Stable and Consistent Prediction of 3D Characteristic Orientation via Invariant Residual Learning

Figure 4 for Stable and Consistent Prediction of 3D Characteristic Orientation via Invariant Residual Learning

Abstract:Learning to predict reliable characteristic orientations of 3D point clouds is an important yet challenging problem, as different point clouds of the same class may have largely varying appearances. In this work, we introduce a novel method to decouple the shape geometry and semantics of the input point cloud to achieve both stability and consistency. The proposed method integrates shape-geometry-based SO(3)-equivariant learning and shape-semantics-based SO(3)-invariant residual learning, where a final characteristic orientation is obtained by calibrating an SO(3)-equivariant orientation hypothesis using an SO(3)-invariant residual rotation. In experiments, the proposed method not only demonstrates superior stability and consistency but also exhibits state-of-the-art performances when applied to point cloud part segmentation, given randomly rotated inputs.

* Accepted to ICML 2023

Via

Access Paper or Ask Questions

Learning Rotation-Equivariant Features for Visual Correspondence

Mar 25, 2023

Jongmin Lee, Byungjin Kim, Seungwook Kim, Minsu Cho

Abstract:Extracting discriminative local features that are invariant to imaging variations is an integral part of establishing correspondences between images. In this work, we introduce a self-supervised learning framework to extract discriminative rotation-invariant descriptors using group-equivariant CNNs. Thanks to employing group-equivariant CNNs, our method effectively learns to obtain rotation-equivariant features and their orientations explicitly, without having to perform sophisticated data augmentations. The resultant features and their orientations are further processed by group aligning, a novel invariant mapping technique that shifts the group-equivariant features by their orientations along the group dimension. Our group aligning technique achieves rotation-invariance without any collapse of the group dimension and thus eschews loss of discriminability. The proposed method is trained end-to-end in a self-supervised manner, where we use an orientation alignment loss for the orientation estimation and a contrastive descriptor loss for robust local descriptors to geometric/photometric variations. Our method demonstrates state-of-the-art matching accuracy among existing rotation-invariant descriptors under varying rotation and also shows competitive results when transferred to the task of keypoint matching and camera pose estimation.

* Accepted to CVPR 2023, Project webpage at http://cvlab.postech.ac.kr/research/RELF

Via

Access Paper or Ask Questions

TransforMatcher: Match-to-Match Attention for Semantic Correspondence

May 23, 2022

Seungwook Kim, Juhong Min, Minsu Cho

Figure 1 for TransforMatcher: Match-to-Match Attention for Semantic Correspondence

Figure 2 for TransforMatcher: Match-to-Match Attention for Semantic Correspondence

Figure 3 for TransforMatcher: Match-to-Match Attention for Semantic Correspondence

Figure 4 for TransforMatcher: Match-to-Match Attention for Semantic Correspondence

Abstract:Establishing correspondences between images remains a challenging task, especially under large appearance changes due to different viewpoints or intra-class variations. In this work, we introduce a strong semantic image matching learner, dubbed TransforMatcher, which builds on the success of transformer networks in vision domains. Unlike existing convolution- or attention-based schemes for correspondence, TransforMatcher performs global match-to-match attention for precise match localization and dynamic refinement. To handle a large number of matches in a dense correlation map, we develop a light-weight attention architecture to consider the global match-to-match interactions. We also propose to utilize a multi-channel correlation map for refinement, treating the multi-level scores as features instead of a single score to fully exploit the richer layer-wise semantics. In experiments, TransforMatcher sets a new state of the art on SPair-71k while performing on par with existing SOTA methods on the PF-PASCAL dataset.

* Accepted to CVPR 2022 (poster presentation)

Via

Access Paper or Ask Questions

Convolutional Hough Matching Networks for Robust and Efficient Visual Correspondence

Sep 11, 2021

Juhong Min, Seungwook Kim, Minsu Cho

Figure 1 for Convolutional Hough Matching Networks for Robust and Efficient Visual Correspondence

Figure 2 for Convolutional Hough Matching Networks for Robust and Efficient Visual Correspondence

Figure 3 for Convolutional Hough Matching Networks for Robust and Efficient Visual Correspondence

Figure 4 for Convolutional Hough Matching Networks for Robust and Efficient Visual Correspondence

Abstract:Despite advances in feature representation, leveraging geometric relations is crucial for establishing reliable visual correspondences under large variations of images. In this work we introduce a Hough transform perspective on convolutional matching and propose an effective geometric matching algorithm, dubbed Convolutional Hough Matching (CHM). The method distributes similarities of candidate matches over a geometric transformation space and evaluates them in a convolutional manner. We cast it into a trainable neural layer with a semi-isotropic high-dimensional kernel, which learns non-rigid matching with a small number of interpretable parameters. To further improve the efficiency of high-dimensional voting, we also propose to use an efficient kernel decomposition with center-pivot neighbors, which significantly sparsifies the proposed semi-isotropic kernels without performance degradation. To validate the proposed techniques, we develop the neural network with CHM layers that perform convolutional matching in the space of translation and scaling. Our method sets a new state of the art on standard benchmarks for semantic visual correspondence, proving its strong robustness to challenging intra-class variations.

* submitted to TPAMI. arXiv admin note: substantial text overlap with arXiv:2103.16831

Via

Access Paper or Ask Questions

Deep Hough Voting for Robust Global Registration

Sep 09, 2021

Junha Lee, Seungwook Kim, Minsu Cho, Jaesik Park

Figure 1 for Deep Hough Voting for Robust Global Registration

Figure 2 for Deep Hough Voting for Robust Global Registration

Figure 3 for Deep Hough Voting for Robust Global Registration

Figure 4 for Deep Hough Voting for Robust Global Registration

Abstract:Point cloud registration is the task of estimating the rigid transformation that aligns a pair of point cloud fragments. We present an efficient and robust framework for pairwise registration of real-world 3D scans, leveraging Hough voting in the 6D transformation parameter space. First, deep geometric features are extracted from a point cloud pair to compute putative correspondences. We then construct a set of triplets of correspondences to cast votes on the 6D Hough space, representing the transformation parameters in sparse tensors. Next, a fully convolutional refinement module is applied to refine the noisy votes. Finally, we identify the consensus among the correspondences from the Hough space, which we use to predict our final transformation parameters. Our method outperforms state-of-the-art methods on 3DMatch and 3DLoMatch benchmarks while achieving comparable performance on KITTI odometry dataset. We further demonstrate the generalizability of our approach by setting a new state-of-the-art on ICL-NUIM dataset, where we integrate our module into a multi-way registration pipeline.

* Accepted to ICCV 2021

Via

Access Paper or Ask Questions