Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Joonki Paik

Consistent Zero-shot 3D Texture Synthesis Using Geometry-aware Diffusion and Temporal Video Models

Jun 26, 2025

Donggoo Kang, Jangyeong Kim, Dasol Jeong, Junyoung Choi, Jeonga Wi, Hyunmin Lee, Joonho Gwon, Joonki Paik

Abstract:Current texture synthesis methods, which generate textures from fixed viewpoints, suffer from inconsistencies due to the lack of global context and geometric understanding. Meanwhile, recent advancements in video generation models have demonstrated remarkable success in achieving temporally consistent videos. In this paper, we introduce VideoTex, a novel framework for seamless texture synthesis that leverages video generation models to address both spatial and temporal inconsistencies in 3D textures. Our approach incorporates geometry-aware conditions, enabling precise utilization of 3D mesh structures. Additionally, we propose a structure-wise UV diffusion strategy, which enhances the generation of occluded areas by preserving semantic information, resulting in smoother and more coherent textures. VideoTex not only achieves smoother transitions across UV boundaries but also ensures high-quality, temporally stable textures across video frames. Extensive experiments demonstrate that VideoTex outperforms existing methods in texture fidelity, seam blending, and stability, paving the way for dynamic real-time applications that demand both visual quality and temporal coherence.

Via

Access Paper or Ask Questions

Structure-Preserving Zero-Shot Image Editing via Stage-Wise Latent Injection in Diffusion Models

Apr 22, 2025

Dasol Jeong, Donggoo Kang, Jiwon Park, Hyebean Lee, Joonki Paik

Abstract:We propose a diffusion-based framework for zero-shot image editing that unifies text-guided and reference-guided approaches without requiring fine-tuning. Our method leverages diffusion inversion and timestep-specific null-text embeddings to preserve the structural integrity of the source image. By introducing a stage-wise latent injection strategy-shape injection in early steps and attribute injection in later steps-we enable precise, fine-grained modifications while maintaining global consistency. Cross-attention with reference latents facilitates semantic alignment between the source and reference. Extensive experiments across expression transfer, texture transformation, and style infusion demonstrate state-of-the-art performance, confirming the method's scalability and adaptability to diverse image editing scenarios.

Via

Access Paper or Ask Questions

VLM-HOI: Vision Language Models for Interpretable Human-Object Interaction Analysis

Nov 27, 2024

Donggoo Kang, Dasol Jeong, Hyunmin Lee, Sangwoo Park, Hasil Park, Sunkyu Kwon, Yeongjoon Kim, Joonki Paik

Figure 1 for VLM-HOI: Vision Language Models for Interpretable Human-Object Interaction Analysis

Figure 2 for VLM-HOI: Vision Language Models for Interpretable Human-Object Interaction Analysis

Figure 3 for VLM-HOI: Vision Language Models for Interpretable Human-Object Interaction Analysis

Figure 4 for VLM-HOI: Vision Language Models for Interpretable Human-Object Interaction Analysis

Abstract:The Large Vision Language Model (VLM) has recently addressed remarkable progress in bridging two fundamental modalities. VLM, trained by a sufficiently large dataset, exhibits a comprehensive understanding of both visual and linguistic to perform diverse tasks. To distill this knowledge accurately, in this paper, we introduce a novel approach that explicitly utilizes VLM as an objective function form for the Human-Object Interaction (HOI) detection task (\textbf{VLM-HOI}). Specifically, we propose a method that quantifies the similarity of the predicted HOI triplet using the Image-Text matching technique. We represent HOI triplets linguistically to fully utilize the language comprehension of VLMs, which are more suitable than CLIP models due to their localization and object-centric nature. This matching score is used as an objective for contrastive optimization. To our knowledge, this is the first utilization of VLM language abilities for HOI detection. Experiments demonstrate the effectiveness of our method, achieving state-of-the-art HOI detection accuracy on benchmarks. We believe integrating VLMs into HOI detection represents important progress towards more advanced and interpretable analysis of human-object interactions.

* 18 pages

Via

Access Paper or Ask Questions

LEAP:D -- A Novel Prompt-based Approach for Domain-Generalized Aerial Object Detection

Nov 14, 2024

Chanyeong Park, Heegwang Kim, Joonki Paik

Figure 1 for LEAP:D -- A Novel Prompt-based Approach for Domain-Generalized Aerial Object Detection

Figure 2 for LEAP:D -- A Novel Prompt-based Approach for Domain-Generalized Aerial Object Detection

Figure 3 for LEAP:D -- A Novel Prompt-based Approach for Domain-Generalized Aerial Object Detection

Figure 4 for LEAP:D -- A Novel Prompt-based Approach for Domain-Generalized Aerial Object Detection

Abstract:Drone-captured images present significant challenges in object detection due to varying shooting conditions, which can alter object appearance and shape. Factors such as drone altitude, angle, and weather cause these variations, influencing the performance of object detection algorithms. To tackle these challenges, we introduce an innovative vision-language approach using learnable prompts. This shift from conventional manual prompts aims to reduce domain-specific knowledge interference, ultimately improving object detection capabilities. Furthermore, we streamline the training process with a one-step approach, updating the learnable prompt concurrently with model training, enhancing efficiency without compromising performance. Our study contributes to domain-generalized object detection by leveraging learnable prompts and optimizing training processes. This enhances model robustness and adaptability across diverse environments, leading to more effective aerial object detection.

* ICIP 2024 Workshop accepted paper

Via

Access Paper or Ask Questions

PU-EdgeFormer: Edge Transformer for Dense Prediction in Point Cloud Upsampling

May 02, 2023

Dohoon Kim, Minwoo Shin, Joonki Paik

Abstract:Despite the recent development of deep learning-based point cloud upsampling, most MLP-based point cloud upsampling methods have limitations in that it is difficult to train the local and global structure of the point cloud at the same time. To solve this problem, we present a combined graph convolution and transformer for point cloud upsampling, denoted by PU-EdgeFormer. The proposed method constructs EdgeFormer unit that consists of graph convolution and multi-head self-attention modules. We employ graph convolution using EdgeConv, which learns the local geometry and global structure of point cloud better than existing point-to-feature method. Through in-depth experiments, we confirmed that the proposed method has better point cloud upsampling performance than the existing state-of-the-art method in both subjective and objective aspects. The code is available at https://github.com/dohoon2045/PU-EdgeFormer.

* Accepted to ICASSP 2023

Via

Access Paper or Ask Questions