Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jaehyeong Jo

Frame Guidance: Training-Free Guidance for Frame-Level Control in Video Diffusion Models

Jun 08, 2025

Sangwon Jang, Taekyung Ki, Jaehyeong Jo, Jaehong Yoon, Soo Ye Kim, Zhe Lin, Sung Ju Hwang

Abstract:Advancements in diffusion models have significantly improved video quality, directing attention to fine-grained controllability. However, many existing methods depend on fine-tuning large-scale video models for specific tasks, which becomes increasingly impractical as model sizes continue to grow. In this work, we present Frame Guidance, a training-free guidance for controllable video generation based on frame-level signals, such as keyframes, style reference images, sketches, or depth maps. For practical training-free guidance, we propose a simple latent processing method that dramatically reduces memory usage, and apply a novel latent optimization strategy designed for globally coherent video generation. Frame Guidance enables effective control across diverse tasks, including keyframe guidance, stylization, and looping, without any training, compatible with any video models. Experimental results show that Frame Guidance can produce high-quality controlled videos for a wide range of tasks and input signals.

* Project page: https://frame-guidance-video.github.io/

Via

Access Paper or Ask Questions

Silent Branding Attack: Trigger-free Data Poisoning Attack on Text-to-Image Diffusion Models

Mar 12, 2025

Sangwon Jang, June Suk Choi, Jaehyeong Jo, Kimin Lee, Sung Ju Hwang

Abstract:Text-to-image diffusion models have achieved remarkable success in generating high-quality contents from text prompts. However, their reliance on publicly available data and the growing trend of data sharing for fine-tuning make these models particularly vulnerable to data poisoning attacks. In this work, we introduce the Silent Branding Attack, a novel data poisoning method that manipulates text-to-image diffusion models to generate images containing specific brand logos or symbols without any text triggers. We find that when certain visual patterns are repeatedly in the training data, the model learns to reproduce them naturally in its outputs, even without prompt mentions. Leveraging this, we develop an automated data poisoning algorithm that unobtrusively injects logos into original images, ensuring they blend naturally and remain undetected. Models trained on this poisoned dataset generate images containing logos without degrading image quality or text alignment. We experimentally validate our silent branding attack across two realistic settings on large-scale high-quality image datasets and style personalization datasets, achieving high success rates even without a specific text trigger. Human evaluation and quantitative metrics including logo detection show that our method can stealthily embed logos.

* CVPR 2025. Project page: https://silent-branding.github.io/

Via

Access Paper or Ask Questions

Continuous Diffusion Model for Language Modeling

Feb 17, 2025

Jaehyeong Jo, Sung Ju Hwang

Abstract:Diffusion models have emerged as a promising alternative to autoregressive models in modeling discrete categorical data. Yet diffusion models that directly work on discrete data space do not fully exploit the power of iterative refinement, as the signals are lost during the transition between discrete states. Existing continuous diffusion models for discrete data have limited performance compared to discrete approaches, and the unclear link between them restricts the development of diffusion models for discrete data. In this work, we propose a continuous diffusion model for language modeling that incorporates the geometry of the underlying categorical distribution. We establish a connection between the discrete diffusion and continuous flow on the statistical manifold, and building on the analogy, we introduce a simple design for the diffusion process that generalizes previous discrete diffusion models. We further propose a simulation-free training framework based on radial symmetry and a simple technique to address the high dimensionality of the manifold. Comprehensive experiments on language modeling benchmarks and other modalities show that our method outperforms existing discrete diffusion models and approaches the performance of autoregressive models. Codes available at \href{https://github.com/harryjo97/RDLM}{https://github.com/harryjo97/RDLM}.

Via

Access Paper or Ask Questions

Conditional Synthesis of 3D Molecules with Time Correction Sampler

Nov 01, 2024

Hojung Jung, Youngrok Park, Laura Schmid, Jaehyeong Jo, Dongkyu Lee, Bongsang Kim, Se-Young Yun, Jinwoo Shin

Abstract:Diffusion models have demonstrated remarkable success in various domains, including molecular generation. However, conditional molecular generation remains a fundamental challenge due to an intrinsic trade-off between targeting specific chemical properties and generating meaningful samples from the data distribution. In this work, we present Time-Aware Conditional Synthesis (TACS), a novel approach to conditional generation on diffusion models. It integrates adaptively controlled plug-and-play "online" guidance into a diffusion model, driving samples toward the desired properties while maintaining validity and stability. A key component of our algorithm is our new type of diffusion sampler, Time Correction Sampler (TCS), which is used to control guidance and ensure that the generated molecules remain on the correct manifold at each reverse step of the diffusion process at the same time. Our proposed method demonstrates significant performance in conditional 3D molecular generation and offers a promising approach towards inverse molecular design, potentially facilitating advancements in drug discovery, materials science, and other related fields.

* NeurIPS 2024

Via

Access Paper or Ask Questions

Identity Decoupling for Multi-Subject Personalization of Text-to-Image Models

Apr 05, 2024

Sangwon Jang, Jaehyeong Jo, Kimin Lee, Sung Ju Hwang

Abstract:Text-to-image diffusion models have shown remarkable success in generating a personalized subject based on a few reference images. However, current methods struggle with handling multiple subjects simultaneously, often resulting in mixed identities with combined attributes from different subjects. In this work, we present MuDI, a novel framework that enables multi-subject personalization by effectively decoupling identities from multiple subjects. Our main idea is to utilize segmented subjects generated by the Segment Anything Model for both training and inference, as a form of data augmentation for training and initialization for the generation process. Our experiments demonstrate that MuDI can produce high-quality personalized images without identity mixing, even for highly similar subjects as shown in Figure 1. In human evaluation, MuDI shows twice as many successes for personalizing multiple subjects without identity mixing over existing baselines and is preferred over 70% compared to the strongest baseline. More results are available at https://mudi-t2i.github.io/.

* Preprint. Project page: https://mudi-t2i.github.io/

Via

Access Paper or Ask Questions

Generative Modeling on Manifolds Through Mixture of Riemannian Diffusion Processes

Oct 11, 2023

Jaehyeong Jo, Sung Ju Hwang

Abstract:Learning the distribution of data on Riemannian manifolds is crucial for modeling data from non-Euclidean space, which is required by many applications from diverse scientific fields. Yet, existing generative models on manifolds suffer from expensive divergence computation or rely on approximations of heat kernel. These limitations restrict their applicability to simple geometries and hinder scalability to high dimensions. In this work, we introduce the Riemannian Diffusion Mixture, a principled framework for building a generative process on manifolds as a mixture of endpoint-conditioned diffusion processes instead of relying on the denoising approach of previous diffusion models, for which the generative process is characterized by its drift guiding toward the most probable endpoint with respect to the geometry of the manifold. We further propose a simple yet efficient training objective for learning the mixture process, that is readily applicable to general manifolds. Our method outperforms previous generative models on various manifolds while scaling to high dimensions and requires a dramatically reduced number of in-training simulation steps for general manifolds.

Via

Access Paper or Ask Questions

DiffusionNAG: Task-guided Neural Architecture Generation with Diffusion Models

May 26, 2023

Sohyun An, Hayeon Lee, Jaehyeong Jo, Seanie Lee, Sung Ju Hwang

Figure 1 for DiffusionNAG: Task-guided Neural Architecture Generation with Diffusion Models

Figure 2 for DiffusionNAG: Task-guided Neural Architecture Generation with Diffusion Models

Figure 3 for DiffusionNAG: Task-guided Neural Architecture Generation with Diffusion Models

Figure 4 for DiffusionNAG: Task-guided Neural Architecture Generation with Diffusion Models

Abstract:Neural Architecture Search (NAS) has emerged as a powerful technique for automating neural architecture design. However, existing NAS methods either require an excessive amount of time for repetitive training or sampling of many task-irrelevant architectures. Moreover, they lack generalization across different tasks and usually require searching for optimal architectures for each task from scratch without reusing the knowledge from the previous NAS tasks. To tackle such limitations of existing NAS methods, we propose a novel transferable task-guided Neural Architecture Generation (NAG) framework based on diffusion models, dubbed DiffusionNAG. With the guidance of a surrogate model, such as a performance predictor for a given task, our DiffusionNAG can generate task-optimal architectures for diverse tasks, including unseen tasks. DiffusionNAG is highly efficient as it generates task-optimal neural architectures by leveraging the prior knowledge obtained from the previous tasks and neural architecture distribution. Furthermore, we introduce a score network to ensure the generation of valid architectures represented as directed acyclic graphs, unlike existing graph generative models that focus on generating undirected graphs. Extensive experiments demonstrate that DiffusionNAG significantly outperforms the state-of-the-art transferable NAG model in architecture generation quality, as well as previous NAS methods on four computer vision datasets with largely reduced computational cost.

Via

Access Paper or Ask Questions

Text-Conditioned Sampling Framework for Text-to-Image Generation with Masked Generative Models

Apr 04, 2023

Jaewoong Lee, Sangwon Jang, Jaehyeong Jo, Jaehong Yoon, Yunji Kim, Jin-Hwa Kim, Jung-Woo Ha, Sung Ju Hwang

Abstract:Token-based masked generative models are gaining popularity for their fast inference time with parallel decoding. While recent token-based approaches achieve competitive performance to diffusion-based models, their generation performance is still suboptimal as they sample multiple tokens simultaneously without considering the dependence among them. We empirically investigate this problem and propose a learnable sampling model, Text-Conditioned Token Selection (TCTS), to select optimal tokens via localized supervision with text information. TCTS improves not only the image quality but also the semantic alignment of the generated images with the given texts. To further improve the image quality, we introduce a cohesive sampling strategy, Frequency Adaptive Sampling (FAS), to each group of tokens divided according to the self-attention maps. We validate the efficacy of TCTS combined with FAS with various generative tasks, demonstrating that it significantly outperforms the baselines in image-text alignment and image quality. Our text-conditioned sampling framework further reduces the original inference time by more than 50% without modifying the original generative model.

Via

Access Paper or Ask Questions

Graph Generation with Destination-Driven Diffusion Mixture

Feb 07, 2023

Jaehyeong Jo, Dongki Kim, Sung Ju Hwang

Abstract:Generation of graphs is a major challenge for real-world tasks that require understanding the complex nature of their non-Euclidean structures. Although diffusion models have achieved notable success in graph generation recently, they are ill-suited for modeling the structural information of graphs since learning to denoise the noisy samples does not explicitly capture the graph topology. To tackle this limitation, we propose a novel generative process that models the topology of graphs by predicting the destination of the process. Specifically, we design the generative process as a mixture of diffusion processes conditioned on the endpoint in the data distribution, which drives the process toward the probable destination. Further, we introduce new training objectives for learning to predict the destination, and discuss the advantages of our generative framework that can explicitly model the graph topology and exploit the inductive bias of the data. Through extensive experimental validation on general graph and 2D/3D molecular graph generation tasks, we show that our method outperforms previous generative models, generating graphs with correct topology with both continuous and discrete features.

Via

Access Paper or Ask Questions

Score-based Generative Modeling of Graphs via the System of Stochastic Differential Equations

Feb 16, 2022

Jaehyeong Jo, Seul Lee, Sung Ju Hwang

Figure 1 for Score-based Generative Modeling of Graphs via the System of Stochastic Differential Equations

Figure 2 for Score-based Generative Modeling of Graphs via the System of Stochastic Differential Equations

Figure 3 for Score-based Generative Modeling of Graphs via the System of Stochastic Differential Equations

Figure 4 for Score-based Generative Modeling of Graphs via the System of Stochastic Differential Equations

Abstract:Generating graph-structured data requires learning the underlying distribution of graphs. Yet, this is a challenging problem, and the previous graph generative methods either fail to capture the permutation-invariance property of graphs or cannot sufficiently model the complex dependency between nodes and edges, which is crucial for generating real-world graphs such as molecules. To overcome such limitations, we propose a novel score-based generative model for graphs with a continuous-time framework. Specifically, we propose a new graph diffusion process that models the joint distribution of the nodes and edges through a system of stochastic differential equations (SDEs). Then, we derive novel score matching objectives tailored for the proposed diffusion process to estimate the gradient of the joint log-density with respect to each component, and introduce a new solver for the system of SDEs to efficiently sample from the reverse diffusion process. We validate our graph generation method on diverse datasets, on which it either achieves significantly superior or competitive performance to the baselines. Further analysis shows that our method is able to generate molecules that lie close to the training distribution yet do not violate the chemical valency rule, demonstrating the effectiveness of the system of SDEs in modeling the node-edge relationships.

Via

Access Paper or Ask Questions