Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ethan Smith

LoRA Diffusion: Zero-Shot LoRA Synthesis for Diffusion Model Personalization

Dec 03, 2024

Ethan Smith, Rami Seid, Alberto Hojel, Paramita Mishra, Jianbo Wu

Figure 1 for LoRA Diffusion: Zero-Shot LoRA Synthesis for Diffusion Model Personalization

Figure 2 for LoRA Diffusion: Zero-Shot LoRA Synthesis for Diffusion Model Personalization

Figure 3 for LoRA Diffusion: Zero-Shot LoRA Synthesis for Diffusion Model Personalization

Figure 4 for LoRA Diffusion: Zero-Shot LoRA Synthesis for Diffusion Model Personalization

Abstract:Low-Rank Adaptation (LoRA) and other parameter-efficient fine-tuning (PEFT) methods provide low-memory, storage-efficient solutions for personalizing text-to-image models. However, these methods offer little to no improvement in wall-clock training time or the number of steps needed for convergence compared to full model fine-tuning. While PEFT methods assume that shifts in generated distributions (from base to fine-tuned models) can be effectively modeled through weight changes in a low-rank subspace, they fail to leverage knowledge of common use cases, which typically focus on capturing specific styles or identities. Observing that desired outputs often comprise only a small subset of the possible domain covered by LoRA training, we propose reducing the search space by incorporating a prior over regions of interest. We demonstrate that training a hypernetwork model to generate LoRA weights can achieve competitive quality for specific domains while enabling near-instantaneous conditioning on user input, in contrast to traditional training methods that require thousands of steps.

* 9 pages, 6 figures

Via

Access Paper or Ask Questions

EZIGen: Enhancing zero-shot subject-driven image generation with precise subject encoding and decoupled guidance

Sep 12, 2024

Zicheng Duan, Yuxuan Ding, Chenhui Gou, Ziqin Zhou, Ethan Smith, Lingqiao Liu

Figure 1 for EZIGen: Enhancing zero-shot subject-driven image generation with precise subject encoding and decoupled guidance

Figure 2 for EZIGen: Enhancing zero-shot subject-driven image generation with precise subject encoding and decoupled guidance

Figure 3 for EZIGen: Enhancing zero-shot subject-driven image generation with precise subject encoding and decoupled guidance

Figure 4 for EZIGen: Enhancing zero-shot subject-driven image generation with precise subject encoding and decoupled guidance

Abstract:Zero-shot subject-driven image generation aims to produce images that incorporate a subject from a given example image. The challenge lies in preserving the subject's identity while aligning with the text prompt, which often requires modifying certain aspects of the subject's appearance. Despite advancements in diffusion model based methods, existing approaches still struggle to balance identity preservation with text prompt alignment. In this study, we conducted an in-depth investigation into this issue and uncovered key insights for achieving effective identity preservation while maintaining a strong balance. Our key findings include: (1) the design of the subject image encoder significantly impacts identity preservation quality, and (2) generating an initial layout is crucial for both text alignment and identity preservation. Building on these insights, we introduce a new approach called EZIGen, which employs two main strategies: a carefully crafted subject image Encoder based on the UNet architecture of the pretrained Stable Diffusion model to ensure high-quality identity transfer, following a process that decouples the guidance stages and iteratively refines the initial image layout. Through these strategies, EZIGen achieves state-of-the-art results on multiple subject-driven benchmarks with a unified model and 100 times less training data.

Via

Access Paper or Ask Questions

High Energy Density Radiative Transfer in the Diffusion Regime with Fourier Neural Operators

May 07, 2024

Joseph Farmer, Ethan Smith, William Bennett, Ryan McClarren

Figure 1 for High Energy Density Radiative Transfer in the Diffusion Regime with Fourier Neural Operators

Figure 2 for High Energy Density Radiative Transfer in the Diffusion Regime with Fourier Neural Operators

Figure 3 for High Energy Density Radiative Transfer in the Diffusion Regime with Fourier Neural Operators

Figure 4 for High Energy Density Radiative Transfer in the Diffusion Regime with Fourier Neural Operators

Abstract:Radiative heat transfer is a fundamental process in high energy density physics and inertial fusion. Accurately predicting the behavior of Marshak waves across a wide range of material properties and drive conditions is crucial for design and analysis of these systems. Conventional numerical solvers and analytical approximations often face challenges in terms of accuracy and computational efficiency. In this work, we propose a novel approach to model Marshak waves using Fourier Neural Operators (FNO). We develop two FNO-based models: (1) a base model that learns the mapping between the drive condition and material properties to a solution approximation based on the widely used analytic model by Hammer & Rosen (2003), and (2) a model that corrects the inaccuracies of the analytic approximation by learning the mapping to a more accurate numerical solution. Our results demonstrate the strong generalization capabilities of the FNOs and show significant improvements in prediction accuracy compared to the base analytic model.

Via

Access Paper or Ask Questions

ToDo: Token Downsampling for Efficient Generation of High-Resolution Images

Feb 28, 2024

Ethan Smith, Nayan Saxena, Aninda Saha

Figure 1 for ToDo: Token Downsampling for Efficient Generation of High-Resolution Images

Figure 2 for ToDo: Token Downsampling for Efficient Generation of High-Resolution Images

Figure 3 for ToDo: Token Downsampling for Efficient Generation of High-Resolution Images

Figure 4 for ToDo: Token Downsampling for Efficient Generation of High-Resolution Images

Abstract:Attention mechanism has been crucial for image diffusion models, however, their quadratic computational complexity limits the sizes of images we can process within reasonable time and memory constraints. This paper investigates the importance of dense attention in generative image models, which often contain redundant features, making them suitable for sparser attention mechanisms. We propose a novel training-free method ToDo that relies on token downsampling of key and value tokens to accelerate Stable Diffusion inference by up to 2x for common sizes and up to 4.5x or more for high resolutions like 2048x2048. We demonstrate that our approach outperforms previous methods in balancing efficient throughput and fidelity.

Via

Access Paper or Ask Questions