Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jaesung R. Park

Numerical Analysis of HiPPO-LegS ODE for Deep State Space Models

Dec 11, 2024

Jaesung R. Park, Jaewook J. Suh, Ernest K. Ryu

Abstract:In deep learning, the recently introduced state space models utilize HiPPO (High-order Polynomial Projection Operators) memory units to approximate continuous-time trajectories of input functions using ordinary differential equations (ODEs), and these techniques have shown empirical success in capturing long-range dependencies in long input sequences. However, the mathematical foundations of these ODEs, particularly the singular HiPPO-LegS (Legendre Scaled) ODE, and their corresponding numerical discretizations remain unexplored. In this work, we fill this gap by establishing that HiPPO-LegS ODE is well-posed despite its singularity, albeit without the freedom of arbitrary initial conditions, and by establishing convergence of the associated numerical discretization schemes for Riemann-integrable input functions.

Via

Access Paper or Ask Questions

Simple Drop-in LoRA Conditioning on Attention Layers Will Improve Your Diffusion Model

May 07, 2024

Joo Young Choi, Jaesung R. Park, Inkyu Park, Jaewoong Cho, Albert No, Ernest K. Ryu

Figure 1 for Simple Drop-in LoRA Conditioning on Attention Layers Will Improve Your Diffusion Model

Figure 2 for Simple Drop-in LoRA Conditioning on Attention Layers Will Improve Your Diffusion Model

Figure 3 for Simple Drop-in LoRA Conditioning on Attention Layers Will Improve Your Diffusion Model

Figure 4 for Simple Drop-in LoRA Conditioning on Attention Layers Will Improve Your Diffusion Model

Abstract:Current state-of-the-art diffusion models employ U-Net architectures containing convolutional and (qkv) self-attention layers. The U-Net processes images while being conditioned on the time embedding input for each sampling step and the class or caption embedding input corresponding to the desired conditional generation. Such conditioning involves scale-and-shift operations to the convolutional layers but does not directly affect the attention layers. While these standard architectural choices are certainly effective, not conditioning the attention layers feels arbitrary and potentially suboptimal. In this work, we show that simply adding LoRA conditioning to the attention layers without changing or tuning the other parts of the U-Net architecture improves the image generation quality. For example, a drop-in addition of LoRA conditioning to EDM diffusion model yields FID scores of 1.91/1.75 for unconditional and class-conditional CIFAR-10 generation, improving upon the baseline of 1.97/1.79.

Via

Access Paper or Ask Questions