Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:FiT: Flexible Vision Transformer for Diffusion Model

Feb 19, 2024

Zeyu Lu, Zidong Wang, Di Huang, Chengyue Wu, Xihui Liu, Wanli Ouyang, Lei Bai

Figure 1 for FiT: Flexible Vision Transformer for Diffusion Model

Figure 2 for FiT: Flexible Vision Transformer for Diffusion Model

Figure 3 for FiT: Flexible Vision Transformer for Diffusion Model

Figure 4 for FiT: Flexible Vision Transformer for Diffusion Model

Share this with someone who'll enjoy it:

Abstract:Nature is infinitely resolution-free. In the context of this reality, existing diffusion models, such as Diffusion Transformers, often face challenges when processing image resolutions outside of their trained domain. To overcome this limitation, we present the Flexible Vision Transformer (FiT), a transformer architecture specifically designed for generating images with unrestricted resolutions and aspect ratios. Unlike traditional methods that perceive images as static-resolution grids, FiT conceptualizes images as sequences of dynamically-sized tokens. This perspective enables a flexible training strategy that effortlessly adapts to diverse aspect ratios during both training and inference phases, thus promoting resolution generalization and eliminating biases induced by image cropping. Enhanced by a meticulously adjusted network structure and the integration of training-free extrapolation techniques, FiT exhibits remarkable flexibility in resolution extrapolation generation. Comprehensive experiments demonstrate the exceptional performance of FiT across a broad range of resolutions, showcasing its effectiveness both within and beyond its training resolution distribution. Repository available at https://github.com/whlzy/FiT.

View paper on

Share this with someone who'll enjoy it:

Title:FiT: Flexible Vision Transformer for Diffusion Model

Paper and Code