Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer

Dec 14, 2024

Hao Chen, Ze Wang, Xiang Li, Ximeng Sun, Fangyi Chen, Jiang Liu, Jindong Wang, Bhiksha Raj, Zicheng Liu, Emad Barsoum

Figure 1 for SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer

Figure 2 for SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer

Figure 3 for SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer

Figure 4 for SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer

Share this with someone who'll enjoy it:

Abstract:Efficient image tokenization with high compression ratios remains a critical challenge for training generative models. We present SoftVQ-VAE, a continuous image tokenizer that leverages soft categorical posteriors to aggregate multiple codewords into each latent token, substantially increasing the representation capacity of the latent space. When applied to Transformer-based architectures, our approach compresses 256x256 and 512x512 images using as few as 32 or 64 1-dimensional tokens. Not only does SoftVQ-VAE show consistent and high-quality reconstruction, more importantly, it also achieves state-of-the-art and significantly faster image generation results across different denoising-based generative models. Remarkably, SoftVQ-VAE improves inference throughput by up to 18x for generating 256x256 images and 55x for 512x512 images while achieving competitive FID scores of 1.78 and 2.21 for SiT-XL. It also improves the training efficiency of the generative models by reducing the number of training iterations by 2.3x while maintaining comparable performance. With its fully-differentiable design and semantic-rich latent space, our experiment demonstrates that SoftVQ-VQE achieves efficient tokenization without compromising generation quality, paving the way for more efficient generative models. Code and model are released.

* Code and model: https://github.com/Hhhhhhao/continuous_tokenizer

View paper on

Share this with someone who'll enjoy it:

Title:SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer

Paper and Code