Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:PanoLlama: Generating Endless and Coherent Panoramas with Next-Token-Prediction LLMs

Nov 24, 2024

Teng Zhou, Xiaoyu Zhang, Yongchuan Tang

Figure 1 for PanoLlama: Generating Endless and Coherent Panoramas with Next-Token-Prediction LLMs

Figure 2 for PanoLlama: Generating Endless and Coherent Panoramas with Next-Token-Prediction LLMs

Figure 3 for PanoLlama: Generating Endless and Coherent Panoramas with Next-Token-Prediction LLMs

Figure 4 for PanoLlama: Generating Endless and Coherent Panoramas with Next-Token-Prediction LLMs

Share this with someone who'll enjoy it:

Abstract:Panoramic Image Generation has emerged as an important task in image generation, driven by growing demands for large-scale visuals in creative and technical applications. While diffusion models have dominated this field, they face inherent limitations, including the multilevel-coherence challenge and implementation complexity, leading to suboptimal outcomes. In this paper, we introduce PanoLlama, a novel framework that redefines panoramic image generation as a next-token prediction task. Building on the pre-trained LlamaGen architecture, we generate images in an autoregressive manner and develop an expansion strategy to handle size limitations. This method aligns with the image token structure in a crop-wise and training-free manner, resulting in high-quality panoramas with minimal seams and maximum scalability. PanoLlama demonstrates its effectiveness and versatility in our experiments, achieving the best overall performance while offering flexibility for multi-scale, multi-layout, and multi-guidance generation. It overcomes the challenges that diffusion-based methods fail to address, setting a new paradigm for panoramic image generation tasks. Code is available at https://github.com/0606zt/PanoLlama.

View paper on

Share this with someone who'll enjoy it:

Title:PanoLlama: Generating Endless and Coherent Panoramas with Next-Token-Prediction LLMs

Paper and Code