Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM

Jun 05, 2024

Tao Yang, Yingmin Luo, Zhongang Qi, Yang Wu, Ying Shan, Chang Wen Chen

Figure 1 for PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM

Figure 2 for PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM

Figure 3 for PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM

Figure 4 for PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM

Share this with someone who'll enjoy it:

Abstract:Layout generation is the keystone in achieving automated graphic design, requiring arranging the position and size of various multi-modal design elements in a visually pleasing and constraint-following manner. Previous approaches are either inefficient for large-scale applications or lack flexibility for varying design requirements. Our research introduces a unified framework for automated graphic layout generation, leveraging the multi-modal large language model (MLLM) to accommodate diverse design tasks. In contrast, our data-driven method employs structured text (JSON format) and visual instruction tuning to generate layouts under specific visual and textual constraints, including user-defined natural language specifications. We conducted extensive experiments and achieved state-of-the-art (SOTA) performance on public multi-modal layout generation benchmarks, demonstrating the effectiveness of our method. Moreover, recognizing existing datasets' limitations in capturing the complexity of real-world graphic designs, we propose two new datasets for much more challenging tasks (user-constrained generation and complicated poster), further validating our model's utility in real-life settings. Marking by its superior accessibility and adaptability, this approach further automates large-scale graphic design tasks. The code and datasets will be publicly available on https://github.com/posterllava/PosterLLaVA.

View paper on

Share this with someone who'll enjoy it:

Title:PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM

Paper and Code