Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:DiagrammerGPT: Generating Open-Domain, Open-Platform Diagrams via LLM Planning

Oct 18, 2023

Abhay Zala, Han Lin, Jaemin Cho, Mohit Bansal

Figure 1 for DiagrammerGPT: Generating Open-Domain, Open-Platform Diagrams via LLM Planning

Figure 2 for DiagrammerGPT: Generating Open-Domain, Open-Platform Diagrams via LLM Planning

Figure 3 for DiagrammerGPT: Generating Open-Domain, Open-Platform Diagrams via LLM Planning

Figure 4 for DiagrammerGPT: Generating Open-Domain, Open-Platform Diagrams via LLM Planning

Share this with someone who'll enjoy it:

Abstract:Text-to-image (T2I) generation has seen significant growth over the past few years. Despite this, there has been little work on generating diagrams with T2I models. A diagram is a symbolic/schematic representation that explains information using structurally rich and spatially complex visualizations (e.g., a dense combination of related objects, text labels, directional arrows, connection lines, etc.). Existing state-of-the-art T2I models often fail at diagram generation because they lack fine-grained object layout control when many objects are densely connected via complex relations such as arrows/lines and also often fail to render comprehensible text labels. To address this gap, we present DiagrammerGPT, a novel two-stage text-to-diagram generation framework that leverages the layout guidance capabilities of LLMs (e.g., GPT-4) to generate more accurate open-domain, open-platform diagrams. In the first stage, we use LLMs to generate and iteratively refine 'diagram plans' (in a planner-auditor feedback loop) which describe all the entities (objects and text labels), their relationships (arrows or lines), and their bounding box layouts. In the second stage, we use a diagram generator, DiagramGLIGEN, and a text label rendering module to generate diagrams following the diagram plans. To benchmark the text-to-diagram generation task, we introduce AI2D-Caption, a densely annotated diagram dataset built on top of the AI2D dataset. We show quantitatively and qualitatively that our DiagrammerGPT framework produces more accurate diagrams, outperforming existing T2I models. We also provide comprehensive analysis including open-domain diagram generation, vector graphic diagram generation in different platforms, human-in-the-loop diagram plan editing, and multimodal planner/auditor LLMs (e.g., GPT-4Vision). We hope our work can inspire further research on diagram generation via T2I models and LLMs.

* Project page: https://diagrammerGPT.github.io/

View paper on

Share this with someone who'll enjoy it:

Title:DiagrammerGPT: Generating Open-Domain, Open-Platform Diagrams via LLM Planning

Paper and Code