Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Leveraging Large Language Models for Scalable Vector Graphics-Driven Image Understanding

Jun 09, 2023

Mu Cai, Zeyi Huang, Yuheng Li, Haohan Wang, Yong Jae Lee

Figure 1 for Leveraging Large Language Models for Scalable Vector Graphics-Driven Image Understanding

Figure 2 for Leveraging Large Language Models for Scalable Vector Graphics-Driven Image Understanding

Figure 3 for Leveraging Large Language Models for Scalable Vector Graphics-Driven Image Understanding

Figure 4 for Leveraging Large Language Models for Scalable Vector Graphics-Driven Image Understanding

Share this with someone who'll enjoy it:

Abstract:Recently, large language models (LLMs) have made significant advancements in natural language understanding and generation. However, their potential in computer vision remains largely unexplored. In this paper, we introduce a new, exploratory approach that enables LLMs to process images using the Scalable Vector Graphics (SVG) format. By leveraging the XML-based textual descriptions of SVG representations instead of raster images, we aim to bridge the gap between the visual and textual modalities, allowing LLMs to directly understand and manipulate images without the need for parameterized visual components. Our method facilitates simple image classification, generation, and in-context learning using only LLM capabilities. We demonstrate the promise of our approach across discriminative and generative tasks, highlighting its (i) robustness against distribution shift, (ii) substantial improvements achieved by tapping into the in-context learning abilities of LLMs, and (iii) image understanding and generation capabilities with human guidance. Our code, data, and models can be found here https://github.com/mu-cai/svg-llm.

View paper on

Share this with someone who'll enjoy it:

Title:Leveraging Large Language Models for Scalable Vector Graphics-Driven Image Understanding

Paper and Code