Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:VisionCoder: Empowering Multi-Agent Auto-Programming for Image Processing with Hybrid LLMs

Oct 25, 2024

Zixiao Zhao, Jing Sun, Zhiyuan Wei, Cheng-Hao Cai, Zhe Hou, Jin Song Dong

Figure 1 for VisionCoder: Empowering Multi-Agent Auto-Programming for Image Processing with Hybrid LLMs

Figure 2 for VisionCoder: Empowering Multi-Agent Auto-Programming for Image Processing with Hybrid LLMs

Figure 3 for VisionCoder: Empowering Multi-Agent Auto-Programming for Image Processing with Hybrid LLMs

Figure 4 for VisionCoder: Empowering Multi-Agent Auto-Programming for Image Processing with Hybrid LLMs

Share this with someone who'll enjoy it:

Abstract:In the field of automated programming, large language models (LLMs) have demonstrated foundational generative capabilities when given detailed task descriptions. However, their current functionalities are primarily limited to function-level development, restricting their effectiveness in complex project environments and specific application scenarios, such as complicated image-processing tasks. This paper presents a multi-agent framework that utilises a hybrid set of LLMs, including GPT-4o and locally deployed open-source models, which collaboratively complete auto-programming tasks. Each agent plays a distinct role in the software development cycle, collectively forming a virtual organisation that works together to produce software products. By establishing a tree-structured thought distribution and development mechanism across project, module, and function levels, this framework offers a cost-effective and efficient solution for code generation. We evaluated our approach using benchmark datasets, and the experimental results demonstrate that VisionCoder significantly outperforms existing methods in image processing auto-programming tasks.

View paper on

Share this with someone who'll enjoy it:

Title:VisionCoder: Empowering Multi-Agent Auto-Programming for Image Processing with Hybrid LLMs

Paper and Code