Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:LLM-wrapper: Black-Box Semantic-Aware Adaptation of Vision-Language Foundation Models

Sep 18, 2024

Amaia Cardiel, Eloi Zablocki, Oriane Siméoni, Elias Ramzi, Matthieu Cord

Figure 1 for LLM-wrapper: Black-Box Semantic-Aware Adaptation of Vision-Language Foundation Models

Figure 2 for LLM-wrapper: Black-Box Semantic-Aware Adaptation of Vision-Language Foundation Models

Figure 3 for LLM-wrapper: Black-Box Semantic-Aware Adaptation of Vision-Language Foundation Models

Figure 4 for LLM-wrapper: Black-Box Semantic-Aware Adaptation of Vision-Language Foundation Models

Share this with someone who'll enjoy it:

Abstract:Vision Language Models (VLMs) have shown impressive performances on numerous tasks but their zero-shot capabilities can be limited compared to dedicated or fine-tuned models. Yet, fine-tuning VLMs comes with limitations as it requires `white-box' access to the model's architecture and weights as well as expertise to design the fine-tuning objectives and optimize the hyper-parameters, which are specific to each VLM and downstream task. In this work, we propose LLM-wrapper, a novel approach to adapt VLMs in a `black-box' manner by leveraging large language models (LLMs) so as to reason on their outputs. We demonstrate the effectiveness of LLM-wrapper on Referring Expression Comprehension (REC), a challenging open-vocabulary task that requires spatial and semantic reasoning. Our approach significantly boosts the performance of off-the-shelf models, resulting in competitive results when compared with classic fine-tuning.

* EVAL-FoMo workshop, ECCV 2024

View paper on

Share this with someone who'll enjoy it:

Title:LLM-wrapper: Black-Box Semantic-Aware Adaptation of Vision-Language Foundation Models

Paper and Code