Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Visual Instruction Tuning with Polite Flamingo

Jul 03, 2023

Delong Chen, Jianfeng Liu, Wenliang Dai, Baoyuan Wang

Figure 1 for Visual Instruction Tuning with Polite Flamingo

Figure 2 for Visual Instruction Tuning with Polite Flamingo

Figure 3 for Visual Instruction Tuning with Polite Flamingo

Figure 4 for Visual Instruction Tuning with Polite Flamingo

Share this with someone who'll enjoy it:

Abstract:Recent research has demonstrated that the multi-task fine-tuning of multi-modal Large Language Models (LLMs) using an assortment of annotated downstream vision-language datasets significantly enhances their performance. Yet, during this process, a side effect, which we termed as the "multi-modal alignment tax", surfaces. This side effect negatively impacts the model's ability to format responses appropriately -- for instance, its "politeness" -- due to the overly succinct and unformatted nature of raw annotations, resulting in reduced human preference. In this paper, we introduce Polite Flamingo, a multi-modal response rewriter that transforms raw annotations into a more appealing, "polite" format. Polite Flamingo is trained to reconstruct high-quality responses from their automatically distorted counterparts and is subsequently applied to a vast array of vision-language datasets for response rewriting. After rigorous filtering, we generate the PF-1M dataset and further validate its value by fine-tuning a multi-modal LLM with it. Combined with novel methodologies including U-shaped multi-stage tuning and multi-turn augmentation, the resulting model, Clever Flamingo, demonstrates its advantages in both multi-modal understanding and response politeness according to automated and human evaluations.

View paper on

Share this with someone who'll enjoy it:

Title:Visual Instruction Tuning with Polite Flamingo

Paper and Code