Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Folco Bertini Baldassini

What Makes Multimodal In-Context Learning Work?

Apr 25, 2024

Folco Bertini Baldassini, Mustafa Shukor, Matthieu Cord, Laure Soulier, Benjamin Piwowarski

Figure 1 for What Makes Multimodal In-Context Learning Work?

Figure 2 for What Makes Multimodal In-Context Learning Work?

Figure 3 for What Makes Multimodal In-Context Learning Work?

Figure 4 for What Makes Multimodal In-Context Learning Work?

Abstract:Large Language Models have demonstrated remarkable performance across various tasks, exhibiting the capacity to swiftly acquire new skills, such as through In-Context Learning (ICL) with minimal demonstration examples. In this work, we present a comprehensive framework for investigating Multimodal ICL (M-ICL) in the context of Large Multimodal Models. We consider the best open-source multimodal models (e.g., IDEFICS, OpenFlamingo) and a wide range of multimodal tasks. Our study unveils several noteworthy findings: (1) M-ICL primarily relies on text-driven mechanisms, showing little to no influence from the image modality. (2) When used with advanced-ICL strategy (like RICES), M-ICL is not better than a simple strategy based on majority voting over context examples. Moreover, we identify several biases and limitations of M-ICL that warrant consideration prior to deployment. Code available at https://gitlab.com/folbaeni/multimodal-icl

* 20 pages, 16 figures. Accepted to CVPR 2024 Workshop on Prompting in Vision. Project page: https://folbaeni.gitlab.io/multimodal-icl

Via

Access Paper or Ask Questions

Cross-Attention Watermarking of Large Language Models

Jan 12, 2024

Folco Bertini Baldassini, Huy H. Nguyen, Ching-Chung Chang, Isao Echizen

Figure 1 for Cross-Attention Watermarking of Large Language Models

Figure 2 for Cross-Attention Watermarking of Large Language Models

Figure 3 for Cross-Attention Watermarking of Large Language Models

Figure 4 for Cross-Attention Watermarking of Large Language Models

Abstract:A new approach to linguistic watermarking of language models is presented in which information is imperceptibly inserted into the output text while preserving its readability and original meaning. A cross-attention mechanism is used to embed watermarks in the text during inference. Two methods using cross-attention are presented that minimize the effect of watermarking on the performance of a pretrained model. Exploration of different training strategies for optimizing the watermarking and of the challenges and implications of applying this approach in real-world scenarios clarified the tradeoff between watermark robustness and text quality. Watermark selection substantially affects the generated output for high entropy sentences. This proactive watermarking approach has potential application in future model development.

* 5 pages, 3 figures. Accepted to ICASSP 2024

Via

Access Paper or Ask Questions