Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Caution for the Environment: Multimodal Agents are Susceptible to Environmental Distractions

Aug 05, 2024

Xinbei Ma, Yiting Wang, Yao Yao, Tongxin Yuan, Aston Zhang, Zhuosheng Zhang, Hai Zhao

Figure 1 for Caution for the Environment: Multimodal Agents are Susceptible to Environmental Distractions

Figure 2 for Caution for the Environment: Multimodal Agents are Susceptible to Environmental Distractions

Figure 3 for Caution for the Environment: Multimodal Agents are Susceptible to Environmental Distractions

Figure 4 for Caution for the Environment: Multimodal Agents are Susceptible to Environmental Distractions

Share this with someone who'll enjoy it:

Abstract:This paper investigates the faithfulness of multimodal large language model (MLLM) agents in the graphical user interface (GUI) environment, aiming to address the research question of whether multimodal GUI agents can be distracted by environmental context. A general setting is proposed where both the user and the agent are benign, and the environment, while not malicious, contains unrelated content. A wide range of MLLMs are evaluated as GUI agents using our simulated dataset, following three working patterns with different levels of perception. Experimental results reveal that even the most powerful models, whether generalist agents or specialist GUI agents, are susceptible to distractions. While recent studies predominantly focus on the helpfulness (i.e., action accuracy) of multimodal agents, our findings indicate that these agents are prone to environmental distractions, resulting in unfaithful behaviors. Furthermore, we switch to the adversarial perspective and implement environment injection, demonstrating that such unfaithfulness can be exploited, leading to unexpected risks.

View paper on

Share this with someone who'll enjoy it:

Title:Caution for the Environment: Multimodal Agents are Susceptible to Environmental Distractions

Paper and Code