Picture for Boyuan Zheng

Boyuan Zheng

Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents

Add code
Oct 07, 2024
Viaarxiv icon

Interpretable Robotic Manipulation from Language

Add code
May 27, 2024
Viaarxiv icon

Scrutinize What We Ignore: Reining Task Representation Shift In Context-Based Offline Meta Reinforcement Learning

Add code
May 20, 2024
Viaarxiv icon

A Trembling House of Cards? Mapping Adversarial Attacks against Language Agents

Add code
Feb 15, 2024
Figure 1 for A Trembling House of Cards? Mapping Adversarial Attacks against Language Agents
Figure 2 for A Trembling House of Cards? Mapping Adversarial Attacks against Language Agents
Figure 3 for A Trembling House of Cards? Mapping Adversarial Attacks against Language Agents
Viaarxiv icon

Dual-View Visual Contextualization for Web Navigation

Add code
Feb 06, 2024
Viaarxiv icon

The Language Barrier: Dissecting Safety Challenges of LLMs in Multilingual Contexts

Add code
Jan 23, 2024
Viaarxiv icon

GPT-4V is a Generalist Web Agent, if Grounded

Add code
Jan 03, 2024
Viaarxiv icon

MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

Add code
Nov 27, 2023
Figure 1 for MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Figure 2 for MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Figure 3 for MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Figure 4 for MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Viaarxiv icon

Mind2Web: Towards a Generalist Agent for the Web

Add code
Jun 15, 2023
Viaarxiv icon

Flatness-Aware Prompt Selection Improves Accuracy and Sample Efficiency

Add code
May 18, 2023
Viaarxiv icon