Picture for Shuyan Zhou

Shuyan Zhou

Beyond Browsing: API-Based Web Agents

Add code
Oct 21, 2024
Viaarxiv icon

WebCanvas: Benchmarking Web Agents in Online Environments

Add code
Jun 18, 2024
Viaarxiv icon

OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

Add code
Apr 11, 2024
Figure 1 for OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Figure 2 for OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Figure 3 for OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Figure 4 for OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Viaarxiv icon

VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks

Add code
Jan 24, 2024
Viaarxiv icon

WebArena: A Realistic Web Environment for Building Autonomous Agents

Add code
Jul 25, 2023
Viaarxiv icon

Hierarchical Prompting Assists Large Language Model on Web Navigation

Add code
May 23, 2023
Viaarxiv icon

Bridging the Gap: A Survey on Integrating Feedback for Natural Language Generation

Add code
May 01, 2023
Viaarxiv icon

CodeBERTScore: Evaluating Code Generation with Pretrained Models of Code

Add code
Feb 10, 2023
Viaarxiv icon

Causal Reasoning of Entities and Events in Procedural Texts

Add code
Jan 29, 2023
Viaarxiv icon

Execution-Based Evaluation for Open-Domain Code Generation

Add code
Dec 20, 2022
Viaarxiv icon