Picture for Shuyan Zhou

Shuyan Zhou

Classroom Final Exam: An Instructor-Tested Reasoning Benchmark

Add code
Feb 23, 2026
Viaarxiv icon

Modeling Distinct Human Interaction in Web Agents

Add code
Feb 19, 2026
Viaarxiv icon

Learning Personalized Agents from Human Feedback

Add code
Feb 18, 2026
Viaarxiv icon

Are Open-Weight LLMs Ready for Social Media Moderation? A Comparative Study on Bluesky

Add code
Feb 05, 2026
Viaarxiv icon

The Llama 4 Herd: Architecture, Training, Evaluation, and Deployment Notes

Add code
Jan 15, 2026
Viaarxiv icon

The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution

Add code
Oct 29, 2025
Viaarxiv icon

EnvInjection: Environmental Prompt Injection Attack to Multi-modal Web Agents

Add code
May 16, 2025
Viaarxiv icon

CowPilot: A Framework for Autonomous and Human-Agent Collaborative Web Navigation

Add code
Jan 28, 2025
Figure 1 for CowPilot: A Framework for Autonomous and Human-Agent Collaborative Web Navigation
Figure 2 for CowPilot: A Framework for Autonomous and Human-Agent Collaborative Web Navigation
Figure 3 for CowPilot: A Framework for Autonomous and Human-Agent Collaborative Web Navigation
Figure 4 for CowPilot: A Framework for Autonomous and Human-Agent Collaborative Web Navigation
Viaarxiv icon

TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks

Add code
Dec 18, 2024
Viaarxiv icon

Beyond Browsing: API-Based Web Agents

Add code
Oct 21, 2024
Figure 1 for Beyond Browsing: API-Based Web Agents
Figure 2 for Beyond Browsing: API-Based Web Agents
Figure 3 for Beyond Browsing: API-Based Web Agents
Figure 4 for Beyond Browsing: API-Based Web Agents
Viaarxiv icon