Picture for Haiyang Shen

Haiyang Shen

M3-BENCH: Process-Aware Evaluation of LLM Agents Social Behaviors in Mixed-Motive Games

Add code
Jan 13, 2026
Viaarxiv icon

BabyVision: Visual Reasoning Beyond Language

Add code
Jan 10, 2026
Viaarxiv icon

MDAgent2: Large Language Model for Code Generation and Knowledge Q&A in Molecular Dynamics

Add code
Jan 07, 2026
Viaarxiv icon

RAGSynth: Synthetic Data for Robust and Faithful RAG Component Optimization

Add code
May 16, 2025
Viaarxiv icon

MASS: Multi-Agent Simulation Scaling for Portfolio Construction

Add code
May 15, 2025
Viaarxiv icon

PixelWeb: The First Web GUI Dataset with Pixel-Wise Labels

Add code
Apr 23, 2025
Viaarxiv icon

ShortcutsBench: A Large-Scale Real-world Benchmark for API-based Agents

Add code
Jun 28, 2024
Viaarxiv icon