Picture for Radha Poovendran

Radha Poovendran

Agents' Last Exam

Add code
Jun 03, 2026
Viaarxiv icon

AutoLab: Can Frontier Models Solve Long-Horizon Auto Research and Engineering Tasks?

Add code
Jun 03, 2026
Viaarxiv icon

The Strongest Teacher Is Not Always the Best Teacher: Student-Centric Answer Selection

Add code
May 26, 2026
Viaarxiv icon

JobBench: Aligning Agent Work With Human Will

Add code
May 25, 2026
Viaarxiv icon

Polyhedral Instability Governs Regret in Online Learning

Add code
May 13, 2026
Viaarxiv icon

The WidthWall: A Strict Expressivity Hierarchy for Hypergraph Neural Networks

Add code
May 13, 2026
Viaarxiv icon

Visual Aesthetic Benchmark: Can Frontier Models Judge Beauty?

Add code
May 12, 2026
Viaarxiv icon

VisualSphinx: Large-Scale Synthetic Vision Logic Puzzles for RL

Add code
May 29, 2025
Viaarxiv icon

SOSBENCH: Benchmarking Safety Alignment on Scientific Knowledge

Add code
May 27, 2025
Viaarxiv icon

Temporal Sampling for Forgotten Reasoning in LLMs

Add code
May 26, 2025
Viaarxiv icon