Picture for Karolina Korgul

Karolina Korgul

Agent Benchmarks Fail Public Sector Requirements

Add code
Jan 28, 2026
Viaarxiv icon

It's a TRAP! Task-Redirecting Agent Persuasion Benchmark for Web Agents

Add code
Dec 29, 2025
Viaarxiv icon

LINGOLY-TOO: Disentangling Memorisation from Reasoning with Linguistic Templatisation and Orthographic Obfuscation

Add code
Mar 04, 2025
Viaarxiv icon

Exploring the Landscape of Large Language Models In Medical Question Answering: Observations and Open Questions

Add code
Oct 11, 2023
Viaarxiv icon