Picture for Johannes Mols

Johannes Mols

EnigmaEval: A Benchmark of Long Multimodal Reasoning Challenges

Add code
Feb 13, 2025
Viaarxiv icon

MultiChallenge: A Realistic Multi-Turn Conversation Evaluation Benchmark Challenging to Frontier LLMs

Add code
Jan 29, 2025
Viaarxiv icon