Picture for Olli Järviniemi

Olli Järviniemi

FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI

Add code
Nov 07, 2024
Viaarxiv icon

Uncovering Deceptive Tendencies in Language Models: A Simulated Company AI Assistant

Add code
Apr 25, 2024
Viaarxiv icon