Picture for Olli Järviniemi

Olli Järviniemi

FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI

Add code
Nov 07, 2024
Figure 1 for FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI
Figure 2 for FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI
Figure 3 for FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI
Figure 4 for FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI
Viaarxiv icon

Uncovering Deceptive Tendencies in Language Models: A Simulated Company AI Assistant

Add code
Apr 25, 2024
Viaarxiv icon