Picture for Wentian Wang

Wentian Wang

Reasoning or Simply Next Token Prediction? A Benchmark for Stress-Testing Large Language Models

Add code
Jun 15, 2024
Viaarxiv icon