Picture for Seungone Kim

Seungone Kim

Reasoning over mathematical objects: on-policy reward modeling and test time aggregation

Add code
Mar 19, 2026
Viaarxiv icon

OptimalThinkingBench: Evaluating Over and Underthinking in LLMs

Add code
Aug 18, 2025
Figure 1 for OptimalThinkingBench: Evaluating Over and Underthinking in LLMs
Figure 2 for OptimalThinkingBench: Evaluating Over and Underthinking in LLMs
Figure 3 for OptimalThinkingBench: Evaluating Over and Underthinking in LLMs
Figure 4 for OptimalThinkingBench: Evaluating Over and Underthinking in LLMs
Viaarxiv icon

Let's Predict Sentence by Sentence

Add code
May 28, 2025
Figure 1 for Let's Predict Sentence by Sentence
Figure 2 for Let's Predict Sentence by Sentence
Figure 3 for Let's Predict Sentence by Sentence
Figure 4 for Let's Predict Sentence by Sentence
Viaarxiv icon

Measuring Sycophancy of Language Models in Multi-turn Dialogues

Add code
May 28, 2025
Viaarxiv icon

FREESON: Retriever-Free Retrieval-Augmented Reasoning via Corpus-Traversing MCTS

Add code
May 22, 2025
Viaarxiv icon

Web-Shepherd: Advancing PRMs for Reinforcing Web Agents

Add code
May 21, 2025
Figure 1 for Web-Shepherd: Advancing PRMs for Reinforcing Web Agents
Figure 2 for Web-Shepherd: Advancing PRMs for Reinforcing Web Agents
Figure 3 for Web-Shepherd: Advancing PRMs for Reinforcing Web Agents
Figure 4 for Web-Shepherd: Advancing PRMs for Reinforcing Web Agents
Viaarxiv icon

Reasoning Models Better Express Their Confidence

Add code
May 20, 2025
Figure 1 for Reasoning Models Better Express Their Confidence
Figure 2 for Reasoning Models Better Express Their Confidence
Figure 3 for Reasoning Models Better Express Their Confidence
Figure 4 for Reasoning Models Better Express Their Confidence
Viaarxiv icon

The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think

Add code
May 15, 2025
Figure 1 for The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think
Figure 2 for The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think
Figure 3 for The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think
Figure 4 for The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think
Viaarxiv icon

M-Prometheus: A Suite of Open Multilingual LLM Judges

Add code
Apr 07, 2025
Viaarxiv icon

Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators

Add code
Mar 25, 2025
Viaarxiv icon