Picture for Dayiheng Liu

Dayiheng Liu

additional authors not shown

START: Self-taught Reasoner with Tools

Add code
Mar 07, 2025
Viaarxiv icon

DataMan: Data Manager for Pre-training Large Language Models

Add code
Feb 26, 2025
Viaarxiv icon

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

Add code
Feb 20, 2025
Viaarxiv icon

HellaSwag-Pro: A Large-Scale Bilingual Benchmark for Evaluating the Robustness of LLMs in Commonsense Reasoning

Add code
Feb 17, 2025
Viaarxiv icon

Qwen2.5-1M Technical Report

Add code
Jan 26, 2025
Viaarxiv icon

RealCritic: Towards Effectiveness-Driven Evaluation of Language Model Critiques

Add code
Jan 24, 2025
Viaarxiv icon

Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models

Add code
Jan 21, 2025
Viaarxiv icon

The Lessons of Developing Process Reward Models in Mathematical Reasoning

Add code
Jan 13, 2025
Figure 1 for The Lessons of Developing Process Reward Models in Mathematical Reasoning
Figure 2 for The Lessons of Developing Process Reward Models in Mathematical Reasoning
Figure 3 for The Lessons of Developing Process Reward Models in Mathematical Reasoning
Figure 4 for The Lessons of Developing Process Reward Models in Mathematical Reasoning
Viaarxiv icon

Enabling Scalable Oversight via Self-Evolving Critic

Add code
Jan 10, 2025
Viaarxiv icon

CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings

Add code
Jan 03, 2025
Viaarxiv icon