Picture for Bowen Yu

Bowen Yu

additional authors not shown

Qwen2.5-1M Technical Report

Add code
Jan 26, 2025
Viaarxiv icon

RealCritic: Towards Effectiveness-Driven Evaluation of Language Model Critiques

Add code
Jan 24, 2025
Viaarxiv icon

The Lessons of Developing Process Reward Models in Mathematical Reasoning

Add code
Jan 13, 2025
Viaarxiv icon

Enabling Scalable Oversight via Self-Evolving Critic

Add code
Jan 10, 2025
Viaarxiv icon

CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings

Add code
Jan 03, 2025
Viaarxiv icon

Qwen2.5 Technical Report

Add code
Dec 19, 2024
Viaarxiv icon

ProcessBench: Identifying Process Errors in Mathematical Reasoning

Add code
Dec 10, 2024
Viaarxiv icon

Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering

Add code
Nov 18, 2024
Figure 1 for Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
Figure 2 for Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
Figure 3 for Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
Figure 4 for Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
Viaarxiv icon

P-MMEval: A Parallel Multilingual Multitask Benchmark for Consistent Evaluation of LLMs

Add code
Nov 14, 2024
Figure 1 for P-MMEval: A Parallel Multilingual Multitask Benchmark for Consistent Evaluation of LLMs
Figure 2 for P-MMEval: A Parallel Multilingual Multitask Benchmark for Consistent Evaluation of LLMs
Figure 3 for P-MMEval: A Parallel Multilingual Multitask Benchmark for Consistent Evaluation of LLMs
Figure 4 for P-MMEval: A Parallel Multilingual Multitask Benchmark for Consistent Evaluation of LLMs
Viaarxiv icon

Language Models can Self-Lengthen to Generate Long Texts

Add code
Oct 31, 2024
Figure 1 for Language Models can Self-Lengthen to Generate Long Texts
Figure 2 for Language Models can Self-Lengthen to Generate Long Texts
Figure 3 for Language Models can Self-Lengthen to Generate Long Texts
Figure 4 for Language Models can Self-Lengthen to Generate Long Texts
Viaarxiv icon