Picture for Runji Lin

Runji Lin

additional authors not shown

The Lessons of Developing Process Reward Models in Mathematical Reasoning

Add code
Jan 13, 2025
Viaarxiv icon

Qwen2.5 Technical Report

Add code
Dec 19, 2024
Viaarxiv icon

ProcessBench: Identifying Process Errors in Mathematical Reasoning

Add code
Dec 10, 2024
Viaarxiv icon

Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement

Add code
Sep 18, 2024
Viaarxiv icon

Online Decision MetaMorphFormer: A Casual Transformer-Based Reinforcement Learning Framework of Universal Embodied Intelligence

Add code
Sep 11, 2024
Viaarxiv icon

Qwen2 Technical Report

Add code
Jul 16, 2024
Figure 1 for Qwen2 Technical Report
Figure 2 for Qwen2 Technical Report
Figure 3 for Qwen2 Technical Report
Figure 4 for Qwen2 Technical Report
Viaarxiv icon

LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Feedback

Add code
Jun 30, 2024
Figure 1 for LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Feedback
Figure 2 for LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Feedback
Figure 3 for LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Feedback
Figure 4 for LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Feedback
Viaarxiv icon

The Reason behind Good or Bad: Towards a Better Mathematical Verifier with Natural Language Feedback

Add code
Jun 20, 2024
Figure 1 for The Reason behind Good or Bad: Towards a Better Mathematical Verifier with Natural Language Feedback
Figure 2 for The Reason behind Good or Bad: Towards a Better Mathematical Verifier with Natural Language Feedback
Figure 3 for The Reason behind Good or Bad: Towards a Better Mathematical Verifier with Natural Language Feedback
Figure 4 for The Reason behind Good or Bad: Towards a Better Mathematical Verifier with Natural Language Feedback
Viaarxiv icon

Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment

Add code
May 28, 2024
Figure 1 for Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment
Figure 2 for Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment
Figure 3 for Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment
Figure 4 for Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment
Viaarxiv icon

Large Language Models Play StarCraft II: Benchmarks and A Chain of Summarization Approach

Add code
Dec 19, 2023
Viaarxiv icon