Picture for Runji Lin

Runji Lin

ProcessBench: Identifying Process Errors in Mathematical Reasoning

Add code
Dec 10, 2024
Viaarxiv icon

Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement

Add code
Sep 18, 2024
Viaarxiv icon

Online Decision MetaMorphFormer: A Casual Transformer-Based Reinforcement Learning Framework of Universal Embodied Intelligence

Add code
Sep 11, 2024
Viaarxiv icon

Qwen2 Technical Report

Add code
Jul 16, 2024
Figure 1 for Qwen2 Technical Report
Figure 2 for Qwen2 Technical Report
Figure 3 for Qwen2 Technical Report
Figure 4 for Qwen2 Technical Report
Viaarxiv icon

LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Feedback

Add code
Jun 30, 2024
Figure 1 for LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Feedback
Figure 2 for LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Feedback
Figure 3 for LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Feedback
Figure 4 for LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Feedback
Viaarxiv icon

The Reason behind Good or Bad: Towards a Better Mathematical Verifier with Natural Language Feedback

Add code
Jun 20, 2024
Figure 1 for The Reason behind Good or Bad: Towards a Better Mathematical Verifier with Natural Language Feedback
Figure 2 for The Reason behind Good or Bad: Towards a Better Mathematical Verifier with Natural Language Feedback
Figure 3 for The Reason behind Good or Bad: Towards a Better Mathematical Verifier with Natural Language Feedback
Figure 4 for The Reason behind Good or Bad: Towards a Better Mathematical Verifier with Natural Language Feedback
Viaarxiv icon

Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment

Add code
May 28, 2024
Figure 1 for Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment
Figure 2 for Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment
Figure 3 for Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment
Figure 4 for Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment
Viaarxiv icon

Large Language Models Play StarCraft II: Benchmarks and A Chain of Summarization Approach

Add code
Dec 19, 2023
Viaarxiv icon

Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models

Add code
Nov 15, 2023
Viaarxiv icon

Qwen Technical Report

Add code
Sep 28, 2023
Figure 1 for Qwen Technical Report
Figure 2 for Qwen Technical Report
Figure 3 for Qwen Technical Report
Figure 4 for Qwen Technical Report
Viaarxiv icon