Picture for Bingxiang He

Bingxiang He

May

CPMobius: Iterative Coach-Player Reasoning for Data-Free Reinforcement Learning

Add code
Feb 03, 2026
Viaarxiv icon

Current Agents Fail to Leverage World Model as Tool for Foresight

Add code
Jan 08, 2026
Viaarxiv icon

CubeBench: Diagnosing Interactive, Long-Horizon Spatial Reasoning Under Partial Observations

Add code
Dec 30, 2025
Viaarxiv icon

JustRL: Scaling a 1.5B LLM with a Simple RL Recipe

Add code
Dec 18, 2025
Viaarxiv icon

Veri-R1: Toward Precise and Faithful Claim Verification via Online Reinforcement Learning

Add code
Oct 02, 2025
Figure 1 for Veri-R1: Toward Precise and Faithful Claim Verification via Online Reinforcement Learning
Figure 2 for Veri-R1: Toward Precise and Faithful Claim Verification via Online Reinforcement Learning
Figure 3 for Veri-R1: Toward Precise and Faithful Claim Verification via Online Reinforcement Learning
Figure 4 for Veri-R1: Toward Precise and Faithful Claim Verification via Online Reinforcement Learning
Viaarxiv icon

A Survey of Reinforcement Learning for Large Reasoning Models

Add code
Sep 10, 2025
Viaarxiv icon

MiniCPM4: Ultra-Efficient LLMs on End Devices

Add code
Jun 09, 2025
Figure 1 for MiniCPM4: Ultra-Efficient LLMs on End Devices
Figure 2 for MiniCPM4: Ultra-Efficient LLMs on End Devices
Figure 3 for MiniCPM4: Ultra-Efficient LLMs on End Devices
Figure 4 for MiniCPM4: Ultra-Efficient LLMs on End Devices
Viaarxiv icon

AIR: A Systematic Analysis of Annotations, Instructions, and Response Pairs in Preference Dataset

Add code
Apr 04, 2025
Figure 1 for AIR: A Systematic Analysis of Annotations, Instructions, and Response Pairs in Preference Dataset
Figure 2 for AIR: A Systematic Analysis of Annotations, Instructions, and Response Pairs in Preference Dataset
Figure 3 for AIR: A Systematic Analysis of Annotations, Instructions, and Response Pairs in Preference Dataset
Figure 4 for AIR: A Systematic Analysis of Annotations, Instructions, and Response Pairs in Preference Dataset
Viaarxiv icon

Process Reinforcement through Implicit Rewards

Add code
Feb 03, 2025
Figure 1 for Process Reinforcement through Implicit Rewards
Figure 2 for Process Reinforcement through Implicit Rewards
Figure 3 for Process Reinforcement through Implicit Rewards
Figure 4 for Process Reinforcement through Implicit Rewards
Viaarxiv icon

EscapeBench: Pushing Language Models to Think Outside the Box

Add code
Dec 18, 2024
Viaarxiv icon