Picture for Shaoning Sun

Shaoning Sun

ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection

Add code
Jan 14, 2026
Viaarxiv icon

Reward Modeling from Natural Language Human Feedback

Add code
Jan 12, 2026
Viaarxiv icon

Distributional Clarity: The Hidden Driver of RL-Friendliness in Large Language Models

Add code
Jan 11, 2026
Viaarxiv icon

S2J: Bridging the Gap Between Solving and Judging Ability in Generative Reward Models

Add code
Sep 26, 2025
Figure 1 for S2J: Bridging the Gap Between Solving and Judging Ability in Generative Reward Models
Figure 2 for S2J: Bridging the Gap Between Solving and Judging Ability in Generative Reward Models
Figure 3 for S2J: Bridging the Gap Between Solving and Judging Ability in Generative Reward Models
Figure 4 for S2J: Bridging the Gap Between Solving and Judging Ability in Generative Reward Models
Viaarxiv icon

Improve LLM-as-a-Judge Ability as a General Ability

Add code
Feb 17, 2025
Figure 1 for Improve LLM-as-a-Judge Ability as a General Ability
Figure 2 for Improve LLM-as-a-Judge Ability as a General Ability
Figure 3 for Improve LLM-as-a-Judge Ability as a General Ability
Figure 4 for Improve LLM-as-a-Judge Ability as a General Ability
Viaarxiv icon