Picture for Tong Che

Tong Che

Learning with a Single Rollout via Monte Carlo Pass@k Critic

Add code
Jun 24, 2026
Viaarxiv icon

Greed Is Learned: Visible Incentives as Reward-Hacking Triggers

Add code
Jun 15, 2026
Viaarxiv icon

Constitutional Value Potentials: reading and steering internal priority margins in language models

Add code
Jun 13, 2026
Viaarxiv icon

EMBER: Efficient Memory via Budgeted Evidence Retention for Long-Horizon Agents

Add code
Jun 04, 2026
Viaarxiv icon

Cast a Wider Net: Coordinated Pass@K Policy Optimization for Code Reasoning

Add code
May 26, 2026
Viaarxiv icon

Reasoning over Precedents Alongside Statutes: Case-Augmented Deliberative Alignment for LLM Safety

Add code
Jan 12, 2026
Viaarxiv icon

Your Reward Function for RL is Your Best PRM for Search: Unifying RL and Search-Based TTS

Add code
Aug 19, 2025
Viaarxiv icon

Two Heads are Better Than One: Test-time Scaling of Multi-agent Collaborative Reasoning

Add code
Apr 14, 2025
Viaarxiv icon

LoR-VP: Low-Rank Visual Prompting for Efficient Vision Model Adaptation

Add code
Feb 04, 2025
Viaarxiv icon

Learning Multiple Initial Solutions to Optimization Problems

Add code
Nov 04, 2024
Figure 1 for Learning Multiple Initial Solutions to Optimization Problems
Figure 2 for Learning Multiple Initial Solutions to Optimization Problems
Figure 3 for Learning Multiple Initial Solutions to Optimization Problems
Figure 4 for Learning Multiple Initial Solutions to Optimization Problems
Viaarxiv icon