Picture for Zhang-Wei Hong

Zhang-Wei Hong

A Reward-Free Viewpoint on Multi-Objective Reinforcement Learning

Add code
Apr 27, 2026
Viaarxiv icon

Decocted Experience Improves Test-Time Inference in LLM Agents

Add code
Apr 06, 2026
Viaarxiv icon

Pushing Forward Pareto Frontiers of Proactive Agents with Behavioral Agentic Optimization

Add code
Feb 11, 2026
Viaarxiv icon

BOAD: Discovering Hierarchical Software Engineering Agents via Bandit Optimization

Add code
Dec 29, 2025
Viaarxiv icon

Tailored Primitive Initialization is the Secret Key to Reinforcement Learning

Add code
Nov 16, 2025
Figure 1 for Tailored Primitive Initialization is the Secret Key to Reinforcement Learning
Figure 2 for Tailored Primitive Initialization is the Secret Key to Reinforcement Learning
Figure 3 for Tailored Primitive Initialization is the Secret Key to Reinforcement Learning
Figure 4 for Tailored Primitive Initialization is the Secret Key to Reinforcement Learning
Viaarxiv icon

ReGen: Generative Robot Simulation via Inverse Design

Add code
Nov 06, 2025
Viaarxiv icon

Your Reward Function for RL is Your Best PRM for Search: Unifying RL and Search-Based TTS

Add code
Aug 19, 2025
Viaarxiv icon

Satori-SWE: Evolutionary Test-Time Scaling for Sample-Efficient Software Engineering

Add code
May 29, 2025
Figure 1 for Satori-SWE: Evolutionary Test-Time Scaling for Sample-Efficient Software Engineering
Figure 2 for Satori-SWE: Evolutionary Test-Time Scaling for Sample-Efficient Software Engineering
Figure 3 for Satori-SWE: Evolutionary Test-Time Scaling for Sample-Efficient Software Engineering
Figure 4 for Satori-SWE: Evolutionary Test-Time Scaling for Sample-Efficient Software Engineering
Viaarxiv icon

RL Tango: Reinforcing Generator and Verifier Together for Language Reasoning

Add code
May 21, 2025
Viaarxiv icon

Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search

Add code
Feb 04, 2025
Figure 1 for Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search
Figure 2 for Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search
Figure 3 for Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search
Figure 4 for Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search
Viaarxiv icon