Picture for Lunjun Zhang

Lunjun Zhang

Evolutionary System Prompt Learning can Facilitate Reinforcement Learning for LLMs

Add code
Feb 16, 2026
Viaarxiv icon

EMA Policy Gradient: Taming Reinforcement Learning for LLMs with EMA Anchor and Top-k KL

Add code
Feb 04, 2026
Viaarxiv icon

Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction

Add code
Jun 09, 2025
Figure 1 for Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction
Figure 2 for Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction
Figure 3 for Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction
Figure 4 for Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction
Viaarxiv icon

Learning to Drive via Asymmetric Self-Play

Add code
Sep 26, 2024
Figure 1 for Learning to Drive via Asymmetric Self-Play
Figure 2 for Learning to Drive via Asymmetric Self-Play
Figure 3 for Learning to Drive via Asymmetric Self-Play
Figure 4 for Learning to Drive via Asymmetric Self-Play
Viaarxiv icon

Generative Verifiers: Reward Modeling as Next-Token Prediction

Add code
Aug 27, 2024
Figure 1 for Generative Verifiers: Reward Modeling as Next-Token Prediction
Figure 2 for Generative Verifiers: Reward Modeling as Next-Token Prediction
Figure 3 for Generative Verifiers: Reward Modeling as Next-Token Prediction
Figure 4 for Generative Verifiers: Reward Modeling as Next-Token Prediction
Viaarxiv icon

Towards Unsupervised Object Detection From LiDAR Point Clouds

Add code
Nov 03, 2023
Figure 1 for Towards Unsupervised Object Detection From LiDAR Point Clouds
Figure 2 for Towards Unsupervised Object Detection From LiDAR Point Clouds
Figure 3 for Towards Unsupervised Object Detection From LiDAR Point Clouds
Figure 4 for Towards Unsupervised Object Detection From LiDAR Point Clouds
Viaarxiv icon

Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion

Add code
Nov 02, 2023
Figure 1 for Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion
Figure 2 for Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion
Figure 3 for Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion
Figure 4 for Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion
Viaarxiv icon

Learning Realistic Traffic Agents in Closed-loop

Add code
Nov 02, 2023
Viaarxiv icon

Understanding Hindsight Goal Relabeling Requires Rethinking Divergence Minimization

Add code
Sep 26, 2022
Figure 1 for Understanding Hindsight Goal Relabeling Requires Rethinking Divergence Minimization
Figure 2 for Understanding Hindsight Goal Relabeling Requires Rethinking Divergence Minimization
Figure 3 for Understanding Hindsight Goal Relabeling Requires Rethinking Divergence Minimization
Figure 4 for Understanding Hindsight Goal Relabeling Requires Rethinking Divergence Minimization
Viaarxiv icon

World Model as a Graph: Learning Latent Landmarks for Planning

Add code
Nov 25, 2020
Figure 1 for World Model as a Graph: Learning Latent Landmarks for Planning
Figure 2 for World Model as a Graph: Learning Latent Landmarks for Planning
Figure 3 for World Model as a Graph: Learning Latent Landmarks for Planning
Figure 4 for World Model as a Graph: Learning Latent Landmarks for Planning
Viaarxiv icon