Picture for Hao Peng

Hao Peng

Beihang University

Low-Light Video Enhancement with An Effective Spatial-Temporal Decomposition Paradigm

Add code
Feb 09, 2026
Viaarxiv icon

WildReward: Learning Reward Models from In-the-Wild Human Interactions

Add code
Feb 09, 2026
Viaarxiv icon

Dialogue Model Optimization via Agent Game and Adaptive Tree-based GRPO

Add code
Feb 09, 2026
Viaarxiv icon

Do We Need Adam? Surprisingly Strong and Sparse Reinforcement Learning with SGD in LLMs

Add code
Feb 07, 2026
Viaarxiv icon

ReBeCA: Unveiling Interpretable Behavior Hierarchy behind the Iterative Self-Reflection of Language Models with Causal Analysis

Add code
Feb 06, 2026
Viaarxiv icon

Faithful Bi-Directional Model Steering via Distribution Matching and Distributed Interchange Interventions

Add code
Feb 05, 2026
Viaarxiv icon

Good SFT Optimizes for SFT, Better SFT Prepares for Reinforcement Learning

Add code
Feb 01, 2026
Viaarxiv icon

Latent Chain-of-Thought as Planning: Decoupling Reasoning from Verbalization

Add code
Jan 29, 2026
Viaarxiv icon

On the Paradoxical Interference between Instruction-Following and Task Solving

Add code
Jan 29, 2026
Viaarxiv icon

MuVaC: AVariational Causal Framework for Multimodal Sarcasm Understanding in Dialogues

Add code
Jan 28, 2026
Viaarxiv icon