Picture for Hao Peng

Hao Peng

Beihang University

Faithful Bi-Directional Model Steering via Distribution Matching and Distributed Interchange Interventions

Add code
Feb 05, 2026
Viaarxiv icon

Good SFT Optimizes for SFT, Better SFT Prepares for Reinforcement Learning

Add code
Feb 01, 2026
Viaarxiv icon

Latent Chain-of-Thought as Planning: Decoupling Reasoning from Verbalization

Add code
Jan 29, 2026
Viaarxiv icon

On the Paradoxical Interference between Instruction-Following and Task Solving

Add code
Jan 29, 2026
Viaarxiv icon

MuVaC: AVariational Causal Framework for Multimodal Sarcasm Understanding in Dialogues

Add code
Jan 28, 2026
Viaarxiv icon

Gaming the Judge: Unfaithful Chain-of-Thought Can Undermine Agent Evaluation

Add code
Jan 21, 2026
Viaarxiv icon

Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning

Add code
Jan 11, 2026
Viaarxiv icon

Generalization of RLVR Using Causal Reasoning as a Testbed

Add code
Dec 23, 2025
Viaarxiv icon

MixKVQ: Query-Aware Mixed-Precision KV Cache Quantization for Long-Context Reasoning

Add code
Dec 22, 2025
Viaarxiv icon

Adaptation of Agentic AI

Add code
Dec 22, 2025
Figure 1 for Adaptation of Agentic AI
Figure 2 for Adaptation of Agentic AI
Figure 3 for Adaptation of Agentic AI
Figure 4 for Adaptation of Agentic AI
Viaarxiv icon