Picture for Qi Zhang

Qi Zhang

School of Information, North China University of Technology

DFPO: Scaling Value Modeling via Distributional Flow towards Robust and Generalizable LLM Post-Training

Add code
Feb 05, 2026
Viaarxiv icon

Outcome Accuracy is Not Enough: Aligning the Reasoning Process of Reward Models

Add code
Feb 04, 2026
Viaarxiv icon

StagePilot: A Deep Reinforcement Learning Agent for Stage-Controlled Cybergrooming Simulation

Add code
Feb 04, 2026
Viaarxiv icon

Swordsman: Entropy-Driven Adaptive Block Partition for Efficient Diffusion Language Models

Add code
Feb 04, 2026
Viaarxiv icon

A computational account of dreaming: learning and memory consolidation

Add code
Feb 04, 2026
Viaarxiv icon

Steering LLMs via Scalable Interactive Oversight

Add code
Feb 04, 2026
Viaarxiv icon

CL-bench: A Benchmark for Context Learning

Add code
Feb 03, 2026
Viaarxiv icon

Adaptive Visual Autoregressive Acceleration via Dual-Linkage Entropy Analysis

Add code
Feb 01, 2026
Viaarxiv icon

ChartE$^{3}$: A Comprehensive Benchmark for End-to-End Chart Editing

Add code
Jan 29, 2026
Viaarxiv icon

Towards Fair Large Language Model-based Recommender Systems without Costly Retraining

Add code
Jan 24, 2026
Viaarxiv icon