Picture for Pengfei Wan

Pengfei Wan

TimeChat-Captioner: Scripting Multi-Scene Videos with Time-Aware and Structural Audio-Visual Captions

Add code
Feb 09, 2026
Viaarxiv icon

IM-Animation: An Implicit Motion Representation for Identity-decoupled Character Animation

Add code
Feb 07, 2026
Viaarxiv icon

CineScene: Implicit 3D as Effective Scene Representation for Cinematic Video Generation

Add code
Feb 06, 2026
Viaarxiv icon

Stable Velocity: A Variance Perspective on Flow Matching

Add code
Feb 05, 2026
Viaarxiv icon

OmniSIFT: Modality-Asymmetric Token Compression for Efficient Omni-modal Large Language Models

Add code
Feb 04, 2026
Viaarxiv icon

Semantic Routing: Exploring Multi-Layer LLM Feature Weighting for Diffusion Transformers

Add code
Feb 03, 2026
Viaarxiv icon

3D-Aware Implicit Motion Control for View-Adaptive Human Video Generation

Add code
Feb 03, 2026
Viaarxiv icon

Research on World Models Is Not Merely Injecting World Knowledge into Specific Tasks

Add code
Feb 02, 2026
Viaarxiv icon

DiaDem: Advancing Dialogue Descriptions in Audiovisual Video Captioning for Multimodal Large Language Models

Add code
Jan 27, 2026
Viaarxiv icon

SALAD: Achieve High-Sparsity Attention via Efficient Linear Attention Tuning for Video Diffusion Transformer

Add code
Jan 23, 2026
Viaarxiv icon