Picture for Pengfei Wan

Pengfei Wan

Kling-Avatar: Grounding Multimodal Instructions for Cascaded Long-Duration Avatar Animation Synthesis

Add code
Sep 11, 2025
Viaarxiv icon

Easier Painting Than Thinking: Can Text-to-Image Models Set the Stage, but Not Direct the Play?

Add code
Sep 03, 2025
Viaarxiv icon

MIDAS: Multimodal Interactive Digital-humAn Synthesis via Real-time Autoregressive Video Generation

Add code
Aug 28, 2025
Viaarxiv icon

Score Augmentation for Diffusion Models

Add code
Aug 11, 2025
Viaarxiv icon

DiffCap: Diffusion-based Real-time Human Motion Capture using Sparse IMUs and a Monocular Camera

Add code
Aug 08, 2025
Viaarxiv icon

Imbalance in Balance: Online Concept Balancing in Generation Models

Add code
Jul 17, 2025
Viaarxiv icon

GGTalker: Talking Head Systhesis with Generalizable Gaussian Priors and Identity-Specific Adaptation

Add code
Jun 26, 2025
Viaarxiv icon

SimpleGVR: A Simple Baseline for Latent-Cascaded Video Super-Resolution

Add code
Jun 24, 2025
Viaarxiv icon

FilMaster: Bridging Cinematic Principles and Generative AI for Automated Film Generation

Add code
Jun 23, 2025
Viaarxiv icon

VersaVid-R1: A Versatile Video Understanding and Reasoning Model from Question Answering to Captioning Tasks

Add code
Jun 10, 2025
Viaarxiv icon