Picture for Di Zhang

Di Zhang

Owl-1: Omni World Model for Consistent Long Video Generation

Add code
Dec 12, 2024
Viaarxiv icon

StyleMaster: Stylize Your Video with Artistic Generation and Translation

Add code
Dec 10, 2024
Viaarxiv icon

SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints

Add code
Dec 10, 2024
Viaarxiv icon

Breaking the Stage Barrier: A Novel Single-Stage Approach to Long Context Extension for Large Language Models

Add code
Dec 10, 2024
Viaarxiv icon

3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation

Add code
Dec 10, 2024
Viaarxiv icon

Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning

Add code
Dec 02, 2024
Viaarxiv icon

Video-Text Dataset Construction from Multi-AI Feedback: Promoting Weak-to-Strong Preference Learning for Video Large Language Models

Add code
Nov 25, 2024
Viaarxiv icon

Towards Precise Scaling Laws for Video Diffusion Transformers

Add code
Nov 25, 2024
Viaarxiv icon

VIVID-10M: A Dataset and Baseline for Versatile and Interactive Video Local Editing

Add code
Nov 22, 2024
Viaarxiv icon

MolReFlect: Towards In-Context Fine-grained Alignments between Molecules and Texts

Add code
Nov 22, 2024
Viaarxiv icon