Picture for Xin Tao

Xin Tao

Owl-1: Omni World Model for Consistent Long Video Generation

Add code
Dec 12, 2024
Viaarxiv icon

Towards Precise Scaling Laws for Video Diffusion Transformers

Add code
Nov 25, 2024
Viaarxiv icon

Koala-36M: A Large-scale Video Dataset Improving Consistency between Fine-grained Conditions and Video Content

Add code
Oct 10, 2024
Viaarxiv icon

SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMs

Add code
Aug 21, 2024
Viaarxiv icon

VideoTetris: Towards Compositional Text-to-Video Generation

Add code
Jun 06, 2024
Viaarxiv icon

SG-Adapter: Enhancing Text-to-Image Generation with Scene Graph Guidance

Add code
May 24, 2024
Viaarxiv icon

UNIAA: A Unified Multi-modal Image Aesthetic Assessment Baseline and Benchmark

Add code
Apr 15, 2024
Figure 1 for UNIAA: A Unified Multi-modal Image Aesthetic Assessment Baseline and Benchmark
Figure 2 for UNIAA: A Unified Multi-modal Image Aesthetic Assessment Baseline and Benchmark
Figure 3 for UNIAA: A Unified Multi-modal Image Aesthetic Assessment Baseline and Benchmark
Figure 4 for UNIAA: A Unified Multi-modal Image Aesthetic Assessment Baseline and Benchmark
Viaarxiv icon

Perception-Oriented Video Frame Interpolation via Asymmetric Blending

Add code
Apr 10, 2024
Viaarxiv icon

Motion Inversion for Video Customization

Add code
Mar 29, 2024
Viaarxiv icon

DVIS++: Improved Decoupled Framework for Universal Video Segmentation

Add code
Dec 20, 2023
Viaarxiv icon