Picture for Zunnan Xu

Zunnan Xu

Refer to the report for detailed contributions

Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation

Add code
Apr 03, 2025
Viaarxiv icon

FireEdit: Fine-grained Instruction-based Image Editing via Region-aware Vision Language Model

Add code
Mar 25, 2025
Viaarxiv icon

HunyuanPortrait: Implicit Condition Control for Enhanced Portrait Animation

Add code
Mar 25, 2025
Viaarxiv icon

Densely Connected Parameter-Efficient Tuning for Referring Image Segmentation

Add code
Jan 15, 2025
Viaarxiv icon

HunyuanVideo: A Systematic Framework For Large Video Generative Models

Add code
Dec 03, 2024
Figure 1 for HunyuanVideo: A Systematic Framework For Large Video Generative Models
Figure 2 for HunyuanVideo: A Systematic Framework For Large Video Generative Models
Figure 3 for HunyuanVideo: A Systematic Framework For Large Video Generative Models
Figure 4 for HunyuanVideo: A Systematic Framework For Large Video Generative Models
Viaarxiv icon

AToM: Aligning Text-to-Motion Model at Event-Level with GPT-4Vision Reward

Add code
Nov 27, 2024
Figure 1 for AToM: Aligning Text-to-Motion Model at Event-Level with GPT-4Vision Reward
Figure 2 for AToM: Aligning Text-to-Motion Model at Event-Level with GPT-4Vision Reward
Figure 3 for AToM: Aligning Text-to-Motion Model at Event-Level with GPT-4Vision Reward
Figure 4 for AToM: Aligning Text-to-Motion Model at Event-Level with GPT-4Vision Reward
Viaarxiv icon

Alignment is All You Need: A Training-free Augmentation Strategy for Pose-guided Video Generation

Add code
Aug 29, 2024
Figure 1 for Alignment is All You Need: A Training-free Augmentation Strategy for Pose-guided Video Generation
Figure 2 for Alignment is All You Need: A Training-free Augmentation Strategy for Pose-guided Video Generation
Figure 3 for Alignment is All You Need: A Training-free Augmentation Strategy for Pose-guided Video Generation
Figure 4 for Alignment is All You Need: A Training-free Augmentation Strategy for Pose-guided Video Generation
Viaarxiv icon

V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results

Add code
Jun 17, 2024
Figure 1 for V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results
Figure 2 for V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results
Figure 3 for V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results
Viaarxiv icon

REPARO: Compositional 3D Assets Generation with Differentiable 3D Layout Alignment

Add code
May 28, 2024
Figure 1 for REPARO: Compositional 3D Assets Generation with Differentiable 3D Layout Alignment
Figure 2 for REPARO: Compositional 3D Assets Generation with Differentiable 3D Layout Alignment
Figure 3 for REPARO: Compositional 3D Assets Generation with Differentiable 3D Layout Alignment
Figure 4 for REPARO: Compositional 3D Assets Generation with Differentiable 3D Layout Alignment
Viaarxiv icon

Sparse-Tuning: Adapting Vision Transformers with Efficient Fine-tuning and Inference

Add code
May 23, 2024
Figure 1 for Sparse-Tuning: Adapting Vision Transformers with Efficient Fine-tuning and Inference
Figure 2 for Sparse-Tuning: Adapting Vision Transformers with Efficient Fine-tuning and Inference
Figure 3 for Sparse-Tuning: Adapting Vision Transformers with Efficient Fine-tuning and Inference
Figure 4 for Sparse-Tuning: Adapting Vision Transformers with Efficient Fine-tuning and Inference
Viaarxiv icon