Picture for Zhenyu Tang

Zhenyu Tang

Open-Sora Plan: Open-Source Large Video Generation Model

Add code
Nov 28, 2024
Figure 1 for Open-Sora Plan: Open-Source Large Video Generation Model
Figure 2 for Open-Sora Plan: Open-Source Large Video Generation Model
Figure 3 for Open-Sora Plan: Open-Source Large Video Generation Model
Figure 4 for Open-Sora Plan: Open-Source Large Video Generation Model
Viaarxiv icon

BrainMVP: Multi-modal Vision Pre-training for Brain Image Analysis using Multi-parametric MRI

Add code
Oct 14, 2024
Figure 1 for BrainMVP: Multi-modal Vision Pre-training for Brain Image Analysis using Multi-parametric MRI
Figure 2 for BrainMVP: Multi-modal Vision Pre-training for Brain Image Analysis using Multi-parametric MRI
Figure 3 for BrainMVP: Multi-modal Vision Pre-training for Brain Image Analysis using Multi-parametric MRI
Figure 4 for BrainMVP: Multi-modal Vision Pre-training for Brain Image Analysis using Multi-parametric MRI
Viaarxiv icon

Cycle3D: High-quality and Consistent Image-to-3D Generation via Generation-Reconstruction Cycle

Add code
Jul 28, 2024
Viaarxiv icon

ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Add code
Jun 06, 2024
Figure 1 for ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
Figure 2 for ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
Figure 3 for ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
Figure 4 for ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
Viaarxiv icon

VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving Zero-Shot Voice Editing

Add code
Apr 11, 2024
Viaarxiv icon

Envision3D: One Image to 3D with Anchor Views Interpolation

Add code
Mar 13, 2024
Viaarxiv icon

LLMBind: A Unified Modality-Task Integration Framework

Add code
Mar 08, 2024
Viaarxiv icon

MoE-LLaVA: Mixture of Experts for Large Vision-Language Models

Add code
Feb 04, 2024
Figure 1 for MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Figure 2 for MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Figure 3 for MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Figure 4 for MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Viaarxiv icon

Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable 2D Repainting

Add code
Dec 27, 2023
Figure 1 for Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable 2D Repainting
Figure 2 for Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable 2D Repainting
Figure 3 for Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable 2D Repainting
Figure 4 for Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable 2D Repainting
Viaarxiv icon

RH20T: A Robotic Dataset for Learning Diverse Skills in One-Shot

Add code
Jul 02, 2023
Viaarxiv icon