Picture for Mingzhen Sun

Mingzhen Sun

MM-LDM: Multi-Modal Latent Diffusion Model for Sounding Video Generation

Add code
Oct 02, 2024
Viaarxiv icon

COMUNI: Decomposing Common and Unique Video Signals for Diffusion-based Video Generation

Add code
Oct 02, 2024
Figure 1 for COMUNI: Decomposing Common and Unique Video Signals for Diffusion-based Video Generation
Figure 2 for COMUNI: Decomposing Common and Unique Video Signals for Diffusion-based Video Generation
Figure 3 for COMUNI: Decomposing Common and Unique Video Signals for Diffusion-based Video Generation
Figure 4 for COMUNI: Decomposing Common and Unique Video Signals for Diffusion-based Video Generation
Viaarxiv icon

VL-Mamba: Exploring State Space Models for Multimodal Learning

Add code
Mar 20, 2024
Viaarxiv icon

GLOBER: Coherent Non-autoregressive Video Generation via GLOBal Guided Video DecodER

Add code
Sep 23, 2023
Viaarxiv icon

VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset

Add code
May 29, 2023
Figure 1 for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
Figure 2 for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
Figure 3 for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
Figure 4 for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
Viaarxiv icon

MOSO: Decomposing MOtion, Scene and Object for Video Prediction

Add code
Mar 16, 2023
Viaarxiv icon

OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation

Add code
Jul 06, 2021
Figure 1 for OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation
Figure 2 for OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation
Figure 3 for OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation
Figure 4 for OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation
Viaarxiv icon