Picture for Xinxin Zhu

Xinxin Zhu

MM-LDM: Multi-Modal Latent Diffusion Model for Sounding Video Generation

Add code
Oct 02, 2024
Viaarxiv icon

COMUNI: Decomposing Common and Unique Video Signals for Diffusion-based Video Generation

Add code
Oct 02, 2024
Figure 1 for COMUNI: Decomposing Common and Unique Video Signals for Diffusion-based Video Generation
Figure 2 for COMUNI: Decomposing Common and Unique Video Signals for Diffusion-based Video Generation
Figure 3 for COMUNI: Decomposing Common and Unique Video Signals for Diffusion-based Video Generation
Figure 4 for COMUNI: Decomposing Common and Unique Video Signals for Diffusion-based Video Generation
Viaarxiv icon

Enhancing and Accelerating Large Language Models via Instruction-Aware Contextual Compression

Add code
Aug 28, 2024
Viaarxiv icon

Deep Optimal Timing Strategies for Time Series

Add code
Oct 09, 2023
Viaarxiv icon

Automatic Deduction Path Learning via Reinforcement Learning with Environmental Correction

Add code
Jun 16, 2023
Viaarxiv icon

VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset

Add code
May 29, 2023
Figure 1 for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
Figure 2 for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
Figure 3 for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
Figure 4 for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
Viaarxiv icon

ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst

Add code
May 25, 2023
Figure 1 for ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst
Figure 2 for ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst
Figure 3 for ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst
Figure 4 for ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst
Viaarxiv icon

VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset

Add code
Apr 17, 2023
Viaarxiv icon

Sounding Video Generator: A Unified Framework for Text-guided Sounding Video Generation

Add code
Mar 29, 2023
Viaarxiv icon

MOSO: Decomposing MOtion, Scene and Object for Video Prediction

Add code
Mar 16, 2023
Viaarxiv icon