Picture for Zehan Wang

Zehan Wang

Rate-Distortion Optimized Skip Coding of Region Adaptive Hierarchical Transform Coefficients for MPEG G-PCC

Add code
Dec 07, 2024
Viaarxiv icon

WavChat: A Survey of Spoken Dialogue Models

Add code
Nov 26, 2024
Viaarxiv icon

OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup

Add code
Oct 28, 2024
Figure 1 for OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup
Figure 2 for OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup
Figure 3 for OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup
Figure 4 for OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup
Viaarxiv icon

Improving Long-Text Alignment for Text-to-Image Diffusion Models

Add code
Oct 15, 2024
Viaarxiv icon

MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes

Add code
Oct 09, 2024
Figure 1 for MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes
Figure 2 for MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes
Figure 3 for MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes
Figure 4 for MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes
Viaarxiv icon

OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces

Add code
Jul 16, 2024
Viaarxiv icon

Frieren: Efficient Video-to-Audio Generation with Rectified Flow Matching

Add code
Jun 01, 2024
Viaarxiv icon

FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion

Add code
May 10, 2024
Viaarxiv icon

Molecule-Space: Free Lunch in Unified Multimodal Space via Knowledge Fusion

Add code
May 08, 2024
Viaarxiv icon

TransFace: Unit-Based Audio-Visual Speech Synthesizer for Talking Head Translation

Add code
Dec 23, 2023
Viaarxiv icon