Picture for Jialong Zuo

Jialong Zuo

Speech Watermarking with Discrete Intermediate Representations

Add code
Dec 18, 2024
Viaarxiv icon

Holmes-VAU: Towards Long-term Video Anomaly Understanding at Any Granularity

Add code
Dec 09, 2024
Viaarxiv icon

WavChat: A Survey of Spoken Dialogue Models

Add code
Nov 26, 2024
Figure 1 for WavChat: A Survey of Spoken Dialogue Models
Figure 2 for WavChat: A Survey of Spoken Dialogue Models
Figure 3 for WavChat: A Survey of Spoken Dialogue Models
Figure 4 for WavChat: A Survey of Spoken Dialogue Models
Viaarxiv icon

OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup

Add code
Oct 28, 2024
Figure 1 for OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup
Figure 2 for OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup
Figure 3 for OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup
Figure 4 for OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup
Viaarxiv icon

Cross-video Identity Correlating for Person Re-identification Pre-training

Add code
Sep 27, 2024
Viaarxiv icon

WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling

Add code
Aug 29, 2024
Figure 1 for WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Figure 2 for WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Figure 3 for WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Figure 4 for WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Viaarxiv icon

MSceneSpeech: A Multi-Scene Speech Dataset For Expressive Speech Synthesis

Add code
Jul 19, 2024
Figure 1 for MSceneSpeech: A Multi-Scene Speech Dataset For Expressive Speech Synthesis
Figure 2 for MSceneSpeech: A Multi-Scene Speech Dataset For Expressive Speech Synthesis
Figure 3 for MSceneSpeech: A Multi-Scene Speech Dataset For Expressive Speech Synthesis
Figure 4 for MSceneSpeech: A Multi-Scene Speech Dataset For Expressive Speech Synthesis
Viaarxiv icon

ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling

Add code
Jun 25, 2024
Figure 1 for ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling
Figure 2 for ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling
Figure 3 for ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling
Figure 4 for ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling
Viaarxiv icon

Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM

Add code
Jun 18, 2024
Viaarxiv icon

ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec

Add code
Jun 03, 2024
Viaarxiv icon