Picture for Siqi Zheng

Siqi Zheng

Massachusetts Institute of Technology

OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup

Add code
Oct 28, 2024
Figure 1 for OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup
Figure 2 for OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup
Figure 3 for OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup
Figure 4 for OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup
Viaarxiv icon

OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation

Add code
Oct 23, 2024
Figure 1 for OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation
Figure 2 for OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation
Figure 3 for OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation
Figure 4 for OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation
Viaarxiv icon

MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization

Add code
Oct 16, 2024
Viaarxiv icon

WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling

Add code
Aug 29, 2024
Figure 1 for WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Figure 2 for WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Figure 3 for WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Figure 4 for WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Viaarxiv icon

Integrating Audio, Visual, and Semantic Information for Enhanced Multimodal Speaker Diarization

Add code
Aug 22, 2024
Viaarxiv icon

CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens

Add code
Jul 09, 2024
Viaarxiv icon

Accompanied Singing Voice Synthesis with Fully Text-controlled Melody

Add code
Jul 02, 2024
Viaarxiv icon

Skip-Layer Attention: Bridging Abstract and Detailed Dependencies in Transformers

Add code
Jun 17, 2024
Figure 1 for Skip-Layer Attention: Bridging Abstract and Detailed Dependencies in Transformers
Figure 2 for Skip-Layer Attention: Bridging Abstract and Detailed Dependencies in Transformers
Figure 3 for Skip-Layer Attention: Bridging Abstract and Detailed Dependencies in Transformers
Figure 4 for Skip-Layer Attention: Bridging Abstract and Detailed Dependencies in Transformers
Viaarxiv icon

Self-Distillation Prototypes Network: Learning Robust Speaker Representations without Supervision

Add code
Jun 17, 2024
Viaarxiv icon

ERes2NetV2: Boosting Short-Duration Speaker Verification Performance with Computational Efficiency

Add code
Jun 04, 2024
Viaarxiv icon