Picture for Xiaofei Wang

Xiaofei Wang

Multi-Dimensional Insights: Benchmarking Real-World Personalization in Large Multimodal Models

Add code
Dec 17, 2024
Viaarxiv icon

Isochrony-Controlled Speech-to-Text Translation: A study on translating from Sino-Tibetan to Indo-European Languages

Add code
Nov 11, 2024
Figure 1 for Isochrony-Controlled Speech-to-Text Translation: A study on translating from Sino-Tibetan to Indo-European Languages
Figure 2 for Isochrony-Controlled Speech-to-Text Translation: A study on translating from Sino-Tibetan to Indo-European Languages
Figure 3 for Isochrony-Controlled Speech-to-Text Translation: A study on translating from Sino-Tibetan to Indo-European Languages
Figure 4 for Isochrony-Controlled Speech-to-Text Translation: A study on translating from Sino-Tibetan to Indo-European Languages
Viaarxiv icon

Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation

Add code
Sep 06, 2024
Figure 1 for Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation
Figure 2 for Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation
Figure 3 for Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation
Figure 4 for Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation
Viaarxiv icon

Exploring Robust Face-Voice Matching in Multilingual Environments

Add code
Jul 29, 2024
Viaarxiv icon

Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech

Add code
Jul 17, 2024
Figure 1 for Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech
Figure 2 for Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech
Figure 3 for Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech
Figure 4 for Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech
Viaarxiv icon

Global-Local Collaborative Inference with LLM for Lidar-Based Open-Vocabulary Detection

Add code
Jul 12, 2024
Viaarxiv icon

E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS

Add code
Jun 26, 2024
Figure 1 for E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
Figure 2 for E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
Figure 3 for E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
Figure 4 for E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
Viaarxiv icon

TacoLM: GaTed Attention Equipped Codec Language Model are Efficient Zero-Shot Text to Speech Synthesizers

Add code
Jun 22, 2024
Viaarxiv icon

Knowledge-driven Subspace Fusion and Gradient Coordination for Multi-modal Learning

Add code
Jun 20, 2024
Figure 1 for Knowledge-driven Subspace Fusion and Gradient Coordination for Multi-modal Learning
Figure 2 for Knowledge-driven Subspace Fusion and Gradient Coordination for Multi-modal Learning
Figure 3 for Knowledge-driven Subspace Fusion and Gradient Coordination for Multi-modal Learning
Figure 4 for Knowledge-driven Subspace Fusion and Gradient Coordination for Multi-modal Learning
Viaarxiv icon

Unified Modeling Enhanced Multimodal Learning for Precision Neuro-Oncology

Add code
Jun 11, 2024
Figure 1 for Unified Modeling Enhanced Multimodal Learning for Precision Neuro-Oncology
Figure 2 for Unified Modeling Enhanced Multimodal Learning for Precision Neuro-Oncology
Figure 3 for Unified Modeling Enhanced Multimodal Learning for Precision Neuro-Oncology
Figure 4 for Unified Modeling Enhanced Multimodal Learning for Precision Neuro-Oncology
Viaarxiv icon