Picture for Xiaoxue Gao

Xiaoxue Gao

Transferable Adversarial Attacks against ASR

Add code
Nov 14, 2024
Viaarxiv icon

VoiceBench: Benchmarking LLM-Based Voice Assistants

Add code
Oct 22, 2024
Viaarxiv icon

Beyond Single-Audio: Advancing Multi-Audio Processing in Audio Large Language Models

Add code
Sep 27, 2024
Figure 1 for Beyond Single-Audio: Advancing Multi-Audio Processing in Audio Large Language Models
Figure 2 for Beyond Single-Audio: Advancing Multi-Audio Processing in Audio Large Language Models
Figure 3 for Beyond Single-Audio: Advancing Multi-Audio Processing in Audio Large Language Models
Figure 4 for Beyond Single-Audio: Advancing Multi-Audio Processing in Audio Large Language Models
Viaarxiv icon

Speech-Mamba: Long-Context Speech Recognition with Selective State Spaces Models

Add code
Sep 27, 2024
Figure 1 for Speech-Mamba: Long-Context Speech Recognition with Selective State Spaces Models
Figure 2 for Speech-Mamba: Long-Context Speech Recognition with Selective State Spaces Models
Figure 3 for Speech-Mamba: Long-Context Speech Recognition with Selective State Spaces Models
Figure 4 for Speech-Mamba: Long-Context Speech Recognition with Selective State Spaces Models
Viaarxiv icon

Emo-DPO: Controllable Emotional Speech Synthesis through Direct Preference Optimization

Add code
Sep 16, 2024
Figure 1 for Emo-DPO: Controllable Emotional Speech Synthesis through Direct Preference Optimization
Figure 2 for Emo-DPO: Controllable Emotional Speech Synthesis through Direct Preference Optimization
Figure 3 for Emo-DPO: Controllable Emotional Speech Synthesis through Direct Preference Optimization
Figure 4 for Emo-DPO: Controllable Emotional Speech Synthesis through Direct Preference Optimization
Viaarxiv icon

MEDSAGE: Enhancing Robustness of Medical Dialogue Summarization to ASR Errors with LLM-generated Synthetic Dialogues

Add code
Aug 26, 2024
Figure 1 for MEDSAGE: Enhancing Robustness of Medical Dialogue Summarization to ASR Errors with LLM-generated Synthetic Dialogues
Figure 2 for MEDSAGE: Enhancing Robustness of Medical Dialogue Summarization to ASR Errors with LLM-generated Synthetic Dialogues
Figure 3 for MEDSAGE: Enhancing Robustness of Medical Dialogue Summarization to ASR Errors with LLM-generated Synthetic Dialogues
Figure 4 for MEDSAGE: Enhancing Robustness of Medical Dialogue Summarization to ASR Errors with LLM-generated Synthetic Dialogues
Viaarxiv icon

TTSlow: Slow Down Text-to-Speech with Efficiency Robustness Evaluations

Add code
Jul 02, 2024
Viaarxiv icon

Enhancing Real-World Active Speaker Detection with Multi-Modal Extraction Pre-Training

Add code
Apr 01, 2024
Figure 1 for Enhancing Real-World Active Speaker Detection with Multi-Modal Extraction Pre-Training
Figure 2 for Enhancing Real-World Active Speaker Detection with Multi-Modal Extraction Pre-Training
Figure 3 for Enhancing Real-World Active Speaker Detection with Multi-Modal Extraction Pre-Training
Figure 4 for Enhancing Real-World Active Speaker Detection with Multi-Modal Extraction Pre-Training
Viaarxiv icon

Text-guided HuBERT: Self-Supervised Speech Pre-training via Generative Adversarial Networks

Add code
Feb 28, 2024
Figure 1 for Text-guided HuBERT: Self-Supervised Speech Pre-training via Generative Adversarial Networks
Figure 2 for Text-guided HuBERT: Self-Supervised Speech Pre-training via Generative Adversarial Networks
Figure 3 for Text-guided HuBERT: Self-Supervised Speech Pre-training via Generative Adversarial Networks
Figure 4 for Text-guided HuBERT: Self-Supervised Speech Pre-training via Generative Adversarial Networks
Viaarxiv icon

Self-Transriber: Few-shot Lyrics Transcription with Self-training

Add code
Nov 18, 2022
Figure 1 for Self-Transriber: Few-shot Lyrics Transcription with Self-training
Figure 2 for Self-Transriber: Few-shot Lyrics Transcription with Self-training
Figure 3 for Self-Transriber: Few-shot Lyrics Transcription with Self-training
Figure 4 for Self-Transriber: Few-shot Lyrics Transcription with Self-training
Viaarxiv icon