Picture for Tianrui Wang

Tianrui Wang

Towards Fine-Grained and Multi-Granular Contrastive Language-Speech Pre-training

Add code
Jan 06, 2026
Viaarxiv icon

POTSA: A Cross-Lingual Speech Alignment Framework for Low Resource Speech-to-Text Translation

Add code
Nov 12, 2025
Viaarxiv icon

ASDA: Audio Spectrogram Differential Attention Mechanism for Self-Supervised Representation Learning

Add code
Jul 03, 2025
Figure 1 for ASDA: Audio Spectrogram Differential Attention Mechanism for Self-Supervised Representation Learning
Figure 2 for ASDA: Audio Spectrogram Differential Attention Mechanism for Self-Supervised Representation Learning
Figure 3 for ASDA: Audio Spectrogram Differential Attention Mechanism for Self-Supervised Representation Learning
Figure 4 for ASDA: Audio Spectrogram Differential Attention Mechanism for Self-Supervised Representation Learning
Viaarxiv icon

MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix

Add code
May 19, 2025
Figure 1 for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
Figure 2 for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
Figure 3 for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
Figure 4 for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
Viaarxiv icon

EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting

Add code
Apr 22, 2025
Viaarxiv icon

Characteristic-Specific Partial Fine-Tuning for Efficient Emotion and Speaker Adaptation in Codec Language Text-to-Speech Models

Add code
Jan 24, 2025
Figure 1 for Characteristic-Specific Partial Fine-Tuning for Efficient Emotion and Speaker Adaptation in Codec Language Text-to-Speech Models
Figure 2 for Characteristic-Specific Partial Fine-Tuning for Efficient Emotion and Speaker Adaptation in Codec Language Text-to-Speech Models
Figure 3 for Characteristic-Specific Partial Fine-Tuning for Efficient Emotion and Speaker Adaptation in Codec Language Text-to-Speech Models
Figure 4 for Characteristic-Specific Partial Fine-Tuning for Efficient Emotion and Speaker Adaptation in Codec Language Text-to-Speech Models
Viaarxiv icon

Reducing the Gap Between Pretrained Speech Enhancement and Recognition Models Using a Real Speech-Trained Bridging Module

Add code
Jan 05, 2025
Figure 1 for Reducing the Gap Between Pretrained Speech Enhancement and Recognition Models Using a Real Speech-Trained Bridging Module
Figure 2 for Reducing the Gap Between Pretrained Speech Enhancement and Recognition Models Using a Real Speech-Trained Bridging Module
Figure 3 for Reducing the Gap Between Pretrained Speech Enhancement and Recognition Models Using a Real Speech-Trained Bridging Module
Viaarxiv icon

Time-Graph Frequency Representation with Singular Value Decomposition for Neural Speech Enhancement

Add code
Dec 24, 2024
Viaarxiv icon

Adapting Whisper for Code-Switching through Encoding Refining and Language-Aware Decoding

Add code
Dec 24, 2024
Figure 1 for Adapting Whisper for Code-Switching through Encoding Refining and Language-Aware Decoding
Figure 2 for Adapting Whisper for Code-Switching through Encoding Refining and Language-Aware Decoding
Figure 3 for Adapting Whisper for Code-Switching through Encoding Refining and Language-Aware Decoding
Viaarxiv icon

Mamba-SEUNet: Mamba UNet for Monaural Speech Enhancement

Add code
Dec 21, 2024
Viaarxiv icon