Picture for Erik Visser

Erik Visser

Aligning Audio Captions with Human Preferences

Add code
Sep 18, 2025
Viaarxiv icon

Mitigating Intra-Speaker Variability in Diarization with Style-Controllable Speech Augmentation

Add code
Sep 18, 2025
Viaarxiv icon

Spatial Audio Motion Understanding and Reasoning

Add code
Sep 18, 2025
Viaarxiv icon

Voice-ENHANCE: Speech Restoration using a Diffusion-based Voice Conversion Framework

Add code
May 21, 2025
Viaarxiv icon

Comprehensive Audio Query Handling System with Integrated Expert Models and Contextual Understanding

Add code
Dec 05, 2024
Figure 1 for Comprehensive Audio Query Handling System with Integrated Expert Models and Contextual Understanding
Figure 2 for Comprehensive Audio Query Handling System with Integrated Expert Models and Contextual Understanding
Figure 3 for Comprehensive Audio Query Handling System with Integrated Expert Models and Contextual Understanding
Figure 4 for Comprehensive Audio Query Handling System with Integrated Expert Models and Contextual Understanding
Viaarxiv icon

Confidence Calibration for Audio Captioning Models

Add code
Sep 13, 2024
Figure 1 for Confidence Calibration for Audio Captioning Models
Figure 2 for Confidence Calibration for Audio Captioning Models
Figure 3 for Confidence Calibration for Audio Captioning Models
Figure 4 for Confidence Calibration for Audio Captioning Models
Viaarxiv icon

Enhancing Temporal Understanding in Audio Question Answering for Large Audio Language Models

Add code
Sep 10, 2024
Viaarxiv icon

VC-ENHANCE: Speech Restoration with Integrated Noise Suppression and Voice Conversion

Add code
Sep 10, 2024
Viaarxiv icon

Stylebook: Content-Dependent Speaking Style Modeling for Any-to-Any Voice Conversion using Only Speech Data

Add code
Sep 12, 2023
Figure 1 for Stylebook: Content-Dependent Speaking Style Modeling for Any-to-Any Voice Conversion using Only Speech Data
Figure 2 for Stylebook: Content-Dependent Speaking Style Modeling for Any-to-Any Voice Conversion using Only Speech Data
Figure 3 for Stylebook: Content-Dependent Speaking Style Modeling for Any-to-Any Voice Conversion using Only Speech Data
Figure 4 for Stylebook: Content-Dependent Speaking Style Modeling for Any-to-Any Voice Conversion using Only Speech Data
Viaarxiv icon

Parameter Efficient Audio Captioning With Faithful Guidance Using Audio-text Shared Latent Representation

Add code
Sep 06, 2023
Figure 1 for Parameter Efficient Audio Captioning With Faithful Guidance Using Audio-text Shared Latent Representation
Figure 2 for Parameter Efficient Audio Captioning With Faithful Guidance Using Audio-text Shared Latent Representation
Figure 3 for Parameter Efficient Audio Captioning With Faithful Guidance Using Audio-text Shared Latent Representation
Figure 4 for Parameter Efficient Audio Captioning With Faithful Guidance Using Audio-text Shared Latent Representation
Viaarxiv icon