Picture for Erik Visser

Erik Visser

Comprehensive Audio Query Handling System with Integrated Expert Models and Contextual Understanding

Add code
Dec 05, 2024
Viaarxiv icon

Confidence Calibration for Audio Captioning Models

Add code
Sep 13, 2024
Figure 1 for Confidence Calibration for Audio Captioning Models
Figure 2 for Confidence Calibration for Audio Captioning Models
Figure 3 for Confidence Calibration for Audio Captioning Models
Figure 4 for Confidence Calibration for Audio Captioning Models
Viaarxiv icon

VC-ENHANCE: Speech Restoration with Integrated Noise Suppression and Voice Conversion

Add code
Sep 10, 2024
Viaarxiv icon

Enhancing Temporal Understanding in Audio Question Answering for Large Audio Language Models

Add code
Sep 10, 2024
Viaarxiv icon

Stylebook: Content-Dependent Speaking Style Modeling for Any-to-Any Voice Conversion using Only Speech Data

Add code
Sep 12, 2023
Viaarxiv icon

Highly Controllable Diffusion-based Any-to-Any Voice Conversion Model with Frame-level Prosody Feature

Add code
Sep 06, 2023
Figure 1 for Highly Controllable Diffusion-based Any-to-Any Voice Conversion Model with Frame-level Prosody Feature
Figure 2 for Highly Controllable Diffusion-based Any-to-Any Voice Conversion Model with Frame-level Prosody Feature
Figure 3 for Highly Controllable Diffusion-based Any-to-Any Voice Conversion Model with Frame-level Prosody Feature
Figure 4 for Highly Controllable Diffusion-based Any-to-Any Voice Conversion Model with Frame-level Prosody Feature
Viaarxiv icon

Parameter Efficient Audio Captioning With Faithful Guidance Using Audio-text Shared Latent Representation

Add code
Sep 06, 2023
Figure 1 for Parameter Efficient Audio Captioning With Faithful Guidance Using Audio-text Shared Latent Representation
Figure 2 for Parameter Efficient Audio Captioning With Faithful Guidance Using Audio-text Shared Latent Representation
Figure 3 for Parameter Efficient Audio Captioning With Faithful Guidance Using Audio-text Shared Latent Representation
Figure 4 for Parameter Efficient Audio Captioning With Faithful Guidance Using Audio-text Shared Latent Representation
Viaarxiv icon

Improved Beam Search for Hallucination Mitigation in Abstractive Summarization

Add code
Dec 06, 2022
Viaarxiv icon

Application of Knowledge Distillation to Multi-task Speech Representation Learning

Add code
Oct 29, 2022
Viaarxiv icon

Activity report analysis with automatic single or multispan answer extraction

Add code
Sep 09, 2022
Figure 1 for Activity report analysis with automatic single or multispan answer extraction
Figure 2 for Activity report analysis with automatic single or multispan answer extraction
Figure 3 for Activity report analysis with automatic single or multispan answer extraction
Figure 4 for Activity report analysis with automatic single or multispan answer extraction
Viaarxiv icon