Picture for Oriol Nieto

Oriol Nieto

SILA: Signal-to-Language Augmentation for Enhanced Control in Text-to-Audio Generation

Add code
Dec 13, 2024
Viaarxiv icon

Sketch2Sound: Controllable Audio Generation via Time-Varying Signals and Sonic Imitations

Add code
Dec 11, 2024
Viaarxiv icon

Video-Guided Foley Sound Generation with Multimodal Controls

Add code
Nov 26, 2024
Viaarxiv icon

MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark

Add code
Oct 24, 2024
Viaarxiv icon

Augment, Drop & Swap: Improving Diversity in LLM Captions for Efficient Music-Text Representation Learning

Add code
Sep 17, 2024
Figure 1 for Augment, Drop & Swap: Improving Diversity in LLM Captions for Efficient Music-Text Representation Learning
Figure 2 for Augment, Drop & Swap: Improving Diversity in LLM Captions for Efficient Music-Text Representation Learning
Figure 3 for Augment, Drop & Swap: Improving Diversity in LLM Captions for Efficient Music-Text Representation Learning
Figure 4 for Augment, Drop & Swap: Improving Diversity in LLM Captions for Efficient Music-Text Representation Learning
Viaarxiv icon

ReCLAP: Improving Zero Shot Audio Classification by Describing Sounds

Add code
Sep 13, 2024
Figure 1 for ReCLAP: Improving Zero Shot Audio Classification by Describing Sounds
Figure 2 for ReCLAP: Improving Zero Shot Audio Classification by Describing Sounds
Figure 3 for ReCLAP: Improving Zero Shot Audio Classification by Describing Sounds
Figure 4 for ReCLAP: Improving Zero Shot Audio Classification by Describing Sounds
Viaarxiv icon

GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities

Add code
Jun 17, 2024
Viaarxiv icon

VDGD: Mitigating LVLM Hallucinations in Cognitive Prompts by Bridging the Visual Perception Gap

Add code
May 24, 2024
Viaarxiv icon

CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models

Add code
Oct 12, 2023
Viaarxiv icon

Bridging High-Quality Audio and Video via Language for Sound Effects Retrieval from Visual Queries

Add code
Aug 17, 2023
Figure 1 for Bridging High-Quality Audio and Video via Language for Sound Effects Retrieval from Visual Queries
Figure 2 for Bridging High-Quality Audio and Video via Language for Sound Effects Retrieval from Visual Queries
Figure 3 for Bridging High-Quality Audio and Video via Language for Sound Effects Retrieval from Visual Queries
Viaarxiv icon