Picture for Hugo Malard

Hugo Malard

TACO: Training-free Sound Prompted Segmentation via Deep Audio-visual CO-factorization

Add code
Dec 02, 2024
Viaarxiv icon

An Eye for an Ear: Zero-shot Audio Description Leveraging an Image Captioner using Audiovisual Distribution Alignment

Add code
Oct 08, 2024
Figure 1 for An Eye for an Ear: Zero-shot Audio Description Leveraging an Image Captioner using Audiovisual Distribution Alignment
Figure 2 for An Eye for an Ear: Zero-shot Audio Description Leveraging an Image Captioner using Audiovisual Distribution Alignment
Figure 3 for An Eye for an Ear: Zero-shot Audio Description Leveraging an Image Captioner using Audiovisual Distribution Alignment
Figure 4 for An Eye for an Ear: Zero-shot Audio Description Leveraging an Image Captioner using Audiovisual Distribution Alignment
Viaarxiv icon

Big model only for hard audios: Sample dependent Whisper model selection for efficient inferences

Add code
Sep 22, 2023
Viaarxiv icon