Picture for Arda Senocak

Arda Senocak

Sound2Vision: Generating Diverse Visuals from Audio through Cross-Modal Latent Alignment

Add code
Dec 09, 2024
Viaarxiv icon

AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models

Add code
Oct 23, 2024
Figure 1 for AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models
Figure 2 for AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models
Figure 3 for AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models
Figure 4 for AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models
Viaarxiv icon

Aligning Sight and Sound: Advanced Sound Source Localization Through Audio-Visual Alignment

Add code
Jul 18, 2024
Viaarxiv icon

ElasticAST: An Audio Spectrogram Transformer for All Length and Resolutions

Add code
Jul 11, 2024
Viaarxiv icon

Audio Mamba: Bidirectional State Space Model for Audio Representation Learning

Add code
Jun 05, 2024
Viaarxiv icon

From Coarse to Fine: Efficient Training for Audio Spectrogram Transformers

Add code
Jan 16, 2024
Viaarxiv icon

Can CLIP Help Sound Source Localization?

Add code
Nov 07, 2023
Viaarxiv icon

Sound Source Localization is All about Cross-Modal Alignment

Add code
Sep 19, 2023
Viaarxiv icon

FlexiAST: Flexibility is What AST Needs

Add code
Jul 18, 2023
Viaarxiv icon

Hindi as a Second Language: Improving Visually Grounded Speech with Semantically Similar Samples

Add code
Mar 30, 2023
Viaarxiv icon