Picture for Joon Son Chung

Joon Son Chung

AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models

Add code
Oct 23, 2024
Viaarxiv icon

Let Me Finish My Sentence: Video Temporal Grounding with Holistic Text Understanding

Add code
Oct 17, 2024
Figure 1 for Let Me Finish My Sentence: Video Temporal Grounding with Holistic Text Understanding
Figure 2 for Let Me Finish My Sentence: Video Temporal Grounding with Holistic Text Understanding
Figure 3 for Let Me Finish My Sentence: Video Temporal Grounding with Holistic Text Understanding
Figure 4 for Let Me Finish My Sentence: Video Temporal Grounding with Holistic Text Understanding
Viaarxiv icon

Accelerating Codec-based Speech Synthesis with Multi-Token Prediction and Speculative Decoding

Add code
Oct 17, 2024
Viaarxiv icon

Text-To-Speech Synthesis In The Wild

Add code
Sep 13, 2024
Viaarxiv icon

The VoxCeleb Speaker Recognition Challenge: A Retrospective

Add code
Aug 27, 2024
Viaarxiv icon

Bridging the Gap between Audio and Text using Parallel-attention for User-defined Keyword Spotting

Add code
Aug 07, 2024
Viaarxiv icon

VoxSim: A perceptual voice similarity dataset

Add code
Jul 26, 2024
Viaarxiv icon

Aligning Sight and Sound: Advanced Sound Source Localization Through Audio-Visual Alignment

Add code
Jul 18, 2024
Viaarxiv icon

ElasticAST: An Audio Spectrogram Transformer for All Length and Resolutions

Add code
Jul 11, 2024
Viaarxiv icon

Lightweight Audio Segmentation for Long-form Speech Translation

Add code
Jun 15, 2024
Viaarxiv icon