Picture for Tae-Hyun Oh

Tae-Hyun Oh

POSTECH

Sound2Vision: Generating Diverse Visuals from Audio through Cross-Modal Latent Alignment

Add code
Dec 09, 2024
Viaarxiv icon

DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding

Add code
Dec 02, 2024
Viaarxiv icon

AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models

Add code
Oct 23, 2024
Figure 1 for AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models
Figure 2 for AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models
Figure 3 for AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models
Figure 4 for AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models
Viaarxiv icon

MeTTA: Single-View to 3D Textured Mesh Reconstruction with Test-Time Adaptation

Add code
Aug 21, 2024
Viaarxiv icon

MemBench: Memorized Image Trigger Prompt Dataset for Diffusion Models

Add code
Jul 24, 2024
Figure 1 for MemBench: Memorized Image Trigger Prompt Dataset for Diffusion Models
Figure 2 for MemBench: Memorized Image Trigger Prompt Dataset for Diffusion Models
Figure 3 for MemBench: Memorized Image Trigger Prompt Dataset for Diffusion Models
Figure 4 for MemBench: Memorized Image Trigger Prompt Dataset for Diffusion Models
Viaarxiv icon

BEAF: Observing BEfore-AFter Changes to Evaluate Hallucination in Vision-language Models

Add code
Jul 18, 2024
Figure 1 for BEAF: Observing BEfore-AFter Changes to Evaluate Hallucination in Vision-language Models
Figure 2 for BEAF: Observing BEfore-AFter Changes to Evaluate Hallucination in Vision-language Models
Figure 3 for BEAF: Observing BEfore-AFter Changes to Evaluate Hallucination in Vision-language Models
Figure 4 for BEAF: Observing BEfore-AFter Changes to Evaluate Hallucination in Vision-language Models
Viaarxiv icon

Aligning Sight and Sound: Advanced Sound Source Localization Through Audio-Visual Alignment

Add code
Jul 18, 2024
Viaarxiv icon

Enhancing Speech-Driven 3D Facial Animation with Audio-Visual Guidance from Lip Reading Expert

Add code
Jul 01, 2024
Viaarxiv icon

MultiTalk: Enhancing 3D Talking Head Generation Across Languages with Multilingual Video Dataset

Add code
Jun 20, 2024
Viaarxiv icon

Object-Centric Domain Randomization for 3D Shape Reconstruction in the Wild

Add code
Mar 21, 2024
Viaarxiv icon