Picture for Tae-Hyun Oh

Tae-Hyun Oh

POSTECH

Zero-shot Depth Completion via Test-time Alignment with Affine-invariant Depth Prior

Add code
Feb 10, 2025
Viaarxiv icon

The Devil is in the Details: Simple Remedies for Image-to-LiDAR Representation Learning

Add code
Jan 16, 2025
Viaarxiv icon

SoundBrush: Sound as a Brush for Visual Scene Editing

Add code
Dec 31, 2024
Figure 1 for SoundBrush: Sound as a Brush for Visual Scene Editing
Figure 2 for SoundBrush: Sound as a Brush for Visual Scene Editing
Figure 3 for SoundBrush: Sound as a Brush for Visual Scene Editing
Figure 4 for SoundBrush: Sound as a Brush for Visual Scene Editing
Viaarxiv icon

Sound2Vision: Generating Diverse Visuals from Audio through Cross-Modal Latent Alignment

Add code
Dec 09, 2024
Figure 1 for Sound2Vision: Generating Diverse Visuals from Audio through Cross-Modal Latent Alignment
Figure 2 for Sound2Vision: Generating Diverse Visuals from Audio through Cross-Modal Latent Alignment
Figure 3 for Sound2Vision: Generating Diverse Visuals from Audio through Cross-Modal Latent Alignment
Figure 4 for Sound2Vision: Generating Diverse Visuals from Audio through Cross-Modal Latent Alignment
Viaarxiv icon

DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding

Add code
Dec 02, 2024
Viaarxiv icon

AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models

Add code
Oct 23, 2024
Figure 1 for AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models
Figure 2 for AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models
Figure 3 for AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models
Figure 4 for AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models
Viaarxiv icon

MeTTA: Single-View to 3D Textured Mesh Reconstruction with Test-Time Adaptation

Add code
Aug 21, 2024
Viaarxiv icon

MemBench: Memorized Image Trigger Prompt Dataset for Diffusion Models

Add code
Jul 24, 2024
Figure 1 for MemBench: Memorized Image Trigger Prompt Dataset for Diffusion Models
Figure 2 for MemBench: Memorized Image Trigger Prompt Dataset for Diffusion Models
Figure 3 for MemBench: Memorized Image Trigger Prompt Dataset for Diffusion Models
Figure 4 for MemBench: Memorized Image Trigger Prompt Dataset for Diffusion Models
Viaarxiv icon

Aligning Sight and Sound: Advanced Sound Source Localization Through Audio-Visual Alignment

Add code
Jul 18, 2024
Viaarxiv icon

BEAF: Observing BEfore-AFter Changes to Evaluate Hallucination in Vision-language Models

Add code
Jul 18, 2024
Figure 1 for BEAF: Observing BEfore-AFter Changes to Evaluate Hallucination in Vision-language Models
Figure 2 for BEAF: Observing BEfore-AFter Changes to Evaluate Hallucination in Vision-language Models
Figure 3 for BEAF: Observing BEfore-AFter Changes to Evaluate Hallucination in Vision-language Models
Figure 4 for BEAF: Observing BEfore-AFter Changes to Evaluate Hallucination in Vision-language Models
Viaarxiv icon