Picture for Zehan Wang

Zehan Wang

DSI-Bench: A Benchmark for Dynamic Spatial Intelligence

Add code
Oct 21, 2025
Viaarxiv icon

GenSpace: Benchmarking Spatially-Aware Image Generation

Add code
May 30, 2025
Figure 1 for GenSpace: Benchmarking Spatially-Aware Image Generation
Figure 2 for GenSpace: Benchmarking Spatially-Aware Image Generation
Figure 3 for GenSpace: Benchmarking Spatially-Aware Image Generation
Figure 4 for GenSpace: Benchmarking Spatially-Aware Image Generation
Viaarxiv icon

Depth Anything with Any Prior

Add code
May 15, 2025
Viaarxiv icon

T2A-Feedback: Improving Basic Capabilities of Text-to-Audio Generation via Fine-grained AI Feedback

Add code
May 15, 2025
Figure 1 for T2A-Feedback: Improving Basic Capabilities of Text-to-Audio Generation via Fine-grained AI Feedback
Figure 2 for T2A-Feedback: Improving Basic Capabilities of Text-to-Audio Generation via Fine-grained AI Feedback
Figure 3 for T2A-Feedback: Improving Basic Capabilities of Text-to-Audio Generation via Fine-grained AI Feedback
Figure 4 for T2A-Feedback: Improving Basic Capabilities of Text-to-Audio Generation via Fine-grained AI Feedback
Viaarxiv icon

Diff-Prompt: Diffusion-Driven Prompt Generator with Mask Supervision

Add code
Apr 30, 2025
Viaarxiv icon

RoboGround: Robotic Manipulation with Grounded Vision-Language Priors

Add code
Apr 30, 2025
Figure 1 for RoboGround: Robotic Manipulation with Grounded Vision-Language Priors
Figure 2 for RoboGround: Robotic Manipulation with Grounded Vision-Language Priors
Figure 3 for RoboGround: Robotic Manipulation with Grounded Vision-Language Priors
Figure 4 for RoboGround: Robotic Manipulation with Grounded Vision-Language Priors
Viaarxiv icon

Unleashing the Power of Natural Audio Featuring Multiple Sound Sources

Add code
Apr 24, 2025
Viaarxiv icon

EAGER-LLM: Enhancing Large Language Models as Recommenders through Exogenous Behavior-Semantic Integration

Add code
Feb 20, 2025
Viaarxiv icon

OmniChat: Enhancing Spoken Dialogue Systems with Scalable Synthetic Data for Diverse Scenarios

Add code
Jan 02, 2025
Figure 1 for OmniChat: Enhancing Spoken Dialogue Systems with Scalable Synthetic Data for Diverse Scenarios
Figure 2 for OmniChat: Enhancing Spoken Dialogue Systems with Scalable Synthetic Data for Diverse Scenarios
Figure 3 for OmniChat: Enhancing Spoken Dialogue Systems with Scalable Synthetic Data for Diverse Scenarios
Figure 4 for OmniChat: Enhancing Spoken Dialogue Systems with Scalable Synthetic Data for Diverse Scenarios
Viaarxiv icon

Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models

Add code
Dec 24, 2024
Viaarxiv icon