Picture for Yan Huang

Yan Huang

FloorPlan-VLN: A New Paradigm for Floor Plan Guided Vision-Language Navigation

Add code
Mar 18, 2026
Viaarxiv icon

Towards Visual Query Segmentation in the Wild

Add code
Mar 09, 2026
Viaarxiv icon

Towards Long-Form Spatio-Temporal Video Grounding

Add code
Feb 26, 2026
Viaarxiv icon

Beyond Closed-Pool Video Retrieval: A Benchmark and Agent Framework for Real-World Video Search and Moment Localization

Add code
Feb 10, 2026
Viaarxiv icon

PaperX: A Unified Framework for Multimodal Academic Presentation Generation with Scholar DAG

Add code
Feb 05, 2026
Viaarxiv icon

BridgeV2W: Bridging Video Generation Models to Embodied World Models via Embodiment Masks

Add code
Feb 03, 2026
Viaarxiv icon

ShotFinder: Imagination-Driven Open-Domain Video Shot Retrieval via Web Search

Add code
Jan 30, 2026
Viaarxiv icon

VERM: Leveraging Foundation Models to Create a Virtual Eye for Efficient 3D Robotic Manipulation

Add code
Dec 18, 2025
Viaarxiv icon

DP-CSGP: Differentially Private Stochastic Gradient Push with Compressed Communication

Add code
Dec 15, 2025
Viaarxiv icon

UniBYD: A Unified Framework for Learning Robotic Manipulation Across Embodiments Beyond Imitation of Human Demonstrations

Add code
Dec 12, 2025
Figure 1 for UniBYD: A Unified Framework for Learning Robotic Manipulation Across Embodiments Beyond Imitation of Human Demonstrations
Figure 2 for UniBYD: A Unified Framework for Learning Robotic Manipulation Across Embodiments Beyond Imitation of Human Demonstrations
Figure 3 for UniBYD: A Unified Framework for Learning Robotic Manipulation Across Embodiments Beyond Imitation of Human Demonstrations
Figure 4 for UniBYD: A Unified Framework for Learning Robotic Manipulation Across Embodiments Beyond Imitation of Human Demonstrations
Viaarxiv icon