Picture for Xiaohan Yu

Xiaohan Yu

BrowseComp-$V^3$: A Visual, Vertical, and Verifiable Benchmark for Multimodal Browsing Agents

Add code
Feb 13, 2026
Viaarxiv icon

M$^3$Searcher: Modular Multimodal Information Seeking Agency with Retrieval-Oriented Reasoning

Add code
Jan 14, 2026
Viaarxiv icon

SCE-SLAM: Scale-Consistent Monocular SLAM via Scene Coordinate Embeddings

Add code
Jan 14, 2026
Viaarxiv icon

SparseSurf: Sparse-View 3D Gaussian Splatting for Surface Reconstruction

Add code
Nov 18, 2025
Viaarxiv icon

TableRAG: A Retrieval Augmented Generation Framework for Heterogeneous Document Reasoning

Add code
Jun 12, 2025
Viaarxiv icon

Improving Medical Visual Representation Learning with Pathological-level Cross-Modal Alignment and Correlation Exploration

Add code
Jun 12, 2025
Viaarxiv icon

X2C: A Dataset Featuring Nuanced Facial Expressions for Realistic Humanoid Imitation

Add code
May 16, 2025
Viaarxiv icon

LGD: Leveraging Generative Descriptions for Zero-Shot Referring Image Segmentation

Add code
Apr 20, 2025
Figure 1 for LGD: Leveraging Generative Descriptions for Zero-Shot Referring Image Segmentation
Figure 2 for LGD: Leveraging Generative Descriptions for Zero-Shot Referring Image Segmentation
Figure 3 for LGD: Leveraging Generative Descriptions for Zero-Shot Referring Image Segmentation
Figure 4 for LGD: Leveraging Generative Descriptions for Zero-Shot Referring Image Segmentation
Viaarxiv icon

Visual and Semantic Prompt Collaboration for Generalized Zero-Shot Learning

Add code
Mar 29, 2025
Viaarxiv icon

Unveiling the Potential of Multimodal Retrieval Augmented Generation with Planning

Add code
Jan 26, 2025
Viaarxiv icon