Picture for XuDong Wang

XuDong Wang

VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents

Add code
Jan 23, 2026
Viaarxiv icon

Human detectors are surprisingly powerful reward models

Add code
Jan 21, 2026
Viaarxiv icon

Visually Prompted Benchmarks Are Surprisingly Fragile

Add code
Dec 19, 2025
Figure 1 for Visually Prompted Benchmarks Are Surprisingly Fragile
Figure 2 for Visually Prompted Benchmarks Are Surprisingly Fragile
Figure 3 for Visually Prompted Benchmarks Are Surprisingly Fragile
Figure 4 for Visually Prompted Benchmarks Are Surprisingly Fragile
Viaarxiv icon

UnSAMv2: Self-Supervised Learning Enables Segment Anything at Any Granularity

Add code
Nov 17, 2025
Viaarxiv icon

Reconstruction Alignment Improves Unified Multimodal Models

Add code
Sep 08, 2025
Viaarxiv icon

TULIP: Towards Unified Language-Image Pretraining

Add code
Mar 19, 2025
Figure 1 for TULIP: Towards Unified Language-Image Pretraining
Figure 2 for TULIP: Towards Unified Language-Image Pretraining
Figure 3 for TULIP: Towards Unified Language-Image Pretraining
Figure 4 for TULIP: Towards Unified Language-Image Pretraining
Viaarxiv icon

Visual Lexicon: Rich Image Features in Language Space

Add code
Dec 09, 2024
Figure 1 for Visual Lexicon: Rich Image Features in Language Space
Viaarxiv icon

SegLLM: Multi-round Reasoning Segmentation

Add code
Oct 24, 2024
Figure 1 for SegLLM: Multi-round Reasoning Segmentation
Figure 2 for SegLLM: Multi-round Reasoning Segmentation
Figure 3 for SegLLM: Multi-round Reasoning Segmentation
Figure 4 for SegLLM: Multi-round Reasoning Segmentation
Viaarxiv icon

Segment Anything without Supervision

Add code
Jun 28, 2024
Viaarxiv icon