Picture for Joongwon Chae

Joongwon Chae

SJTU:Spatial judgments in multimodal models towards unified segmentation through coordinate detection

Add code
Dec 03, 2024
Viaarxiv icon

Grid-augmented vision: A simple yet effective approach for enhanced spatial understanding in multi-modal agents

Add code
Dec 03, 2024
Viaarxiv icon