Picture for Jieyu Zhang

Jieyu Zhang

Imaginative Perception Tokens Enhance Spatial Reasoning in Multimodal Language Models

Add code
Jun 03, 2026
Viaarxiv icon

Demo-JEPA: Joint-Embedding Predictive Architecture for One-shot Cross-Embodiment Imitation

Add code
May 20, 2026
Viaarxiv icon

You Only Judge Once: Multi-response Reward Modeling in a Single Forward Pass

Add code
Apr 13, 2026
Viaarxiv icon

WildDet3D: Scaling Promptable 3D Detection in the Wild

Add code
Apr 09, 2026
Viaarxiv icon

MolmoPoint: Better Pointing for VLMs with Grounding Tokens

Add code
Mar 30, 2026
Viaarxiv icon

URDF-Anything+: Autoregressive Articulated 3D Models Generation for Physical Simulation

Add code
Mar 14, 2026
Viaarxiv icon

Video-Based Reward Modeling for Computer-Use Agents

Add code
Mar 10, 2026
Viaarxiv icon

TrajTok: Learning Trajectory Tokens enables better Video Understanding

Add code
Feb 26, 2026
Viaarxiv icon

Theory of Space: Can Foundation Models Construct Spatial Beliefs through Active Exploration?

Add code
Feb 04, 2026
Viaarxiv icon

Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding

Add code
Jan 15, 2026
Viaarxiv icon