Picture for Jaemin Cho

Jaemin Cho

Imaginative Perception Tokens Enhance Spatial Reasoning in Multimodal Language Models

Add code
Jun 03, 2026
Viaarxiv icon

SeeTraceAct: Visibility-Aware Latent Planning from Cross-Embodiment Demonstration Videos

Add code
Jun 01, 2026
Viaarxiv icon

GPU Forecasters: Language Models as Selective Surrogates for Kernel Runtime Optimization

Add code
May 29, 2026
Viaarxiv icon

PhyMotion: Structured 3D Motion Reward for Physics-Grounded Human Video Generation

Add code
May 14, 2026
Viaarxiv icon

MolmoAct2: Action Reasoning Models for Real-world Deployment

Add code
May 04, 2026
Viaarxiv icon

WildDet3D: Scaling Promptable 3D Detection in the Wild

Add code
Apr 09, 2026
Viaarxiv icon

VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models

Add code
Mar 25, 2026
Viaarxiv icon

V-Co: A Closer Look at Visual Representation Alignment via Co-Denoising

Add code
Mar 17, 2026
Viaarxiv icon

InfinityStory: Unlimited Video Generation with World Consistency and Character-Aware Shot Transitions

Add code
Mar 04, 2026
Viaarxiv icon

AnchorWeave: World-Consistent Video Generation with Retrieved Local Spatial Memories

Add code
Feb 16, 2026
Viaarxiv icon