Picture for Jin Ma

Jin Ma

Pause or Fabricate? Training Language Models for Grounded Reasoning

Add code
Apr 21, 2026
Viaarxiv icon

StreamCacheVGGT: Streaming Visual Geometry Transformers with Robust Scoring and Hybrid Cache Compression

Add code
Apr 16, 2026
Viaarxiv icon

UI-Copilot: Advancing Long-Horizon GUI Automation via Tool-Integrated Policy Optimization

Add code
Apr 15, 2026
Viaarxiv icon

Robust 4D Visual Geometry Transformer with Uncertainty-Aware Priors

Add code
Apr 10, 2026
Viaarxiv icon

HD-VGGT: High-Resolution Visual Geometry Transformer

Add code
Mar 28, 2026
Viaarxiv icon

Towards Automated Community Notes Generation with Large Vision Language Models for Combating Contextual Deception

Add code
Mar 23, 2026
Viaarxiv icon

A Large-Scale Remote Sensing Dataset and VLM-based Algorithm for Fine-Grained Road Hierarchy Classification

Add code
Mar 22, 2026
Viaarxiv icon

Omni IIE Bench: Benchmarking the Practical Capabilities of Image Editing Models

Add code
Mar 16, 2026
Viaarxiv icon

Beyond Closed-Pool Video Retrieval: A Benchmark and Agent Framework for Real-World Video Search and Moment Localization

Add code
Feb 10, 2026
Viaarxiv icon

VDE Bench: Evaluating The Capability of Image Editing Models to Modify Visual Documents

Add code
Jan 27, 2026
Viaarxiv icon