Picture for Jinqiao Wang

Jinqiao Wang

Foundation Model Research Center, Institute of Automation, Chinese Academy of Sciences, objecteye.Inc

Listening with the Eyes: Benchmarking Egocentric Co-Speech Grounding across Space and Time

Add code
Mar 09, 2026
Viaarxiv icon

TRACE: Task-Adaptive Reasoning and Representation Learning for Universal Multimodal Retrieval

Add code
Mar 04, 2026
Viaarxiv icon

WISER: Wider Search, Deeper Thinking, and Adaptive Fusion for Training-Free Zero-Shot Composed Image Retrieval

Add code
Feb 26, 2026
Viaarxiv icon

TraceVision: Trajectory-Aware Vision-Language Model for Human-Like Spatial Understanding

Add code
Feb 24, 2026
Viaarxiv icon

R-Diverse: Mitigating Diversity Illusion in Self-Play LLM Training

Add code
Feb 16, 2026
Viaarxiv icon

Active Zero: Self-Evolving Vision-Language Models through Active Environment Exploration

Add code
Feb 11, 2026
Viaarxiv icon

ReCALL: Recalibrating Capability Degradation for MLLM-based Composed Image Retrieval

Add code
Feb 02, 2026
Viaarxiv icon

Towards Governance-Oriented Low-Altitude Intelligence: A Management-Centric Multi-Modal Benchmark With Implicitly Coordinated Vision-Language Reasoning Framework

Add code
Jan 27, 2026
Viaarxiv icon

PASs-MoE: Mitigating Misaligned Co-drift among Router and Experts via Pathway Activation Subspaces for Continual Learning

Add code
Jan 19, 2026
Viaarxiv icon

GeM-VG: Towards Generalized Multi-image Visual Grounding with Multimodal Large Language Models

Add code
Jan 08, 2026
Viaarxiv icon