Picture for Jusheng Zhang

Jusheng Zhang

Why Keep Your Doubts to Yourself? Trading Visual Uncertainties in Multi-Agent Bandit Systems

Add code
Jan 26, 2026
Viaarxiv icon

ResAgent: Entropy-based Prior Point Discovery and Visual Reasoning for Referring Expression Segmentation

Add code
Jan 23, 2026
Viaarxiv icon

3D-Agent:Tri-Modal Multi-Agent Collaboration for Scalable 3D Object Annotation

Add code
Jan 07, 2026
Viaarxiv icon

FlashVLM: Text-Guided Visual Token Selection for Large Multimodal Models

Add code
Dec 23, 2025
Figure 1 for FlashVLM: Text-Guided Visual Token Selection for Large Multimodal Models
Figure 2 for FlashVLM: Text-Guided Visual Token Selection for Large Multimodal Models
Figure 3 for FlashVLM: Text-Guided Visual Token Selection for Large Multimodal Models
Figure 4 for FlashVLM: Text-Guided Visual Token Selection for Large Multimodal Models
Viaarxiv icon

MM-CoT:A Benchmark for Probing Visual Chain-of-Thought Reasoning in Multimodal Models

Add code
Dec 09, 2025
Figure 1 for MM-CoT:A Benchmark for Probing Visual Chain-of-Thought Reasoning in Multimodal Models
Figure 2 for MM-CoT:A Benchmark for Probing Visual Chain-of-Thought Reasoning in Multimodal Models
Figure 3 for MM-CoT:A Benchmark for Probing Visual Chain-of-Thought Reasoning in Multimodal Models
Figure 4 for MM-CoT:A Benchmark for Probing Visual Chain-of-Thought Reasoning in Multimodal Models
Viaarxiv icon

HybridToken-VLM: Hybrid Token Compression for Vision-Language Models

Add code
Dec 09, 2025
Viaarxiv icon

3DAlign-DAER: Dynamic Attention Policy and Efficient Retrieval Strategy for Fine-grained 3D-Text Alignment at Scale

Add code
Nov 17, 2025
Viaarxiv icon

Cost-Effective Communication: An Auction-based Method for Language Agent Interaction

Add code
Nov 17, 2025
Viaarxiv icon

Understanding Hardness of Vision-Language Compositionality from A Token-level Causal Lens

Add code
Oct 30, 2025
Viaarxiv icon

Guardian: Decoupling Exploration from Safety in Reinforcement Learning

Add code
Oct 26, 2025
Viaarxiv icon