Picture for Xin Zou

Xin Zou

Decoding by Perturbation: Mitigating MLLM Hallucinations via Dynamic Textual Perturbation

Add code
Apr 14, 2026
Viaarxiv icon

Visual Late Chunking: An Empirical Study of Contextual Chunking for Efficient Visual Document Retrieval

Add code
Apr 11, 2026
Viaarxiv icon

Unveiling Language Routing Isolation in Multilingual MoE Models for Interpretable Subnetwork Adaptation

Add code
Apr 04, 2026
Viaarxiv icon

Temporal Gains, Spatial Costs: Revisiting Video Fine-Tuning in Multimodal Large Language Models

Add code
Mar 18, 2026
Viaarxiv icon

Beyond the Grid: Layout-Informed Multi-Vector Retrieval with Parsed Visual Document Representations

Add code
Mar 02, 2026
Viaarxiv icon

Unlocking Multimodal Document Intelligence: From Current Triumphs to Future Frontiers of Visual Document Retrieval

Add code
Feb 23, 2026
Viaarxiv icon

Sculpting the Vector Space: Towards Efficient Multi-Vector Visual Document Retrieval via Prune-then-Merge Framework

Add code
Feb 23, 2026
Viaarxiv icon

REMAC: Reference-Based Martian Asymmetrical Image Compression

Add code
Jan 26, 2026
Viaarxiv icon

Vision-Language Introspection: Mitigating Overconfident Hallucinations in MLLMs via Interpretable Bi-Causal Steering

Add code
Jan 08, 2026
Viaarxiv icon

Sharp Eyes and Memory for VideoLLMs: Information-Aware Visual Token Pruning for Efficient and Reliable VideoLLM Reasoning

Add code
Nov 11, 2025
Viaarxiv icon