Picture for Zhenwei Shao

Zhenwei Shao

MindWatcher: Toward Smarter Multimodal Tool-Integrated Reasoning

Add code
Dec 29, 2025
Viaarxiv icon

VideoARM: Agentic Reasoning over Hierarchical Memory for Long-Form Video Understanding

Add code
Dec 13, 2025
Figure 1 for VideoARM: Agentic Reasoning over Hierarchical Memory for Long-Form Video Understanding
Figure 2 for VideoARM: Agentic Reasoning over Hierarchical Memory for Long-Form Video Understanding
Figure 3 for VideoARM: Agentic Reasoning over Hierarchical Memory for Long-Form Video Understanding
Figure 4 for VideoARM: Agentic Reasoning over Hierarchical Memory for Long-Form Video Understanding
Viaarxiv icon

MARS2 2025 Challenge on Multimodal Reasoning: Datasets, Methods, Results, Discussion, and Outlook

Add code
Sep 17, 2025
Figure 1 for MARS2 2025 Challenge on Multimodal Reasoning: Datasets, Methods, Results, Discussion, and Outlook
Figure 2 for MARS2 2025 Challenge on Multimodal Reasoning: Datasets, Methods, Results, Discussion, and Outlook
Figure 3 for MARS2 2025 Challenge on Multimodal Reasoning: Datasets, Methods, Results, Discussion, and Outlook
Figure 4 for MARS2 2025 Challenge on Multimodal Reasoning: Datasets, Methods, Results, Discussion, and Outlook
Viaarxiv icon

Growing a Twig to Accelerate Large Vision-Language Models

Add code
Mar 18, 2025
Figure 1 for Growing a Twig to Accelerate Large Vision-Language Models
Figure 2 for Growing a Twig to Accelerate Large Vision-Language Models
Figure 3 for Growing a Twig to Accelerate Large Vision-Language Models
Figure 4 for Growing a Twig to Accelerate Large Vision-Language Models
Viaarxiv icon

Imp: Highly Capable Large Multimodal Models for Mobile Devices

Add code
May 20, 2024
Viaarxiv icon

Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering

Add code
Mar 16, 2023
Viaarxiv icon