Picture for Zhihang Liu

Zhihang Liu

Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models

Add code
Mar 20, 2025
Viaarxiv icon

SpaceVLLM: Endowing Multimodal Large Language Model with Spatio-Temporal Video Grounding Capability

Add code
Mar 18, 2025
Viaarxiv icon

Rethinking Video Tokenization: A Conditioned Diffusion-based Approach

Add code
Mar 05, 2025
Viaarxiv icon

What Is a Good Caption? A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Coverage of MLLMs

Add code
Feb 19, 2025
Viaarxiv icon

Hallucination Mitigation Prompts Long-term Video Understanding

Add code
Jun 17, 2024
Figure 1 for Hallucination Mitigation Prompts Long-term Video Understanding
Figure 2 for Hallucination Mitigation Prompts Long-term Video Understanding
Figure 3 for Hallucination Mitigation Prompts Long-term Video Understanding
Figure 4 for Hallucination Mitigation Prompts Long-term Video Understanding
Viaarxiv icon

Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval

Add code
Dec 19, 2023
Figure 1 for Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval
Figure 2 for Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval
Figure 3 for Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval
Figure 4 for Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval
Viaarxiv icon