Picture for Hang Hua

Hang Hua

Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models

Add code
Oct 06, 2025
Figure 1 for Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models
Figure 2 for Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models
Figure 3 for Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models
Figure 4 for Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models
Viaarxiv icon

MMPerspective: Do MLLMs Understand Perspective? A Comprehensive Benchmark for Perspective Perception, Reasoning, and Robustness

Add code
May 26, 2025
Viaarxiv icon

MMIG-Bench: Towards Comprehensive and Explainable Evaluation of Multi-Modal Image Generation Models

Add code
May 26, 2025
Viaarxiv icon

The Tenth NTIRE 2025 Efficient Super-Resolution Challenge Report

Add code
Apr 14, 2025
Viaarxiv icon

Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting

Add code
Apr 09, 2025
Viaarxiv icon

Generative AI for Cel-Animation: A Survey

Add code
Jan 08, 2025
Viaarxiv icon

FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity

Add code
Nov 23, 2024
Figure 1 for FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity
Figure 2 for FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity
Figure 3 for FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity
Figure 4 for FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity
Viaarxiv icon

VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?

Add code
Nov 19, 2024
Figure 1 for VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?
Figure 2 for VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?
Figure 3 for VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?
Figure 4 for VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?
Viaarxiv icon

MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models

Add code
Oct 13, 2024
Figure 1 for MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models
Figure 2 for MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models
Figure 3 for MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models
Figure 4 for MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models
Viaarxiv icon

PromptFix: You Prompt and We Fix the Photo

Add code
May 27, 2024
Figure 1 for PromptFix: You Prompt and We Fix the Photo
Figure 2 for PromptFix: You Prompt and We Fix the Photo
Figure 3 for PromptFix: You Prompt and We Fix the Photo
Figure 4 for PromptFix: You Prompt and We Fix the Photo
Viaarxiv icon