Picture for Hang Hua

Hang Hua

Generative AI for Cel-Animation: A Survey

Add code
Jan 08, 2025
Viaarxiv icon

FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity

Add code
Nov 23, 2024
Figure 1 for FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity
Figure 2 for FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity
Figure 3 for FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity
Figure 4 for FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity
Viaarxiv icon

VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?

Add code
Nov 19, 2024
Figure 1 for VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?
Figure 2 for VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?
Figure 3 for VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?
Figure 4 for VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?
Viaarxiv icon

MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models

Add code
Oct 13, 2024
Viaarxiv icon

PromptFix: You Prompt and We Fix the Photo

Add code
May 27, 2024
Viaarxiv icon

BattleAgent: Multi-modal Dynamic Emulation on Historical Battles to Complement Historical Analysis

Add code
Apr 23, 2024
Figure 1 for BattleAgent: Multi-modal Dynamic Emulation on Historical Battles to Complement Historical Analysis
Figure 2 for BattleAgent: Multi-modal Dynamic Emulation on Historical Battles to Complement Historical Analysis
Figure 3 for BattleAgent: Multi-modal Dynamic Emulation on Historical Battles to Complement Historical Analysis
Figure 4 for BattleAgent: Multi-modal Dynamic Emulation on Historical Battles to Complement Historical Analysis
Viaarxiv icon

FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction

Add code
Apr 23, 2024
Figure 1 for FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction
Figure 2 for FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction
Figure 3 for FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction
Figure 4 for FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction
Viaarxiv icon

V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning

Add code
Apr 18, 2024
Viaarxiv icon

Emo-Avatar: Efficient Monocular Video Style Avatar through Texture Rendering

Add code
Feb 01, 2024
Viaarxiv icon

VideoXum: Cross-modal Visual and Textural Summarization of Videos

Add code
Mar 21, 2023
Viaarxiv icon