Picture for Yunlong Tang

Yunlong Tang

VERIFY: A Benchmark of Visual Explanation and Reasoning for Investigating Multimodal Reasoning Fidelity

Add code
Mar 14, 2025
Viaarxiv icon

Generative AI for Cel-Animation: A Survey

Add code
Jan 08, 2025
Viaarxiv icon

Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach

Add code
Dec 24, 2024
Viaarxiv icon

VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?

Add code
Nov 19, 2024
Figure 1 for VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?
Figure 2 for VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?
Figure 3 for VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?
Figure 4 for VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?
Viaarxiv icon

Scaling Concept With Text-Guided Diffusion Models

Add code
Oct 31, 2024
Viaarxiv icon

MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models

Add code
Oct 13, 2024
Figure 1 for MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models
Figure 2 for MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models
Figure 3 for MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models
Figure 4 for MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models
Viaarxiv icon

EAGLE: Egocentric AGgregated Language-video Engine

Add code
Sep 26, 2024
Figure 1 for EAGLE: Egocentric AGgregated Language-video Engine
Figure 2 for EAGLE: Egocentric AGgregated Language-video Engine
Figure 3 for EAGLE: Egocentric AGgregated Language-video Engine
Figure 4 for EAGLE: Egocentric AGgregated Language-video Engine
Viaarxiv icon

CaRDiff: Video Salient Object Ranking Chain of Thought Reasoning for Saliency Prediction with Diffusion

Add code
Aug 21, 2024
Viaarxiv icon

Do More Details Always Introduce More Hallucinations in LVLM-based Image Captioning?

Add code
Jun 18, 2024
Viaarxiv icon

V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning

Add code
Apr 18, 2024
Figure 1 for V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning
Figure 2 for V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning
Figure 3 for V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning
Figure 4 for V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning
Viaarxiv icon