Picture for Yunlong Tang

Yunlong Tang

VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?

Add code
Nov 19, 2024
Figure 1 for VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?
Figure 2 for VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?
Figure 3 for VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?
Figure 4 for VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?
Viaarxiv icon

Scaling Concept With Text-Guided Diffusion Models

Add code
Oct 31, 2024
Viaarxiv icon

MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models

Add code
Oct 13, 2024
Viaarxiv icon

EAGLE: Egocentric AGgregated Language-video Engine

Add code
Sep 26, 2024
Figure 1 for EAGLE: Egocentric AGgregated Language-video Engine
Figure 2 for EAGLE: Egocentric AGgregated Language-video Engine
Figure 3 for EAGLE: Egocentric AGgregated Language-video Engine
Figure 4 for EAGLE: Egocentric AGgregated Language-video Engine
Viaarxiv icon

CaRDiff: Video Salient Object Ranking Chain of Thought Reasoning for Saliency Prediction with Diffusion

Add code
Aug 21, 2024
Viaarxiv icon

Do More Details Always Introduce More Hallucinations in LVLM-based Image Captioning?

Add code
Jun 18, 2024
Viaarxiv icon

V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning

Add code
Apr 18, 2024
Viaarxiv icon

DPStyler: Dynamic PromptStyler for Source-Free Domain Generalization

Add code
Mar 25, 2024
Viaarxiv icon

AVicuna: Audio-Visual LLM with Interleaver and Context-Boundary Alignment for Temporal Referential Dialogue

Add code
Mar 24, 2024
Viaarxiv icon

Emo-Avatar: Efficient Monocular Video Style Avatar through Texture Rendering

Add code
Feb 01, 2024
Viaarxiv icon