Picture for Yunlong Tang

Yunlong Tang

Scaling Concept With Text-Guided Diffusion Models

Add code
Oct 31, 2024
Viaarxiv icon

MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models

Add code
Oct 13, 2024
Viaarxiv icon

EAGLE: Egocentric AGgregated Language-video Engine

Add code
Sep 26, 2024
Viaarxiv icon

CaRDiff: Video Salient Object Ranking Chain of Thought Reasoning for Saliency Prediction with Diffusion

Add code
Aug 21, 2024
Viaarxiv icon

Do More Details Always Introduce More Hallucinations in LVLM-based Image Captioning?

Add code
Jun 18, 2024
Viaarxiv icon

V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning

Add code
Apr 18, 2024
Viaarxiv icon

DPStyler: Dynamic PromptStyler for Source-Free Domain Generalization

Add code
Mar 25, 2024
Viaarxiv icon

AVicuna: Audio-Visual LLM with Interleaver and Context-Boundary Alignment for Temporal Referential Dialogue

Add code
Mar 24, 2024
Viaarxiv icon

Emo-Avatar: Efficient Monocular Video Style Avatar through Texture Rendering

Add code
Feb 01, 2024
Viaarxiv icon

Video Understanding with Large Language Models: A Survey

Add code
Jan 04, 2024
Viaarxiv icon