Picture for Chenliang Xu

Chenliang Xu

VERIFY: A Benchmark of Visual Explanation and Reasoning for Investigating Multimodal Reasoning Fidelity

Add code
Mar 14, 2025
Viaarxiv icon

Rethinking Audio-Visual Adversarial Vulnerability from Temporal and Modality Perspectives

Add code
Feb 19, 2025
Viaarxiv icon

GestureLSM: Latent Shortcut based Co-Speech Gesture Generation with Spatial-Temporal Modeling

Add code
Jan 31, 2025
Viaarxiv icon

Generative AI for Cel-Animation: A Survey

Add code
Jan 08, 2025
Viaarxiv icon

Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach

Add code
Dec 24, 2024
Viaarxiv icon

VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?

Add code
Nov 19, 2024
Figure 1 for VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?
Figure 2 for VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?
Figure 3 for VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?
Figure 4 for VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?
Viaarxiv icon

Scaling Concept With Text-Guided Diffusion Models

Add code
Oct 31, 2024
Viaarxiv icon

Will the Inclusion of Generated Data Amplify Bias Across Generations in Future Image Classification Models?

Add code
Oct 14, 2024
Viaarxiv icon

MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models

Add code
Oct 13, 2024
Figure 1 for MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models
Figure 2 for MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models
Figure 3 for MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models
Figure 4 for MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models
Viaarxiv icon

Language-Guided Joint Audio-Visual Editing via One-Shot Adaptation

Add code
Oct 09, 2024
Figure 1 for Language-Guided Joint Audio-Visual Editing via One-Shot Adaptation
Figure 2 for Language-Guided Joint Audio-Visual Editing via One-Shot Adaptation
Figure 3 for Language-Guided Joint Audio-Visual Editing via One-Shot Adaptation
Figure 4 for Language-Guided Joint Audio-Visual Editing via One-Shot Adaptation
Viaarxiv icon