Picture for Luchuan Song

Luchuan Song

Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting

Add code
Apr 09, 2025
Viaarxiv icon

Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1)

Add code
Apr 04, 2025
Viaarxiv icon

VERIFY: A Benchmark of Visual Explanation and Reasoning for Investigating Multimodal Reasoning Fidelity

Add code
Mar 14, 2025
Viaarxiv icon

GestureLSM: Latent Shortcut based Co-Speech Gesture Generation with Spatial-Temporal Modeling

Add code
Jan 31, 2025
Viaarxiv icon

Generative AI for Cel-Animation: A Survey

Add code
Jan 08, 2025
Viaarxiv icon

Free-viewpoint Human Animation with Pose-correlated Reference Selection

Add code
Dec 23, 2024
Viaarxiv icon

EAGLE: Egocentric AGgregated Language-video Engine

Add code
Sep 26, 2024
Figure 1 for EAGLE: Egocentric AGgregated Language-video Engine
Figure 2 for EAGLE: Egocentric AGgregated Language-video Engine
Figure 3 for EAGLE: Egocentric AGgregated Language-video Engine
Figure 4 for EAGLE: Egocentric AGgregated Language-video Engine
Viaarxiv icon

Adaptive Super Resolution For One-Shot Talking-Head Generation

Add code
Mar 23, 2024
Viaarxiv icon

Emo-Avatar: Efficient Monocular Video Style Avatar through Texture Rendering

Add code
Feb 01, 2024
Viaarxiv icon

Tri$^{2}$-plane: Volumetric Avatar Reconstruction with Feature Pyramid

Add code
Jan 17, 2024
Viaarxiv icon