Picture for Yuchi Wang

Yuchi Wang

Rethinking Semantic Parsing for Large Language Models: Enhancing LLM Performance with Semantic Hints

Add code
Sep 22, 2024
Viaarxiv icon

Make Your Actor Talk: Generalizable and High-Fidelity Lip Sync with Motion and Appearance Disentanglement

Add code
Jun 12, 2024
Viaarxiv icon

InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation

Add code
May 24, 2024
Figure 1 for InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation
Figure 2 for InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation
Figure 3 for InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation
Figure 4 for InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation
Viaarxiv icon

LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation?

Add code
Apr 16, 2024
Figure 1 for LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation?
Figure 2 for LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation?
Figure 3 for LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation?
Figure 4 for LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation?
Viaarxiv icon

UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance Editing

Add code
Feb 24, 2024
Figure 1 for UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance Editing
Figure 2 for UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance Editing
Figure 3 for UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance Editing
Figure 4 for UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance Editing
Viaarxiv icon

PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain

Add code
Feb 21, 2024
Viaarxiv icon

GAIA: Zero-shot Talking Avatar Generation

Add code
Nov 26, 2023
Viaarxiv icon

Towards End-to-End Embodied Decision Making via Multi-modal Large Language Model: Explorations with GPT4-Vision and Beyond

Add code
Oct 16, 2023
Viaarxiv icon