Picture for Guanglu Song

Guanglu Song

VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video Face Swapping

Add code
Dec 15, 2024
Viaarxiv icon

EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM

Add code
Dec 12, 2024
Figure 1 for EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM
Figure 2 for EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM
Figure 3 for EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM
Figure 4 for EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM
Viaarxiv icon

See Further When Clear: Curriculum Consistency Model

Add code
Dec 09, 2024
Figure 1 for See Further When Clear: Curriculum Consistency Model
Figure 2 for See Further When Clear: Curriculum Consistency Model
Figure 3 for See Further When Clear: Curriculum Consistency Model
Figure 4 for See Further When Clear: Curriculum Consistency Model
Viaarxiv icon

Robo-MUTUAL: Robotic Multimodal Task Specification via Unimodal Learning

Add code
Oct 02, 2024
Figure 1 for Robo-MUTUAL: Robotic Multimodal Task Specification via Unimodal Learning
Figure 2 for Robo-MUTUAL: Robotic Multimodal Task Specification via Unimodal Learning
Figure 3 for Robo-MUTUAL: Robotic Multimodal Task Specification via Unimodal Learning
Figure 4 for Robo-MUTUAL: Robotic Multimodal Task Specification via Unimodal Learning
Viaarxiv icon

Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models

Add code
Jun 17, 2024
Viaarxiv icon

Phased Consistency Model

Add code
May 28, 2024
Viaarxiv icon

Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models

Add code
May 01, 2024
Viaarxiv icon

MoVA: Adapting Mixture of Vision Experts to Multimodal Context

Add code
Apr 19, 2024
Figure 1 for MoVA: Adapting Mixture of Vision Experts to Multimodal Context
Figure 2 for MoVA: Adapting Mixture of Vision Experts to Multimodal Context
Figure 3 for MoVA: Adapting Mixture of Vision Experts to Multimodal Context
Figure 4 for MoVA: Adapting Mixture of Vision Experts to Multimodal Context
Viaarxiv icon

Rethinking the Spatial Inconsistency in Classifier-Free Diffusion Guidance

Add code
Apr 08, 2024
Viaarxiv icon

CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching

Add code
Apr 04, 2024
Viaarxiv icon