Picture for Guanglu Song

Guanglu Song

EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM

Add code
Dec 12, 2024
Viaarxiv icon

See Further When Clear: Curriculum Consistency Model

Add code
Dec 09, 2024
Viaarxiv icon

Robo-MUTUAL: Robotic Multimodal Task Specification via Unimodal Learning

Add code
Oct 02, 2024
Viaarxiv icon

Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models

Add code
Jun 17, 2024
Viaarxiv icon

Phased Consistency Model

Add code
May 28, 2024
Viaarxiv icon

Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models

Add code
May 01, 2024
Viaarxiv icon

MoVA: Adapting Mixture of Vision Experts to Multimodal Context

Add code
Apr 19, 2024
Figure 1 for MoVA: Adapting Mixture of Vision Experts to Multimodal Context
Figure 2 for MoVA: Adapting Mixture of Vision Experts to Multimodal Context
Figure 3 for MoVA: Adapting Mixture of Vision Experts to Multimodal Context
Figure 4 for MoVA: Adapting Mixture of Vision Experts to Multimodal Context
Viaarxiv icon

Rethinking the Spatial Inconsistency in Classifier-Free Diffusion Guidance

Add code
Apr 08, 2024
Viaarxiv icon

CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching

Add code
Apr 04, 2024
Viaarxiv icon

Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models

Add code
Mar 25, 2024
Figure 1 for Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models
Figure 2 for Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models
Figure 3 for Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models
Figure 4 for Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models
Viaarxiv icon