Picture for Guanglu Song

Guanglu Song

Robo-MUTUAL: Robotic Multimodal Task Specification via Unimodal Learning

Add code
Oct 02, 2024
Viaarxiv icon

Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models

Add code
Jun 17, 2024
Viaarxiv icon

Phased Consistency Model

Add code
May 28, 2024
Viaarxiv icon

Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models

Add code
May 01, 2024
Viaarxiv icon

MoVA: Adapting Mixture of Vision Experts to Multimodal Context

Add code
Apr 19, 2024
Viaarxiv icon

Rethinking the Spatial Inconsistency in Classifier-Free Diffusion Guidance

Add code
Apr 08, 2024
Viaarxiv icon

CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching

Add code
Apr 04, 2024
Viaarxiv icon

Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models

Add code
Mar 25, 2024
Viaarxiv icon

Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation

Add code
Mar 20, 2024
Figure 1 for Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation
Figure 2 for Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation
Figure 3 for Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation
Figure 4 for Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation
Viaarxiv icon

FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis

Add code
Mar 19, 2024
Viaarxiv icon