Picture for Yibing Song

Yibing Song

DyDiT++: Dynamic Diffusion Transformers for Efficient Visual Generation

Add code
Apr 09, 2025
Viaarxiv icon

Re-Aligning Language to Visual Objects with an Agentic Workflow

Add code
Mar 30, 2025
Viaarxiv icon

AvatarArtist: Open-Domain 4D Avatarization

Add code
Mar 26, 2025
Viaarxiv icon

UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation

Add code
Mar 19, 2025
Viaarxiv icon

ATPrompt: Textual Prompt Learning with Embedded Attributes

Add code
Dec 12, 2024
Figure 1 for ATPrompt: Textual Prompt Learning with Embedded Attributes
Figure 2 for ATPrompt: Textual Prompt Learning with Embedded Attributes
Figure 3 for ATPrompt: Textual Prompt Learning with Embedded Attributes
Figure 4 for ATPrompt: Textual Prompt Learning with Embedded Attributes
Viaarxiv icon

A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for Accelerating Large VLMs

Add code
Dec 05, 2024
Figure 1 for A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for Accelerating Large VLMs
Figure 2 for A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for Accelerating Large VLMs
Figure 3 for A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for Accelerating Large VLMs
Figure 4 for A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for Accelerating Large VLMs
Viaarxiv icon

LLaVA-CoT: Let Vision Language Models Reason Step-by-Step

Add code
Nov 25, 2024
Figure 1 for LLaVA-CoT: Let Vision Language Models Reason Step-by-Step
Figure 2 for LLaVA-CoT: Let Vision Language Models Reason Step-by-Step
Figure 3 for LLaVA-CoT: Let Vision Language Models Reason Step-by-Step
Figure 4 for LLaVA-CoT: Let Vision Language Models Reason Step-by-Step
Viaarxiv icon

LLaVA-o1: Let Vision Language Models Reason Step-by-Step

Add code
Nov 15, 2024
Figure 1 for LLaVA-o1: Let Vision Language Models Reason Step-by-Step
Figure 2 for LLaVA-o1: Let Vision Language Models Reason Step-by-Step
Figure 3 for LLaVA-o1: Let Vision Language Models Reason Step-by-Step
Figure 4 for LLaVA-o1: Let Vision Language Models Reason Step-by-Step
Viaarxiv icon

Aligning Audio-Visual Joint Representations with an Agentic Workflow

Add code
Oct 31, 2024
Figure 1 for Aligning Audio-Visual Joint Representations with an Agentic Workflow
Figure 2 for Aligning Audio-Visual Joint Representations with an Agentic Workflow
Figure 3 for Aligning Audio-Visual Joint Representations with an Agentic Workflow
Figure 4 for Aligning Audio-Visual Joint Representations with an Agentic Workflow
Viaarxiv icon

LFME: A Simple Framework for Learning from Multiple Experts in Domain Generalization

Add code
Oct 22, 2024
Figure 1 for LFME: A Simple Framework for Learning from Multiple Experts in Domain Generalization
Figure 2 for LFME: A Simple Framework for Learning from Multiple Experts in Domain Generalization
Figure 3 for LFME: A Simple Framework for Learning from Multiple Experts in Domain Generalization
Figure 4 for LFME: A Simple Framework for Learning from Multiple Experts in Domain Generalization
Viaarxiv icon