Picture for Yibing Song

Yibing Song

AvatarArtist: Open-Domain 4D Avatarization

Add code
Mar 26, 2025
Viaarxiv icon

UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation

Add code
Mar 19, 2025
Viaarxiv icon

ATPrompt: Textual Prompt Learning with Embedded Attributes

Add code
Dec 12, 2024
Figure 1 for ATPrompt: Textual Prompt Learning with Embedded Attributes
Figure 2 for ATPrompt: Textual Prompt Learning with Embedded Attributes
Figure 3 for ATPrompt: Textual Prompt Learning with Embedded Attributes
Figure 4 for ATPrompt: Textual Prompt Learning with Embedded Attributes
Viaarxiv icon

A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for Accelerating Large VLMs

Add code
Dec 05, 2024
Figure 1 for A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for Accelerating Large VLMs
Figure 2 for A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for Accelerating Large VLMs
Figure 3 for A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for Accelerating Large VLMs
Figure 4 for A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for Accelerating Large VLMs
Viaarxiv icon

LLaVA-CoT: Let Vision Language Models Reason Step-by-Step

Add code
Nov 25, 2024
Figure 1 for LLaVA-CoT: Let Vision Language Models Reason Step-by-Step
Figure 2 for LLaVA-CoT: Let Vision Language Models Reason Step-by-Step
Figure 3 for LLaVA-CoT: Let Vision Language Models Reason Step-by-Step
Figure 4 for LLaVA-CoT: Let Vision Language Models Reason Step-by-Step
Viaarxiv icon

LLaVA-o1: Let Vision Language Models Reason Step-by-Step

Add code
Nov 15, 2024
Figure 1 for LLaVA-o1: Let Vision Language Models Reason Step-by-Step
Figure 2 for LLaVA-o1: Let Vision Language Models Reason Step-by-Step
Figure 3 for LLaVA-o1: Let Vision Language Models Reason Step-by-Step
Figure 4 for LLaVA-o1: Let Vision Language Models Reason Step-by-Step
Viaarxiv icon

Aligning Audio-Visual Joint Representations with an Agentic Workflow

Add code
Oct 31, 2024
Figure 1 for Aligning Audio-Visual Joint Representations with an Agentic Workflow
Figure 2 for Aligning Audio-Visual Joint Representations with an Agentic Workflow
Figure 3 for Aligning Audio-Visual Joint Representations with an Agentic Workflow
Figure 4 for Aligning Audio-Visual Joint Representations with an Agentic Workflow
Viaarxiv icon

LFME: A Simple Framework for Learning from Multiple Experts in Domain Generalization

Add code
Oct 22, 2024
Figure 1 for LFME: A Simple Framework for Learning from Multiple Experts in Domain Generalization
Figure 2 for LFME: A Simple Framework for Learning from Multiple Experts in Domain Generalization
Figure 3 for LFME: A Simple Framework for Learning from Multiple Experts in Domain Generalization
Figure 4 for LFME: A Simple Framework for Learning from Multiple Experts in Domain Generalization
Viaarxiv icon

Dynamic Diffusion Transformer

Add code
Oct 04, 2024
Viaarxiv icon

Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation

Add code
Mar 18, 2024
Figure 1 for Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation
Figure 2 for Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation
Figure 3 for Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation
Figure 4 for Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation
Viaarxiv icon