Picture for Zhili Liu

Zhili Liu

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

Add code
Sep 26, 2024
Figure 1 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Figure 2 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Figure 3 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Figure 4 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Viaarxiv icon

Mixture of insighTful Experts : The Synergy of Thought Chains and Expert Mixtures in Self-Alignment

Add code
May 01, 2024
Viaarxiv icon

Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation

Add code
Mar 22, 2024
Viaarxiv icon

MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metric

Add code
Mar 12, 2024
Viaarxiv icon

Task-customized Masked AutoEncoder via Mixture of Cluster-conditional Experts

Add code
Feb 08, 2024
Viaarxiv icon

PROXYQA: An Alternative Framework for Evaluating Long-Form Text Generation with Large Language Models

Add code
Jan 26, 2024
Viaarxiv icon

Mixture of Cluster-conditional LoRA Experts for Vision-language Instruction Tuning

Add code
Dec 19, 2023
Viaarxiv icon

TrackDiffusion: Multi-object Tracking Data Generation via Diffusion Models

Add code
Dec 01, 2023
Viaarxiv icon

Geom-Erasing: Geometry-Driven Removal of Implicit Concept in Diffusion Models

Add code
Oct 13, 2023
Viaarxiv icon

DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning

Add code
May 04, 2023
Viaarxiv icon