Picture for Zhiwu Lu

Zhiwu Lu

CoTMR: Chain-of-Thought Multi-Scale Reasoning for Training-Free Zero-Shot Composed Image Retrieval

Add code
Feb 28, 2025
Viaarxiv icon

Leveraging Large Vision-Language Model as User Intent-aware Encoder for Composed Image Retrieval

Add code
Dec 15, 2024
Viaarxiv icon

Awaker2.5-VL: Stably Scaling MLLMs with Parameter-Efficient Mixture of Experts

Add code
Nov 16, 2024
Viaarxiv icon

MMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents

Add code
Aug 08, 2024
Figure 1 for MMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents
Figure 2 for MMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents
Figure 3 for MMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents
Figure 4 for MMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents
Viaarxiv icon

CoTBal: Comprehensive Task Balancing for Multi-Task Visual Instruction Tuning

Add code
Mar 07, 2024
Viaarxiv icon

Improvable Gap Balancing for Multi-Task Learning

Add code
Jul 28, 2023
Viaarxiv icon

VDT: An Empirical Study on Video Diffusion with Transformers

Add code
May 22, 2023
Viaarxiv icon

UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal Modeling

Add code
Feb 13, 2023
Viaarxiv icon

TikTalk: A Multi-Modal Dialogue Dataset for Real-World Chitchat

Add code
Jan 14, 2023
Viaarxiv icon

Text2Poster: Laying out Stylized Texts on Retrieved Images

Add code
Jan 06, 2023
Viaarxiv icon