Picture for Peng Jin

Peng Jin

Senior member, IEEE

LLaVA-CoT: Let Vision Language Models Reason Step-by-Step

Add code
Nov 25, 2024
Viaarxiv icon

Effort: Efficient Orthogonal Modeling for Generalizable AI-Generated Image Detection

Add code
Nov 23, 2024
Viaarxiv icon

LLaVA-o1: Let Vision Language Models Reason Step-by-Step

Add code
Nov 15, 2024
Viaarxiv icon

MoH: Multi-Head Attention as Mixture-of-Head Attention

Add code
Oct 15, 2024
Viaarxiv icon

MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts

Add code
Oct 09, 2024
Viaarxiv icon

MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval

Add code
Aug 20, 2024
Figure 1 for MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval
Figure 2 for MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval
Figure 3 for MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval
Figure 4 for MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval
Viaarxiv icon

Local Action-Guided Motion Diffusion Model for Text-to-Motion Generation

Add code
Jul 15, 2024
Viaarxiv icon

LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference

Add code
Jun 26, 2024
Figure 1 for LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference
Figure 2 for LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference
Figure 3 for LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference
Figure 4 for LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference
Viaarxiv icon

RAP: Efficient Text-Video Retrieval with Sparse-and-Correlated Adapter

Add code
May 29, 2024
Viaarxiv icon

LLMBind: A Unified Modality-Task Integration Framework

Add code
Mar 08, 2024
Viaarxiv icon