Picture for Peng Jin

Peng Jin

Senior member, IEEE

MagicComp: Training-free Dual-Phase Refinement for Compositional Video Generation

Add code
Mar 18, 2025
Viaarxiv icon

WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation

Add code
Mar 10, 2025
Viaarxiv icon

Hierarchical Banzhaf Interaction for General Video-Language Representation Learning

Add code
Dec 30, 2024
Figure 1 for Hierarchical Banzhaf Interaction for General Video-Language Representation Learning
Figure 2 for Hierarchical Banzhaf Interaction for General Video-Language Representation Learning
Figure 3 for Hierarchical Banzhaf Interaction for General Video-Language Representation Learning
Figure 4 for Hierarchical Banzhaf Interaction for General Video-Language Representation Learning
Viaarxiv icon

Next Patch Prediction for Autoregressive Visual Generation

Add code
Dec 19, 2024
Viaarxiv icon

LLaVA-CoT: Let Vision Language Models Reason Step-by-Step

Add code
Nov 25, 2024
Figure 1 for LLaVA-CoT: Let Vision Language Models Reason Step-by-Step
Figure 2 for LLaVA-CoT: Let Vision Language Models Reason Step-by-Step
Figure 3 for LLaVA-CoT: Let Vision Language Models Reason Step-by-Step
Figure 4 for LLaVA-CoT: Let Vision Language Models Reason Step-by-Step
Viaarxiv icon

Effort: Efficient Orthogonal Modeling for Generalizable AI-Generated Image Detection

Add code
Nov 23, 2024
Viaarxiv icon

LLaVA-o1: Let Vision Language Models Reason Step-by-Step

Add code
Nov 15, 2024
Figure 1 for LLaVA-o1: Let Vision Language Models Reason Step-by-Step
Figure 2 for LLaVA-o1: Let Vision Language Models Reason Step-by-Step
Figure 3 for LLaVA-o1: Let Vision Language Models Reason Step-by-Step
Figure 4 for LLaVA-o1: Let Vision Language Models Reason Step-by-Step
Viaarxiv icon

MoH: Multi-Head Attention as Mixture-of-Head Attention

Add code
Oct 15, 2024
Viaarxiv icon

MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts

Add code
Oct 09, 2024
Viaarxiv icon

MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval

Add code
Aug 20, 2024
Figure 1 for MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval
Figure 2 for MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval
Figure 3 for MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval
Figure 4 for MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval
Viaarxiv icon