Picture for Pengpeng Zeng

Pengpeng Zeng

Sim-and-Human Co-training for Data-Efficient and Generalizable Robotic Manipulation

Add code
Jan 27, 2026
Viaarxiv icon

From One-to-One to Many-to-Many: Dynamic Cross-Layer Injection for Deep Vision-Language Fusion

Add code
Jan 15, 2026
Viaarxiv icon

RISER: Orchestrating Latent Reasoning Skills for Adaptive Activation Steering

Add code
Jan 14, 2026
Viaarxiv icon

OmniCharacter: Towards Immersive Role-Playing Agents with Seamless Speech-Language Personality Interaction

Add code
May 26, 2025
Figure 1 for OmniCharacter: Towards Immersive Role-Playing Agents with Seamless Speech-Language Personality Interaction
Figure 2 for OmniCharacter: Towards Immersive Role-Playing Agents with Seamless Speech-Language Personality Interaction
Figure 3 for OmniCharacter: Towards Immersive Role-Playing Agents with Seamless Speech-Language Personality Interaction
Figure 4 for OmniCharacter: Towards Immersive Role-Playing Agents with Seamless Speech-Language Personality Interaction
Viaarxiv icon

Towards Generalized and Training-Free Text-Guided Semantic Manipulation

Add code
Apr 24, 2025
Figure 1 for Towards Generalized and Training-Free Text-Guided Semantic Manipulation
Figure 2 for Towards Generalized and Training-Free Text-Guided Semantic Manipulation
Figure 3 for Towards Generalized and Training-Free Text-Guided Semantic Manipulation
Figure 4 for Towards Generalized and Training-Free Text-Guided Semantic Manipulation
Viaarxiv icon

Skip Tuning: Pre-trained Vision-Language Models are Effective and Efficient Adapters Themselves

Add code
Dec 16, 2024
Figure 1 for Skip Tuning: Pre-trained Vision-Language Models are Effective and Efficient Adapters Themselves
Figure 2 for Skip Tuning: Pre-trained Vision-Language Models are Effective and Efficient Adapters Themselves
Figure 3 for Skip Tuning: Pre-trained Vision-Language Models are Effective and Efficient Adapters Themselves
Figure 4 for Skip Tuning: Pre-trained Vision-Language Models are Effective and Efficient Adapters Themselves
Viaarxiv icon

GT23D-Bench: A Comprehensive General Text-to-3D Generation Benchmark

Add code
Dec 13, 2024
Viaarxiv icon

SeMv-3D: Towards Semantic and Mutil-view Consistency simultaneously for General Text-to-3D Generation with Triplane Priors

Add code
Oct 10, 2024
Viaarxiv icon

MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct

Add code
Sep 09, 2024
Figure 1 for MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct
Figure 2 for MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct
Figure 3 for MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct
Figure 4 for MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct
Viaarxiv icon

Text-Video Retrieval with Global-Local Semantic Consistent Learning

Add code
May 21, 2024
Figure 1 for Text-Video Retrieval with Global-Local Semantic Consistent Learning
Figure 2 for Text-Video Retrieval with Global-Local Semantic Consistent Learning
Figure 3 for Text-Video Retrieval with Global-Local Semantic Consistent Learning
Figure 4 for Text-Video Retrieval with Global-Local Semantic Consistent Learning
Viaarxiv icon