Picture for Pengpeng Zeng

Pengpeng Zeng

Structure-aware Prompt Adaptation from Seen to Unseen for Open-Vocabulary Compositional Zero-Shot Learning

Add code
Mar 04, 2026
Viaarxiv icon

TIMI: Training-Free Image-to-3D Multi-Instance Generation with Spatial Fidelity

Add code
Mar 02, 2026
Viaarxiv icon

Sim-and-Human Co-training for Data-Efficient and Generalizable Robotic Manipulation

Add code
Jan 27, 2026
Viaarxiv icon

From One-to-One to Many-to-Many: Dynamic Cross-Layer Injection for Deep Vision-Language Fusion

Add code
Jan 15, 2026
Viaarxiv icon

RISER: Orchestrating Latent Reasoning Skills for Adaptive Activation Steering

Add code
Jan 14, 2026
Viaarxiv icon

OmniCharacter: Towards Immersive Role-Playing Agents with Seamless Speech-Language Personality Interaction

Add code
May 26, 2025
Figure 1 for OmniCharacter: Towards Immersive Role-Playing Agents with Seamless Speech-Language Personality Interaction
Figure 2 for OmniCharacter: Towards Immersive Role-Playing Agents with Seamless Speech-Language Personality Interaction
Figure 3 for OmniCharacter: Towards Immersive Role-Playing Agents with Seamless Speech-Language Personality Interaction
Figure 4 for OmniCharacter: Towards Immersive Role-Playing Agents with Seamless Speech-Language Personality Interaction
Viaarxiv icon

Towards Generalized and Training-Free Text-Guided Semantic Manipulation

Add code
Apr 24, 2025
Figure 1 for Towards Generalized and Training-Free Text-Guided Semantic Manipulation
Figure 2 for Towards Generalized and Training-Free Text-Guided Semantic Manipulation
Figure 3 for Towards Generalized and Training-Free Text-Guided Semantic Manipulation
Figure 4 for Towards Generalized and Training-Free Text-Guided Semantic Manipulation
Viaarxiv icon

Skip Tuning: Pre-trained Vision-Language Models are Effective and Efficient Adapters Themselves

Add code
Dec 16, 2024
Figure 1 for Skip Tuning: Pre-trained Vision-Language Models are Effective and Efficient Adapters Themselves
Figure 2 for Skip Tuning: Pre-trained Vision-Language Models are Effective and Efficient Adapters Themselves
Figure 3 for Skip Tuning: Pre-trained Vision-Language Models are Effective and Efficient Adapters Themselves
Figure 4 for Skip Tuning: Pre-trained Vision-Language Models are Effective and Efficient Adapters Themselves
Viaarxiv icon

GT23D-Bench: A Comprehensive General Text-to-3D Generation Benchmark

Add code
Dec 13, 2024
Viaarxiv icon

SeMv-3D: Towards Semantic and Mutil-view Consistency simultaneously for General Text-to-3D Generation with Triplane Priors

Add code
Oct 10, 2024
Viaarxiv icon