Picture for Sipeng Zheng

Sipeng Zheng

Quo Vadis, Motion Generation? From Large Language Models to Large Motion Models

Add code
Oct 04, 2024
Viaarxiv icon

From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities

Add code
Oct 03, 2024
Viaarxiv icon

QuadrupedGPT: Towards a Versatile Quadruped Agent in Open-ended Worlds

Add code
Jun 24, 2024
Viaarxiv icon

EgoNCE++: Do Egocentric Video-Language Models Really Understand Hand-Object Interactions?

Add code
May 28, 2024
Viaarxiv icon

UniCode: Learning a Unified Codebook for Multimodal Large Language Models

Add code
Mar 14, 2024
Viaarxiv icon

POV: Prompt-Oriented View-Agnostic Learning for Egocentric Hand-Object Interaction in the Multi-View World

Add code
Mar 09, 2024
Viaarxiv icon

SPAFormer: Sequential 3D Part Assembly with Transformers

Add code
Mar 09, 2024
Viaarxiv icon

Steve-Eye: Equipping LLM-based Embodied Agents with Visual Perception in Open Worlds

Add code
Oct 20, 2023
Viaarxiv icon

LLaMA Rider: Spurring Large Language Models to Explore the Open World

Add code
Oct 13, 2023
Viaarxiv icon

No-frills Temporal Video Grounding: Multi-Scale Neighboring Attention and Zoom-in Boundary Detection

Add code
Jul 20, 2023
Viaarxiv icon