Picture for Sipeng Zheng

Sipeng Zheng

VideoOrion: Tokenizing Object Dynamics in Videos

Add code
Nov 25, 2024
Viaarxiv icon

Quo Vadis, Motion Generation? From Large Language Models to Large Motion Models

Add code
Oct 04, 2024
Viaarxiv icon

From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities

Add code
Oct 03, 2024
Figure 1 for From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities
Figure 2 for From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities
Figure 3 for From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities
Figure 4 for From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities
Viaarxiv icon

QuadrupedGPT: Towards a Versatile Quadruped Agent in Open-ended Worlds

Add code
Jun 24, 2024
Viaarxiv icon

EgoNCE++: Do Egocentric Video-Language Models Really Understand Hand-Object Interactions?

Add code
May 28, 2024
Viaarxiv icon

UniCode: Learning a Unified Codebook for Multimodal Large Language Models

Add code
Mar 14, 2024
Viaarxiv icon

POV: Prompt-Oriented View-Agnostic Learning for Egocentric Hand-Object Interaction in the Multi-View World

Add code
Mar 09, 2024
Viaarxiv icon

SPAFormer: Sequential 3D Part Assembly with Transformers

Add code
Mar 09, 2024
Viaarxiv icon

Steve-Eye: Equipping LLM-based Embodied Agents with Visual Perception in Open Worlds

Add code
Oct 20, 2023
Viaarxiv icon

LLaMA Rider: Spurring Large Language Models to Explore the Open World

Add code
Oct 13, 2023
Viaarxiv icon