Picture for Sipeng Zheng

Sipeng Zheng

EgoDTM: Towards 3D-Aware Egocentric Video-Language Pretraining

Add code
Mar 19, 2025
Viaarxiv icon

Taking Notes Brings Focus? Towards Multi-Turn Multimodal Dialogue Learning

Add code
Mar 10, 2025
Viaarxiv icon

VideoOrion: Tokenizing Object Dynamics in Videos

Add code
Nov 25, 2024
Viaarxiv icon

Quo Vadis, Motion Generation? From Large Language Models to Large Motion Models

Add code
Oct 04, 2024
Viaarxiv icon

From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities

Add code
Oct 03, 2024
Figure 1 for From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities
Figure 2 for From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities
Figure 3 for From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities
Figure 4 for From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities
Viaarxiv icon

QuadrupedGPT: Towards a Versatile Quadruped Agent in Open-ended Worlds

Add code
Jun 24, 2024
Viaarxiv icon

EgoNCE++: Do Egocentric Video-Language Models Really Understand Hand-Object Interactions?

Add code
May 28, 2024
Viaarxiv icon

UniCode: Learning a Unified Codebook for Multimodal Large Language Models

Add code
Mar 14, 2024
Viaarxiv icon

SPAFormer: Sequential 3D Part Assembly with Transformers

Add code
Mar 09, 2024
Viaarxiv icon

POV: Prompt-Oriented View-Agnostic Learning for Egocentric Hand-Object Interaction in the Multi-View World

Add code
Mar 09, 2024
Viaarxiv icon