Picture for Yizhuo Li

Yizhuo Li

Moto: Latent Motion Token as the Bridging Language for Robot Manipulation

Add code
Dec 05, 2024
Figure 1 for Moto: Latent Motion Token as the Bridging Language for Robot Manipulation
Figure 2 for Moto: Latent Motion Token as the Bridging Language for Robot Manipulation
Figure 3 for Moto: Latent Motion Token as the Bridging Language for Robot Manipulation
Figure 4 for Moto: Latent Motion Token as the Bridging Language for Robot Manipulation
Viaarxiv icon

Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation

Add code
Dec 05, 2024
Viaarxiv icon

DiCoDe: Diffusion-Compressed Deep Tokens for Autoregressive Video Generation with Language Models

Add code
Dec 05, 2024
Viaarxiv icon

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark

Add code
Dec 03, 2023
Viaarxiv icon

Harvest Video Foundation Models via Efficient Post-Pretraining

Add code
Oct 30, 2023
Viaarxiv icon

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation

Add code
Jul 13, 2023
Viaarxiv icon

VideoChat: Chat-Centric Video Understanding

Add code
May 10, 2023
Viaarxiv icon

Unmasked Teacher: Towards Training-Efficient Video Foundation Models

Add code
Mar 28, 2023
Figure 1 for Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Figure 2 for Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Figure 3 for Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Figure 4 for Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Viaarxiv icon

InternVideo: General Video Foundation Models via Generative and Discriminative Learning

Add code
Dec 07, 2022
Viaarxiv icon

InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges

Add code
Nov 17, 2022
Viaarxiv icon