Picture for Yixiao Ge

Yixiao Ge

EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios

Add code
Dec 05, 2024
Figure 1 for EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios
Figure 2 for EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios
Figure 3 for EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios
Figure 4 for EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios
Viaarxiv icon

Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation

Add code
Dec 05, 2024
Figure 1 for Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation
Figure 2 for Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation
Figure 3 for Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation
Figure 4 for Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation
Viaarxiv icon

DiCoDe: Diffusion-Compressed Deep Tokens for Autoregressive Video Generation with Language Models

Add code
Dec 05, 2024
Viaarxiv icon

Moto: Latent Motion Token as the Bridging Language for Robot Manipulation

Add code
Dec 05, 2024
Figure 1 for Moto: Latent Motion Token as the Bridging Language for Robot Manipulation
Figure 2 for Moto: Latent Motion Token as the Bridging Language for Robot Manipulation
Figure 3 for Moto: Latent Motion Token as the Bridging Language for Robot Manipulation
Figure 4 for Moto: Latent Motion Token as the Bridging Language for Robot Manipulation
Viaarxiv icon

Taming Scalable Visual Tokenizer for Autoregressive Image Generation

Add code
Dec 03, 2024
Figure 1 for Taming Scalable Visual Tokenizer for Autoregressive Image Generation
Figure 2 for Taming Scalable Visual Tokenizer for Autoregressive Image Generation
Figure 3 for Taming Scalable Visual Tokenizer for Autoregressive Image Generation
Figure 4 for Taming Scalable Visual Tokenizer for Autoregressive Image Generation
Viaarxiv icon

ATP-LLaVA: Adaptive Token Pruning for Large Vision Language Models

Add code
Nov 30, 2024
Viaarxiv icon

PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance

Add code
Nov 05, 2024
Figure 1 for PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance
Figure 2 for PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance
Figure 3 for PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance
Figure 4 for PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance
Viaarxiv icon

Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation

Add code
Sep 06, 2024
Viaarxiv icon

SEED-Story: Multimodal Long Story Generation with Large Language Model

Add code
Jul 11, 2024
Viaarxiv icon

VoCo-LLaMA: Towards Vision Compression with Large Language Models

Add code
Jun 18, 2024
Viaarxiv icon