Picture for Xihui Liu

Xihui Liu

V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding

Add code
Dec 12, 2024
Viaarxiv icon

EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios

Add code
Dec 05, 2024
Figure 1 for EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios
Figure 2 for EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios
Figure 3 for EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios
Figure 4 for EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios
Viaarxiv icon

GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration

Add code
Dec 05, 2024
Figure 1 for GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration
Figure 2 for GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration
Figure 3 for GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration
Figure 4 for GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration
Viaarxiv icon

Moto: Latent Motion Token as the Bridging Language for Robot Manipulation

Add code
Dec 05, 2024
Figure 1 for Moto: Latent Motion Token as the Bridging Language for Robot Manipulation
Figure 2 for Moto: Latent Motion Token as the Bridging Language for Robot Manipulation
Figure 3 for Moto: Latent Motion Token as the Bridging Language for Robot Manipulation
Figure 4 for Moto: Latent Motion Token as the Bridging Language for Robot Manipulation
Viaarxiv icon

MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation

Add code
Dec 04, 2024
Viaarxiv icon

SAMPart3D: Segment Any Part in 3D Objects

Add code
Nov 11, 2024
Viaarxiv icon

WorldSimBench: Towards Video Generation Models as World Simulators

Add code
Oct 23, 2024
Figure 1 for WorldSimBench: Towards Video Generation Models as World Simulators
Figure 2 for WorldSimBench: Towards Video Generation Models as World Simulators
Figure 3 for WorldSimBench: Towards Video Generation Models as World Simulators
Figure 4 for WorldSimBench: Towards Video Generation Models as World Simulators
Viaarxiv icon

PUMA: Empowering Unified MLLM with Multi-granular Visual Generation

Add code
Oct 17, 2024
Viaarxiv icon

LVD-2M: A Long-take Video Dataset with Temporally Dense Captions

Add code
Oct 14, 2024
Viaarxiv icon

Loong: Generating Minute-level Long Videos with Autoregressive Language Models

Add code
Oct 03, 2024
Viaarxiv icon