Picture for Xihui Liu

Xihui Liu

Position: Interactive Generative Video as Next-Generation Game Engine

Add code
Mar 21, 2025
Viaarxiv icon

Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation

Add code
Mar 20, 2025
Viaarxiv icon

RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints

Add code
Mar 20, 2025
Viaarxiv icon

GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing

Add code
Mar 13, 2025
Viaarxiv icon

GameFactory: Creating New Games with Generative Interactive Videos

Add code
Jan 14, 2025
Figure 1 for GameFactory: Creating New Games with Generative Interactive Videos
Figure 2 for GameFactory: Creating New Games with Generative Interactive Videos
Figure 3 for GameFactory: Creating New Games with Generative Interactive Videos
Figure 4 for GameFactory: Creating New Games with Generative Interactive Videos
Viaarxiv icon

MBQ: Modality-Balanced Quantization for Large Vision-Language Models

Add code
Dec 27, 2024
Viaarxiv icon

Parallelized Autoregressive Visual Generation

Add code
Dec 19, 2024
Figure 1 for Parallelized Autoregressive Visual Generation
Figure 2 for Parallelized Autoregressive Visual Generation
Figure 3 for Parallelized Autoregressive Visual Generation
Figure 4 for Parallelized Autoregressive Visual Generation
Viaarxiv icon

V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding

Add code
Dec 12, 2024
Figure 1 for V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding
Figure 2 for V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding
Figure 3 for V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding
Figure 4 for V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding
Viaarxiv icon

GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration

Add code
Dec 05, 2024
Figure 1 for GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration
Figure 2 for GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration
Figure 3 for GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration
Figure 4 for GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration
Viaarxiv icon

Moto: Latent Motion Token as the Bridging Language for Robot Manipulation

Add code
Dec 05, 2024
Figure 1 for Moto: Latent Motion Token as the Bridging Language for Robot Manipulation
Figure 2 for Moto: Latent Motion Token as the Bridging Language for Robot Manipulation
Figure 3 for Moto: Latent Motion Token as the Bridging Language for Robot Manipulation
Figure 4 for Moto: Latent Motion Token as the Bridging Language for Robot Manipulation
Viaarxiv icon