Picture for Xiaojian Ma

Xiaojian Ma

ROCKET-1: Master Open-World Interaction with Visual-Temporal Context Prompting

Add code
Oct 23, 2024
Viaarxiv icon

Multi-modal Situated Reasoning in 3D Scenes

Add code
Sep 04, 2024
Viaarxiv icon

Task-oriented Sequential Grounding in 3D Scenes

Add code
Aug 07, 2024
Figure 1 for Task-oriented Sequential Grounding in 3D Scenes
Figure 2 for Task-oriented Sequential Grounding in 3D Scenes
Figure 3 for Task-oriented Sequential Grounding in 3D Scenes
Figure 4 for Task-oriented Sequential Grounding in 3D Scenes
Viaarxiv icon

UltraEdit: Instruction-based Fine-Grained Image Editing at Scale

Add code
Jul 07, 2024
Viaarxiv icon

OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents

Add code
Jun 27, 2024
Viaarxiv icon

Latent Energy-Based Odyssey: Black-Box Optimization via Expanded Exploration in the Energy-Based Latent Space

Add code
May 27, 2024
Viaarxiv icon

Unifying 3D Vision-Language Understanding via Promptable Queries

Add code
May 19, 2024
Figure 1 for Unifying 3D Vision-Language Understanding via Promptable Queries
Figure 2 for Unifying 3D Vision-Language Understanding via Promptable Queries
Figure 3 for Unifying 3D Vision-Language Understanding via Promptable Queries
Figure 4 for Unifying 3D Vision-Language Understanding via Promptable Queries
Viaarxiv icon

Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting

Add code
Mar 22, 2024
Viaarxiv icon

VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding

Add code
Mar 18, 2024
Viaarxiv icon

RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation

Add code
Mar 08, 2024
Viaarxiv icon