Picture for Yongming Rao

Yongming Rao

Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

Add code
Nov 21, 2024
Viaarxiv icon

Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model

Add code
Aug 01, 2024
Viaarxiv icon

Efficient Inference of Vision Instruction-Following Models with Elastic Cache

Add code
Jul 25, 2024
Viaarxiv icon

X-3D: Explicit 3D Structure Modeling for Point Cloud Recognition

Add code
Apr 23, 2024
Viaarxiv icon

Chain-of-Spot: Interactive Reasoning Improves Large Vision-Language Models

Add code
Mar 21, 2024
Viaarxiv icon

Generative Multimodal Models are In-Context Learners

Add code
Dec 20, 2023
Viaarxiv icon

Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior

Add code
Dec 11, 2023
Viaarxiv icon

TCOVIS: Temporally Consistent Online Video Instance Segmentation

Add code
Sep 21, 2023
Viaarxiv icon

Take-A-Photo: 3D-to-2D Generative Pre-training of Point Cloud Models

Add code
Jul 27, 2023
Viaarxiv icon

Unleashing Text-to-Image Diffusion Models for Visual Perception

Add code
Mar 03, 2023
Viaarxiv icon