Picture for Yongming Rao

Yongming Rao

BREEN: Bridge Data-Efficient Encoder-Free Multimodal Learning with Learnable Queries

Add code
Mar 16, 2025
Viaarxiv icon

Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment

Add code
Feb 06, 2025
Figure 1 for Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment
Figure 2 for Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment
Figure 3 for Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment
Figure 4 for Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment
Viaarxiv icon

Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

Add code
Nov 21, 2024
Figure 1 for Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
Figure 2 for Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
Figure 3 for Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
Figure 4 for Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
Viaarxiv icon

Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model

Add code
Aug 01, 2024
Figure 1 for Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model
Figure 2 for Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model
Figure 3 for Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model
Figure 4 for Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model
Viaarxiv icon

Efficient Inference of Vision Instruction-Following Models with Elastic Cache

Add code
Jul 25, 2024
Viaarxiv icon

X-3D: Explicit 3D Structure Modeling for Point Cloud Recognition

Add code
Apr 23, 2024
Figure 1 for X-3D: Explicit 3D Structure Modeling for Point Cloud Recognition
Figure 2 for X-3D: Explicit 3D Structure Modeling for Point Cloud Recognition
Figure 3 for X-3D: Explicit 3D Structure Modeling for Point Cloud Recognition
Figure 4 for X-3D: Explicit 3D Structure Modeling for Point Cloud Recognition
Viaarxiv icon

Chain-of-Spot: Interactive Reasoning Improves Large Vision-Language Models

Add code
Mar 21, 2024
Viaarxiv icon

Generative Multimodal Models are In-Context Learners

Add code
Dec 20, 2023
Viaarxiv icon

Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior

Add code
Dec 11, 2023
Figure 1 for Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior
Figure 2 for Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior
Figure 3 for Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior
Figure 4 for Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior
Viaarxiv icon

TCOVIS: Temporally Consistent Online Video Instance Segmentation

Add code
Sep 21, 2023
Viaarxiv icon