Picture for Yuhao Dong

Yuhao Dong

Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment

Add code
Feb 06, 2025
Figure 1 for Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment
Figure 2 for Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment
Figure 3 for Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment
Figure 4 for Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment
Viaarxiv icon

Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives

Add code
Jan 07, 2025
Figure 1 for Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives
Figure 2 for Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives
Figure 3 for Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives
Figure 4 for Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives
Viaarxiv icon

Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

Add code
Nov 21, 2024
Viaarxiv icon

Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model

Add code
Aug 01, 2024
Figure 1 for Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model
Figure 2 for Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model
Figure 3 for Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model
Figure 4 for Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model
Viaarxiv icon

Efficient Inference of Vision Instruction-Following Models with Elastic Cache

Add code
Jul 25, 2024
Viaarxiv icon

Chain-of-Spot: Interactive Reasoning Improves Large Vision-Language Models

Add code
Mar 21, 2024
Viaarxiv icon

NSM4D: Neural Scene Model Based Online 4D Point Cloud Sequence Understanding

Add code
Oct 12, 2023
Viaarxiv icon

Octopus: Embodied Vision-Language Programmer from Environmental Feedback

Add code
Oct 12, 2023
Viaarxiv icon

Complete-to-Partial 4D Distillation for Self-Supervised Point Cloud Sequence Representation Learning

Add code
Dec 20, 2022
Figure 1 for Complete-to-Partial 4D Distillation for Self-Supervised Point Cloud Sequence Representation Learning
Figure 2 for Complete-to-Partial 4D Distillation for Self-Supervised Point Cloud Sequence Representation Learning
Figure 3 for Complete-to-Partial 4D Distillation for Self-Supervised Point Cloud Sequence Representation Learning
Figure 4 for Complete-to-Partial 4D Distillation for Self-Supervised Point Cloud Sequence Representation Learning
Viaarxiv icon

ImageTBAD: A 3D Computed Tomography Angiography Image Dataset for Automatic Segmentation of Type-B Aortic Dissection

Add code
Sep 01, 2021
Figure 1 for ImageTBAD: A 3D Computed Tomography Angiography Image Dataset for Automatic Segmentation of Type-B Aortic Dissection
Figure 2 for ImageTBAD: A 3D Computed Tomography Angiography Image Dataset for Automatic Segmentation of Type-B Aortic Dissection
Figure 3 for ImageTBAD: A 3D Computed Tomography Angiography Image Dataset for Automatic Segmentation of Type-B Aortic Dissection
Figure 4 for ImageTBAD: A 3D Computed Tomography Angiography Image Dataset for Automatic Segmentation of Type-B Aortic Dissection
Viaarxiv icon