Picture for Yong Man Ro

Yong Man Ro

Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language

Add code
Sep 02, 2024
Viaarxiv icon

SPARK: Multi-Vision Sensor Perception and Reasoning Benchmark for Large-scale Vision-Language Models

Add code
Aug 23, 2024
Figure 1 for SPARK: Multi-Vision Sensor Perception and Reasoning Benchmark for Large-scale Vision-Language Models
Figure 2 for SPARK: Multi-Vision Sensor Perception and Reasoning Benchmark for Large-scale Vision-Language Models
Figure 3 for SPARK: Multi-Vision Sensor Perception and Reasoning Benchmark for Large-scale Vision-Language Models
Figure 4 for SPARK: Multi-Vision Sensor Perception and Reasoning Benchmark for Large-scale Vision-Language Models
Viaarxiv icon

TroL: Traversal of Layers for Large Language and Vision Models

Add code
Jun 18, 2024
Viaarxiv icon

Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation

Add code
Jun 12, 2024
Viaarxiv icon

CODE: Contrasting Self-generated Description to Combat Hallucination in Large Multi-modal Models

Add code
Jun 04, 2024
Viaarxiv icon

Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

Add code
May 27, 2024
Viaarxiv icon

Robust Pedestrian Detection via Constructing Versatile Pedestrian Knowledge Bank

Add code
Apr 30, 2024
Figure 1 for Robust Pedestrian Detection via Constructing Versatile Pedestrian Knowledge Bank
Figure 2 for Robust Pedestrian Detection via Constructing Versatile Pedestrian Knowledge Bank
Figure 3 for Robust Pedestrian Detection via Constructing Versatile Pedestrian Knowledge Bank
Figure 4 for Robust Pedestrian Detection via Constructing Versatile Pedestrian Knowledge Bank
Viaarxiv icon

MSCoTDet: Language-driven Multi-modal Fusion for Improved Multispectral Pedestrian Detection

Add code
Mar 22, 2024
Figure 1 for MSCoTDet: Language-driven Multi-modal Fusion for Improved Multispectral Pedestrian Detection
Figure 2 for MSCoTDet: Language-driven Multi-modal Fusion for Improved Multispectral Pedestrian Detection
Figure 3 for MSCoTDet: Language-driven Multi-modal Fusion for Improved Multispectral Pedestrian Detection
Figure 4 for MSCoTDet: Language-driven Multi-modal Fusion for Improved Multispectral Pedestrian Detection
Viaarxiv icon

What if...?: Counterfactual Inception to Mitigate Hallucination Effects in Large Multimodal Models

Add code
Mar 20, 2024
Viaarxiv icon

MoAI: Mixture of All Intelligence for Large Language and Vision Models

Add code
Mar 12, 2024
Viaarxiv icon