Picture for Yibo Yan

Yibo Yan

Explainable and Interpretable Multimodal Large Language Models: A Comprehensive Survey

Add code
Dec 03, 2024
Figure 1 for Explainable and Interpretable Multimodal Large Language Models: A Comprehensive Survey
Figure 2 for Explainable and Interpretable Multimodal Large Language Models: A Comprehensive Survey
Figure 3 for Explainable and Interpretable Multimodal Large Language Models: A Comprehensive Survey
Figure 4 for Explainable and Interpretable Multimodal Large Language Models: A Comprehensive Survey
Viaarxiv icon

Optimizing Multispectral Object Detection: A Bag of Tricks and Comprehensive Benchmarks

Add code
Nov 27, 2024
Figure 1 for Optimizing Multispectral Object Detection: A Bag of Tricks and Comprehensive Benchmarks
Figure 2 for Optimizing Multispectral Object Detection: A Bag of Tricks and Comprehensive Benchmarks
Figure 3 for Optimizing Multispectral Object Detection: A Bag of Tricks and Comprehensive Benchmarks
Figure 4 for Optimizing Multispectral Object Detection: A Bag of Tricks and Comprehensive Benchmarks
Viaarxiv icon

Learning Robust Anymodal Segmentor with Unimodal and Cross-modal Distillation

Add code
Nov 26, 2024
Viaarxiv icon

SAVEn-Vid: Synergistic Audio-Visual Integration for Enhanced Understanding in Long Video Context

Add code
Nov 25, 2024
Viaarxiv icon

Exploring Response Uncertainty in MLLMs: An Empirical Evaluation under Misleading Scenarios

Add code
Nov 05, 2024
Figure 1 for Exploring Response Uncertainty in MLLMs: An Empirical Evaluation under Misleading Scenarios
Figure 2 for Exploring Response Uncertainty in MLLMs: An Empirical Evaluation under Misleading Scenarios
Figure 3 for Exploring Response Uncertainty in MLLMs: An Empirical Evaluation under Misleading Scenarios
Figure 4 for Exploring Response Uncertainty in MLLMs: An Empirical Evaluation under Misleading Scenarios
Viaarxiv icon

MINER: Mining the Underlying Pattern of Modality-Specific Neurons in Multimodal Large Language Models

Add code
Oct 07, 2024
Figure 1 for MINER: Mining the Underlying Pattern of Modality-Specific Neurons in Multimodal Large Language Models
Figure 2 for MINER: Mining the Underlying Pattern of Modality-Specific Neurons in Multimodal Large Language Models
Figure 3 for MINER: Mining the Underlying Pattern of Modality-Specific Neurons in Multimodal Large Language Models
Figure 4 for MINER: Mining the Underlying Pattern of Modality-Specific Neurons in Multimodal Large Language Models
Viaarxiv icon

Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality

Add code
Oct 07, 2024
Figure 1 for Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality
Figure 2 for Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality
Figure 3 for Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality
Figure 4 for Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality
Viaarxiv icon

ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal Large Language Models Via Error Detection

Add code
Oct 06, 2024
Figure 1 for ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal Large Language Models Via Error Detection
Figure 2 for ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal Large Language Models Via Error Detection
Figure 3 for ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal Large Language Models Via Error Detection
Figure 4 for ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal Large Language Models Via Error Detection
Viaarxiv icon

Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models

Add code
Oct 04, 2024
Figure 1 for Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models
Figure 2 for Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models
Figure 3 for Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models
Figure 4 for Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models
Viaarxiv icon

GeoReasoner: Reasoning On Geospatially Grounded Context For Natural Language Understanding

Add code
Aug 21, 2024
Viaarxiv icon