Picture for Peng Xia

Peng Xia

MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models

Add code
Oct 16, 2024
Viaarxiv icon

MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models

Add code
Oct 14, 2024
Figure 1 for MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models
Figure 2 for MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models
Figure 3 for MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models
Figure 4 for MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models
Viaarxiv icon

RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models

Add code
Jul 06, 2024
Figure 1 for RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models
Figure 2 for RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models
Figure 3 for RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models
Figure 4 for RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models
Viaarxiv icon

TP-DRSeg: Improving Diabetic Retinopathy Lesion Segmentation with Explicit Text-Prompts Assisted SAM

Add code
Jun 22, 2024
Viaarxiv icon

OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding

Add code
Jun 12, 2024
Viaarxiv icon

CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models

Add code
Jun 10, 2024
Figure 1 for CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models
Figure 2 for CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models
Figure 3 for CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models
Figure 4 for CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models
Viaarxiv icon

Diffusion Model Driven Test-Time Image Adaptation for Robust Skin Lesion Classification

Add code
May 18, 2024
Viaarxiv icon

HGCLIP: Exploring Vision-Language Models with Graph Representations for Hierarchical Understanding

Add code
Nov 23, 2023
Viaarxiv icon

NurViD: A Large Expert-Level Video Database for Nursing Procedure Activity Understanding

Add code
Oct 20, 2023
Viaarxiv icon

LMPT: Prompt Tuning with Class-Specific Embedding Loss for Long-tailed Multi-Label Visual Recognition

Add code
May 08, 2023
Viaarxiv icon