Picture for Yiyang Zhou

Yiyang Zhou

Fine-Grained Verifiers: Preference Modeling as Next-token Prediction in Vision-Language Alignment

Add code
Oct 18, 2024
Figure 1 for Fine-Grained Verifiers: Preference Modeling as Next-token Prediction in Vision-Language Alignment
Figure 2 for Fine-Grained Verifiers: Preference Modeling as Next-token Prediction in Vision-Language Alignment
Figure 3 for Fine-Grained Verifiers: Preference Modeling as Next-token Prediction in Vision-Language Alignment
Figure 4 for Fine-Grained Verifiers: Preference Modeling as Next-token Prediction in Vision-Language Alignment
Viaarxiv icon

MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models

Add code
Oct 14, 2024
Figure 1 for MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models
Figure 2 for MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models
Figure 3 for MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models
Figure 4 for MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models
Viaarxiv icon

VHELM: A Holistic Evaluation of Vision Language Models

Add code
Oct 09, 2024
Figure 1 for VHELM: A Holistic Evaluation of Vision Language Models
Figure 2 for VHELM: A Holistic Evaluation of Vision Language Models
Figure 3 for VHELM: A Holistic Evaluation of Vision Language Models
Figure 4 for VHELM: A Holistic Evaluation of Vision Language Models
Viaarxiv icon

MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?

Add code
Jul 05, 2024
Figure 1 for MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?
Figure 2 for MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?
Figure 3 for MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?
Figure 4 for MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?
Viaarxiv icon

CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models

Add code
Jun 10, 2024
Figure 1 for CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models
Figure 2 for CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models
Figure 3 for CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models
Figure 4 for CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models
Viaarxiv icon

Enhancing Visual-Language Modality Alignment in Large Vision Language Models via Self-Improvement

Add code
May 29, 2024
Viaarxiv icon

Calibrated Self-Rewarding Vision Language Models

Add code
May 23, 2024
Viaarxiv icon

Aligning Modalities in Vision Large Language Models via Preference Fine-tuning

Add code
Feb 18, 2024
Viaarxiv icon

How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs

Add code
Nov 27, 2023
Viaarxiv icon

Holistic Analysis of Hallucination in GPT-4V: Bias and Interference Challenges

Add code
Nov 07, 2023
Viaarxiv icon