Picture for Dongfu Jiang

Dongfu Jiang

MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks

Add code
Oct 14, 2024
Viaarxiv icon

VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation

Add code
Jun 24, 2024
Figure 1 for VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation
Figure 2 for VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation
Figure 3 for VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation
Figure 4 for VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation
Viaarxiv icon

WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences

Add code
Jun 16, 2024
Viaarxiv icon

GenAI Arena: An Open Evaluation Platform for Generative Models

Add code
Jun 06, 2024
Viaarxiv icon

MANTIS: Interleaved Multi-Image Instruction Tuning

Add code
May 02, 2024
Viaarxiv icon

VIEScore: Towards Explainable Metrics for Conditional Image Synthesis Evaluation

Add code
Dec 22, 2023
Figure 1 for VIEScore: Towards Explainable Metrics for Conditional Image Synthesis Evaluation
Figure 2 for VIEScore: Towards Explainable Metrics for Conditional Image Synthesis Evaluation
Figure 3 for VIEScore: Towards Explainable Metrics for Conditional Image Synthesis Evaluation
Figure 4 for VIEScore: Towards Explainable Metrics for Conditional Image Synthesis Evaluation
Viaarxiv icon

MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

Add code
Nov 27, 2023
Figure 1 for MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Figure 2 for MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Figure 3 for MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Figure 4 for MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Viaarxiv icon

TIGERScore: Towards Building Explainable Metric for All Text Generation Tasks

Add code
Oct 01, 2023
Viaarxiv icon

LLM-Blender: Ensembling Large Language Models with Pairwise Ranking and Generative Fusion

Add code
Jun 10, 2023
Viaarxiv icon

PairReranker: Pairwise Reranking for Natural Language Generation

Add code
Dec 20, 2022
Viaarxiv icon