Picture for Zhipin Wang

Zhipin Wang

DeepSeek vs. o3-mini: How Well can Reasoning LLMs Evaluate MT and Summarization?

Add code
Apr 10, 2025
Viaarxiv icon

Assessment of Multimodal Large Language Models in Alignment with Human Values

Add code
Mar 26, 2024
Viaarxiv icon

From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities

Add code
Jan 29, 2024
Figure 1 for From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities
Figure 2 for From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities
Figure 3 for From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities
Figure 4 for From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities
Viaarxiv icon

ChEF: A Comprehensive Evaluation Framework for Standardized Assessment of Multimodal Large Language Models

Add code
Nov 05, 2023
Viaarxiv icon