Picture for Qihui Zhang

Qihui Zhang

Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge

Add code
Oct 03, 2024
Figure 1 for Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge
Figure 2 for Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge
Figure 3 for Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge
Figure 4 for Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge
Viaarxiv icon

UniGen: A Unified Framework for Textual Dataset Generation Using Large Language Models

Add code
Jun 27, 2024
Viaarxiv icon

GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents

Add code
Jun 16, 2024
Viaarxiv icon

The Best of Both Worlds: Toward an Honest and Helpful Large Language Model

Add code
Jun 01, 2024
Viaarxiv icon

MLLM-as-a-Judge: Assessing Multimodal LLM-as-a-Judge with Vision-Language Benchmark

Add code
Feb 07, 2024
Viaarxiv icon

TrustLLM: Trustworthiness in Large Language Models

Add code
Jan 25, 2024
Figure 1 for TrustLLM: Trustworthiness in Large Language Models
Figure 2 for TrustLLM: Trustworthiness in Large Language Models
Figure 3 for TrustLLM: Trustworthiness in Large Language Models
Figure 4 for TrustLLM: Trustworthiness in Large Language Models
Viaarxiv icon

LLM-as-a-Coauthor: The Challenges of Detecting LLM-Human Mixcase

Add code
Jan 11, 2024
Figure 1 for LLM-as-a-Coauthor: The Challenges of Detecting LLM-Human Mixcase
Figure 2 for LLM-as-a-Coauthor: The Challenges of Detecting LLM-Human Mixcase
Figure 3 for LLM-as-a-Coauthor: The Challenges of Detecting LLM-Human Mixcase
Figure 4 for LLM-as-a-Coauthor: The Challenges of Detecting LLM-Human Mixcase
Viaarxiv icon

MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to Use

Add code
Oct 12, 2023
Viaarxiv icon

TrustGPT: A Benchmark for Trustworthy and Responsible Large Language Models

Add code
Jun 20, 2023
Viaarxiv icon