Picture for Canyu Chen

Canyu Chen

From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge

Add code
Nov 25, 2024
Viaarxiv icon

ClinicalBench: Can LLMs Beat Traditional ML Models in Clinical Prediction?

Add code
Nov 10, 2024
Viaarxiv icon

Can Knowledge Editing Really Correct Hallucinations?

Add code
Oct 21, 2024
Figure 1 for Can Knowledge Editing Really Correct Hallucinations?
Figure 2 for Can Knowledge Editing Really Correct Hallucinations?
Figure 3 for Can Knowledge Editing Really Correct Hallucinations?
Figure 4 for Can Knowledge Editing Really Correct Hallucinations?
Viaarxiv icon

FMBench: Benchmarking Fairness in Multimodal Large Language Models on Medical Tasks

Add code
Oct 01, 2024
Viaarxiv icon

Model Attribution in Machine-Generated Disinformation: A Domain Generalization Approach with Supervised Contrastive Learning

Add code
Jul 31, 2024
Viaarxiv icon

Can Editing LLMs Inject Harm?

Add code
Jul 29, 2024
Viaarxiv icon

MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?

Add code
Jul 05, 2024
Figure 1 for MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?
Figure 2 for MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?
Figure 3 for MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?
Figure 4 for MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?
Viaarxiv icon

Introducing v0.5 of the AI Safety Benchmark from MLCommons

Add code
Apr 18, 2024
Figure 1 for Introducing v0.5 of the AI Safety Benchmark from MLCommons
Figure 2 for Introducing v0.5 of the AI Safety Benchmark from MLCommons
Figure 3 for Introducing v0.5 of the AI Safety Benchmark from MLCommons
Figure 4 for Introducing v0.5 of the AI Safety Benchmark from MLCommons
Viaarxiv icon

Can Large Language Models Identify Authorship?

Add code
Mar 13, 2024
Figure 1 for Can Large Language Models Identify Authorship?
Figure 2 for Can Large Language Models Identify Authorship?
Figure 3 for Can Large Language Models Identify Authorship?
Figure 4 for Can Large Language Models Identify Authorship?
Viaarxiv icon

Can Large Language Model Agents Simulate Human Trust Behaviors?

Add code
Feb 07, 2024
Figure 1 for Can Large Language Model Agents Simulate Human Trust Behaviors?
Figure 2 for Can Large Language Model Agents Simulate Human Trust Behaviors?
Figure 3 for Can Large Language Model Agents Simulate Human Trust Behaviors?
Figure 4 for Can Large Language Model Agents Simulate Human Trust Behaviors?
Viaarxiv icon