Picture for Steffi Chern

Steffi Chern

Halu-J: Critique-Based Hallucination Judge

Add code
Jul 17, 2024
Viaarxiv icon

BeHonest: Benchmarking Honesty of Large Language Models

Add code
Jun 19, 2024
Viaarxiv icon

OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI

Add code
Jun 18, 2024
Figure 1 for OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
Figure 2 for OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
Figure 3 for OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
Figure 4 for OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
Viaarxiv icon

Can Large Language Models be Trusted for Evaluation? Scalable Meta-Evaluation of LLMs as Evaluators via Agent Debate

Add code
Jan 30, 2024
Viaarxiv icon

Combating Adversarial Attacks with Multi-Agent Debate

Add code
Jan 11, 2024
Viaarxiv icon

Align on the Fly: Adapting Chatbot Behavior to Established Norms

Add code
Dec 26, 2023
Viaarxiv icon

FacTool: Factuality Detection in Generative AI -- A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios

Add code
Jul 26, 2023
Viaarxiv icon