Picture for Haochuan Wang

Haochuan Wang

TMGBench: A Systematic Game Benchmark for Evaluating Strategic Reasoning Abilities of LLMs

Add code
Oct 14, 2024
Viaarxiv icon

Mitigating Gender Bias in Code Large Language Models via Model Editing

Add code
Oct 10, 2024
Viaarxiv icon

UNO Arena for Evaluating Sequential Decision-Making Capability of Large Language Models

Add code
Jun 24, 2024
Viaarxiv icon