Picture for Xiaoyuan Li

Xiaoyuan Li

HellaSwag-Pro: A Large-Scale Bilingual Benchmark for Evaluating the Robustness of LLMs in Commonsense Reasoning

Add code
Feb 17, 2025
Viaarxiv icon

Evaluating Mathematical Reasoning of Large Language Models: A Focus on Error Identification and Correction

Add code
Jun 02, 2024
Viaarxiv icon

Self-Paced Neutral Expression-Disentangled Learning for Facial Expression Recognition

Add code
Mar 21, 2023
Viaarxiv icon