Picture for Yujing Qiao

Yujing Qiao

SysBench: Can Large Language Models Follow System Messages?

Add code
Aug 20, 2024
Viaarxiv icon

MathScape: Evaluating MLLMs in multimodal Math Scenarios through a Hierarchical Benchmark

Add code
Aug 15, 2024
Viaarxiv icon

CFBench: A Comprehensive Constraints-Following Benchmark for LLMs

Add code
Aug 02, 2024
Viaarxiv icon

PAS: Data-Efficient Plug-and-Play Prompt Augmentation System

Add code
Jul 11, 2024
Viaarxiv icon