Picture for Sergei Tilga

Sergei Tilga

U-MATH: A University-Level Benchmark for Evaluating Mathematical Skills in LLMs

Add code
Dec 04, 2024
Figure 1 for U-MATH: A University-Level Benchmark for Evaluating Mathematical Skills in LLMs
Figure 2 for U-MATH: A University-Level Benchmark for Evaluating Mathematical Skills in LLMs
Figure 3 for U-MATH: A University-Level Benchmark for Evaluating Mathematical Skills in LLMs
Figure 4 for U-MATH: A University-Level Benchmark for Evaluating Mathematical Skills in LLMs
Viaarxiv icon

Hands-On Tutorial: Labeling with LLM and Human-in-the-Loop

Add code
Nov 07, 2024
Viaarxiv icon

Beemo: Benchmark of Expert-edited Machine-generated Outputs

Add code
Nov 06, 2024
Viaarxiv icon