Picture for Zerui Cheng

Zerui Cheng

Michael Pokorny

TabularMath: Evaluating Computational Extrapolation in Tabular Learning via Program-Verified Synthesis

Add code
Jan 25, 2026
Viaarxiv icon

FutureX-Pro: Extending Future Prediction to High-Value Vertical Domains

Add code
Jan 18, 2026
Viaarxiv icon

FrontierCS: Evolving Challenges for Evolving Intelligence

Add code
Dec 17, 2025
Figure 1 for FrontierCS: Evolving Challenges for Evolving Intelligence
Figure 2 for FrontierCS: Evolving Challenges for Evolving Intelligence
Figure 3 for FrontierCS: Evolving Challenges for Evolving Intelligence
Figure 4 for FrontierCS: Evolving Challenges for Evolving Intelligence
Viaarxiv icon

LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming?

Add code
Jun 13, 2025
Viaarxiv icon

SPIN-Bench: How Well Do LLMs Plan Strategically and Reason Socially?

Add code
Mar 16, 2025
Figure 1 for SPIN-Bench: How Well Do LLMs Plan Strategically and Reason Socially?
Figure 2 for SPIN-Bench: How Well Do LLMs Plan Strategically and Reason Socially?
Figure 3 for SPIN-Bench: How Well Do LLMs Plan Strategically and Reason Socially?
Figure 4 for SPIN-Bench: How Well Do LLMs Plan Strategically and Reason Socially?
Viaarxiv icon

Humanity's Last Exam

Add code
Jan 24, 2025
Viaarxiv icon

OML: Open, Monetizable, and Loyal AI

Add code
Nov 01, 2024
Viaarxiv icon