Picture for Jiashuo Liu

Jiashuo Liu

TabularMath: Evaluating Computational Extrapolation in Tabular Learning via Program-Verified Synthesis

Add code
Jan 25, 2026
Viaarxiv icon

FutureX-Pro: Extending Future Prediction to High-Value Vertical Domains

Add code
Jan 18, 2026
Viaarxiv icon

LLM Swiss Round: Aggregating Multi-Benchmark Performance via Competitive Swiss-System Dynamics

Add code
Dec 24, 2025
Viaarxiv icon

AInsteinBench: Benchmarking Coding Agents on Scientific Repositories

Add code
Dec 24, 2025
Viaarxiv icon

Mitigating LLM Hallucination via Behaviorally Calibrated Reinforcement Learning

Add code
Dec 22, 2025
Viaarxiv icon

DiscoX: Benchmarking Discourse-Level Translation task in Expert Domains

Add code
Nov 14, 2025
Figure 1 for DiscoX: Benchmarking Discourse-Level Translation task in Expert Domains
Figure 2 for DiscoX: Benchmarking Discourse-Level Translation task in Expert Domains
Figure 3 for DiscoX: Benchmarking Discourse-Level Translation task in Expert Domains
Figure 4 for DiscoX: Benchmarking Discourse-Level Translation task in Expert Domains
Viaarxiv icon

LPFQA: A Long-Tail Professional Forum-based Benchmark for LLM Evaluation

Add code
Nov 09, 2025
Viaarxiv icon

RLoop: An Self-Improving Framework for Reinforcement Learning with Iterative Policy Initialization

Add code
Nov 06, 2025
Figure 1 for RLoop: An Self-Improving Framework for Reinforcement Learning with Iterative Policy Initialization
Figure 2 for RLoop: An Self-Improving Framework for Reinforcement Learning with Iterative Policy Initialization
Figure 3 for RLoop: An Self-Improving Framework for Reinforcement Learning with Iterative Policy Initialization
Figure 4 for RLoop: An Self-Improving Framework for Reinforcement Learning with Iterative Policy Initialization
Viaarxiv icon

FinSearchComp: Towards a Realistic, Expert-Level Evaluation of Financial Search and Reasoning

Add code
Sep 16, 2025
Figure 1 for FinSearchComp: Towards a Realistic, Expert-Level Evaluation of Financial Search and Reasoning
Figure 2 for FinSearchComp: Towards a Realistic, Expert-Level Evaluation of Financial Search and Reasoning
Figure 3 for FinSearchComp: Towards a Realistic, Expert-Level Evaluation of Financial Search and Reasoning
Figure 4 for FinSearchComp: Towards a Realistic, Expert-Level Evaluation of Financial Search and Reasoning
Viaarxiv icon

LimiX: Unleashing Structured-Data Modeling Capability for Generalist Intelligence

Add code
Sep 03, 2025
Figure 1 for LimiX: Unleashing Structured-Data Modeling Capability for Generalist Intelligence
Figure 2 for LimiX: Unleashing Structured-Data Modeling Capability for Generalist Intelligence
Figure 3 for LimiX: Unleashing Structured-Data Modeling Capability for Generalist Intelligence
Figure 4 for LimiX: Unleashing Structured-Data Modeling Capability for Generalist Intelligence
Viaarxiv icon