Picture for Xuezhi Cao

Xuezhi Cao

Alphabetical order by last name

General365: Benchmarking General Reasoning in Large Language Models Across Diverse and Challenging Tasks

Add code
Apr 13, 2026
Viaarxiv icon

LARY: A Latent Action Representation Yielding Benchmark for Generalizable Vision-to-Action Alignment

Add code
Apr 13, 2026
Viaarxiv icon

TR-ICRL: Test-Time Rethinking for In-Context Reinforcement Learning

Add code
Apr 01, 2026
Viaarxiv icon

LongCat-Next: Lexicalizing Modalities as Discrete Tokens

Add code
Mar 29, 2026
Viaarxiv icon

LongCat-Flash-Prover: Advancing Native Formal Reasoning via Agentic Tool-Integrated Reinforcement Learning

Add code
Mar 22, 2026
Viaarxiv icon

AMemGym: Interactive Memory Benchmarking for Assistants in Long-Horizon Conversations

Add code
Mar 02, 2026
Viaarxiv icon

LongCat-Flash-Thinking-2601 Technical Report

Add code
Jan 23, 2026
Viaarxiv icon

UniHetero: Could Generation Enhance Understanding for Vision-Language-Model at Large Data Scale?

Add code
Dec 30, 2025
Viaarxiv icon

AMO-Bench: Large Language Models Still Struggle in High School Math Competitions

Add code
Oct 30, 2025
Figure 1 for AMO-Bench: Large Language Models Still Struggle in High School Math Competitions
Figure 2 for AMO-Bench: Large Language Models Still Struggle in High School Math Competitions
Figure 3 for AMO-Bench: Large Language Models Still Struggle in High School Math Competitions
Figure 4 for AMO-Bench: Large Language Models Still Struggle in High School Math Competitions
Viaarxiv icon

CATArena: Evaluation of LLM Agents through Iterative Tournament Competitions

Add code
Oct 30, 2025
Figure 1 for CATArena: Evaluation of LLM Agents through Iterative Tournament Competitions
Figure 2 for CATArena: Evaluation of LLM Agents through Iterative Tournament Competitions
Figure 3 for CATArena: Evaluation of LLM Agents through Iterative Tournament Competitions
Figure 4 for CATArena: Evaluation of LLM Agents through Iterative Tournament Competitions
Viaarxiv icon