Picture for Yuxuan Zhu

Yuxuan Zhu

Rethinking Generative Recommender Tokenizer: Recsys-Native Encoding and Semantic Quantization Beyond LLMs

Add code
Feb 02, 2026
Viaarxiv icon

Kimi K2.5: Visual Agentic Intelligence

Add code
Feb 02, 2026
Viaarxiv icon

Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces

Add code
Jan 17, 2026
Viaarxiv icon

Pervasive Annotation Errors Break Text-to-SQL Benchmarks and Leaderboards

Add code
Jan 13, 2026
Viaarxiv icon

Establishing Best Practices for Building Rigorous Agentic Benchmarks

Add code
Jul 03, 2025
Figure 1 for Establishing Best Practices for Building Rigorous Agentic Benchmarks
Figure 2 for Establishing Best Practices for Building Rigorous Agentic Benchmarks
Figure 3 for Establishing Best Practices for Building Rigorous Agentic Benchmarks
Figure 4 for Establishing Best Practices for Building Rigorous Agentic Benchmarks
Viaarxiv icon

Breaking Barriers: Do Reinforcement Post Training Gains Transfer To Unseen Domains?

Add code
Jun 24, 2025
Viaarxiv icon

UTBoost: Rigorous Evaluation of Coding Agents on SWE-Bench

Add code
Jun 10, 2025
Viaarxiv icon

Embed Progressive Implicit Preference in Unified Space for Deep Collaborative Filtering

Add code
May 28, 2025
Viaarxiv icon

ELT-Bench: An End-to-End Benchmark for Evaluating AI Agents on ELT Pipelines

Add code
Apr 07, 2025
Viaarxiv icon

SentenceKV: Efficient LLM Inference via Sentence-Level Semantic KV Caching

Add code
Apr 01, 2025
Viaarxiv icon