Picture for Austin Xu

Austin Xu

MAS-ProVe: Understanding the Process Verification of Multi-Agent Systems

Add code
Feb 03, 2026
Viaarxiv icon

Least-Loaded Expert Parallelism: Load Balancing An Imbalanced Mixture-of-Experts

Add code
Jan 23, 2026
Viaarxiv icon

MAS-Orchestra: Understanding and Improving Multi-Agent Reasoning Through Holistic Orchestration and Controlled Benchmarks

Add code
Jan 21, 2026
Viaarxiv icon

LiveResearchBench: A Live Benchmark for User-Centric Deep Research in the Wild

Add code
Oct 16, 2025
Viaarxiv icon

MAS-ZERO: Designing Multi-Agent Systems with Zero Supervision

Add code
May 26, 2025
Viaarxiv icon

Meta-Design Matters: A Self-Design Multi-Agent System

Add code
May 21, 2025
Viaarxiv icon

J4R: Learning to Judge with Equivalent Initial State Group Relative Preference Optimization

Add code
May 19, 2025
Viaarxiv icon

Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators

Add code
Apr 21, 2025
Viaarxiv icon

A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems

Add code
Apr 12, 2025
Figure 1 for A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems
Figure 2 for A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems
Figure 3 for A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems
Figure 4 for A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems
Viaarxiv icon

Does Context Matter? ContextualJudgeBench for Evaluating LLM-based Judges in Contextual Settings

Add code
Mar 19, 2025
Viaarxiv icon