Picture for Zeyao Ma

Zeyao Ma

The Verification Horizon: No Silver Bullet for Coding Agent Rewards

Add code
Jun 24, 2026
Viaarxiv icon

Benchmarking the Limits of In-Context Reinforcement Learning for Ad-Hoc Teamwork

Add code
May 23, 2026
Viaarxiv icon

DV-World: Benchmarking Data Visualization Agents in Real-World Scenarios

Add code
Apr 28, 2026
Viaarxiv icon

Qwen3-Coder-Next Technical Report

Add code
Feb 28, 2026
Viaarxiv icon

Scaling Agentic Verifier for Competitive Coding

Add code
Feb 04, 2026
Viaarxiv icon

SWE-Universe: Scale Real-World Verifiable Environments to Millions

Add code
Feb 02, 2026
Viaarxiv icon

From Completion to Editing: Unlocking Context-Aware Code Infilling via Search-and-Replace Instruction Tuning

Add code
Jan 19, 2026
Viaarxiv icon

Dynamic Scaling of Unit Tests for Code Reward Modeling

Add code
Jan 02, 2025
Figure 1 for Dynamic Scaling of Unit Tests for Code Reward Modeling
Figure 2 for Dynamic Scaling of Unit Tests for Code Reward Modeling
Figure 3 for Dynamic Scaling of Unit Tests for Code Reward Modeling
Figure 4 for Dynamic Scaling of Unit Tests for Code Reward Modeling
Viaarxiv icon

SpreadsheetBench: Towards Challenging Real World Spreadsheet Manipulation

Add code
Jun 21, 2024
Figure 1 for SpreadsheetBench: Towards Challenging Real World Spreadsheet Manipulation
Figure 2 for SpreadsheetBench: Towards Challenging Real World Spreadsheet Manipulation
Figure 3 for SpreadsheetBench: Towards Challenging Real World Spreadsheet Manipulation
Figure 4 for SpreadsheetBench: Towards Challenging Real World Spreadsheet Manipulation
Viaarxiv icon

TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios

Add code
Apr 01, 2024
Figure 1 for TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios
Figure 2 for TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios
Figure 3 for TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios
Figure 4 for TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios
Viaarxiv icon