Picture for Cunxiang Wang

Cunxiang Wang

IF-RewardBench: Benchmarking Judge Models for Instruction-Following Evaluation

Add code
Mar 05, 2026
Viaarxiv icon

RLAR: An Agentic Reward System for Multi-task Reinforcement Learning on Large Language Models

Add code
Feb 28, 2026
Viaarxiv icon

RAVEL: Reasoning Agents for Validating and Evaluating LLM Text Synthesis

Add code
Feb 28, 2026
Viaarxiv icon

TraceSIR: A Multi-Agent Framework for Structured Analysis and Reporting of Agentic Execution Traces

Add code
Feb 28, 2026
Viaarxiv icon

GLM-5: from Vibe Coding to Agentic Engineering

Add code
Feb 17, 2026
Viaarxiv icon

MVSS: A Unified Framework for Multi-View Structured Survey Generation

Add code
Jan 14, 2026
Viaarxiv icon

Beyond Literal Mapping: Benchmarking and Improving Non-Literal Translation Evaluation

Add code
Jan 12, 2026
Viaarxiv icon

DVD: A Robust Method for Detecting Variant Contamination in Large Language Model Evaluation

Add code
Jan 08, 2026
Viaarxiv icon

AudioMarathon: A Comprehensive Benchmark for Long-Context Audio Understanding and Efficiency in Audio LLMs

Add code
Oct 08, 2025
Figure 1 for AudioMarathon: A Comprehensive Benchmark for Long-Context Audio Understanding and Efficiency in Audio LLMs
Figure 2 for AudioMarathon: A Comprehensive Benchmark for Long-Context Audio Understanding and Efficiency in Audio LLMs
Figure 3 for AudioMarathon: A Comprehensive Benchmark for Long-Context Audio Understanding and Efficiency in Audio LLMs
Figure 4 for AudioMarathon: A Comprehensive Benchmark for Long-Context Audio Understanding and Efficiency in Audio LLMs
Viaarxiv icon

GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models

Add code
Aug 08, 2025
Viaarxiv icon