Picture for Guojun Yin

Guojun Yin

$π$-Play: Multi-Agent Self-Play via Privileged Self-Distillation without External Data

Add code
Apr 15, 2026
Viaarxiv icon

DiningBench: A Hierarchical Multi-view Benchmark for Perception and Reasoning in the Dietary Domain

Add code
Apr 12, 2026
Viaarxiv icon

CDRRM: Contrast-Driven Rubric Generation for Reliable and Interpretable Reward Modeling

Add code
Mar 09, 2026
Viaarxiv icon

SAE as a Crystal Ball: Interpretable Features Predict Cross-domain Transferability of LLMs without Training

Add code
Mar 03, 2026
Viaarxiv icon

Contextual Rollout Bandits for Reinforcement Learning with Verifiable Rewards

Add code
Feb 09, 2026
Viaarxiv icon

Your Group-Relative Advantage Is Biased

Add code
Jan 13, 2026
Viaarxiv icon

Beyond Dialogue Time: Temporal Semantic Memory for Personalized LLM Agents

Add code
Jan 12, 2026
Viaarxiv icon

AWPO: Enhancing Tool-Use of Large Language Models through Explicit Integration of Reasoning Rewards

Add code
Dec 23, 2025
Viaarxiv icon

ToolForge: A Data Synthesis Pipeline for Multi-Hop Search without Real-World APIs

Add code
Dec 18, 2025
Viaarxiv icon

LocalSearchBench: Benchmarking Agentic Search in Real-World Local Life Services

Add code
Dec 08, 2025
Viaarxiv icon