Picture for Yunpeng Zhai

Yunpeng Zhai

Peking University

RLCSD: Reinforcement Learning with Contrastive On-Policy Self-Distillation

Add code
Jun 10, 2026
Viaarxiv icon

Towards Robust LLM Post-Training: Automatic Failure Management for Reinforcement Fine-Tuning

Add code
May 06, 2026
Viaarxiv icon

E2E-REME: Towards End-to-End Microservices Auto-Remediation via Experience-Simulation Reinforcement Fine-Tuning

Add code
Apr 13, 2026
Viaarxiv icon

Agentic Memory Enhanced Recursive Reasoning for Root Cause Localization in Microservices

Add code
Jan 06, 2026
Viaarxiv icon

Hypothesize-Then-Verify: Speculative Root Cause Analysis for Microservices with Pathwise Parallelism

Add code
Jan 06, 2026
Viaarxiv icon

d-TreeRPO: Towards More Reliable Policy Optimization for Diffusion Language Models

Add code
Dec 10, 2025
Viaarxiv icon

AgentEvolver: Towards Efficient Self-Evolving Agent System

Add code
Nov 13, 2025
Viaarxiv icon

A Survey on Parallel Text Generation: From Parallel Decoding to Diffusion Language Models

Add code
Aug 12, 2025
Figure 1 for A Survey on Parallel Text Generation: From Parallel Decoding to Diffusion Language Models
Figure 2 for A Survey on Parallel Text Generation: From Parallel Decoding to Diffusion Language Models
Figure 3 for A Survey on Parallel Text Generation: From Parallel Decoding to Diffusion Language Models
Figure 4 for A Survey on Parallel Text Generation: From Parallel Decoding to Diffusion Language Models
Viaarxiv icon

Omni-SafetyBench: A Benchmark for Safety Evaluation of Audio-Visual Large Language Models

Add code
Aug 10, 2025
Figure 1 for Omni-SafetyBench: A Benchmark for Safety Evaluation of Audio-Visual Large Language Models
Figure 2 for Omni-SafetyBench: A Benchmark for Safety Evaluation of Audio-Visual Large Language Models
Figure 3 for Omni-SafetyBench: A Benchmark for Safety Evaluation of Audio-Visual Large Language Models
Figure 4 for Omni-SafetyBench: A Benchmark for Safety Evaluation of Audio-Visual Large Language Models
Viaarxiv icon

Provoking Multi-modal Few-Shot LVLM via Exploration-Exploitation In-Context Learning

Add code
Jun 11, 2025
Viaarxiv icon