Picture for Xingcheng Xu

Xingcheng Xu

Shanghai Artificial Intelligence Laboratory

Privacy-Preserving Text Sanitization for Distributed Agents Collaboration via Disentangled Representations

Add code
Jun 13, 2026
Viaarxiv icon

AgentSchool: An LLM-Powered Multi-Agent Simulation for Education

Add code
May 28, 2026
Viaarxiv icon

SkillSafetyBench: Evaluating Agent Safety under Skill-Facing Attack Surfaces

Add code
May 12, 2026
Viaarxiv icon

TrinityGuard: A Unified Framework for Safeguarding Multi-Agent Systems

Add code
Mar 16, 2026
Viaarxiv icon

CoT is Not the Chain of Truth: An Empirical Internal Analysis of Reasoning LLMs for Fake News Generation

Add code
Feb 05, 2026
Viaarxiv icon

RAPO: Risk-Aware Preference Optimization for Generalizable Safe Reasoning

Add code
Feb 04, 2026
Viaarxiv icon

MAGIC: A Co-Evolving Attacker-Defender Adversarial Game for Robust LLM Safety

Add code
Feb 02, 2026
Viaarxiv icon

KALE: Enhancing Knowledge Manipulation in Large Language Models via Knowledge-aware Learning

Add code
Jan 12, 2026
Viaarxiv icon

The Two-Stage Decision-Sampling Hypothesis: Understanding the Emergence of Self-Reflection in RL-Trained LLMs

Add code
Jan 04, 2026
Viaarxiv icon

The Policy Cliff: A Theoretical Analysis of Reward-Policy Maps in Large Language Models

Add code
Jul 27, 2025
Viaarxiv icon