Picture for Weixiang Zhao

Weixiang Zhao

Who Transfers Safety? Identifying and Targeting Cross-Lingual Shared Safety Neurons

Add code
Feb 01, 2026
Viaarxiv icon

Large Language Model Agents Are Not Always Faithful Self-Evolvers

Add code
Jan 30, 2026
Viaarxiv icon

TEA-Bench: A Systematic Benchmarking of Tool-enhanced Emotional Support Dialogue Agent

Add code
Jan 26, 2026
Viaarxiv icon

When Personalization Legitimizes Risks: Uncovering Safety Vulnerabilities in Personalized Dialogue Agents

Add code
Jan 25, 2026
Viaarxiv icon

OP-Bench: Benchmarking Over-Personalization for Memory-Augmented Personalized Conversational Agents

Add code
Jan 20, 2026
Viaarxiv icon

Understanding Multilingualism in Mixture-of-Experts LLMs: Routing Mechanism, Expert Specialization, and Layerwise Steering

Add code
Jan 20, 2026
Viaarxiv icon

Exploring and Exploiting the Inherent Efficiency within Large Reasoning Models for Self-Guided Efficiency Enhancement

Add code
Jun 18, 2025
Viaarxiv icon

On Reasoning Strength Planning in Large Reasoning Models

Add code
Jun 10, 2025
Viaarxiv icon

RSafe: Incentivizing proactive reasoning to build robust and adaptive LLM safeguards

Add code
Jun 09, 2025
Figure 1 for RSafe: Incentivizing proactive reasoning to build robust and adaptive LLM safeguards
Figure 2 for RSafe: Incentivizing proactive reasoning to build robust and adaptive LLM safeguards
Figure 3 for RSafe: Incentivizing proactive reasoning to build robust and adaptive LLM safeguards
Figure 4 for RSafe: Incentivizing proactive reasoning to build robust and adaptive LLM safeguards
Viaarxiv icon

AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint

Add code
Jun 08, 2025
Figure 1 for AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint
Figure 2 for AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint
Figure 3 for AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint
Figure 4 for AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint
Viaarxiv icon