Picture for Weixiang Zhao

Weixiang Zhao

TEA-Bench: A Systematic Benchmarking of Tool-enhanced Emotional Support Dialogue Agent

Add code
Jan 26, 2026
Viaarxiv icon

When Personalization Legitimizes Risks: Uncovering Safety Vulnerabilities in Personalized Dialogue Agents

Add code
Jan 25, 2026
Viaarxiv icon

OP-Bench: Benchmarking Over-Personalization for Memory-Augmented Personalized Conversational Agents

Add code
Jan 20, 2026
Viaarxiv icon

Understanding Multilingualism in Mixture-of-Experts LLMs: Routing Mechanism, Expert Specialization, and Layerwise Steering

Add code
Jan 20, 2026
Viaarxiv icon

Exploring and Exploiting the Inherent Efficiency within Large Reasoning Models for Self-Guided Efficiency Enhancement

Add code
Jun 18, 2025
Viaarxiv icon

On Reasoning Strength Planning in Large Reasoning Models

Add code
Jun 10, 2025
Viaarxiv icon

RSafe: Incentivizing proactive reasoning to build robust and adaptive LLM safeguards

Add code
Jun 09, 2025
Figure 1 for RSafe: Incentivizing proactive reasoning to build robust and adaptive LLM safeguards
Figure 2 for RSafe: Incentivizing proactive reasoning to build robust and adaptive LLM safeguards
Figure 3 for RSafe: Incentivizing proactive reasoning to build robust and adaptive LLM safeguards
Figure 4 for RSafe: Incentivizing proactive reasoning to build robust and adaptive LLM safeguards
Viaarxiv icon

AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint

Add code
Jun 08, 2025
Figure 1 for AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint
Figure 2 for AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint
Figure 3 for AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint
Figure 4 for AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint
Viaarxiv icon

L-MTP: Leap Multi-Token Prediction Beyond Adjacent Context for Large Language Models

Add code
May 23, 2025
Viaarxiv icon

MPO: Multilingual Safety Alignment via Reward Gap Optimization

Add code
May 22, 2025
Viaarxiv icon