Picture for Yixu Wang

Yixu Wang

A Rigorous Benchmark with Multidimensional Evaluation for Deep Research Agents: From Answers to Reports

Add code
Oct 02, 2025
Viaarxiv icon

SafeWork-R1: Coevolving Safety and Intelligence under the AI-45$^{\circ}$ Law

Add code
Jul 24, 2025
Figure 1 for SafeWork-R1: Coevolving Safety and Intelligence under the AI-45$^{\circ}$ Law
Figure 2 for SafeWork-R1: Coevolving Safety and Intelligence under the AI-45$^{\circ}$ Law
Figure 3 for SafeWork-R1: Coevolving Safety and Intelligence under the AI-45$^{\circ}$ Law
Figure 4 for SafeWork-R1: Coevolving Safety and Intelligence under the AI-45$^{\circ}$ Law
Viaarxiv icon

JailBound: Jailbreaking Internal Safety Boundaries of Vision-Language Models

Add code
May 26, 2025
Viaarxiv icon

SafeVid: Toward Safety Aligned Video Large Multimodal Models

Add code
May 17, 2025
Viaarxiv icon

A Mousetrap: Fooling Large Reasoning Models for Jailbreak with Chain of Iterative Chaos

Add code
Feb 19, 2025
Viaarxiv icon

HoneypotNet: Backdoor Attacks Against Model Extraction

Add code
Jan 02, 2025
Figure 1 for HoneypotNet: Backdoor Attacks Against Model Extraction
Figure 2 for HoneypotNet: Backdoor Attacks Against Model Extraction
Figure 3 for HoneypotNet: Backdoor Attacks Against Model Extraction
Figure 4 for HoneypotNet: Backdoor Attacks Against Model Extraction
Viaarxiv icon

Reflection-Bench: probing AI intelligence with reflection

Add code
Oct 21, 2024
Figure 1 for Reflection-Bench: probing AI intelligence with reflection
Figure 2 for Reflection-Bench: probing AI intelligence with reflection
Figure 3 for Reflection-Bench: probing AI intelligence with reflection
Figure 4 for Reflection-Bench: probing AI intelligence with reflection
Viaarxiv icon

ESC-Eval: Evaluating Emotion Support Conversations in Large Language Models

Add code
Jun 24, 2024
Figure 1 for ESC-Eval: Evaluating Emotion Support Conversations in Large Language Models
Figure 2 for ESC-Eval: Evaluating Emotion Support Conversations in Large Language Models
Figure 3 for ESC-Eval: Evaluating Emotion Support Conversations in Large Language Models
Figure 4 for ESC-Eval: Evaluating Emotion Support Conversations in Large Language Models
Viaarxiv icon

RS-Agent: Automating Remote Sensing Tasks through Intelligent Agents

Add code
Jun 11, 2024
Viaarxiv icon

MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal Large Language Models

Add code
Jun 11, 2024
Figure 1 for MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal Large Language Models
Figure 2 for MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal Large Language Models
Figure 3 for MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal Large Language Models
Figure 4 for MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal Large Language Models
Viaarxiv icon