Picture for Haibo Tong

Haibo Tong

VESTA: A Fully Automated Scenario Generation and Safety Evaluation Framework for LLM Agents

Add code
Jun 07, 2026
Viaarxiv icon

CogManip: Benchmarking Manipulative Behavior in Multi-Turn Interactions with Large Language Model

Add code
Jun 04, 2026
Viaarxiv icon

DecodingTrust-Agent Platform (DTap): A Controllable and Interactive Red-Teaming Platform for AI Agents

Add code
May 06, 2026
Viaarxiv icon

ForesightSafety Bench: A Frontier Risk Evaluation and Governance Framework towards Safe AI

Add code
Feb 15, 2026
Viaarxiv icon

CogToM: A Comprehensive Theory of Mind Benchmark inspired by Human Cognition for Large Language Models

Add code
Jan 22, 2026
Viaarxiv icon

Scaling Agent Learning via Experience Synthesis

Add code
Nov 10, 2025
Figure 1 for Scaling Agent Learning via Experience Synthesis
Figure 2 for Scaling Agent Learning via Experience Synthesis
Figure 3 for Scaling Agent Learning via Experience Synthesis
Figure 4 for Scaling Agent Learning via Experience Synthesis
Viaarxiv icon

PandaGuard: Systematic Evaluation of LLM Safety against Jailbreaking Attacks

Add code
May 22, 2025
Figure 1 for PandaGuard: Systematic Evaluation of LLM Safety against Jailbreaking Attacks
Figure 2 for PandaGuard: Systematic Evaluation of LLM Safety against Jailbreaking Attacks
Figure 3 for PandaGuard: Systematic Evaluation of LLM Safety against Jailbreaking Attacks
Figure 4 for PandaGuard: Systematic Evaluation of LLM Safety against Jailbreaking Attacks
Viaarxiv icon

Redefining Superalignment: From Weak-to-Strong Alignment to Human-AI Co-Alignment to Sustainable Symbiotic Society

Add code
Apr 24, 2025
Viaarxiv icon

MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation

Add code
Feb 03, 2025
Figure 1 for MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation
Figure 2 for MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation
Figure 3 for MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation
Figure 4 for MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation
Viaarxiv icon

Autonomous Alignment with Human Value on Altruism through Considerate Self-imagination and Theory of Mind

Add code
Jan 07, 2025
Figure 1 for Autonomous Alignment with Human Value on Altruism through Considerate Self-imagination and Theory of Mind
Figure 2 for Autonomous Alignment with Human Value on Altruism through Considerate Self-imagination and Theory of Mind
Figure 3 for Autonomous Alignment with Human Value on Altruism through Considerate Self-imagination and Theory of Mind
Figure 4 for Autonomous Alignment with Human Value on Altruism through Considerate Self-imagination and Theory of Mind
Viaarxiv icon