Picture for Jiaheng Zhang

Jiaheng Zhang

MemPot: Defending Against Memory Extraction Attack with Optimized Honeypots

Add code
Feb 07, 2026
Viaarxiv icon

Secure Code Generation via Online Reinforcement Learning with Vulnerability Reward Model

Add code
Feb 07, 2026
Viaarxiv icon

Reliable and Responsible Foundation Models: A Comprehensive Survey

Add code
Feb 04, 2026
Viaarxiv icon

Beyond Hard Masks: Progressive Token Evolution for Diffusion Language Models

Add code
Jan 12, 2026
Viaarxiv icon

Reasoning with Confidence: Efficient Verification of LLM Reasoning Steps via Uncertainty Heads

Add code
Nov 11, 2025
Viaarxiv icon

SWAP: Towards Copyright Auditing of Soft Prompts via Sequential Watermarking

Add code
Nov 05, 2025
Figure 1 for SWAP: Towards Copyright Auditing of Soft Prompts via Sequential Watermarking
Figure 2 for SWAP: Towards Copyright Auditing of Soft Prompts via Sequential Watermarking
Figure 3 for SWAP: Towards Copyright Auditing of Soft Prompts via Sequential Watermarking
Figure 4 for SWAP: Towards Copyright Auditing of Soft Prompts via Sequential Watermarking
Viaarxiv icon

TrojanTO: Action-Level Backdoor Attacks against Trajectory Optimization Models

Add code
Jun 15, 2025
Viaarxiv icon

Efficient Reasoning via Chain of Unconscious Thought

Add code
May 26, 2025
Viaarxiv icon

Silent Leaks: Implicit Knowledge Extraction Attack on RAG Systems through Benign Queries

Add code
May 21, 2025
Viaarxiv icon

GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning

Add code
May 16, 2025
Viaarxiv icon