Picture for Wei-Bin Lee

Wei-Bin Lee

RAS: Measuring LLM Safety Through Refusal Alignment

Add code
Jun 24, 2026
Viaarxiv icon

Tracing Target Answers in Poisoned Retrieval Corpora via Token Influence Attribution

Add code
Jun 24, 2026
Viaarxiv icon

RedRFT: A Light-Weight Benchmark for Reinforcement Fine-Tuning-Based Red Teaming

Add code
Jun 04, 2025
Figure 1 for RedRFT: A Light-Weight Benchmark for Reinforcement Fine-Tuning-Based Red Teaming
Figure 2 for RedRFT: A Light-Weight Benchmark for Reinforcement Fine-Tuning-Based Red Teaming
Figure 3 for RedRFT: A Light-Weight Benchmark for Reinforcement Fine-Tuning-Based Red Teaming
Figure 4 for RedRFT: A Light-Weight Benchmark for Reinforcement Fine-Tuning-Based Red Teaming
Viaarxiv icon

CALM: Co-evolution of Algorithms and Language Model for Automatic Heuristic Design

Add code
May 18, 2025
Viaarxiv icon

Beyond Natural Language Perplexity: Detecting Dead Code Poisoning in Code Generation Datasets

Add code
Feb 28, 2025
Figure 1 for Beyond Natural Language Perplexity: Detecting Dead Code Poisoning in Code Generation Datasets
Figure 2 for Beyond Natural Language Perplexity: Detecting Dead Code Poisoning in Code Generation Datasets
Figure 3 for Beyond Natural Language Perplexity: Detecting Dead Code Poisoning in Code Generation Datasets
Figure 4 for Beyond Natural Language Perplexity: Detecting Dead Code Poisoning in Code Generation Datasets
Viaarxiv icon

Layer-Aware Task Arithmetic: Disentangling Task-Specific and Instruction-Following Knowledge

Add code
Feb 27, 2025
Figure 1 for Layer-Aware Task Arithmetic: Disentangling Task-Specific and Instruction-Following Knowledge
Figure 2 for Layer-Aware Task Arithmetic: Disentangling Task-Specific and Instruction-Following Knowledge
Figure 3 for Layer-Aware Task Arithmetic: Disentangling Task-Specific and Instruction-Following Knowledge
Figure 4 for Layer-Aware Task Arithmetic: Disentangling Task-Specific and Instruction-Following Knowledge
Viaarxiv icon

A Survey on Backdoor Threats in Large Language Models (LLMs): Attacks, Defenses, and Evaluations

Add code
Feb 06, 2025
Figure 1 for A Survey on Backdoor Threats in Large Language Models (LLMs): Attacks, Defenses, and Evaluations
Figure 2 for A Survey on Backdoor Threats in Large Language Models (LLMs): Attacks, Defenses, and Evaluations
Figure 3 for A Survey on Backdoor Threats in Large Language Models (LLMs): Attacks, Defenses, and Evaluations
Figure 4 for A Survey on Backdoor Threats in Large Language Models (LLMs): Attacks, Defenses, and Evaluations
Viaarxiv icon

A First Physical-World Trajectory Prediction Attack via LiDAR-induced Deceptions in Autonomous Driving

Add code
Jun 17, 2024
Figure 1 for A First Physical-World Trajectory Prediction Attack via LiDAR-induced Deceptions in Autonomous Driving
Figure 2 for A First Physical-World Trajectory Prediction Attack via LiDAR-induced Deceptions in Autonomous Driving
Figure 3 for A First Physical-World Trajectory Prediction Attack via LiDAR-induced Deceptions in Autonomous Driving
Figure 4 for A First Physical-World Trajectory Prediction Attack via LiDAR-induced Deceptions in Autonomous Driving
Viaarxiv icon