Picture for Yingshui Tan

Yingshui Tan

CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models

Add code
Feb 23, 2025
Viaarxiv icon

HiddenDetect: Detecting Jailbreak Attacks against Large Vision-Language Models via Monitoring Hidden States

Add code
Feb 21, 2025
Viaarxiv icon

ChineseSimpleVQA -- "See the World, Discover Knowledge": A Chinese Factuality Evaluation for Large Vision Language Models

Add code
Feb 19, 2025
Viaarxiv icon

Equilibrate RLHF: Towards Balancing Helpfulness-Safety Trade-off in Large Language Models

Add code
Feb 17, 2025
Viaarxiv icon

RapGuard: Safeguarding Multimodal Large Language Models via Rationale-aware Defensive Prompting

Add code
Dec 25, 2024
Viaarxiv icon

Chinese SafetyQA: A Safety Short-form Factuality Benchmark for Large Language Models

Add code
Dec 23, 2024
Figure 1 for Chinese SafetyQA: A Safety Short-form Factuality Benchmark for Large Language Models
Figure 2 for Chinese SafetyQA: A Safety Short-form Factuality Benchmark for Large Language Models
Figure 3 for Chinese SafetyQA: A Safety Short-form Factuality Benchmark for Large Language Models
Figure 4 for Chinese SafetyQA: A Safety Short-form Factuality Benchmark for Large Language Models
Viaarxiv icon

Enhancing Vision-Language Model Safety through Progressive Concept-Bottleneck-Driven Alignment

Add code
Nov 18, 2024
Figure 1 for Enhancing Vision-Language Model Safety through Progressive Concept-Bottleneck-Driven Alignment
Figure 2 for Enhancing Vision-Language Model Safety through Progressive Concept-Bottleneck-Driven Alignment
Figure 3 for Enhancing Vision-Language Model Safety through Progressive Concept-Bottleneck-Driven Alignment
Figure 4 for Enhancing Vision-Language Model Safety through Progressive Concept-Bottleneck-Driven Alignment
Viaarxiv icon

Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models

Add code
Nov 13, 2024
Figure 1 for Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models
Figure 2 for Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models
Figure 3 for Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models
Figure 4 for Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models
Viaarxiv icon

Adaptive Dense Reward: Understanding the Gap Between Action and Reward Space in Alignment

Add code
Oct 23, 2024
Figure 1 for Adaptive Dense Reward: Understanding the Gap Between Action and Reward Space in Alignment
Figure 2 for Adaptive Dense Reward: Understanding the Gap Between Action and Reward Space in Alignment
Figure 3 for Adaptive Dense Reward: Understanding the Gap Between Action and Reward Space in Alignment
Figure 4 for Adaptive Dense Reward: Understanding the Gap Between Action and Reward Space in Alignment
Viaarxiv icon

Safety Alignment for Vision Language Models

Add code
May 22, 2024
Viaarxiv icon