Picture for Ziran Yang

Ziran Yang

From Uncertainty to Trust: Enhancing Reliability in Vision-Language Models with Uncertainty-Guided Dropout Decoding

Add code
Dec 09, 2024
Viaarxiv icon

ChemSafetyBench: Benchmarking LLM Safety on Chemistry Domain

Add code
Nov 23, 2024
Viaarxiv icon

SafeSora: Towards Safety Alignment of Text2Video Generation via a Human Preference Dataset

Add code
Jun 20, 2024
Viaarxiv icon

Panacea: Pareto Alignment via Preference Adaptation for LLMs

Add code
Feb 03, 2024
Figure 1 for Panacea: Pareto Alignment via Preference Adaptation for LLMs
Figure 2 for Panacea: Pareto Alignment via Preference Adaptation for LLMs
Figure 3 for Panacea: Pareto Alignment via Preference Adaptation for LLMs
Figure 4 for Panacea: Pareto Alignment via Preference Adaptation for LLMs
Viaarxiv icon

Red Teaming Game: A Game-Theoretic Framework for Red Teaming Language Models

Add code
Oct 10, 2023
Viaarxiv icon