Picture for Souradip Chakraborty

Souradip Chakraborty

LIAR: Leveraging Alignment (Best-of-N) to Jailbreak LLMs in Seconds

Add code
Dec 06, 2024
Viaarxiv icon

Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment

Add code
Nov 27, 2024
Figure 1 for Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment
Figure 2 for Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment
Figure 3 for Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment
Figure 4 for Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment
Viaarxiv icon

Hierarchical Preference Optimization: Learning to achieve goals via feasible subgoals prediction

Add code
Nov 01, 2024
Viaarxiv icon

On the Sample Complexity of a Policy Gradient Algorithm with Occupancy Approximation for General Utility Reinforcement Learning

Add code
Oct 05, 2024
Figure 1 for On the Sample Complexity of a Policy Gradient Algorithm with Occupancy Approximation for General Utility Reinforcement Learning
Figure 2 for On the Sample Complexity of a Policy Gradient Algorithm with Occupancy Approximation for General Utility Reinforcement Learning
Figure 3 for On the Sample Complexity of a Policy Gradient Algorithm with Occupancy Approximation for General Utility Reinforcement Learning
Figure 4 for On the Sample Complexity of a Policy Gradient Algorithm with Occupancy Approximation for General Utility Reinforcement Learning
Viaarxiv icon

AIME: AI System Optimization via Multiple LLM Evaluators

Add code
Oct 04, 2024
Viaarxiv icon

Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?

Add code
Jul 24, 2024
Viaarxiv icon

SAIL: Self-Improving Efficient Online Alignment of Large Language Models

Add code
Jun 21, 2024
Viaarxiv icon

Is poisoning a real threat to LLM alignment? Maybe more so than you think

Add code
Jun 17, 2024
Figure 1 for Is poisoning a real threat to LLM alignment? Maybe more so than you think
Figure 2 for Is poisoning a real threat to LLM alignment? Maybe more so than you think
Figure 3 for Is poisoning a real threat to LLM alignment? Maybe more so than you think
Figure 4 for Is poisoning a real threat to LLM alignment? Maybe more so than you think
Viaarxiv icon

DIPPER: Direct Preference Optimization to Accelerate Primitive-Enabled Hierarchical Reinforcement Learning

Add code
Jun 16, 2024
Viaarxiv icon

Transfer Q Star: Principled Decoding for LLM Alignment

Add code
May 30, 2024
Viaarxiv icon