Picture for Amrit Singh Bedi

Amrit Singh Bedi

Learning Multi-Robot Coordination through Locality-Based Factorized Multi-Agent Actor-Critic Algorithm

Add code
Mar 24, 2025
Viaarxiv icon

BalancedDPO: Adaptive Multi-Metric Alignment

Add code
Mar 16, 2025
Viaarxiv icon

Align-Pro: A Principled Approach to Prompt Optimization for LLM Alignment

Add code
Jan 07, 2025
Figure 1 for Align-Pro: A Principled Approach to Prompt Optimization for LLM Alignment
Figure 2 for Align-Pro: A Principled Approach to Prompt Optimization for LLM Alignment
Figure 3 for Align-Pro: A Principled Approach to Prompt Optimization for LLM Alignment
Figure 4 for Align-Pro: A Principled Approach to Prompt Optimization for LLM Alignment
Viaarxiv icon

LIAR: Leveraging Alignment (Best-of-N) to Jailbreak LLMs in Seconds

Add code
Dec 06, 2024
Figure 1 for LIAR: Leveraging Alignment (Best-of-N) to Jailbreak LLMs in Seconds
Figure 2 for LIAR: Leveraging Alignment (Best-of-N) to Jailbreak LLMs in Seconds
Figure 3 for LIAR: Leveraging Alignment (Best-of-N) to Jailbreak LLMs in Seconds
Figure 4 for LIAR: Leveraging Alignment (Best-of-N) to Jailbreak LLMs in Seconds
Viaarxiv icon

Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment

Add code
Nov 27, 2024
Figure 1 for Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment
Figure 2 for Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment
Figure 3 for Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment
Figure 4 for Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment
Viaarxiv icon

Hierarchical Preference Optimization: Learning to achieve goals via feasible subgoals prediction

Add code
Nov 01, 2024
Figure 1 for Hierarchical Preference Optimization: Learning to achieve goals via feasible subgoals prediction
Figure 2 for Hierarchical Preference Optimization: Learning to achieve goals via feasible subgoals prediction
Figure 3 for Hierarchical Preference Optimization: Learning to achieve goals via feasible subgoals prediction
Figure 4 for Hierarchical Preference Optimization: Learning to achieve goals via feasible subgoals prediction
Viaarxiv icon

EfficientEQA: An Efficient Approach for Open Vocabulary Embodied Question Answering

Add code
Oct 26, 2024
Figure 1 for EfficientEQA: An Efficient Approach for Open Vocabulary Embodied Question Answering
Figure 2 for EfficientEQA: An Efficient Approach for Open Vocabulary Embodied Question Answering
Figure 3 for EfficientEQA: An Efficient Approach for Open Vocabulary Embodied Question Answering
Figure 4 for EfficientEQA: An Efficient Approach for Open Vocabulary Embodied Question Answering
Viaarxiv icon

On The Global Convergence Of Online RLHF With Neural Parametrization

Add code
Oct 21, 2024
Viaarxiv icon

On the Sample Complexity of a Policy Gradient Algorithm with Occupancy Approximation for General Utility Reinforcement Learning

Add code
Oct 05, 2024
Figure 1 for On the Sample Complexity of a Policy Gradient Algorithm with Occupancy Approximation for General Utility Reinforcement Learning
Figure 2 for On the Sample Complexity of a Policy Gradient Algorithm with Occupancy Approximation for General Utility Reinforcement Learning
Figure 3 for On the Sample Complexity of a Policy Gradient Algorithm with Occupancy Approximation for General Utility Reinforcement Learning
Figure 4 for On the Sample Complexity of a Policy Gradient Algorithm with Occupancy Approximation for General Utility Reinforcement Learning
Viaarxiv icon

AIME: AI System Optimization via Multiple LLM Evaluators

Add code
Oct 04, 2024
Viaarxiv icon