Picture for Goran Radanović

Goran Radanović

Text-Diffusion Red-Teaming of Large Language Models: Unveiling Harmful Behaviors with Proximity Constraints

Add code
Jan 14, 2025
Viaarxiv icon

Reward Model Learning vs. Direct Policy Optimization: A Comparative Analysis of Learning from Human Preferences

Add code
Mar 04, 2024
Viaarxiv icon

Corruption-Robust Offline Two-Player Zero-Sum Markov Games

Add code
Mar 04, 2024
Viaarxiv icon

Corruption Robust Offline Reinforcement Learning with Human Feedback

Add code
Feb 09, 2024
Viaarxiv icon