Picture for Prithviraj Ammanabrolu

Prithviraj Ammanabrolu

Critique-out-Loud Reward Models

Add code
Aug 21, 2024
Viaarxiv icon

Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging

Add code
Oct 17, 2023
Viaarxiv icon

Fine-Grained Human Feedback Gives Better Rewards for Language Model Training

Add code
Jun 02, 2023
Viaarxiv icon

SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks

Add code
May 27, 2023
Viaarxiv icon

Inference-Time Policy Adapters (IPA): Tailoring Extreme-Scale LMs without Fine-tuning

Add code
May 24, 2023
Viaarxiv icon

Do Embodied Agents Dream of Pixelated Sheep?: Embodied Decision Making using Language Guided World Modelling

Add code
Jan 28, 2023
Viaarxiv icon

An AI Dungeon Master's Guide: Learning to Converse and Guide with Intents and Theory-of-Mind in Dungeons and Dragons

Add code
Dec 20, 2022
Viaarxiv icon

Behavior Cloned Transformers are Neurosymbolic Reasoners

Add code
Oct 13, 2022
Figure 1 for Behavior Cloned Transformers are Neurosymbolic Reasoners
Figure 2 for Behavior Cloned Transformers are Neurosymbolic Reasoners
Figure 3 for Behavior Cloned Transformers are Neurosymbolic Reasoners
Figure 4 for Behavior Cloned Transformers are Neurosymbolic Reasoners
Viaarxiv icon

Is Reinforcement Learning (Not) for Natural Language Processing?: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization

Add code
Oct 03, 2022
Figure 1 for Is Reinforcement Learning (Not) for Natural Language Processing?: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization
Figure 2 for Is Reinforcement Learning (Not) for Natural Language Processing?: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization
Figure 3 for Is Reinforcement Learning (Not) for Natural Language Processing?: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization
Figure 4 for Is Reinforcement Learning (Not) for Natural Language Processing?: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization
Viaarxiv icon

INSCIT: Information-Seeking Conversations with Mixed-Initiative Interactions

Add code
Jul 02, 2022
Figure 1 for INSCIT: Information-Seeking Conversations with Mixed-Initiative Interactions
Figure 2 for INSCIT: Information-Seeking Conversations with Mixed-Initiative Interactions
Figure 3 for INSCIT: Information-Seeking Conversations with Mixed-Initiative Interactions
Figure 4 for INSCIT: Information-Seeking Conversations with Mixed-Initiative Interactions
Viaarxiv icon