Picture for Peter Hase

Peter Hase

Teaching Models to Balance Resisting and Accepting Persuasion

Add code
Oct 18, 2024
Figure 1 for Teaching Models to Balance Resisting and Accepting Persuasion
Figure 2 for Teaching Models to Balance Resisting and Accepting Persuasion
Figure 3 for Teaching Models to Balance Resisting and Accepting Persuasion
Figure 4 for Teaching Models to Balance Resisting and Accepting Persuasion
Viaarxiv icon

System-1.x: Learning to Balance Fast and Slow Planning with Language Models

Add code
Jul 19, 2024
Viaarxiv icon

Fundamental Problems With Model Editing: How Should Rational Belief Revision Work in LLMs?

Add code
Jun 27, 2024
Viaarxiv icon

Are language models rational? The case of coherence norms and belief revision

Add code
Jun 05, 2024
Viaarxiv icon

LACIE: Listener-Aware Finetuning for Confidence Calibration in Large Language Models

Add code
May 31, 2024
Viaarxiv icon

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

Add code
Apr 15, 2024
Viaarxiv icon

Rethinking Machine Unlearning for Large Language Models

Add code
Feb 15, 2024
Viaarxiv icon

The Unreasonable Effectiveness of Easy Training Data for Hard Tasks

Add code
Jan 12, 2024
Viaarxiv icon

Can Sensitive Information Be Deleted From LLMs? Objectives for Defending Against Extraction Attacks

Add code
Sep 29, 2023
Viaarxiv icon

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

Add code
Jul 27, 2023
Viaarxiv icon