Picture for Jesse Mu

Jesse Mu

Jailbreak Defense in a Narrow Domain: Limitations of Existing Methods and a New Transcript-Classifier Approach

Add code
Dec 03, 2024
Viaarxiv icon

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Add code
Jan 17, 2024
Viaarxiv icon

Characterizing tradeoffs between teaching via language and demonstrations in multi-agent systems

Add code
May 19, 2023
Viaarxiv icon

Learning to Compress Prompts with Gist Tokens

Add code
Apr 17, 2023
Viaarxiv icon

Improving Policy Learning via Language Dynamics Distillation

Add code
Sep 30, 2022
Figure 1 for Improving Policy Learning via Language Dynamics Distillation
Figure 2 for Improving Policy Learning via Language Dynamics Distillation
Figure 3 for Improving Policy Learning via Language Dynamics Distillation
Figure 4 for Improving Policy Learning via Language Dynamics Distillation
Viaarxiv icon

Active Learning Helps Pretrained Models Learn the Intended Task

Add code
Apr 18, 2022
Figure 1 for Active Learning Helps Pretrained Models Learn the Intended Task
Figure 2 for Active Learning Helps Pretrained Models Learn the Intended Task
Figure 3 for Active Learning Helps Pretrained Models Learn the Intended Task
Figure 4 for Active Learning Helps Pretrained Models Learn the Intended Task
Viaarxiv icon

Improving Intrinsic Exploration with Language Abstractions

Add code
Feb 17, 2022
Figure 1 for Improving Intrinsic Exploration with Language Abstractions
Figure 2 for Improving Intrinsic Exploration with Language Abstractions
Figure 3 for Improving Intrinsic Exploration with Language Abstractions
Figure 4 for Improving Intrinsic Exploration with Language Abstractions
Viaarxiv icon

Calibrate your listeners! Robust communication-based training for pragmatic speakers

Add code
Oct 11, 2021
Figure 1 for Calibrate your listeners! Robust communication-based training for pragmatic speakers
Figure 2 for Calibrate your listeners! Robust communication-based training for pragmatic speakers
Figure 3 for Calibrate your listeners! Robust communication-based training for pragmatic speakers
Figure 4 for Calibrate your listeners! Robust communication-based training for pragmatic speakers
Viaarxiv icon

Emergent Communication of Generalizations

Add code
Jun 04, 2021
Figure 1 for Emergent Communication of Generalizations
Figure 2 for Emergent Communication of Generalizations
Figure 3 for Emergent Communication of Generalizations
Figure 4 for Emergent Communication of Generalizations
Viaarxiv icon

Compositional Explanations of Neurons

Add code
Jun 24, 2020
Figure 1 for Compositional Explanations of Neurons
Figure 2 for Compositional Explanations of Neurons
Figure 3 for Compositional Explanations of Neurons
Figure 4 for Compositional Explanations of Neurons
Viaarxiv icon