Picture for Julian Michael

Julian Michael

Rapid Response: Mitigating LLM Jailbreaks with a Few Examples

Add code
Nov 12, 2024
Figure 1 for Rapid Response: Mitigating LLM Jailbreaks with a Few Examples
Figure 2 for Rapid Response: Mitigating LLM Jailbreaks with a Few Examples
Figure 3 for Rapid Response: Mitigating LLM Jailbreaks with a Few Examples
Figure 4 for Rapid Response: Mitigating LLM Jailbreaks with a Few Examples
Viaarxiv icon

Training Language Models to Win Debates with Self-Play Improves Judge Accuracy

Add code
Sep 25, 2024
Viaarxiv icon

Analyzing the Role of Semantic Representations in the Era of Large Language Models

Add code
May 02, 2024
Viaarxiv icon

Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought

Add code
Mar 08, 2024
Figure 1 for Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought
Figure 2 for Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought
Figure 3 for Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought
Figure 4 for Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought
Viaarxiv icon

The Case for Scalable, Data-Driven Theory: A Paradigm for Scientific Progress in NLP

Add code
Dec 01, 2023
Viaarxiv icon

GPQA: A Graduate-Level Google-Proof Q&A Benchmark

Add code
Nov 20, 2023
Viaarxiv icon

Debate Helps Supervise Unreliable Experts

Add code
Nov 15, 2023
Viaarxiv icon

Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting

Add code
May 07, 2023
Viaarxiv icon

We're Afraid Language Models Aren't Modeling Ambiguity

Add code
Apr 27, 2023
Viaarxiv icon

What Do NLP Researchers Believe? Results of the NLP Community Metasurvey

Add code
Aug 26, 2022
Figure 1 for What Do NLP Researchers Believe? Results of the NLP Community Metasurvey
Figure 2 for What Do NLP Researchers Believe? Results of the NLP Community Metasurvey
Figure 3 for What Do NLP Researchers Believe? Results of the NLP Community Metasurvey
Figure 4 for What Do NLP Researchers Believe? Results of the NLP Community Metasurvey
Viaarxiv icon