Picture for Owain Evans

Owain Evans

Shammie

The Two-Hop Curse: LLMs trained on A->B, B->C fail to learn A-->C

Add code
Nov 25, 2024
Viaarxiv icon

Towards evaluations-based safety cases for AI scheming

Add code
Nov 07, 2024
Viaarxiv icon

Looking Inward: Language Models Can Learn About Themselves by Introspection

Add code
Oct 17, 2024
Viaarxiv icon

Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs

Add code
Jul 05, 2024
Viaarxiv icon

Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data

Add code
Jun 20, 2024
Viaarxiv icon

Can Language Models Explain Their Own Classification Behavior?

Add code
May 13, 2024
Viaarxiv icon

Tell, don't show: Declarative facts influence how LLMs generalize

Add code
Dec 12, 2023
Viaarxiv icon

How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions

Add code
Sep 26, 2023
Viaarxiv icon

The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"

Add code
Sep 22, 2023
Viaarxiv icon

Taken out of context: On measuring situational awareness in LLMs

Add code
Sep 01, 2023
Viaarxiv icon