Picture for Christian Schroeder de Witt

Christian Schroeder de Witt

MALT: Improving Reasoning with Multi-Agent LLM Training

Add code
Dec 02, 2024
Viaarxiv icon

Delta-Influence: Unlearning Poisons via Influence Functions

Add code
Nov 20, 2024
Figure 1 for Delta-Influence: Unlearning Poisons via Influence Functions
Figure 2 for Delta-Influence: Unlearning Poisons via Influence Functions
Figure 3 for Delta-Influence: Unlearning Poisons via Influence Functions
Figure 4 for Delta-Influence: Unlearning Poisons via Influence Functions
Viaarxiv icon

MAD-Sherlock: Multi-Agent Debates for Out-of-Context Misinformation Detection

Add code
Oct 26, 2024
Viaarxiv icon

Efficient Dictionary Learning with Switch Sparse Autoencoders

Add code
Oct 10, 2024
Figure 1 for Efficient Dictionary Learning with Switch Sparse Autoencoders
Figure 2 for Efficient Dictionary Learning with Switch Sparse Autoencoders
Figure 3 for Efficient Dictionary Learning with Switch Sparse Autoencoders
Figure 4 for Efficient Dictionary Learning with Switch Sparse Autoencoders
Viaarxiv icon

Toward Robust Real-World Audio Deepfake Detection: Closing the Explainability Gap

Add code
Oct 09, 2024
Figure 1 for Toward Robust Real-World Audio Deepfake Detection: Closing the Explainability Gap
Figure 2 for Toward Robust Real-World Audio Deepfake Detection: Closing the Explainability Gap
Figure 3 for Toward Robust Real-World Audio Deepfake Detection: Closing the Explainability Gap
Figure 4 for Toward Robust Real-World Audio Deepfake Detection: Closing the Explainability Gap
Viaarxiv icon

SAGE: Scalable Ground Truth Evaluations for Large Sparse Autoencoders

Add code
Oct 09, 2024
Figure 1 for SAGE: Scalable Ground Truth Evaluations for Large Sparse Autoencoders
Figure 2 for SAGE: Scalable Ground Truth Evaluations for Large Sparse Autoencoders
Figure 3 for SAGE: Scalable Ground Truth Evaluations for Large Sparse Autoencoders
Figure 4 for SAGE: Scalable Ground Truth Evaluations for Large Sparse Autoencoders
Viaarxiv icon

Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs

Add code
Oct 02, 2024
Figure 1 for Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs
Figure 2 for Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs
Figure 3 for Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs
Figure 4 for Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs
Viaarxiv icon

IDs for AI Systems

Add code
Jun 17, 2024
Viaarxiv icon

Unelicitable Backdoors in Language Models via Cryptographic Transformer Circuits

Add code
Jun 03, 2024
Viaarxiv icon

Near to Mid-term Risks and Opportunities of Open Source Generative AI

Add code
Apr 25, 2024
Figure 1 for Near to Mid-term Risks and Opportunities of Open Source Generative AI
Figure 2 for Near to Mid-term Risks and Opportunities of Open Source Generative AI
Figure 3 for Near to Mid-term Risks and Opportunities of Open Source Generative AI
Figure 4 for Near to Mid-term Risks and Opportunities of Open Source Generative AI
Viaarxiv icon