Picture for Christian Schroeder de Witt

Christian Schroeder de Witt

MAD-Sherlock: Multi-Agent Debates for Out-of-Context Misinformation Detection

Add code
Oct 26, 2024
Viaarxiv icon

Efficient Dictionary Learning with Switch Sparse Autoencoders

Add code
Oct 10, 2024
Viaarxiv icon

Toward Robust Real-World Audio Deepfake Detection: Closing the Explainability Gap

Add code
Oct 09, 2024
Figure 1 for Toward Robust Real-World Audio Deepfake Detection: Closing the Explainability Gap
Figure 2 for Toward Robust Real-World Audio Deepfake Detection: Closing the Explainability Gap
Figure 3 for Toward Robust Real-World Audio Deepfake Detection: Closing the Explainability Gap
Figure 4 for Toward Robust Real-World Audio Deepfake Detection: Closing the Explainability Gap
Viaarxiv icon

SAGE: Scalable Ground Truth Evaluations for Large Sparse Autoencoders

Add code
Oct 09, 2024
Figure 1 for SAGE: Scalable Ground Truth Evaluations for Large Sparse Autoencoders
Figure 2 for SAGE: Scalable Ground Truth Evaluations for Large Sparse Autoencoders
Figure 3 for SAGE: Scalable Ground Truth Evaluations for Large Sparse Autoencoders
Figure 4 for SAGE: Scalable Ground Truth Evaluations for Large Sparse Autoencoders
Viaarxiv icon

Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs

Add code
Oct 02, 2024
Figure 1 for Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs
Figure 2 for Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs
Figure 3 for Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs
Figure 4 for Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs
Viaarxiv icon

IDs for AI Systems

Add code
Jun 17, 2024
Viaarxiv icon

Unelicitable Backdoors in Language Models via Cryptographic Transformer Circuits

Add code
Jun 03, 2024
Viaarxiv icon

Near to Mid-term Risks and Opportunities of Open Source Generative AI

Add code
Apr 25, 2024
Figure 1 for Near to Mid-term Risks and Opportunities of Open Source Generative AI
Figure 2 for Near to Mid-term Risks and Opportunities of Open Source Generative AI
Figure 3 for Near to Mid-term Risks and Opportunities of Open Source Generative AI
Figure 4 for Near to Mid-term Risks and Opportunities of Open Source Generative AI
Viaarxiv icon

Rethinking Out-of-Distribution Detection for Reinforcement Learning: Advancing Methods for Evaluation and Detection

Add code
Apr 10, 2024
Viaarxiv icon

Secret Collusion Among Generative AI Agents

Add code
Feb 12, 2024
Figure 1 for Secret Collusion Among Generative AI Agents
Figure 2 for Secret Collusion Among Generative AI Agents
Figure 3 for Secret Collusion Among Generative AI Agents
Figure 4 for Secret Collusion Among Generative AI Agents
Viaarxiv icon