Picture for Fazl Barez

Fazl Barez

PoisonBench: Assessing Large Language Model Vulnerability to Data Poisoning

Add code
Oct 11, 2024
Figure 1 for PoisonBench: Assessing Large Language Model Vulnerability to Data Poisoning
Figure 2 for PoisonBench: Assessing Large Language Model Vulnerability to Data Poisoning
Figure 3 for PoisonBench: Assessing Large Language Model Vulnerability to Data Poisoning
Figure 4 for PoisonBench: Assessing Large Language Model Vulnerability to Data Poisoning
Viaarxiv icon

Sparse Autoencoders Reveal Universal Feature Spaces Across Large Language Models

Add code
Oct 09, 2024
Figure 1 for Sparse Autoencoders Reveal Universal Feature Spaces Across Large Language Models
Figure 2 for Sparse Autoencoders Reveal Universal Feature Spaces Across Large Language Models
Figure 3 for Sparse Autoencoders Reveal Universal Feature Spaces Across Large Language Models
Figure 4 for Sparse Autoencoders Reveal Universal Feature Spaces Across Large Language Models
Viaarxiv icon

Towards Interpreting Visual Information Processing in Vision-Language Models

Add code
Oct 09, 2024
Figure 1 for Towards Interpreting Visual Information Processing in Vision-Language Models
Figure 2 for Towards Interpreting Visual Information Processing in Vision-Language Models
Figure 3 for Towards Interpreting Visual Information Processing in Vision-Language Models
Figure 4 for Towards Interpreting Visual Information Processing in Vision-Language Models
Viaarxiv icon

Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models

Add code
Jun 17, 2024
Viaarxiv icon

Risks and Opportunities of Open-Source Generative AI

Add code
May 14, 2024
Viaarxiv icon

Visualizing Neural Network Imagination

Add code
May 10, 2024
Viaarxiv icon

Near to Mid-term Risks and Opportunities of Open Source Generative AI

Add code
Apr 25, 2024
Figure 1 for Near to Mid-term Risks and Opportunities of Open Source Generative AI
Figure 2 for Near to Mid-term Risks and Opportunities of Open Source Generative AI
Figure 3 for Near to Mid-term Risks and Opportunities of Open Source Generative AI
Figure 4 for Near to Mid-term Risks and Opportunities of Open Source Generative AI
Viaarxiv icon

Interpreting Context Look-ups in Transformers: Investigating Attention-MLP Interactions

Add code
Feb 23, 2024
Viaarxiv icon

Increasing Trust in Language Models through the Reuse of Verified Circuits

Add code
Feb 06, 2024
Viaarxiv icon

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Add code
Jan 17, 2024
Viaarxiv icon