Picture for Nandi Schoots

Nandi Schoots

The Propensity for Density in Feed-forward Models

Add code
Oct 18, 2024
Viaarxiv icon

Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs

Add code
Oct 02, 2024
Figure 1 for Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs
Figure 2 for Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs
Figure 3 for Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs
Figure 4 for Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs
Viaarxiv icon

Extending Activation Steering to Broad Skills and Multiple Behaviours

Add code
Mar 09, 2024
Viaarxiv icon

Dissecting Language Models: Machine Unlearning via Selective Pruning

Add code
Mar 02, 2024
Viaarxiv icon

Improving Activation Steering in Language Models with Mean-Centring

Add code
Dec 06, 2023
Viaarxiv icon

Comparing Optimization Targets for Contrast-Consistent Search

Add code
Nov 01, 2023
Viaarxiv icon

Any Deep ReLU Network is Shallow

Add code
Jun 20, 2023
Viaarxiv icon

Low-Entropy Latent Variables Hurt Out-of-Distribution Performance

Add code
May 20, 2023
Viaarxiv icon

Learning to Communicate with Strangers via Channel Randomisation Methods

Add code
Apr 19, 2021
Figure 1 for Learning to Communicate with Strangers via Channel Randomisation Methods
Figure 2 for Learning to Communicate with Strangers via Channel Randomisation Methods
Figure 3 for Learning to Communicate with Strangers via Channel Randomisation Methods
Figure 4 for Learning to Communicate with Strangers via Channel Randomisation Methods
Viaarxiv icon