Picture for Nandi Schoots

Nandi Schoots

Relating Piecewise Linear Kolmogorov Arnold Networks to ReLU Networks

Add code
Mar 03, 2025
Viaarxiv icon

Modular Training of Neural Networks aids Interpretability

Add code
Feb 04, 2025
Figure 1 for Modular Training of Neural Networks aids Interpretability
Figure 2 for Modular Training of Neural Networks aids Interpretability
Figure 3 for Modular Training of Neural Networks aids Interpretability
Figure 4 for Modular Training of Neural Networks aids Interpretability
Viaarxiv icon

Open Problems in Mechanistic Interpretability

Add code
Jan 27, 2025
Figure 1 for Open Problems in Mechanistic Interpretability
Figure 2 for Open Problems in Mechanistic Interpretability
Figure 3 for Open Problems in Mechanistic Interpretability
Figure 4 for Open Problems in Mechanistic Interpretability
Viaarxiv icon

The Propensity for Density in Feed-forward Models

Add code
Oct 18, 2024
Viaarxiv icon

Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs

Add code
Oct 02, 2024
Figure 1 for Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs
Figure 2 for Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs
Figure 3 for Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs
Figure 4 for Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs
Viaarxiv icon

Extending Activation Steering to Broad Skills and Multiple Behaviours

Add code
Mar 09, 2024
Viaarxiv icon

Dissecting Language Models: Machine Unlearning via Selective Pruning

Add code
Mar 02, 2024
Viaarxiv icon

Improving Activation Steering in Language Models with Mean-Centring

Add code
Dec 06, 2023
Viaarxiv icon

Comparing Optimization Targets for Contrast-Consistent Search

Add code
Nov 01, 2023
Viaarxiv icon

Any Deep ReLU Network is Shallow

Add code
Jun 20, 2023
Viaarxiv icon