Picture for Iván Arcuschin

Iván Arcuschin

Mind the Performance Gap: Capability-Behavior Trade-offs in Feature Steering

Add code
Feb 03, 2026
Viaarxiv icon

Measuring Chain-of-Thought Monitorability Through Faithfulness and Verbosity

Add code
Oct 31, 2025
Viaarxiv icon

MIB: A Mechanistic Interpretability Benchmark

Add code
Apr 17, 2025
Figure 1 for MIB: A Mechanistic Interpretability Benchmark
Figure 2 for MIB: A Mechanistic Interpretability Benchmark
Figure 3 for MIB: A Mechanistic Interpretability Benchmark
Figure 4 for MIB: A Mechanistic Interpretability Benchmark
Viaarxiv icon

Chain-of-Thought Reasoning In The Wild Is Not Always Faithful

Add code
Mar 13, 2025
Viaarxiv icon

InterpBench: Semi-Synthetic Transformers for Evaluating Mechanistic Interpretability Techniques

Add code
Jul 19, 2024
Figure 1 for InterpBench: Semi-Synthetic Transformers for Evaluating Mechanistic Interpretability Techniques
Figure 2 for InterpBench: Semi-Synthetic Transformers for Evaluating Mechanistic Interpretability Techniques
Figure 3 for InterpBench: Semi-Synthetic Transformers for Evaluating Mechanistic Interpretability Techniques
Figure 4 for InterpBench: Semi-Synthetic Transformers for Evaluating Mechanistic Interpretability Techniques
Viaarxiv icon