Picture for Nikita Balagansky

Nikita Balagansky

Unstable Features, Reproducible Subspaces: Understanding Seed Dependence in Sparse Autoencoders

Add code
Jun 10, 2026
Viaarxiv icon

Interpreting and Steering a Text-to-Speech Language Model with Sparse Autoencoders

Add code
Jun 08, 2026
Viaarxiv icon

Trust-Region Behavior Blending for On-Policy Distillation

Add code
May 29, 2026
Viaarxiv icon

Next Embedding Prediction Makes World Models Stronger

Add code
Mar 03, 2026
Viaarxiv icon

Teach Old SAEs New Domain Tricks with Boosting

Add code
Jul 17, 2025
Figure 1 for Teach Old SAEs New Domain Tricks with Boosting
Figure 2 for Teach Old SAEs New Domain Tricks with Boosting
Figure 3 for Teach Old SAEs New Domain Tricks with Boosting
Figure 4 for Teach Old SAEs New Domain Tricks with Boosting
Viaarxiv icon

Train One Sparse Autoencoder Across Multiple Sparsity Budgets to Preserve Interpretability and Accuracy

Add code
May 30, 2025
Viaarxiv icon

Train Sparse Autoencoders Efficiently by Utilizing Features Correlation

Add code
May 28, 2025
Figure 1 for Train Sparse Autoencoders Efficiently by Utilizing Features Correlation
Figure 2 for Train Sparse Autoencoders Efficiently by Utilizing Features Correlation
Figure 3 for Train Sparse Autoencoders Efficiently by Utilizing Features Correlation
Figure 4 for Train Sparse Autoencoders Efficiently by Utilizing Features Correlation
Viaarxiv icon

Steering LLM Reasoning Through Bias-Only Adaptation

Add code
May 24, 2025
Figure 1 for Steering LLM Reasoning Through Bias-Only Adaptation
Figure 2 for Steering LLM Reasoning Through Bias-Only Adaptation
Figure 3 for Steering LLM Reasoning Through Bias-Only Adaptation
Viaarxiv icon

You Do Not Fully Utilize Transformer's Representation Capacity

Add code
Feb 13, 2025
Figure 1 for You Do Not Fully Utilize Transformer's Representation Capacity
Figure 2 for You Do Not Fully Utilize Transformer's Representation Capacity
Figure 3 for You Do Not Fully Utilize Transformer's Representation Capacity
Figure 4 for You Do Not Fully Utilize Transformer's Representation Capacity
Viaarxiv icon

Analyze Feature Flow to Enhance Interpretation and Steering in Language Models

Add code
Feb 06, 2025
Figure 1 for Analyze Feature Flow to Enhance Interpretation and Steering in Language Models
Figure 2 for Analyze Feature Flow to Enhance Interpretation and Steering in Language Models
Figure 3 for Analyze Feature Flow to Enhance Interpretation and Steering in Language Models
Figure 4 for Analyze Feature Flow to Enhance Interpretation and Steering in Language Models
Viaarxiv icon