Picture for Anna Soligo

Anna Soligo

Convergent Linear Representations of Emergent Misalignment

Add code
Jun 13, 2025
Viaarxiv icon

Model Organisms for Emergent Misalignment

Add code
Jun 13, 2025
Figure 1 for Model Organisms for Emergent Misalignment
Figure 2 for Model Organisms for Emergent Misalignment
Figure 3 for Model Organisms for Emergent Misalignment
Figure 4 for Model Organisms for Emergent Misalignment
Viaarxiv icon

Induced Modularity and Community Detection for Functionally Interpretable Reinforcement Learning

Add code
Jan 28, 2025
Figure 1 for Induced Modularity and Community Detection for Functionally Interpretable Reinforcement Learning
Figure 2 for Induced Modularity and Community Detection for Functionally Interpretable Reinforcement Learning
Figure 3 for Induced Modularity and Community Detection for Functionally Interpretable Reinforcement Learning
Figure 4 for Induced Modularity and Community Detection for Functionally Interpretable Reinforcement Learning
Viaarxiv icon