Picture for Michael S. A. Graziano

Michael S. A. Graziano

Learning Self-Interpretation from Interpretability Artifacts: Training Lightweight Adapters on Vector-Label Pairs

Add code
Feb 10, 2026
Viaarxiv icon

Endogenous Resistance to Activation Steering in Language Models

Add code
Feb 06, 2026
Viaarxiv icon

Improving How Agents Cooperate: Attention Schemas in Artificial Neural Networks

Add code
Nov 01, 2024
Figure 1 for Improving How Agents Cooperate: Attention Schemas in Artificial Neural Networks
Figure 2 for Improving How Agents Cooperate: Attention Schemas in Artificial Neural Networks
Figure 3 for Improving How Agents Cooperate: Attention Schemas in Artificial Neural Networks
Figure 4 for Improving How Agents Cooperate: Attention Schemas in Artificial Neural Networks
Viaarxiv icon

Unexpected Benefits of Self-Modeling in Neural Systems

Add code
Jul 14, 2024
Figure 1 for Unexpected Benefits of Self-Modeling in Neural Systems
Figure 2 for Unexpected Benefits of Self-Modeling in Neural Systems
Figure 3 for Unexpected Benefits of Self-Modeling in Neural Systems
Figure 4 for Unexpected Benefits of Self-Modeling in Neural Systems
Viaarxiv icon

Chatbots as social companions: How people perceive consciousness, human likeness, and social health benefits in machines

Add code
Nov 17, 2023
Viaarxiv icon