Picture for Senthooran Rajamanoharan

Senthooran Rajamanoharan

Thought Branches: Interpreting LLM Reasoning Requires Resampling

Add code
Oct 31, 2025
Viaarxiv icon

Steering Out-of-Distribution Generalization with Concept Ablation Fine-Tuning

Add code
Jul 22, 2025
Viaarxiv icon

Dense SAE Latents Are Features, Not Bugs

Add code
Jun 18, 2025
Viaarxiv icon

Model Organisms for Emergent Misalignment

Add code
Jun 13, 2025
Figure 1 for Model Organisms for Emergent Misalignment
Figure 2 for Model Organisms for Emergent Misalignment
Figure 3 for Model Organisms for Emergent Misalignment
Figure 4 for Model Organisms for Emergent Misalignment
Viaarxiv icon

Convergent Linear Representations of Emergent Misalignment

Add code
Jun 13, 2025
Viaarxiv icon

Towards eliciting latent knowledge from LLMs with mechanistic interpretability

Add code
May 20, 2025
Viaarxiv icon

An Approach to Technical AGI Safety and Security

Add code
Apr 02, 2025
Viaarxiv icon

Chain-of-Thought Reasoning In The Wild Is Not Always Faithful

Add code
Mar 13, 2025
Viaarxiv icon

Are Sparse Autoencoders Useful? A Case Study in Sparse Probing

Add code
Feb 23, 2025
Viaarxiv icon

Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models

Add code
Nov 21, 2024
Figure 1 for Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models
Figure 2 for Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models
Figure 3 for Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models
Figure 4 for Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models
Viaarxiv icon