Picture for Mor Geva

Mor Geva

Shammie

Preventing Rogue Agents Improves Multi-Agent Collaboration

Add code
Feb 09, 2025
Figure 1 for Preventing Rogue Agents Improves Multi-Agent Collaboration
Figure 2 for Preventing Rogue Agents Improves Multi-Agent Collaboration
Figure 3 for Preventing Rogue Agents Improves Multi-Agent Collaboration
Figure 4 for Preventing Rogue Agents Improves Multi-Agent Collaboration
Viaarxiv icon

Open Problems in Mechanistic Interpretability

Add code
Jan 27, 2025
Figure 1 for Open Problems in Mechanistic Interpretability
Figure 2 for Open Problems in Mechanistic Interpretability
Figure 3 for Open Problems in Mechanistic Interpretability
Figure 4 for Open Problems in Mechanistic Interpretability
Viaarxiv icon

Enhancing Automated Interpretability with Output-Centric Feature Descriptions

Add code
Jan 14, 2025
Viaarxiv icon

Open Problems in Machine Unlearning for AI Safety

Add code
Jan 09, 2025
Viaarxiv icon

Performance Gap in Entity Knowledge Extraction Across Modalities in Vision Language Models

Add code
Dec 18, 2024
Figure 1 for Performance Gap in Entity Knowledge Extraction Across Modalities in Vision Language Models
Figure 2 for Performance Gap in Entity Knowledge Extraction Across Modalities in Vision Language Models
Figure 3 for Performance Gap in Entity Knowledge Extraction Across Modalities in Vision Language Models
Figure 4 for Performance Gap in Entity Knowledge Extraction Across Modalities in Vision Language Models
Viaarxiv icon

Inferring Functionality of Attention Heads from their Parameters

Add code
Dec 16, 2024
Viaarxiv icon

Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?

Add code
Nov 25, 2024
Viaarxiv icon

Eliciting Textual Descriptions from Representations of Continuous Prompts

Add code
Oct 15, 2024
Figure 1 for Eliciting Textual Descriptions from Representations of Continuous Prompts
Figure 2 for Eliciting Textual Descriptions from Representations of Continuous Prompts
Figure 3 for Eliciting Textual Descriptions from Representations of Continuous Prompts
Figure 4 for Eliciting Textual Descriptions from Representations of Continuous Prompts
Viaarxiv icon

Language Models Encode Numbers Using Digit Representations in Base 10

Add code
Oct 15, 2024
Viaarxiv icon

Towards Interpreting Visual Information Processing in Vision-Language Models

Add code
Oct 09, 2024
Figure 1 for Towards Interpreting Visual Information Processing in Vision-Language Models
Figure 2 for Towards Interpreting Visual Information Processing in Vision-Language Models
Figure 3 for Towards Interpreting Visual Information Processing in Vision-Language Models
Figure 4 for Towards Interpreting Visual Information Processing in Vision-Language Models
Viaarxiv icon