Picture for Ashkan Khakzar

Ashkan Khakzar

Sparse Autoencoders Reveal Universal Feature Spaces Across Large Language Models

Add code
Oct 09, 2024
Figure 1 for Sparse Autoencoders Reveal Universal Feature Spaces Across Large Language Models
Figure 2 for Sparse Autoencoders Reveal Universal Feature Spaces Across Large Language Models
Figure 3 for Sparse Autoencoders Reveal Universal Feature Spaces Across Large Language Models
Figure 4 for Sparse Autoencoders Reveal Universal Feature Spaces Across Large Language Models
Viaarxiv icon

The Cognitive Revolution in Interpretability: From Explaining Behavior to Interpreting Representations and Algorithms

Add code
Aug 11, 2024
Viaarxiv icon

Learning Visual Prompts for Guiding the Attention of Vision Transformers

Add code
Jun 05, 2024
Viaarxiv icon

Latent Guard: a Safety Framework for Text-to-image Generation

Add code
Apr 11, 2024
Viaarxiv icon

On Discprecncies between Perturbation Evaluations of Graph Neural Network Attributions

Add code
Jan 01, 2024
Viaarxiv icon

A Survey on Transferability of Adversarial Examples across Deep Neural Networks

Add code
Oct 26, 2023
Viaarxiv icon

AttributionLab: Faithfulness of Feature Attribution Under Controllable Environments

Add code
Oct 10, 2023
Viaarxiv icon

A Dual-Perspective Approach to Evaluating Feature Attribution Methods

Add code
Aug 17, 2023
Viaarxiv icon

Pixel-Level Explanation of Multiple Instance Learning Models in Biomedical Single Cell Images

Add code
Mar 15, 2023
Viaarxiv icon

CheXplaining in Style: Counterfactual Explanations for Chest X-rays using StyleGAN

Add code
Jul 15, 2022
Figure 1 for CheXplaining in Style: Counterfactual Explanations for Chest X-rays using StyleGAN
Figure 2 for CheXplaining in Style: Counterfactual Explanations for Chest X-rays using StyleGAN
Figure 3 for CheXplaining in Style: Counterfactual Explanations for Chest X-rays using StyleGAN
Figure 4 for CheXplaining in Style: Counterfactual Explanations for Chest X-rays using StyleGAN
Viaarxiv icon