Picture for Amit Dhurandhar

Amit Dhurandhar

Identifying Sub-networks in Neural Networks via Functionally Similar Representations

Add code
Oct 21, 2024
Viaarxiv icon

Programming Refusal with Conditional Activation Steering

Add code
Sep 06, 2024
Figure 1 for Programming Refusal with Conditional Activation Steering
Figure 2 for Programming Refusal with Conditional Activation Steering
Figure 3 for Programming Refusal with Conditional Activation Steering
Figure 4 for Programming Refusal with Conditional Activation Steering
Viaarxiv icon

CELL your Model: Contrastive Explanation Methods for Large Language Models

Add code
Jun 17, 2024
Viaarxiv icon

Large Language Model Confidence Estimation via Black-Box Access

Add code
Jun 01, 2024
Viaarxiv icon

Deep Generative Sampling in the Dual Divergence Space: A Data-efficient & Interpretative Approach for Generative AI

Add code
Apr 10, 2024
Viaarxiv icon

Multi-Level Explanations for Generative Language Models

Add code
Mar 21, 2024
Figure 1 for Multi-Level Explanations for Generative Language Models
Figure 2 for Multi-Level Explanations for Generative Language Models
Figure 3 for Multi-Level Explanations for Generative Language Models
Figure 4 for Multi-Level Explanations for Generative Language Models
Viaarxiv icon

Trust Regions for Explanations via Black-Box Probabilistic Certification

Add code
Feb 21, 2024
Viaarxiv icon

Ranking Large Language Models without Ground Truth

Add code
Feb 21, 2024
Figure 1 for Ranking Large Language Models without Ground Truth
Figure 2 for Ranking Large Language Models without Ground Truth
Figure 3 for Ranking Large Language Models without Ground Truth
Figure 4 for Ranking Large Language Models without Ground Truth
Viaarxiv icon

Spectral Adversarial MixUp for Few-Shot Unsupervised Domain Adaptation

Add code
Sep 03, 2023
Viaarxiv icon

When Neural Networks Fail to Generalize? A Model Sensitivity Perspective

Add code
Dec 01, 2022
Viaarxiv icon