Picture for Amit Dhurandhar

Amit Dhurandhar

Protecting Users From Themselves: Safeguarding Contextual Privacy in Interactions with Conversational Agents

Add code
Feb 22, 2025
Viaarxiv icon

AGGA: A Dataset of Academic Guidelines for Generative AI and Large Language Models

Add code
Jan 07, 2025
Figure 1 for AGGA: A Dataset of Academic Guidelines for Generative AI and Large Language Models
Viaarxiv icon

Final-Model-Only Data Attribution with a Unifying View of Gradient-Based Methods

Add code
Dec 05, 2024
Figure 1 for Final-Model-Only Data Attribution with a Unifying View of Gradient-Based Methods
Figure 2 for Final-Model-Only Data Attribution with a Unifying View of Gradient-Based Methods
Figure 3 for Final-Model-Only Data Attribution with a Unifying View of Gradient-Based Methods
Figure 4 for Final-Model-Only Data Attribution with a Unifying View of Gradient-Based Methods
Viaarxiv icon

Identifying Sub-networks in Neural Networks via Functionally Similar Representations

Add code
Oct 21, 2024
Figure 1 for Identifying Sub-networks in Neural Networks via Functionally Similar Representations
Figure 2 for Identifying Sub-networks in Neural Networks via Functionally Similar Representations
Figure 3 for Identifying Sub-networks in Neural Networks via Functionally Similar Representations
Figure 4 for Identifying Sub-networks in Neural Networks via Functionally Similar Representations
Viaarxiv icon

Programming Refusal with Conditional Activation Steering

Add code
Sep 06, 2024
Figure 1 for Programming Refusal with Conditional Activation Steering
Figure 2 for Programming Refusal with Conditional Activation Steering
Figure 3 for Programming Refusal with Conditional Activation Steering
Figure 4 for Programming Refusal with Conditional Activation Steering
Viaarxiv icon

CELL your Model: Contrastive Explanation Methods for Large Language Models

Add code
Jun 17, 2024
Viaarxiv icon

Large Language Model Confidence Estimation via Black-Box Access

Add code
Jun 01, 2024
Viaarxiv icon

Deep Generative Sampling in the Dual Divergence Space: A Data-efficient & Interpretative Approach for Generative AI

Add code
Apr 10, 2024
Figure 1 for Deep Generative Sampling in the Dual Divergence Space: A Data-efficient & Interpretative Approach for Generative AI
Figure 2 for Deep Generative Sampling in the Dual Divergence Space: A Data-efficient & Interpretative Approach for Generative AI
Figure 3 for Deep Generative Sampling in the Dual Divergence Space: A Data-efficient & Interpretative Approach for Generative AI
Figure 4 for Deep Generative Sampling in the Dual Divergence Space: A Data-efficient & Interpretative Approach for Generative AI
Viaarxiv icon

Multi-Level Explanations for Generative Language Models

Add code
Mar 21, 2024
Figure 1 for Multi-Level Explanations for Generative Language Models
Figure 2 for Multi-Level Explanations for Generative Language Models
Figure 3 for Multi-Level Explanations for Generative Language Models
Figure 4 for Multi-Level Explanations for Generative Language Models
Viaarxiv icon

Trust Regions for Explanations via Black-Box Probabilistic Certification

Add code
Feb 21, 2024
Figure 1 for Trust Regions for Explanations via Black-Box Probabilistic Certification
Figure 2 for Trust Regions for Explanations via Black-Box Probabilistic Certification
Figure 3 for Trust Regions for Explanations via Black-Box Probabilistic Certification
Figure 4 for Trust Regions for Explanations via Black-Box Probabilistic Certification
Viaarxiv icon