Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sanjay Kariyappa

Stronger Enforcement of Instruction Hierarchy via Augmented Intermediate Representations

May 25, 2025

Sanjay Kariyappa, G. Edward Suh

Abstract:Prompt injection attacks are a critical security vulnerability in large language models (LLMs), allowing attackers to hijack model behavior by injecting malicious instructions within the input context. Recent defense mechanisms have leveraged an Instruction Hierarchy (IH) Signal, often implemented through special delimiter tokens or additive embeddings to denote the privilege level of input tokens. However, these prior works typically inject the IH signal exclusively at the initial input layer, which we hypothesize limits its ability to effectively distinguish the privilege levels of tokens as it propagates through the different layers of the model. To overcome this limitation, we introduce a novel approach that injects the IH signal into the intermediate token representations within the network. Our method augments these representations with layer-specific trainable embeddings that encode the privilege information. Our evaluations across multiple models and training methods reveal that our proposal yields between $1.6\times$ and $9.2\times$ reduction in attack success rate on gradient-based prompt injection attacks compared to state-of-the-art methods, without significantly degrading the model's utility.

Via

Access Paper or Ask Questions

Interpretable LLM-based Table Question Answering

Dec 16, 2024

Giang, Nguyen, Ivan Brugere, Shubham Sharma, Sanjay Kariyappa, Anh Totti Nguyen, Freddy Lecue

Figure 1 for Interpretable LLM-based Table Question Answering

Figure 2 for Interpretable LLM-based Table Question Answering

Figure 3 for Interpretable LLM-based Table Question Answering

Figure 4 for Interpretable LLM-based Table Question Answering

Abstract:Interpretability for Table Question Answering (Table QA) is critical, particularly in high-stakes industries like finance or healthcare. Although recent approaches using Large Language Models (LLMs) have significantly improved Table QA performance, their explanations for how the answers are generated are ambiguous. To fill this gap, we introduce Plan-of-SQLs ( or POS), an interpretable, effective, and efficient approach to Table QA that answers an input query solely with SQL executions. Through qualitative and quantitative evaluations with human and LLM judges, we show that POS is most preferred among explanation methods, helps human users understand model decision boundaries, and facilitates model success and error identification. Furthermore, when evaluated in standard benchmarks (TabFact, WikiTQ, and FetaQA), POS achieves competitive or superior accuracy compared to existing methods, while maintaining greater efficiency by requiring significantly fewer LLM calls and database queries.

Via

Access Paper or Ask Questions

Progressive Inference: Explaining Decoder-Only Sequence Classification Models Using Intermediate Predictions

Jun 03, 2024

Sanjay Kariyappa, Freddy Lécué, Saumitra Mishra, Christopher Pond, Daniele Magazzeni, Manuela Veloso

Figure 1 for Progressive Inference: Explaining Decoder-Only Sequence Classification Models Using Intermediate Predictions

Figure 2 for Progressive Inference: Explaining Decoder-Only Sequence Classification Models Using Intermediate Predictions

Figure 3 for Progressive Inference: Explaining Decoder-Only Sequence Classification Models Using Intermediate Predictions

Figure 4 for Progressive Inference: Explaining Decoder-Only Sequence Classification Models Using Intermediate Predictions

Abstract:This paper proposes Progressive Inference - a framework to compute input attributions to explain the predictions of decoder-only sequence classification models. Our work is based on the insight that the classification head of a decoder-only Transformer model can be used to make intermediate predictions by evaluating them at different points in the input sequence. Due to the causal attention mechanism, these intermediate predictions only depend on the tokens seen before the inference point, allowing us to obtain the model's prediction on a masked input sub-sequence, with negligible computational overheads. We develop two methods to provide sub-sequence level attributions using this insight. First, we propose Single Pass-Progressive Inference (SP-PI), which computes attributions by taking the difference between consecutive intermediate predictions. Second, we exploit a connection with Kernel SHAP to develop Multi Pass-Progressive Inference (MP-PI). MP-PI uses intermediate predictions from multiple masked versions of the input to compute higher quality attributions. Our studies on a diverse set of models trained on text classification tasks show that SP-PI and MP-PI provide significantly better attributions compared to prior work.

Via

Access Paper or Ask Questions

Privacy-Preserving Algorithmic Recourse

Nov 23, 2023

Sikha Pentyala, Shubham Sharma, Sanjay Kariyappa, Freddy Lecue, Daniele Magazzeni

Figure 1 for Privacy-Preserving Algorithmic Recourse

Figure 2 for Privacy-Preserving Algorithmic Recourse

Figure 3 for Privacy-Preserving Algorithmic Recourse

Figure 4 for Privacy-Preserving Algorithmic Recourse

Abstract:When individuals are subject to adverse outcomes from machine learning models, providing a recourse path to help achieve a positive outcome is desirable. Recent work has shown that counterfactual explanations - which can be used as a means of single-step recourse - are vulnerable to privacy issues, putting an individuals' privacy at risk. Providing a sequential multi-step path for recourse can amplify this risk. Furthermore, simply adding noise to recourse paths found from existing methods can impact the realism and actionability of the path for an end-user. In this work, we address privacy issues when generating realistic recourse paths based on instance-based counterfactual explanations, and provide PrivRecourse: an end-to-end privacy preserving pipeline that can provide realistic recourse paths. PrivRecourse uses differentially private (DP) clustering to represent non-overlapping subsets of the private dataset. These DP cluster centers are then used to generate recourse paths by forming a graph with cluster centers as the nodes, so that we can generate realistic - feasible and actionable - recourse paths. We empirically evaluate our approach on finance datasets and compare it to simply adding noise to data instances, and to using DP synthetic data, to generate the graph. We observe that PrivRecourse can provide paths that are private and realistic.

* Accepted at 3rd International Workshop on Explainable AI in Finance, ICAIF 2023

Via

Access Paper or Ask Questions

SHAP@k:Efficient and Probably Approximately Correct (PAC) Identification of Top-k Features

Jul 10, 2023

Sanjay Kariyappa, Leonidas Tsepenekas, Freddy Lécué, Daniele Magazzeni

Figure 1 for SHAP@k:Efficient and Probably Approximately Correct (PAC) Identification of Top-k Features

Figure 2 for SHAP@k:Efficient and Probably Approximately Correct (PAC) Identification of Top-k Features

Figure 3 for SHAP@k:Efficient and Probably Approximately Correct (PAC) Identification of Top-k Features

Figure 4 for SHAP@k:Efficient and Probably Approximately Correct (PAC) Identification of Top-k Features

Abstract:The SHAP framework provides a principled method to explain the predictions of a model by computing feature importance. Motivated by applications in finance, we introduce the Top-k Identification Problem (TkIP), where the objective is to identify the k features with the highest SHAP values. While any method to compute SHAP values with uncertainty estimates (such as KernelSHAP and SamplingSHAP) can be trivially adapted to solve TkIP, doing so is highly sample inefficient. The goal of our work is to improve the sample efficiency of existing methods in the context of solving TkIP. Our key insight is that TkIP can be framed as an Explore-m problem--a well-studied problem related to multi-armed bandits (MAB). This connection enables us to improve sample efficiency by leveraging two techniques from the MAB literature: (1) a better stopping-condition (to stop sampling) that identifies when PAC (Probably Approximately Correct) guarantees have been met and (2) a greedy sampling scheme that judiciously allocates samples between different features. By adopting these methods we develop KernelSHAP@k and SamplingSHAP@k to efficiently solve TkIP, offering an average improvement of $5\times$ in sample-efficiency and runtime across most common credit related datasets.

Via

Access Paper or Ask Questions

Information Flow Control in Machine Learning through Modular Model Architecture

Jun 05, 2023

Trishita Tiwari, Suchin Gururangan, Chuan Guo, Weizhe Hua, Sanjay Kariyappa, Udit Gupta, Wenjie Xiong, Kiwan Maeng, Hsien-Hsin S. Lee, G. Edward Suh

Abstract:In today's machine learning (ML) models, any part of the training data can affect its output. This lack of control for information flow from training data to model output is a major obstacle in training models on sensitive data when access control only allows individual users to access a subset of data. To enable secure machine learning for access controlled data, we propose the notion of information flow control for machine learning, and develop a secure Transformer-based language model based on the Mixture-of-Experts (MoE) architecture. The secure MoE architecture controls information flow by limiting the influence of training data from each security domain to a single expert module, and only enabling a subset of experts at inference time based on an access control policy. The evaluation using a large corpus of text data shows that the proposed MoE architecture has minimal (1.9%) performance overhead and can significantly improve model accuracy (up to 37%) by enabling training on access-controlled data.

Via

Access Paper or Ask Questions

Bounding the Invertibility of Privacy-preserving Instance Encoding using Fisher Information

May 06, 2023

Kiwan Maeng, Chuan Guo, Sanjay Kariyappa, G. Edward Suh

Abstract:Privacy-preserving instance encoding aims to encode raw data as feature vectors without revealing their privacy-sensitive information. When designed properly, these encodings can be used for downstream ML applications such as training and inference with limited privacy risk. However, the vast majority of existing instance encoding schemes are based on heuristics and their privacy-preserving properties are only validated empirically against a limited set of attacks. In this paper, we propose a theoretically-principled measure for the privacy of instance encoding based on Fisher information. We show that our privacy measure is intuitive, easily applicable, and can be used to bound the invertibility of encodings both theoretically and empirically.

Via

Access Paper or Ask Questions

Measuring and Controlling Split Layer Privacy Leakage Using Fisher Information

Sep 21, 2022

Kiwan Maeng, Chuan Guo, Sanjay Kariyappa, Edward Suh

Figure 1 for Measuring and Controlling Split Layer Privacy Leakage Using Fisher Information

Figure 2 for Measuring and Controlling Split Layer Privacy Leakage Using Fisher Information

Figure 3 for Measuring and Controlling Split Layer Privacy Leakage Using Fisher Information

Figure 4 for Measuring and Controlling Split Layer Privacy Leakage Using Fisher Information

Abstract:Split learning and inference propose to run training/inference of a large model that is split across client devices and the cloud. However, such a model splitting imposes privacy concerns, because the activation flowing through the split layer may leak information about the clients' private input data. There is currently no good way to quantify how much private information is being leaked through the split layer, nor a good way to improve privacy up to the desired level. In this work, we propose to use Fisher information as a privacy metric to measure and control the information leakage. We show that Fisher information can provide an intuitive understanding of how much private information is leaking through the split layer, in the form of an error bound for an unbiased reconstruction attacker. We then propose a privacy-enhancing technique, ReFIL, that can enforce a user-desired level of Fisher information leakage at the split layer to achieve high privacy, while maintaining reasonable utility.

Via

Access Paper or Ask Questions

Cocktail Party Attack: Breaking Aggregation-Based Privacy in Federated Learning using Independent Component Analysis

Sep 12, 2022

Sanjay Kariyappa, Chuan Guo, Kiwan Maeng, Wenjie Xiong, G. Edward Suh, Moinuddin K Qureshi, Hsien-Hsin S. Lee

Figure 1 for Cocktail Party Attack: Breaking Aggregation-Based Privacy in Federated Learning using Independent Component Analysis

Figure 2 for Cocktail Party Attack: Breaking Aggregation-Based Privacy in Federated Learning using Independent Component Analysis

Figure 3 for Cocktail Party Attack: Breaking Aggregation-Based Privacy in Federated Learning using Independent Component Analysis

Figure 4 for Cocktail Party Attack: Breaking Aggregation-Based Privacy in Federated Learning using Independent Component Analysis

Abstract:Federated learning (FL) aims to perform privacy-preserving machine learning on distributed data held by multiple data owners. To this end, FL requires the data owners to perform training locally and share the gradient updates (instead of the private inputs) with the central server, which are then securely aggregated over multiple data owners. Although aggregation by itself does not provably offer privacy protection, prior work showed that it may suffice if the batch size is sufficiently large. In this paper, we propose the Cocktail Party Attack (CPA) that, contrary to prior belief, is able to recover the private inputs from gradients aggregated over a very large batch size. CPA leverages the crucial insight that aggregate gradients from a fully connected layer is a linear combination of its inputs, which leads us to frame gradient inversion as a blind source separation (BSS) problem (informally called the cocktail party problem). We adapt independent component analysis (ICA)--a classic solution to the BSS problem--to recover private inputs for fully-connected and convolutional networks, and show that CPA significantly outperforms prior gradient inversion attacks, scales to ImageNet-sized inputs, and works on large batch sizes of up to 1024.

Via

Access Paper or Ask Questions

Gradient Inversion Attack: Leaking Private Labels in Two-Party Split Learning

Nov 25, 2021

Sanjay Kariyappa, Moinuddin K Qureshi

Figure 1 for Gradient Inversion Attack: Leaking Private Labels in Two-Party Split Learning

Figure 2 for Gradient Inversion Attack: Leaking Private Labels in Two-Party Split Learning

Figure 3 for Gradient Inversion Attack: Leaking Private Labels in Two-Party Split Learning

Figure 4 for Gradient Inversion Attack: Leaking Private Labels in Two-Party Split Learning

Abstract:Split learning is a popular technique used to perform vertical federated learning, where the goal is to jointly train a model on the private input and label data held by two parties. To preserve privacy of the input and label data, this technique uses a split model and only requires the exchange of intermediate representations (IR) of the inputs and gradients of the IR between the two parties during the learning process. In this paper, we propose Gradient Inversion Attack (GIA), a label leakage attack that allows an adversarial input owner to learn the label owner's private labels by exploiting the gradient information obtained during split learning. GIA frames the label leakage attack as a supervised learning problem by developing a novel loss function using certain key properties of the dataset and models. Our attack can uncover the private label data on several multi-class image classification problems and a binary conversion prediction task with near-perfect accuracy (97.01% - 99.96%), demonstrating that split learning provides negligible privacy benefits to the label owner. Furthermore, we evaluate the use of gradient noise to defend against GIA. While this technique is effective for simpler datasets, it significantly degrades utility for datasets with higher input dimensionality. Our findings underscore the need for better privacy-preserving training techniques for vertically split data.

Via

Access Paper or Ask Questions