Abstract:The emerging in-context learning (ICL) ability of large language models (LLMs) has prompted their use for predictive tasks in various domains with different types of data facilitated by serialization methods. However, with increasing applications in high-stakes domains, it has been shown that LLMs can inherit social bias and discrimination from their pre-training data. In this work, we investigate this inherent bias in LLMs during in-context learning with tabular data. We focus on an optimal demonstration selection approach that utilizes latent concept variables for resource-efficient task adaptation. We design data augmentation strategies that reduce correlation between predictive outcomes and sensitive variables helping to promote fairness during latent concept learning. We utilize the learned concept and select demonstrations from a training dataset to obtain fair predictions during inference while maintaining model utility. The latent concept variable is learned using a smaller internal LLM and the selected demonstrations can be used for inference with larger external LLMs. We empirically verify that the fair latent variable approach improves fairness results on tabular datasets compared to multiple heuristic demonstration selection methods.
Abstract:The widespread popularity of Large Language Models (LLMs), partly due to their unique ability to perform in-context learning, has also brought to light the importance of ethical and safety considerations when deploying these pre-trained models. In this work, we focus on investigating machine unlearning for LLMs motivated by data protection regulations. In contrast to the growing literature on fine-tuning methods to achieve unlearning, we focus on a comparatively lightweight alternative called soft prompting to realize the unlearning of a subset of training data. With losses designed to enforce forgetting as well as utility preservation, our framework \textbf{S}oft \textbf{P}rompting for \textbf{U}n\textbf{l}earning (SPUL) learns prompt tokens that can be appended to an arbitrary query to induce unlearning of specific examples at inference time without updating LLM parameters. We conduct a rigorous evaluation of the proposed method and our results indicate that SPUL can significantly improve the trade-off between utility and forgetting in the context of text classification with LLMs. We further validate our method using multiple LLMs to highlight the scalability of our framework and provide detailed insights into the choice of hyperparameters and the influence of the size of unlearning data. Our implementation is available at \url{https://github.com/karuna-bhaila/llm_unlearning}.
Abstract:Vision language models (VLMs) have recently emerged and gained the spotlight for their ability to comprehend the dual modality of image and textual data. VLMs such as LLaVA, ChatGPT-4, and Gemini have recently shown impressive performance on tasks such as natural image captioning, visual question answering (VQA), and spatial reasoning. Additionally, a universal segmentation model by Meta AI, Segment Anything Model (SAM) shows unprecedented performance at isolating objects from unforeseen images. Since medical experts, biologists, and materials scientists routinely examine microscopy or medical images in conjunction with textual information in the form of captions, literature, or reports, and draw conclusions of great importance and merit, it is indubitably essential to test the performance of VLMs and foundation models such as SAM, on these images. In this study, we charge ChatGPT, LLaVA, Gemini, and SAM with classification, segmentation, counting, and VQA tasks on a variety of microscopy images. We observe that ChatGPT and Gemini are impressively able to comprehend the visual features in microscopy images, while SAM is quite capable at isolating artefacts in a general sense. However, the performance is not close to that of a domain expert - the models are readily encumbered by the introduction of impurities, defects, artefact overlaps and diversity present in the images.
Abstract:Correctly classifying brain tumors is imperative to the prompt and accurate treatment of a patient. While several classification algorithms based on classical image processing or deep learning methods have been proposed to rapidly classify tumors in MR images, most assume the unrealistic setting of noise-free training data. In this work, we study a difficult but realistic setting of training a deep learning model on noisy MR images to classify brain tumors. We propose two training methods that are robust to noisy MRI training data, Influence-based Sample Reweighing (ISR) and Influence-based Sample Perturbation (ISP), which are based on influence functions from robust statistics. Using the influence functions, in ISR, we adaptively reweigh training examples according to how helpful/harmful they are to the training process, while in ISP, we craft and inject helpful perturbation proportional to the influence score. Both ISR and ISP harden the classification model against noisy training data without significantly affecting the generalization ability of the model on test data. We conduct empirical evaluations over a common brain tumor dataset and compare ISR and ISP to three baselines. Our empirical results show that ISR and ISP can efficiently train deep learning models robust against noisy training data.
Abstract:Recently, large language models (LLMs) have taken the spotlight in natural language processing. Further, integrating LLMs with vision enables the users to explore emergent abilities with multimodal data. Visual language models (VLMs), such as LLaVA, Flamingo, or CLIP, have demonstrated impressive performance on various visio-linguistic tasks. Consequently, there are enormous applications of large models that could be potentially used in the biomedical imaging field. Along that direction, there is a lack of related work to show the ability of large models to diagnose the diseases. In this work, we study the zero-shot and few-shot robustness of VLMs on the medical imaging analysis tasks. Our comprehensive experiments demonstrate the effectiveness of VLMs in analyzing biomedical images such as brain MRIs, microscopic images of blood cells, and chest X-rays.
Abstract:Large Language Models (LLMs) have demonstrated their In-Context Learning (ICL) capabilities which provides an opportunity to perform few shot learning without any gradient update. Despite its multiple benefits, ICL generalization performance is sensitive to the selected demonstrations. Selecting effective demonstrations for ICL is still an open research challenge. To address this challenge, we propose a demonstration selection method called InfICL which analyzes influences of training samples through influence functions. Identifying highly influential training samples can potentially aid in uplifting the ICL generalization performance. To limit the running cost of InfICL, we only employ the LLM to generate sample embeddings, and don't perform any costly fine tuning. We perform empirical study on multiple real-world datasets and show merits of our InfICL against state-of-the-art baselines.
Abstract:Recently, large language models (LLMs) have taken the spotlight in natural language processing. Further, integrating LLMs with vision enables the users to explore more emergent abilities in multimodality. Visual language models (VLMs), such as LLaVA, Flamingo, or GPT-4, have demonstrated impressive performance on various visio-linguistic tasks. Consequently, there are enormous applications of large models that could be potentially used on social media platforms. Despite that, there is a lack of related work on detecting or correcting hateful memes with VLMs. In this work, we study the ability of VLMs on hateful meme detection and hateful meme correction tasks with zero-shot prompting. From our empirical experiments, we show the effectiveness of the pretrained LLaVA model and discuss its strengths and weaknesses in these tasks.
Abstract:How to properly set the privacy parameter in differential privacy (DP) has been an open question in DP research since it was first proposed in 2006. In this work, we demonstrate the ability of influence functions to offer insight into how a specific privacy parameter value will affect a model's test loss in the randomized response-based local DP setting. Our proposed method allows a data curator to select the privacy parameter best aligned with their allowed privacy-utility trade-off without requiring heavy computation such as extensive model retraining and data privatization. We consider multiple common randomization scenarios, such as performing randomized response over the features, and/or over the labels, as well as the more complex case of applying a class-dependent label noise correction method to offset the noise incurred by randomization. Further, we provide a detailed discussion over the computational complexity of our proposed approach inclusive of an empirical analysis. Through empirical evaluations we show that for both binary and multi-class settings, influence functions are able to approximate the true change in test loss that occurs when randomized response is applied over features and/or labels with small mean absolute error, especially in cases where noise correction methods are applied.
Abstract:While numerous defense methods have been proposed to prohibit potential poisoning attacks from untrusted data sources, most research works only defend against specific attacks, which leaves many avenues for an adversary to exploit. In this work, we propose an efficient and robust training approach to defend against data poisoning attacks based on influence functions, named Healthy Influential-Noise based Training. Using influence functions, we craft healthy noise that helps to harden the classification model against poisoning attacks without significantly affecting the generalization ability on test data. In addition, our method can perform effectively when only a subset of the training data is modified, instead of the current method of adding noise to all examples that has been used in several previous works. We conduct comprehensive evaluations over two image datasets with state-of-the-art poisoning attacks under different realistic attack scenarios. Our empirical results show that HINT can efficiently protect deep learning models against the effect of both untargeted and targeted poisoning attacks.
Abstract:Both fair machine learning and adversarial learning have been extensively studied. However, attacking fair machine learning models has received less attention. In this paper, we present a framework that seeks to effectively generate poisoning samples to attack both model accuracy and algorithmic fairness. Our attacking framework can target fair machine learning models trained with a variety of group based fairness notions such as demographic parity and equalized odds. We develop three online attacks, adversarial sampling , adversarial labeling, and adversarial feature modification. All three attacks effectively and efficiently produce poisoning samples via sampling, labeling, or modifying a fraction of training data in order to reduce the test accuracy. Our framework enables attackers to flexibly adjust the attack's focus on prediction accuracy or fairness and accurately quantify the impact of each candidate point to both accuracy loss and fairness violation, thus producing effective poisoning samples. Experiments on two real datasets demonstrate the effectiveness and efficiency of our framework.