Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Rachneet Sachdeva

Turning Logic Against Itself : Probing Model Defenses Through Contrastive Questions

Jan 03, 2025

Rachneet Sachdeva, Rima Hazra, Iryna Gurevych

Figure 1 for Turning Logic Against Itself : Probing Model Defenses Through Contrastive Questions

Figure 2 for Turning Logic Against Itself : Probing Model Defenses Through Contrastive Questions

Figure 3 for Turning Logic Against Itself : Probing Model Defenses Through Contrastive Questions

Figure 4 for Turning Logic Against Itself : Probing Model Defenses Through Contrastive Questions

Abstract:Despite significant efforts to align large language models with human values and ethical guidelines, these models remain susceptible to sophisticated jailbreak attacks that exploit their reasoning capabilities. Traditional safety mechanisms often focus on detecting explicit malicious intent, leaving deeper vulnerabilities unaddressed. In this work, we introduce a jailbreak technique, POATE (Polar Opposite query generation, Adversarial Template construction, and Elaboration), which leverages contrastive reasoning to elicit unethical responses. POATE generates prompts with semantically opposite intents and combines them with adversarial templates to subtly direct models toward producing harmful responses. We conduct extensive evaluations across six diverse language model families of varying parameter sizes, including LLaMA3, Gemma2, Phi3, and GPT-4, to demonstrate the robustness of the attack, achieving significantly higher attack success rates (~44%) compared to existing methods. We evaluate our proposed attack against seven safety defenses, revealing their limitations in addressing reasoning-based vulnerabilities. To counteract this, we propose a defense strategy that improves reasoning robustness through chain-of-thought prompting and reverse thinking, mitigating reasoning-driven adversarial exploits.

* Our code is publicly available at https://github.com/UKPLab/POATE-attack

Via

Access Paper or Ask Questions

CATfOOD: Counterfactual Augmented Training for Improving Out-of-Domain Performance and Calibration

Sep 15, 2023

Rachneet Sachdeva, Martin Tutek, Iryna Gurevych

Figure 1 for CATfOOD: Counterfactual Augmented Training for Improving Out-of-Domain Performance and Calibration

Figure 2 for CATfOOD: Counterfactual Augmented Training for Improving Out-of-Domain Performance and Calibration

Figure 3 for CATfOOD: Counterfactual Augmented Training for Improving Out-of-Domain Performance and Calibration

Figure 4 for CATfOOD: Counterfactual Augmented Training for Improving Out-of-Domain Performance and Calibration

Abstract:In recent years, large language models (LLMs) have shown remarkable capabilities at scale, particularly at generating text conditioned on a prompt. In our work, we investigate the use of LLMs to augment training data of small language models~(SLMs) with automatically generated counterfactual~(CF) instances -- i.e. minimally altered inputs -- in order to improve out-of-domain~(OOD) performance of SLMs in the extractive question answering~(QA) setup. We show that, across various LLM generators, such data augmentation consistently enhances OOD performance and improves model calibration for both confidence-based and rationale-augmented calibrator models. Furthermore, these performance improvements correlate with higher diversity of CF instances in terms of their surface form and semantic content. Finally, we show that CF augmented models which are easier to calibrate also exhibit much lower entropy when assigning importance, indicating that rationale-augmented calibrators prefer concise explanations.

* We make our code available at: https://github.com/UKPLab/CATfOOD

Via

Access Paper or Ask Questions

Are Emergent Abilities in Large Language Models just In-Context Learning?

Sep 04, 2023

Sheng Lu, Irina Bigoulaeva, Rachneet Sachdeva, Harish Tayyar Madabushi, Iryna Gurevych

Abstract:Large language models have exhibited emergent abilities, demonstrating exceptional performance across diverse tasks for which they were not explicitly trained, including those that require complex reasoning abilities. The emergence of such abilities carries profound implications for the future direction of research in NLP, especially as the deployment of such models becomes more prevalent. However, one key challenge is that the evaluation of these abilities is often confounded by competencies that arise in models through alternative prompting techniques, such as in-context learning and instruction following, which also emerge as the models are scaled up. In this study, we provide the first comprehensive examination of these emergent abilities while accounting for various potentially biasing factors that can influence the evaluation of models. We conduct rigorous tests on a set of 18 models, encompassing a parameter range from 60 million to 175 billion parameters, across a comprehensive set of 22 tasks. Through an extensive series of over 1,000 experiments, we provide compelling evidence that emergent abilities can primarily be ascribed to in-context learning. We find no evidence for the emergence of reasoning abilities, thus providing valuable insights into the underlying mechanisms driving the observed abilities and thus alleviating safety concerns regarding their use.

* Code available at https://github.com/UKPLab/on-emergence and data available at https://tudatalib.ulb.tu-darmstadt.de/handle/tudatalib/3931

Via

Access Paper or Ask Questions

UKP-SQuARE v3: A Platform for Multi-Agent QA Research

Mar 31, 2023

Haritz Puerto, Tim Baumgärtner, Rachneet Sachdeva, Haishuo Fang, Hao Zhang, Sewin Tariverdian, Kexin Wang, Iryna Gurevych

Figure 1 for UKP-SQuARE v3: A Platform for Multi-Agent QA Research

Figure 2 for UKP-SQuARE v3: A Platform for Multi-Agent QA Research

Figure 3 for UKP-SQuARE v3: A Platform for Multi-Agent QA Research

Figure 4 for UKP-SQuARE v3: A Platform for Multi-Agent QA Research

Abstract:The continuous development of Question Answering (QA) datasets has drawn the research community's attention toward multi-domain models. A popular approach is to use multi-dataset models, which are models trained on multiple datasets to learn their regularities and prevent overfitting to a single dataset. However, with the proliferation of QA models in online repositories such as GitHub or Hugging Face, an alternative is becoming viable. Recent works have demonstrated that combining expert agents can yield large performance gains over multi-dataset models. To ease research in multi-agent models, we extend UKP-SQuARE, an online platform for QA research, to support three families of multi-agent systems: i) agent selection, ii) early-fusion of agents, and iii) late-fusion of agents. We conduct experiments to evaluate their inference speed and discuss the performance vs. speed trade-off compared to multi-dataset models. UKP-SQuARE is open-source and publicly available at http://square.ukp-lab.de.

Via

Access Paper or Ask Questions

Effective Cross-Task Transfer Learning for Explainable Natural Language Inference with T5

Oct 31, 2022

Irina Bigoulaeva, Rachneet Sachdeva, Harish Tayyar Madabushi, Aline Villavicencio, Iryna Gurevych

Abstract:We compare sequential fine-tuning with a model for multi-task learning in the context where we are interested in boosting performance on two tasks, one of which depends on the other. We test these models on the FigLang2022 shared task which requires participants to predict language inference labels on figurative language along with corresponding textual explanations of the inference predictions. Our results show that while sequential multi-task learning can be tuned to be good at the first of two target tasks, it performs less well on the second and additionally struggles with overfitting. Our findings show that simple sequential fine-tuning of text-to-text models is an extraordinarily powerful method for cross-task knowledge transfer while simultaneously predicting multiple interdependent targets. So much so, that our best model achieved the (tied) highest score on the task.

* Accepted for publication in the Proceedings of the Second Workshop on Figurative Language Processing (colocated with EMNLP 2022). Code and models at https://github.com/Rachneet/cross-task-figurative-explanations

Via

Access Paper or Ask Questions

UKP-SQuARE v2 Explainability and Adversarial Attacks for Trustworthy QA

Aug 23, 2022

Rachneet Sachdeva, Haritz Puerto, Tim Baumgärtner, Sewin Tariverdian, Hao Zhang, Kexin Wang, Hossain Shaikh Saadi, Leonardo F. R. Ribeiro, Iryna Gurevych

Figure 1 for UKP-SQuARE v2 Explainability and Adversarial Attacks for Trustworthy QA

Figure 2 for UKP-SQuARE v2 Explainability and Adversarial Attacks for Trustworthy QA

Figure 3 for UKP-SQuARE v2 Explainability and Adversarial Attacks for Trustworthy QA

Figure 4 for UKP-SQuARE v2 Explainability and Adversarial Attacks for Trustworthy QA

Abstract:Question Answering (QA) systems are increasingly deployed in applications where they support real-world decisions. However, state-of-the-art models rely on deep neural networks, which are difficult to interpret by humans. Inherently interpretable models or post hoc explainability methods can help users to comprehend how a model arrives at its prediction and, if successful, increase their trust in the system. Furthermore, researchers can leverage these insights to develop new methods that are more accurate and less biased. In this paper, we introduce SQuARE v2, the new version of SQuARE, to provide an explainability infrastructure for comparing models based on methods such as saliency maps and graph-based explanations. While saliency maps are useful to inspect the importance of each input token for the model's prediction, graph-based explanations from external Knowledge Graphs enable the users to verify the reasoning behind the model prediction. In addition, we provide multiple adversarial attacks to compare the robustness of QA models. With these explainability methods and adversarial attacks, we aim to ease the research on trustworthy QA models. SQuARE is available on https://square.ukp-lab.de.

Via

Access Paper or Ask Questions

UKP-SQUARE: An Online Platform for Question Answering Research

Mar 28, 2022

Tim Baumgärtner, Kexin Wang, Rachneet Sachdeva, Max Eichler, Gregor Geigle, Clifton Poth, Hannah Sterz, Haritz Puerto, Leonardo F. R. Ribeiro, Jonas Pfeiffer(+3 more)

Figure 1 for UKP-SQUARE: An Online Platform for Question Answering Research

Figure 2 for UKP-SQUARE: An Online Platform for Question Answering Research

Figure 3 for UKP-SQUARE: An Online Platform for Question Answering Research

Figure 4 for UKP-SQUARE: An Online Platform for Question Answering Research

Abstract:Recent advances in NLP and information retrieval have given rise to a diverse set of question answering tasks that are of different formats (e.g., extractive, abstractive), require different model architectures (e.g., generative, discriminative), and setups (e.g., with or without retrieval). Despite having a large number of powerful, specialized QA pipelines (which we refer to as Skills) that consider a single domain, model or setup, there exists no framework where users can easily explore and compare such pipelines and can extend them according to their needs. To address this issue, we present UKP-SQUARE, an extensible online QA platform for researchers which allows users to query and analyze a large collection of modern Skills via a user-friendly web interface and integrated behavioural tests. In addition, QA researchers can develop, manage, and share their custom Skills using our microservices that support a wide range of models (Transformers, Adapters, ONNX), datastores and retrieval techniques (e.g., sparse and dense). UKP-SQUARE is available on https://square.ukp-lab.de.

* Accepted at ACL 2022 Demo Track

Via

Access Paper or Ask Questions