Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Rashmi Gangadharaiah

Constrained Decoding with Speculative Lookaheads

Dec 09, 2024

Nishanth Nakshatri, Shamik Roy, Rajarshi Das, Suthee Chaidaroon, Leonid Boytsov, Rashmi Gangadharaiah

Abstract:Constrained decoding with lookahead heuristics (CDLH) is a highly effective method for aligning LLM generations to human preferences. However, the extensive lookahead roll-out operations for each generated token makes CDLH prohibitively expensive, resulting in low adoption in practice. In contrast, common decoding strategies such as greedy decoding are extremely efficient, but achieve very low constraint satisfaction. We propose constrained decoding with speculative lookaheads (CDSL), a technique that significantly improves upon the inference efficiency of CDLH without experiencing the drastic performance reduction seen with greedy decoding. CDSL is motivated by the recently proposed idea of speculative decoding that uses a much smaller draft LLM for generation and a larger target LLM for verification. In CDSL, the draft model is used to generate lookaheads which is verified by a combination of target LLM and task-specific reward functions. This process accelerates decoding by reducing the computational burden while maintaining strong performance. We evaluate CDSL in two constraint decoding tasks with three LLM families and achieve 2.2x to 12.15x speedup over CDLH without significant performance reduction.

* Under submission

Via

Access Paper or Ask Questions

User Persona Identification and New Service Adaptation Recommendation

Nov 15, 2023

Narges Tabari, Sandesh Swamy, Rashmi Gangadharaiah

Abstract:Providing a personalized user experience on information dense webpages helps users in reaching their end-goals sooner. We explore an automated approach to identifying user personas by leveraging high dimensional trajectory information from user sessions on webpages. While neural collaborative filtering (NCF) approaches pay little attention to token semantics, our method introduces SessionBERT, a Transformer-backed language model trained from scratch on the masked language modeling (mlm) objective for user trajectories (pages, metadata, billing in a session) aiming to capture semantics within them. Our results show that representations learned through SessionBERT are able to consistently outperform a BERT-base model providing a 3% and 1% relative improvement in F1-score for predicting page links and next services. We leverage SessionBERT and extend it to provide recommendations (top-5) for the next most-relevant services that a user would be likely to use. We achieve a HIT@5 of 58% from our recommendation model.

* 6 pages

Via

Access Paper or Ask Questions

Bring Your Own KG: Self-Supervised Program Synthesis for Zero-Shot KGQA

Nov 14, 2023

Dhruv Agarwal, Rajarshi Das, Sopan Khosla, Rashmi Gangadharaiah

Figure 1 for Bring Your Own KG: Self-Supervised Program Synthesis for Zero-Shot KGQA

Figure 2 for Bring Your Own KG: Self-Supervised Program Synthesis for Zero-Shot KGQA

Figure 3 for Bring Your Own KG: Self-Supervised Program Synthesis for Zero-Shot KGQA

Figure 4 for Bring Your Own KG: Self-Supervised Program Synthesis for Zero-Shot KGQA

Abstract:We present BYOKG, a universal question-answering (QA) system that can operate on any knowledge graph (KG), requires no human-annotated training data, and can be ready to use within a day -- attributes that are out-of-scope for current KGQA systems. BYOKG draws inspiration from the remarkable ability of humans to comprehend information present in an unseen KG through exploration -- starting at random nodes, inspecting the labels of adjacent nodes and edges, and combining them with their prior world knowledge. In BYOKG, exploration leverages an LLM-backed symbolic agent that generates a diverse set of query-program exemplars, which are then used to ground a retrieval-augmented reasoning procedure to predict programs for arbitrary questions. BYOKG is effective over both small- and large-scale graphs, showing dramatic gains in QA accuracy over a zero-shot baseline of 27.89 and 58.02 F1 on GrailQA and MetaQA, respectively. On GrailQA, we further show that our unsupervised BYOKG outperforms a supervised in-context learning method, demonstrating the effectiveness of exploration. Lastly, we find that performance of BYOKG reliably improves with continued exploration as well as improvements in the base LLM, notably outperforming a state-of-the-art fine-tuned model by 7.08 F1 on a sub-sampled zero-shot split of GrailQA.

Via

Access Paper or Ask Questions

Document-Level Supervision for Multi-Aspect Sentiment Analysis Without Fine-grained Labels

Oct 10, 2023

Kasturi Bhattacharjee, Rashmi Gangadharaiah

Abstract:Aspect-based sentiment analysis (ABSA) is a widely studied topic, most often trained through supervision from human annotations of opinionated texts. These fine-grained annotations include identifying aspects towards which a user expresses their sentiment, and their associated polarities (aspect-based sentiments). Such fine-grained annotations can be expensive and often infeasible to obtain in real-world settings. There is, however, an abundance of scenarios where user-generated text contains an overall sentiment, such as a rating of 1-5 in user reviews or user-generated feedback, which may be leveraged for this task. In this paper, we propose a VAE-based topic modeling approach that performs ABSA using document-level supervision and without requiring fine-grained labels for either aspects or sentiments. Our approach allows for the detection of multiple aspects in a document, thereby allowing for the possibility of reasoning about how sentiment expressed through multiple aspects comes together to form an observable overall document-level sentiment. We demonstrate results on two benchmark datasets from two different domains, significantly outperforming a state-of-the-art baseline.

* 9 pages, 1 figure

Via

Access Paper or Ask Questions

Contextual Dynamic Prompting for Response Generation in Task-oriented Dialog Systems

Feb 10, 2023

Sandesh Swamy, Narges Tabari, Chacha Chen, Rashmi Gangadharaiah

Figure 1 for Contextual Dynamic Prompting for Response Generation in Task-oriented Dialog Systems

Figure 2 for Contextual Dynamic Prompting for Response Generation in Task-oriented Dialog Systems

Figure 3 for Contextual Dynamic Prompting for Response Generation in Task-oriented Dialog Systems

Figure 4 for Contextual Dynamic Prompting for Response Generation in Task-oriented Dialog Systems

Abstract:Response generation is one of the critical components in task-oriented dialog systems. Existing studies have shown that large pre-trained language models can be adapted to this task. The typical paradigm of adapting such extremely large language models would be by fine-tuning on the downstream tasks which is not only time-consuming but also involves significant resources and access to fine-tuning data. Prompting (Schick and Sch\"utze, 2020) has been an alternative to fine-tuning in many NLP tasks. In our work, we explore the idea of using prompting for response generation in task-oriented dialog systems. Specifically, we propose an approach that performs contextual dynamic prompting where the prompts are learnt from dialog contexts. We aim to distill useful prompting signals from the dialog context. On experiments with MultiWOZ 2.2 dataset (Zang et al., 2020), we show that contextual dynamic prompts improve response generation in terms of combined score (Mehri et al., 2019) by 3 absolute points, and a massive 20 points when dialog states are incorporated. Furthermore, human annotation on these conversations found that agents which incorporate context were preferred over agents with vanilla prefix-tuning.

* Accepted at EACL 2023 main conference. (Camera-ready version)

Via

Access Paper or Ask Questions

Privacy Adhering Machine Un-learning in NLP

Dec 19, 2022

Vinayshekhar Bannihatti Kumar, Rashmi Gangadharaiah, Dan Roth

Figure 1 for Privacy Adhering Machine Un-learning in NLP

Figure 2 for Privacy Adhering Machine Un-learning in NLP

Figure 3 for Privacy Adhering Machine Un-learning in NLP

Figure 4 for Privacy Adhering Machine Un-learning in NLP

Abstract:Regulations introduced by General Data Protection Regulation (GDPR) in the EU or California Consumer Privacy Act (CCPA) in the US have included provisions on the \textit{right to be forgotten} that mandates industry applications to remove data related to an individual from their systems. In several real world industry applications that use Machine Learning to build models on user data, such mandates require significant effort both in terms of data cleansing as well as model retraining while ensuring the models do not deteriorate in prediction quality due to removal of data. As a result, continuous removal of data and model retraining steps do not scale if these applications receive such requests at a very high frequency. Recently, a few researchers proposed the idea of \textit{Machine Unlearning} to tackle this challenge. Despite the significant importance of this task, the area of Machine Unlearning is under-explored in Natural Language Processing (NLP) tasks. In this paper, we explore the Unlearning framework on various GLUE tasks \cite{Wang:18}, such as, QQP, SST and MNLI. We propose computationally efficient approaches (SISA-FC and SISA-A) to perform \textit{guaranteed} Unlearning that provides significant reduction in terms of both memory (90-95\%), time (100x) and space consumption (99\%) in comparison to the baselines while keeping model performance constant.

Via

Access Paper or Ask Questions

Zero-Shot Learning for Joint Intent and Slot Labeling

Nov 29, 2022

Rashmi Gangadharaiah, Balakrishnan Narayanaswamy

Abstract:It is expensive and difficult to obtain the large number of sentence-level intent and token-level slot label annotations required to train neural network (NN)-based Natural Language Understanding (NLU) components of task-oriented dialog systems, especially for the many real world tasks that have a large and growing number of intents and slot types. While zero shot learning approaches that require no labeled examples -- only features and auxiliary information -- have been proposed only for slot labeling, we show that one can profitably perform joint zero-shot intent classification and slot labeling. We demonstrate the value of capturing dependencies between intents and slots, and between different slots in an utterance in the zero shot setting. We describe NN architectures that translate between word and sentence embedding spaces, and demonstrate that these modifications are required to enable zero shot learning for this task. We show a substantial improvement over strong baselines and explain the intuition behind each architectural modification through visualizations and ablation studies.

Via

Access Paper or Ask Questions

Unsupervised Neural Stylistic Text Generation using Transfer learning and Adapters

Oct 07, 2022

Vinayshekhar Bannihatti Kumar, Rashmi Gangadharaiah, Dan Roth

Figure 1 for Unsupervised Neural Stylistic Text Generation using Transfer learning and Adapters

Figure 2 for Unsupervised Neural Stylistic Text Generation using Transfer learning and Adapters

Figure 3 for Unsupervised Neural Stylistic Text Generation using Transfer learning and Adapters

Figure 4 for Unsupervised Neural Stylistic Text Generation using Transfer learning and Adapters

Abstract:Research has shown that personality is a key driver to improve engagement and user experience in conversational systems. Conversational agents should also maintain a consistent persona to have an engaging conversation with a user. However, text generation datasets are often crowd sourced and thereby have an averaging effect where the style of the generation model is an average style of all the crowd workers that have contributed to the dataset. While one can collect persona-specific datasets for each task, it would be an expensive and time consuming annotation effort. In this work, we propose a novel transfer learning framework which updates only $0.3\%$ of model parameters to learn style specific attributes for response generation. For the purpose of this study, we tackle the problem of stylistic story ending generation using the ROC stories Corpus. We learn style specific attributes from the PERSONALITY-CAPTIONS dataset. Through extensive experiments and evaluation metrics we show that our novel training procedure can improve the style generation by 200 over Encoder-Decoder baselines while maintaining on-par content relevance metrics with

Via

Access Paper or Ask Questions

Achieving Fluency and Coherency in Task-oriented Dialog

Apr 11, 2018

Rashmi Gangadharaiah, Balakrishnan Narayanaswamy, Charles Elkan

Figure 1 for Achieving Fluency and Coherency in Task-oriented Dialog

Figure 2 for Achieving Fluency and Coherency in Task-oriented Dialog

Figure 3 for Achieving Fluency and Coherency in Task-oriented Dialog

Figure 4 for Achieving Fluency and Coherency in Task-oriented Dialog

Abstract:We consider real world task-oriented dialog settings, where agents need to generate both fluent natural language responses and correct external actions like database queries and updates. We demonstrate that, when applied to customer support chat transcripts, Sequence to Sequence (Seq2Seq) models often generate short, incoherent and ungrammatical natural language responses that are dominated by words that occur with high frequency in the training data. These phenomena do not arise in synthetic datasets such as bAbI, where we show Seq2Seq models are nearly perfect. We develop techniques to learn embeddings that succinctly capture relevant information from the dialog history, and demonstrate that nearest neighbor based approaches in this learned neural embedding space generate more fluent responses. However, we see that these methods are not able to accurately predict when to execute an external action. We show how to combine nearest neighbor and Seq2Seq methods in a hybrid model, where nearest neighbor is used to generate fluent responses and Seq2Seq type models ensure dialog coherency and generate accurate external actions. We show that this approach is well suited for customer support scenarios, where agents' responses are typically script-driven, and correct external actions are critically important. The hybrid model on the customer support data achieves a 78% relative improvement in fluency scores, and a 130% improvement in accuracy of external calls.

* Workshop on Conversational AI, NIPS 2017, Long Beach, CA, USA

Via

Access Paper or Ask Questions

Exploring the Role of Logically Related Non-Question Phrases for Answering Why-Questions

Mar 29, 2013

Niraj Kumar, Rashmi Gangadharaiah, Kannan Srinathan, Vasudeva Varma

Figure 1 for Exploring the Role of Logically Related Non-Question Phrases for Answering Why-Questions

Figure 2 for Exploring the Role of Logically Related Non-Question Phrases for Answering Why-Questions

Figure 3 for Exploring the Role of Logically Related Non-Question Phrases for Answering Why-Questions

Figure 4 for Exploring the Role of Logically Related Non-Question Phrases for Answering Why-Questions

Abstract:In this paper, we show that certain phrases although not present in a given question/query, play a very important role in answering the question. Exploring the role of such phrases in answering questions not only reduces the dependency on matching question phrases for extracting answers, but also improves the quality of the extracted answers. Here matching question phrases means phrases which co-occur in given question and candidate answers. To achieve the above discussed goal, we introduce a bigram-based word graph model populated with semantic and topical relatedness of terms in the given document. Next, we apply an improved version of ranking with a prior-based approach, which ranks all words in the candidate document with respect to a set of root words (i.e. non-stopwords present in the question and in the candidate document). As a result, terms logically related to the root words are scored higher than terms that are not related to the root words. Experimental results show that our devised system performs better than state-of-the-art for the task of answering Why-questions.

* Got accepted in NLDB-2013; as Paper ID: 23; Title: "Exploring the Role of Logically Related Non-Question Phrases for Answering Why-Questions", Withdrawn

Via

Access Paper or Ask Questions