Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mukund Sridhar

Low-Resource Compositional Semantic Parsing with Concept Pretraining

Jan 30, 2023

Subendhu Rongali, Mukund Sridhar, Haidar Khan, Konstantine Arkoudas, Wael Hamza, Andrew McCallum

Abstract:Semantic parsing plays a key role in digital voice assistants such as Alexa, Siri, and Google Assistant by mapping natural language to structured meaning representations. When we want to improve the capabilities of a voice assistant by adding a new domain, the underlying semantic parsing model needs to be retrained using thousands of annotated examples from the new domain, which is time-consuming and expensive. In this work, we present an architecture to perform such domain adaptation automatically, with only a small amount of metadata about the new domain and without any new training data (zero-shot) or with very few examples (few-shot). We use a base seq2seq (sequence-to-sequence) architecture and augment it with a concept encoder that encodes intent and slot tags from the new domain. We also introduce a novel decoder-focused approach to pretrain seq2seq models to be concept aware using Wikidata and use it to help our model learn important concepts and perform well in low-resource settings. We report few-shot and zero-shot results for compositional semantic parsing on the TOPv2 dataset and show that our model outperforms prior approaches in few-shot settings for the TOPv2 and SNIPS datasets.

* EACL 2023

Via

Access Paper or Ask Questions

AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model

Aug 03, 2022

Saleh Soltan, Shankar Ananthakrishnan, Jack FitzGerald, Rahul Gupta, Wael Hamza, Haidar Khan, Charith Peris, Stephen Rawls, Andy Rosenbaum, Anna Rumshisky(+6 more)

Figure 1 for AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model

Figure 2 for AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model

Figure 3 for AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model

Figure 4 for AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model

Abstract:In this work, we demonstrate that multilingual large-scale sequence-to-sequence (seq2seq) models, pre-trained on a mixture of denoising and Causal Language Modeling (CLM) tasks, are more efficient few-shot learners than decoder-only models on various tasks. In particular, we train a 20 billion parameter multilingual seq2seq model called Alexa Teacher Model (AlexaTM 20B) and show that it achieves state-of-the-art (SOTA) performance on 1-shot summarization tasks, outperforming a much larger 540B PaLM decoder model. AlexaTM 20B also achieves SOTA in 1-shot machine translation, especially for low-resource languages, across almost all language pairs supported by the model (Arabic, English, French, German, Hindi, Italian, Japanese, Marathi, Portuguese, Spanish, Tamil, and Telugu) on Flores-101 dataset. We also show in zero-shot setting, AlexaTM 20B outperforms GPT3 (175B) on SuperGLUE and SQuADv2 datasets and provides SOTA performance on multilingual tasks such as XNLI, XCOPA, Paws-X, and XWinograd. Overall, our results present a compelling case for seq2seq models as a powerful alternative to decoder-only models for Large-scale Language Model (LLM) training.

Via

Access Paper or Ask Questions

Instilling Type Knowledge in Language Models via Multi-Task QA

Apr 28, 2022

Shuyang Li, Mukund Sridhar, Chandana Satya Prakash, Jin Cao, Wael Hamza, Julian McAuley

Figure 1 for Instilling Type Knowledge in Language Models via Multi-Task QA

Figure 2 for Instilling Type Knowledge in Language Models via Multi-Task QA

Figure 3 for Instilling Type Knowledge in Language Models via Multi-Task QA

Figure 4 for Instilling Type Knowledge in Language Models via Multi-Task QA

Abstract:Understanding human language often necessitates understanding entities and their place in a taxonomy of knowledge -- their types. Previous methods to learn entity types rely on training classifiers on datasets with coarse, noisy, and incomplete labels. We introduce a method to instill fine-grained type knowledge in language models with text-to-text pre-training on type-centric questions leveraging knowledge base documents and knowledge graphs. We create the WikiWiki dataset: entities and passages from 10M Wikipedia articles linked to the Wikidata knowledge graph with 41K types. Models trained on WikiWiki achieve state-of-the-art performance in zero-shot dialog state tracking benchmarks, accurately infer entity types in Wikipedia articles, and can discover new types deemed useful by human judges.

* Findings of NAACL 2022; dataset link: https://github.com/amazon-research/wikiwiki-dataset

Via

Access Paper or Ask Questions

Towards Realistic Single-Task Continuous Learning Research for NER

Oct 27, 2021

Justin Payan, Yuval Merhav, He Xie, Satyapriya Krishna, Anil Ramakrishna, Mukund Sridhar, Rahul Gupta

Figure 1 for Towards Realistic Single-Task Continuous Learning Research for NER

Figure 2 for Towards Realistic Single-Task Continuous Learning Research for NER

Figure 3 for Towards Realistic Single-Task Continuous Learning Research for NER

Figure 4 for Towards Realistic Single-Task Continuous Learning Research for NER

Abstract:There is an increasing interest in continuous learning (CL), as data privacy is becoming a priority for real-world machine learning applications. Meanwhile, there is still a lack of academic NLP benchmarks that are applicable for realistic CL settings, which is a major challenge for the advancement of the field. In this paper we discuss some of the unrealistic data characteristics of public datasets, study the challenges of realistic single-task continuous learning as well as the effectiveness of data rehearsal as a way to mitigate accuracy loss. We construct a CL NER dataset from an existing publicly available dataset and release it along with the code to the research community.

* 11 pages, 2 figures, Findings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP) (short paper), November 2021

Via

Access Paper or Ask Questions

Zero-shot Generalization in Dialog State Tracking through Generative Question Answering

Jan 20, 2021

Shuyang Li, Jin Cao, Mukund Sridhar, Henghui Zhu, Shang-Wen Li, Wael Hamza, Julian McAuley

Figure 1 for Zero-shot Generalization in Dialog State Tracking through Generative Question Answering

Figure 2 for Zero-shot Generalization in Dialog State Tracking through Generative Question Answering

Figure 3 for Zero-shot Generalization in Dialog State Tracking through Generative Question Answering

Figure 4 for Zero-shot Generalization in Dialog State Tracking through Generative Question Answering

Abstract:Dialog State Tracking (DST), an integral part of modern dialog systems, aims to track user preferences and constraints (slots) in task-oriented dialogs. In real-world settings with constantly changing services, DST systems must generalize to new domains and unseen slot types. Existing methods for DST do not generalize well to new slot names and many require known ontologies of slot types and values for inference. We introduce a novel ontology-free framework that supports natural language queries for unseen constraints and slots in multi-domain task-oriented dialogs. Our approach is based on generative question-answering using a conditional language model pre-trained on substantive English sentences. Our model improves joint goal accuracy in zero-shot domain adaptation settings by up to 9% (absolute) over the previous state-of-the-art on the MultiWOZ 2.1 dataset.

* Accepted as a Long Paper at EACL 2021

Via

Access Paper or Ask Questions

Automatic Discovery of Novel Intents & Domains from Text Utterances

May 22, 2020

Nikhita Vedula, Rahul Gupta, Aman Alok, Mukund Sridhar

Figure 1 for Automatic Discovery of Novel Intents & Domains from Text Utterances

Figure 2 for Automatic Discovery of Novel Intents & Domains from Text Utterances

Figure 3 for Automatic Discovery of Novel Intents & Domains from Text Utterances

Figure 4 for Automatic Discovery of Novel Intents & Domains from Text Utterances

Abstract:One of the primary tasks in Natural Language Understanding (NLU) is to recognize the intents as well as domains of users' spoken and written language utterances. Most existing research formulates this as a supervised classification problem with a closed-world assumption, i.e. the domains or intents to be identified are pre-defined or known beforehand. Real-world applications however increasingly encounter dynamic, rapidly evolving environments with newly emerging intents and domains, about which no information is known during model training. We propose a novel framework, ADVIN, to automatically discover novel domains and intents from large volumes of unlabeled data. We first employ an open classification model to identify all utterances potentially consisting of a novel intent. Next, we build a knowledge transfer component with a pairwise margin loss function. It learns discriminative deep features to group together utterances and discover multiple latent intent categories within them in an unsupervised manner. We finally hierarchically link mutually related intents into domains, forming an intent-domain taxonomy. ADVIN significantly outperforms baselines on three benchmark datasets, and real user utterances from a commercial voice-powered agent.

* 11 pages, 5 figures

Via

Access Paper or Ask Questions