Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Rajendra Ugrani

Iterative Data Programming for Expanding Text Classification Corpora

Feb 04, 2020

Neil Mallinar, Abhishek Shah, Tin Kam Ho, Rajendra Ugrani, Ayush Gupta

Figure 1 for Iterative Data Programming for Expanding Text Classification Corpora

Figure 2 for Iterative Data Programming for Expanding Text Classification Corpora

Figure 3 for Iterative Data Programming for Expanding Text Classification Corpora

Figure 4 for Iterative Data Programming for Expanding Text Classification Corpora

Abstract:Real-world text classification tasks often require many labeled training examples that are expensive to obtain. Recent advancements in machine teaching, specifically the data programming paradigm, facilitate the creation of training data sets quickly via a general framework for building weak models, also known as labeling functions, and denoising them through ensemble learning techniques. We present a fast, simple data programming method for augmenting text data sets by generating neighborhood-based weak models with minimal supervision. Furthermore, our method employs an iterative procedure to identify sparsely distributed examples from large volumes of unlabeled data. The iterative data programming techniques improve newer weak models as more labeled data is confirmed with human-in-loop. We show empirical results on sentence classification tasks, including those from a task of improving intent recognition in conversational agents.

* 6 pages, 2 figures, In Proceedings of the AAAI Conference on Artificial Intelligence 2020 (IAAI Technical Track: Emerging Papers)

Via

Access Paper or Ask Questions

Bootstrapping Conversational Agents With Weak Supervision

Dec 14, 2018

Neil Mallinar, Abhishek Shah, Rajendra Ugrani, Ayush Gupta, Manikandan Gurusankar, Tin Kam Ho, Q. Vera Liao, Yunfeng Zhang, Rachel K. E. Bellamy, Robert Yates(+2 more)

Figure 1 for Bootstrapping Conversational Agents With Weak Supervision

Figure 2 for Bootstrapping Conversational Agents With Weak Supervision

Figure 3 for Bootstrapping Conversational Agents With Weak Supervision

Figure 4 for Bootstrapping Conversational Agents With Weak Supervision

Abstract:Many conversational agents in the market today follow a standard bot development framework which requires training intent classifiers to recognize user input. The need to create a proper set of training examples is often the bottleneck in the development process. In many occasions agent developers have access to historical chat logs that can provide a good quantity as well as coverage of training examples. However, the cost of labeling them with tens to hundreds of intents often prohibits taking full advantage of these chat logs. In this paper, we present a framework called \textit{search, label, and propagate} (SLP) for bootstrapping intents from existing chat logs using weak supervision. The framework reduces hours to days of labeling effort down to minutes of work by using a search engine to find examples, then relies on a data programming approach to automatically expand the labels. We report on a user study that shows positive user feedback for this new approach to build conversational agents, and demonstrates the effectiveness of using data programming for auto-labeling. While the system is developed for training conversational agents, the framework has broader application in significantly reducing labeling effort for training text classifiers.

* 6 pages, 3 figures, 1 table, Accepted for publication in IAAI 2019

Via

Access Paper or Ask Questions