Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yi-Ting Yeh

Improving Zero and Few-shot Generalization in Dialogue through Instruction Tuning

May 25, 2022

Prakhar Gupta, Cathy Jiao, Yi-Ting Yeh, Shikib Mehri, Maxine Eskenazi, Jeffrey P. Bigham

Figure 1 for Improving Zero and Few-shot Generalization in Dialogue through Instruction Tuning

Figure 2 for Improving Zero and Few-shot Generalization in Dialogue through Instruction Tuning

Figure 3 for Improving Zero and Few-shot Generalization in Dialogue through Instruction Tuning

Figure 4 for Improving Zero and Few-shot Generalization in Dialogue through Instruction Tuning

Abstract:Instruction tuning is an emergent paradigm in NLP wherein natural language instructions are leveraged with language models to induce zero-shot performance on unseen tasks. Instructions have been shown to enable good performance on unseen tasks and datasets in both large and small language models. Dialogue is an especially interesting area to explore instruction tuning because dialogue systems perform multiple kinds of tasks related to language (e.g., natural language understanding and generation, domain-specific interaction), yet instruction tuning has not been systematically explored for dialogue-related tasks. We introduce InstructDial, an instruction tuning framework for dialogue, which consists of a repository of 48 diverse dialogue tasks in a unified text-to-text format created from 59 openly available dialogue datasets. Next, we explore cross-task generalization ability on models tuned on InstructDial across diverse dialogue tasks. Our analysis reveals that InstructDial enables good zero-shot performance on unseen datasets and tasks such as dialogue evaluation and intent detection, and even better performance in a few-shot setting. To ensure that models adhere to instructions, we introduce novel meta-tasks. We establish benchmark zero-shot and few-shot performance of models trained using the proposed framework on multiple dialogue tasks.

Via

Access Paper or Ask Questions

Report from the NSF Future Directions Workshop on Automatic Evaluation of Dialog: Research Directions and Challenges

Mar 18, 2022

Shikib Mehri, Jinho Choi, Luis Fernando D'Haro, Jan Deriu, Maxine Eskenazi, Milica Gasic, Kallirroi Georgila, Dilek Hakkani-Tur, Zekang Li, Verena Rieser(+6 more)

Figure 1 for Report from the NSF Future Directions Workshop on Automatic Evaluation of Dialog: Research Directions and Challenges

Figure 2 for Report from the NSF Future Directions Workshop on Automatic Evaluation of Dialog: Research Directions and Challenges

Abstract:This is a report on the NSF Future Directions Workshop on Automatic Evaluation of Dialog. The workshop explored the current state of the art along with its limitations and suggested promising directions for future work in this important and very rapidly changing area of research.

* Report from the NSF AED Workshop (http://dialrc.org/AED/)

Via

Access Paper or Ask Questions

Breaking Down Multilingual Machine Translation

Oct 15, 2021

Ting-Rui Chiang, Yi-Pei Chen, Yi-Ting Yeh, Graham Neubig

Figure 1 for Breaking Down Multilingual Machine Translation

Figure 2 for Breaking Down Multilingual Machine Translation

Figure 3 for Breaking Down Multilingual Machine Translation

Figure 4 for Breaking Down Multilingual Machine Translation

Abstract:While multilingual training is now an essential ingredient in machine translation (MT) systems, recent work has demonstrated that it has different effects in different multilingual settings, such as many-to-one, one-to-many, and many-to-many learning. These training settings expose the encoder and the decoder in a machine translation model with different data distributions. In this paper, we examine how different varieties of multilingual training contribute to learning these two components of the MT model. Specifically, we compare bilingual models with encoders and/or decoders initialized by multilingual training. We show that multilingual training is beneficial to encoders in general, while it only benefits decoders for low-resource languages (LRLs). We further find the important attention heads for each language pair and compare their correlations during inference. Our analysis sheds light on how multilingual translation models work and also enables us to propose methods to improve performance by training with highly related languages. Our many-to-one models for high-resource languages and one-to-many models for LRL outperform the best results reported by Aharoni et al. (2019).

Via

Access Paper or Ask Questions

Are you doing what I say? On modalities alignment in ALFRED

Oct 12, 2021

Ting-Rui Chiang, Yi-Ting Yeh, Ta-Chung Chi, Yau-Shian Wang

Figure 1 for Are you doing what I say? On modalities alignment in ALFRED

Figure 2 for Are you doing what I say? On modalities alignment in ALFRED

Figure 3 for Are you doing what I say? On modalities alignment in ALFRED

Figure 4 for Are you doing what I say? On modalities alignment in ALFRED

Abstract:ALFRED is a recently proposed benchmark that requires a model to complete tasks in simulated house environments specified by instructions in natural language. We hypothesize that key to success is accurately aligning the text modality with visual inputs. Motivated by this, we inspect how well existing models can align these modalities using our proposed intrinsic metric, boundary adherence score (BAS). The results show the previous models are indeed failing to perform proper alignment. To address this issue, we introduce approaches aimed at improving model alignment and demonstrate how improved alignment, improves end task performance.

* Accepted by Novel Ideas in Learning-to-Learn through Interaction at EMNLP 2021

Via

Access Paper or Ask Questions

Improving Dialogue State Tracking by Joint Slot Modeling

Sep 29, 2021

Ting-Rui Chiang, Yi-Ting Yeh

Figure 1 for Improving Dialogue State Tracking by Joint Slot Modeling

Figure 2 for Improving Dialogue State Tracking by Joint Slot Modeling

Figure 3 for Improving Dialogue State Tracking by Joint Slot Modeling

Figure 4 for Improving Dialogue State Tracking by Joint Slot Modeling

Abstract:Dialogue state tracking models play an important role in a task-oriented dialogue system. However, most of them model the slot types conditionally independently given the input. We discover that it may cause the model to be confused by slot types that share the same data type. To mitigate this issue, we propose TripPy-MRF and TripPy-LSTM that models the slots jointly. Our results show that they are able to alleviate the confusion mentioned above, and they push the state-of-the-art on dataset MultiWoZ 2.1 from 58.7 to 61.3. Our implementation is available at https://github.com/CTinRay/Trippy-Joint.

* Accepted to the 3rd Workshop on NLP for ConvAI in EMNLP 2021

Via

Access Paper or Ask Questions

A Comprehensive Assessment of Dialog Evaluation Metrics

Jun 30, 2021

Yi-Ting Yeh, Maxine Eskenazi, Shikib Mehri

Figure 1 for A Comprehensive Assessment of Dialog Evaluation Metrics

Figure 2 for A Comprehensive Assessment of Dialog Evaluation Metrics

Figure 3 for A Comprehensive Assessment of Dialog Evaluation Metrics

Figure 4 for A Comprehensive Assessment of Dialog Evaluation Metrics

Abstract:Automatic evaluation metrics are a crucial component of dialog systems research. Standard language evaluation metrics are known to be ineffective for evaluating dialog. As such, recent research has proposed a number of novel, dialog-specific metrics that correlate better with human judgements. Due to the fast pace of research, many of these metrics have been assessed on different datasets and there has as yet been no time for a systematic comparison between them. To this end, this paper provides a comprehensive assessment of recently proposed dialog evaluation metrics on a number of datasets. In this paper, 17 different automatic evaluation metrics are evaluated on 10 different datasets. Furthermore, the metrics are assessed in different settings, to better qualify their respective strengths and weaknesses. Metrics are assessed (1) on both the turn level and the dialog level, (2) for different dialog lengths, (3) for different dialog qualities (e.g., coherence, engaging), (4) for different types of response generation models (i.e., generative, retrieval, simple models and state-of-the-art models), (5) taking into account the similarity of different metrics and (6) exploring combinations of different metrics. This comprehensive assessment offers several takeaways pertaining to dialog evaluation metrics in general. It also suggests how to best assess evaluation metrics and indicates promising directions for future work.

Via

Access Paper or Ask Questions

QAInfomax: Learning Robust Question Answering System by Mutual Information Maximization

Aug 31, 2019

Yi-Ting Yeh, Yun-Nung Chen

Figure 1 for QAInfomax: Learning Robust Question Answering System by Mutual Information Maximization

Figure 2 for QAInfomax: Learning Robust Question Answering System by Mutual Information Maximization

Figure 3 for QAInfomax: Learning Robust Question Answering System by Mutual Information Maximization

Figure 4 for QAInfomax: Learning Robust Question Answering System by Mutual Information Maximization

Abstract:Standard accuracy metrics indicate that modern reading comprehension systems have achieved strong performance in many question answering datasets. However, the extent these systems truly understand language remains unknown, and existing systems are not good at distinguishing distractor sentences, which look related but do not actually answer the question. To address this problem, we propose QAInfomax as a regularizer in reading comprehension systems by maximizing mutual information among passages, a question, and its answer. QAInfomax helps regularize the model to not simply learn the superficial correlation for answering questions. The experiments show that our proposed QAInfomax achieves the state-of-the-art performance on the benchmark Adversarial-SQuAD dataset.

* EMNLP 2019 short paper

Via

Access Paper or Ask Questions

FlowDelta: Modeling Flow Information Gain in Reasoning for Conversational Machine Comprehension

Aug 14, 2019

Yi-Ting Yeh, Yun-Nung Chen

Figure 1 for FlowDelta: Modeling Flow Information Gain in Reasoning for Conversational Machine Comprehension

Figure 2 for FlowDelta: Modeling Flow Information Gain in Reasoning for Conversational Machine Comprehension

Figure 3 for FlowDelta: Modeling Flow Information Gain in Reasoning for Conversational Machine Comprehension

Figure 4 for FlowDelta: Modeling Flow Information Gain in Reasoning for Conversational Machine Comprehension

Abstract:Conversational machine comprehension requires deep understanding of the dialogue flow, and the prior work proposed FlowQA to implicitly model the context representations in reasoning for better understanding. This paper proposes to explicitly model the information gain through dialogue reasoning in order to allow the model to focus on more informative cues. The proposed model achieves state-of-the-art performance in a conversational QA dataset QuAC and sequential instruction understanding dataset SCONE, which shows the effectiveness of the proposed mechanism and demonstrates its capability of generalization to different QA models and tasks.

* Accepted by 1st Workshop NLP for Conversational AI, ACL 2019 Workshop (Cross-submission)

Via

Access Paper or Ask Questions

Reactive Multi-Stage Feature Fusion for Multimodal Dialogue Modeling

Aug 14, 2019

Yi-Ting Yeh, Tzu-Chuan Lin, Hsiao-Hua Cheng, Yu-Hsuan Deng, Shang-Yu Su, Yun-Nung Chen

Figure 1 for Reactive Multi-Stage Feature Fusion for Multimodal Dialogue Modeling

Figure 2 for Reactive Multi-Stage Feature Fusion for Multimodal Dialogue Modeling

Figure 3 for Reactive Multi-Stage Feature Fusion for Multimodal Dialogue Modeling

Figure 4 for Reactive Multi-Stage Feature Fusion for Multimodal Dialogue Modeling

Abstract:Visual question answering and visual dialogue tasks have been increasingly studied in the multimodal field towards more practical real-world scenarios. A more challenging task, audio visual scene-aware dialogue (AVSD), is proposed to further advance the technologies that connect audio, vision, and language, which introduces temporal video information and dialogue interactions between a questioner and an answerer. This paper proposes an intuitive mechanism that fuses features and attention in multiple stages in order to well integrate multimodal features, and the results demonstrate its capability in the experiments. Also, we apply several state-of-the-art models in other tasks to the AVSD task, and further analyze their generalization across different tasks.

* Accepted for a poster session at the DSTC7 workshop at AAAI 2019

Via

Access Paper or Ask Questions

Natural Language Generation by Hierarchical Decoding with Linguistic Patterns

Aug 09, 2018

Shang-Yu Su, Kai-Ling Lo, Yi-Ting Yeh, Yun-Nung Chen

Figure 1 for Natural Language Generation by Hierarchical Decoding with Linguistic Patterns

Figure 2 for Natural Language Generation by Hierarchical Decoding with Linguistic Patterns

Abstract:Natural language generation (NLG) is a critical component in spoken dialogue systems. Classic NLG can be divided into two phases: (1) sentence planning: deciding on the overall sentence structure, (2) surface realization: determining specific word forms and flattening the sentence structure into a string. Many simple NLG models are based on recurrent neural networks (RNN) and sequence-to-sequence (seq2seq) model, which basically contains an encoder-decoder structure; these NLG models generate sentences from scratch by jointly optimizing sentence planning and surface realization using a simple cross entropy loss training criterion. However, the simple encoder-decoder architecture usually suffers from generating complex and long sentences, because the decoder has to learn all grammar and diction knowledge. This paper introduces a hierarchical decoding NLG model based on linguistic patterns in different levels, and shows that the proposed method outperforms the traditional one with a smaller model size. Furthermore, the design of the hierarchical decoding is flexible and easily-extensible in various NLG systems.

* Published in NAACL-HLT 2018, the first two authors have equal contributions

Via

Access Paper or Ask Questions