Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Steve DeNeefe

Domain adapted machine translation: What does catastrophic forgetting forget and why?

Dec 23, 2024

Danielle Saunders, Steve DeNeefe

Figure 1 for Domain adapted machine translation: What does catastrophic forgetting forget and why?

Figure 2 for Domain adapted machine translation: What does catastrophic forgetting forget and why?

Figure 3 for Domain adapted machine translation: What does catastrophic forgetting forget and why?

Figure 4 for Domain adapted machine translation: What does catastrophic forgetting forget and why?

Abstract:Neural Machine Translation (NMT) models can be specialized by domain adaptation, often involving fine-tuning on a dataset of interest. This process risks catastrophic forgetting: rapid loss of generic translation quality. Forgetting has been widely observed, with many mitigation methods proposed. However, the causes of forgetting and the relationship between forgetting and adaptation data are under-explored. This paper takes a novel approach to understanding catastrophic forgetting during NMT adaptation by investigating the impact of the data. We provide a first investigation of what is forgotten, and why. We examine the relationship between forgetting and the in-domain data, and show that the amount and type of forgetting is linked to that data's target vocabulary coverage. Our findings pave the way toward better informed NMT domain adaptation.

* EMNLP 2024

Via

Access Paper or Ask Questions

AbLit: A Resource for Analyzing and Generating Abridged Versions of English Literature

Feb 13, 2023

Melissa Roemmele, Kyle Shaffer, Katrina Olsen, Yiyi Wang, Steve DeNeefe

Figure 1 for AbLit: A Resource for Analyzing and Generating Abridged Versions of English Literature

Figure 2 for AbLit: A Resource for Analyzing and Generating Abridged Versions of English Literature

Figure 3 for AbLit: A Resource for Analyzing and Generating Abridged Versions of English Literature

Figure 4 for AbLit: A Resource for Analyzing and Generating Abridged Versions of English Literature

Abstract:Creating an abridged version of a text involves shortening it while maintaining its linguistic qualities. In this paper, we examine this task from an NLP perspective for the first time. We present a new resource, AbLit, which is derived from abridged versions of English literature books. The dataset captures passage-level alignments between the original and abridged texts. We characterize the linguistic relations of these alignments, and create automated models to predict these relations as well as to generate abridgements for new texts. Our findings establish abridgement as a challenging task, motivating future resources and research. The dataset is available at github.com/roemmele/AbLit.

* Accepted at EACL 2023

Via

Access Paper or Ask Questions

AnswerQuest: A System for Generating Question-Answer Items from Multi-Paragraph Documents

Mar 05, 2021

Melissa Roemmele, Deep Sidhpura, Steve DeNeefe, Ling Tsou

Figure 1 for AnswerQuest: A System for Generating Question-Answer Items from Multi-Paragraph Documents

Figure 2 for AnswerQuest: A System for Generating Question-Answer Items from Multi-Paragraph Documents

Figure 3 for AnswerQuest: A System for Generating Question-Answer Items from Multi-Paragraph Documents

Figure 4 for AnswerQuest: A System for Generating Question-Answer Items from Multi-Paragraph Documents

Abstract:One strategy for facilitating reading comprehension is to present information in a question-and-answer format. We demo a system that integrates the tasks of question answering (QA) and question generation (QG) in order to produce Q&A items that convey the content of multi-paragraph documents. We report some experiments for QA and QG that yield improvements on both tasks, and assess how they interact to produce a list of Q&A items for a text. The demo is accessible at qna.sdl.com.

* Accepted at demo track of EACL 2021

Via

Access Paper or Ask Questions