Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sebastian Martschat

Multi-Source (Pre-)Training for Cross-Domain Measurement, Unit and Context Extraction

Aug 05, 2023

Yueling Li, Sebastian Martschat, Simone Paolo Ponzetto

Abstract:We present a cross-domain approach for automated measurement and context extraction based on pre-trained language models. We construct a multi-source, multi-domain corpus and train an end-to-end extraction pipeline. We then apply multi-source task-adaptive pre-training and fine-tuning to benchmark the cross-domain generalization capability of our model. Further, we conceptualize and apply a task-specific error analysis and derive insights for future work. Our results suggest that multi-source training leads to the best overall results, while single-source training yields the best results for the respective individual domain. While our setup is successful at extracting quantity values and units, more research is needed to improve the extraction of contextual entities. We make the cross-domain corpus used in this work available online.

* Published as a workshop paper at BioNLP 2023

Via

Access Paper or Ask Questions

A Temporally Sensitive Submodularity Framework for Timeline Summarization

Oct 18, 2018

Sebastian Martschat, Katja Markert

Figure 1 for A Temporally Sensitive Submodularity Framework for Timeline Summarization

Figure 2 for A Temporally Sensitive Submodularity Framework for Timeline Summarization

Figure 3 for A Temporally Sensitive Submodularity Framework for Timeline Summarization

Figure 4 for A Temporally Sensitive Submodularity Framework for Timeline Summarization

Abstract:Timeline summarization (TLS) creates an overview of long-running events via dated daily summaries for the most important dates. TLS differs from standard multi-document summarization (MDS) in the importance of date selection, interdependencies between summaries of different dates and by having very short summaries compared to the number of corpus documents. However, we show that MDS optimization models using submodular functions can be adapted to yield well-performing TLS models by designing objective functions and constraints that model the temporal dimension inherent in TLS. Importantly, these adaptations retain the elegance and advantages of the original MDS models (clear separation of features and inference, performance guarantees and scalability, little need for supervision) that current TLS-specific models lack. An open-source implementation of the framework and all models described in this paper is available online.

* To appear at CoNLL 2018

Via

Access Paper or Ask Questions

Dynamic Entity Representations in Neural Language Models

Aug 02, 2017

Yangfeng Ji, Chenhao Tan, Sebastian Martschat, Yejin Choi, Noah A. Smith

Figure 1 for Dynamic Entity Representations in Neural Language Models

Figure 2 for Dynamic Entity Representations in Neural Language Models

Figure 3 for Dynamic Entity Representations in Neural Language Models

Figure 4 for Dynamic Entity Representations in Neural Language Models

Abstract:Understanding a long document requires tracking how entities are introduced and evolve over time. We present a new type of language model, EntityNLM, that can explicitly model entities, dynamically update their representations, and contextually generate their mentions. Our model is generative and flexible; it can model an arbitrary number of entities in context while generating each entity mention at an arbitrary length. In addition, it can be used for several different tasks such as language modeling, coreference resolution, and entity prediction. Experimental results with all these tasks demonstrate that our model consistently outperforms strong baselines and prior work.

* EMNLP 2017 camera-ready version

Via

Access Paper or Ask Questions