Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Minh Phan

Knowledge-Aware Language Model Pretraining

Jun 29, 2020

Corby Rosset, Chenyan Xiong, Minh Phan, Xia Song, Paul Bennett, Saurabh Tiwary

Figure 1 for Knowledge-Aware Language Model Pretraining

Figure 2 for Knowledge-Aware Language Model Pretraining

Figure 3 for Knowledge-Aware Language Model Pretraining

Figure 4 for Knowledge-Aware Language Model Pretraining

Abstract:How much knowledge do pretrained language models hold? Recent research observed that pretrained transformers are adept at modeling semantics but it is unclear to what degree they grasp human knowledge, or how to ensure they do so. In this paper we incorporate knowledge-awareness in language model pretraining without changing the transformer architecture, inserting explicit knowledge layers, or adding external storage of semantic information. Rather, we simply signal the existence of entities to the input of the transformer in pretraining, with an entity-extended tokenizer; and at the output, with an additional entity prediction task. Our experiments show that solely by adding these entity signals in pretraining, significantly more knowledge is packed into the transformer parameters: we observe improved language modeling accuracy, factual correctness in LAMA knowledge probing tasks, and semantics in the hidden representations through edge probing.We also show that our knowledge-aware language model (KALM) can serve as a drop-in replacement for GPT-2 models, significantly improving downstream tasks like zero-shot question-answering with no task-related training.

* NeurIPS 2020, somehow the Arxiv latex processing made it over 8 pages -- it wasn't

Via

Access Paper or Ask Questions