Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Paweł Krzysztof Nowak

Improving Compositional Generalization with Latent Structure and Data Augmentation

Dec 14, 2021

Linlu Qiu, Peter Shaw, Panupong Pasupat, Paweł Krzysztof Nowak, Tal Linzen, Fei Sha, Kristina Toutanova

Figure 1 for Improving Compositional Generalization with Latent Structure and Data Augmentation

Figure 2 for Improving Compositional Generalization with Latent Structure and Data Augmentation

Figure 3 for Improving Compositional Generalization with Latent Structure and Data Augmentation

Figure 4 for Improving Compositional Generalization with Latent Structure and Data Augmentation

Abstract:Generic unstructured neural networks have been shown to struggle on out-of-distribution compositional generalization. Compositional data augmentation via example recombination has transferred some prior knowledge about compositionality to such black-box neural models for several semantic parsing tasks, but this often required task-specific engineering or provided limited gains. We present a more powerful data recombination method using a model called Compositional Structure Learner (CSL). CSL is a generative model with a quasi-synchronous context-free grammar backbone, which we induce from the training data. We sample recombined examples from CSL and add them to the fine-tuning data of a pre-trained sequence-to-sequence model (T5). This procedure effectively transfers most of CSL's compositional bias to T5 for diagnostic tasks, and results in a model even stronger than a T5-CSL ensemble on two real world compositional generalization tasks. This results in new state-of-the-art performance for these challenging semantic parsing tasks requiring generalization to both natural language variation and novel compositions of elements.

Via

Access Paper or Ask Questions

Structured Context and High-Coverage Grammar for Conversational Question Answering over Knowledge Graphs

Sep 01, 2021

Pierre Marion, Paweł Krzysztof Nowak, Francesco Piccinno

Figure 1 for Structured Context and High-Coverage Grammar for Conversational Question Answering over Knowledge Graphs

Figure 2 for Structured Context and High-Coverage Grammar for Conversational Question Answering over Knowledge Graphs

Figure 3 for Structured Context and High-Coverage Grammar for Conversational Question Answering over Knowledge Graphs

Figure 4 for Structured Context and High-Coverage Grammar for Conversational Question Answering over Knowledge Graphs

Abstract:We tackle the problem of weakly-supervised conversational Question Answering over large Knowledge Graphs using a neural semantic parsing approach. We introduce a new Logical Form (LF) grammar that can model a wide range of queries on the graph while remaining sufficiently simple to generate supervision data efficiently. Our Transformer-based model takes a JSON-like structure as input, allowing us to easily incorporate both Knowledge Graph and conversational contexts. This structured input is transformed to lists of embeddings and then fed to standard attention layers. We validate our approach, both in terms of grammar coverage and LF execution accuracy, on two publicly available datasets, CSQA and ConvQuestions, both grounded in Wikidata. On CSQA, our approach increases the coverage from $80\%$ to $96.2\%$, and the LF execution accuracy from $70.6\%$ to $75.6\%$, with respect to previous state-of-the-art results. On ConvQuestions, we achieve competitive results with respect to the state-of-the-art.

* 16 pages, 1 figure. Accepted to EMNLP 2021

Via

Access Paper or Ask Questions

TAPAS: Weakly Supervised Table Parsing via Pre-training

Apr 21, 2020

Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno, Julian Martin Eisenschlos

Figure 1 for TAPAS: Weakly Supervised Table Parsing via Pre-training

Figure 2 for TAPAS: Weakly Supervised Table Parsing via Pre-training

Figure 3 for TAPAS: Weakly Supervised Table Parsing via Pre-training

Figure 4 for TAPAS: Weakly Supervised Table Parsing via Pre-training

Abstract:Answering natural language questions over tables is usually seen as a semantic parsing task. To alleviate the collection cost of full logical forms, one popular approach focuses on weak supervision consisting of denotations instead of logical forms. However, training semantic parsers from weak supervision poses difficulties, and in addition, the generated logical forms are only used as an intermediate step prior to retrieving the denotation. In this paper, we present TAPAS, an approach to question answering over tables without generating logical forms. TAPAS trains from weak supervision, and predicts the denotation by selecting table cells and optionally applying a corresponding aggregation operator to such selection. TAPAS extends BERT's architecture to encode tables as input, initializes from an effective joint pre-training of text segments and tables crawled from Wikipedia, and is trained end-to-end. We experiment with three different semantic parsing datasets, and find that TAPAS outperforms or rivals semantic parsing models by improving state-of-the-art accuracy on SQA from 55.1 to 67.2 and performing on par with the state-of-the-art on WIKISQL and WIKITQ, but with a simpler model architecture. We additionally find that transfer learning, which is trivial in our setting, from WIKISQL to WIKITQ, yields 48.7 accuracy, 4.2 points above the state-of-the-art.

* Accepted to ACL 2020

Via

Access Paper or Ask Questions

Stiffness: A New Perspective on Generalization in Neural Networks

Jan 28, 2019

Stanislav Fort, Paweł Krzysztof Nowak, Srini Narayanan

Figure 1 for Stiffness: A New Perspective on Generalization in Neural Networks

Figure 2 for Stiffness: A New Perspective on Generalization in Neural Networks

Figure 3 for Stiffness: A New Perspective on Generalization in Neural Networks

Figure 4 for Stiffness: A New Perspective on Generalization in Neural Networks

Abstract:We investigate neural network training and generalization using the concept of stiffness. We measure how stiff a network is by looking at how a small gradient step on one example affects the loss on another example. In particular, we study how stiffness varies with 1) class membership, 2) distance between data points (in the input space as well as in latent spaces), 3) training iteration, and 4) learning rate. We empirically study the evolution of stiffness on MNIST, FASHION MNIST, CIFAR-10 and CIFAR-100 using fully-connected and convolutional neural networks. Our results demonstrate that stiffness is a useful concept for diagnosing and characterizing generalization. We observe that small learning rates lead to initial learning of more specific features that do not translate well to improvements on inputs from all classes, whereas high learning rates initially benefit all classes at once. We measure stiffness as a function of distance between data points and observe that higher learning rates induce positive correlation between changes in loss further apart, pointing towards a regularization effect of learning rate. When training on CIFAR-100, the stiffness matrix exhibits a coarse-grained behavior suggestive of the model's awareness of super-class membership.

Via

Access Paper or Ask Questions