Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Daniel Andor

Gemini: A Family of Highly Capable Multimodal Models

Dec 19, 2023

Gemini Team, Rohan Anil, Sebastian Borgeaud, Yonghui Wu, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth(+930 more)

Abstract:This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of Gemini models in cross-modal reasoning and language understanding will enable a wide variety of use cases and we discuss our approach toward deploying them responsibly to users.

Via

Access Paper or Ask Questions

Learning to Reject with a Fixed Predictor: Application to Decontextualization

Jan 31, 2023

Christopher Mohri, Daniel Andor, Eunsol Choi, Michael Collins

Figure 1 for Learning to Reject with a Fixed Predictor: Application to Decontextualization

Figure 2 for Learning to Reject with a Fixed Predictor: Application to Decontextualization

Figure 3 for Learning to Reject with a Fixed Predictor: Application to Decontextualization

Figure 4 for Learning to Reject with a Fixed Predictor: Application to Decontextualization

Abstract:We study the problem of classification with a reject option for a fixed predictor, applicable in natural language processing. We introduce a new problem formulation for this scenario, and an algorithm minimizing a new surrogate loss function. We provide a complete theoretical analysis of the surrogate loss function with a strong $H$-consistency guarantee. For evaluation, we choose the decontextualization task, and provide a manually-labelled dataset of $2\mathord,000$ examples. Our algorithm significantly outperforms the baselines considered, with a $\sim\!\!25\%$ improvement in coverage when halving the error rate, which is only $\sim\!\! 3 \%$ away from the theoretical limit.

Via

Access Paper or Ask Questions

Attributed Question Answering: Evaluation and Modeling for Attributed Large Language Models

Dec 15, 2022

Bernd Bohnet, Vinh Q. Tran, Pat Verga, Roee Aharoni, Daniel Andor, Livio Baldini Soares, Jacob Eisenstein, Kuzman Ganchev, Jonathan Herzig, Kai Hui(+10 more)

Figure 1 for Attributed Question Answering: Evaluation and Modeling for Attributed Large Language Models

Figure 2 for Attributed Question Answering: Evaluation and Modeling for Attributed Large Language Models

Figure 3 for Attributed Question Answering: Evaluation and Modeling for Attributed Large Language Models

Figure 4 for Attributed Question Answering: Evaluation and Modeling for Attributed Large Language Models

Abstract:Large language models (LLMs) have shown impressive results across a variety of tasks while requiring little or no direct supervision. Further, there is mounting evidence that LLMs may have potential in information-seeking scenarios. We believe the ability of an LLM to attribute the text that it generates is likely to be crucial for both system developers and users in this setting. We propose and study Attributed QA as a key first step in the development of attributed LLMs. We develop a reproducable evaluation framework for the task, using human annotations as a gold standard and a correlated automatic metric that we show is suitable for development settings. We describe and benchmark a broad set of architectures for the task. Our contributions give some concrete answers to two key questions (How to measure attribution?, and How well do current state-of-the-art methods perform on attribution?), and give some hints as to how to address a third key question (How to build LLMs with attribution?).

Via

Access Paper or Ask Questions

Honest Students from Untrusted Teachers: Learning an Interpretable Question-Answering Pipeline from a Pretrained Language Model

Oct 05, 2022

Jacob Eisenstein, Daniel Andor, Bernd Bohnet, Michael Collins, David Mimno

Figure 1 for Honest Students from Untrusted Teachers: Learning an Interpretable Question-Answering Pipeline from a Pretrained Language Model

Figure 2 for Honest Students from Untrusted Teachers: Learning an Interpretable Question-Answering Pipeline from a Pretrained Language Model

Figure 3 for Honest Students from Untrusted Teachers: Learning an Interpretable Question-Answering Pipeline from a Pretrained Language Model

Figure 4 for Honest Students from Untrusted Teachers: Learning an Interpretable Question-Answering Pipeline from a Pretrained Language Model

Abstract:Explainable question answering systems should produce not only accurate answers but also rationales that justify their reasoning and allow humans to check their work. But what sorts of rationales are useful and how can we train systems to produce them? We propose a new style of rationale for open-book question answering, called \emph{markup-and-mask}, which combines aspects of extractive and free-text explanations. In the markup phase, the passage is augmented with free-text markup that enables each sentence to stand on its own outside the discourse context. In the masking phase, a sub-span of the marked-up passage is selected. To train a system to produce markup-and-mask rationales without annotations, we leverage in-context learning. Specifically, we generate silver annotated data by sending a series of prompts to a frozen pretrained language model, which acts as a teacher. We then fine-tune a smaller student model by training on the subset of rationales that led to correct answers. The student is "honest" in the sense that it is a pipeline: the rationale acts as a bottleneck between the passage and the answer, while the "untrusted" teacher operates under no such constraints. Thus, we offer a new way to build trustworthy pipeline systems from a combination of end-task annotations and frozen pretrained language models.

Via

Access Paper or Ask Questions

Scaling Up Models and Data with $\texttt{t5x}$ and $\texttt{seqio}$

Mar 31, 2022

Adam Roberts, Hyung Won Chung, Anselm Levskaya, Gaurav Mishra, James Bradbury, Daniel Andor, Sharan Narang, Brian Lester, Colin Gaffney, Afroz Mohiuddin(+33 more)

$Figure 1 for Scaling Up Models and Data with $\texttt{t5x}$ and $\texttt{seqio}$$

$Figure 2 for Scaling Up Models and Data with $\texttt{t5x}$ and $\texttt{seqio}$$

Abstract:Recent neural network-based language models have benefited greatly from scaling up the size of training datasets and the number of parameters in the models themselves. Scaling can be complicated due to various factors including the need to distribute computation on supercomputer clusters (e.g., TPUs), prevent bottlenecks when infeeding data, and ensure reproducible results. In this work, we present two software libraries that ease these issues: $\texttt{t5x}$ simplifies the process of building and training large language models at scale while maintaining ease of use, and $\texttt{seqio}$ provides a task-based API for simple creation of fast and reproducible training data and evaluation pipelines. These open-source libraries have been used to train models with hundreds of billions of parameters on datasets with multiple terabytes of training data. Along with the libraries, we release configurations and instructions for T5-like encoder-decoder models as well as GPT-like decoder-only architectures. $\texttt{t5x}$ and $\texttt{seqio}$ are open source and available at https://github.com/google-research/t5x and https://github.com/google/seqio, respectively.

Via

Access Paper or Ask Questions

QED: A Framework and Dataset for Explanations in Question Answering

Sep 08, 2020

Matthew Lamm, Jennimaria Palomaki, Chris Alberti, Daniel Andor, Eunsol Choi, Livio Baldini Soares, Michael Collins

Figure 1 for QED: A Framework and Dataset for Explanations in Question Answering

Figure 2 for QED: A Framework and Dataset for Explanations in Question Answering

Figure 3 for QED: A Framework and Dataset for Explanations in Question Answering

Figure 4 for QED: A Framework and Dataset for Explanations in Question Answering

Abstract:A question answering system that in addition to providing an answer provides an explanation of the reasoning that leads to that answer has potential advantages in terms of debuggability, extensibility and trust. To this end, we propose QED, a linguistically informed, extensible framework for explanations in question answering. A QED explanation specifies the relationship between a question and answer according to formal semantic notions such as referential equality, sentencehood, and entailment. We describe and publicly release an expert-annotated dataset of QED explanations built upon a subset of the Google Natural Questions dataset, and report baseline models on two tasks -- post-hoc explanation generation given an answer, and joint question answering and explanation generation. In the joint setting, a promising result suggests that training on a relatively small amount of QED data can improve question answering. In addition to describing the formal, language-theoretic motivations for the QED approach, we describe a large user study showing that the presence of QED explanations significantly improves the ability of untrained raters to spot errors made by a strong neural QA baseline.

Via

Access Paper or Ask Questions

Measuring Domain Portability and ErrorPropagation in Biomedical QA

Sep 24, 2019

Stefan Hosein, Daniel Andor, Ryan McDonald

Figure 1 for Measuring Domain Portability and ErrorPropagation in Biomedical QA

Figure 2 for Measuring Domain Portability and ErrorPropagation in Biomedical QA

Figure 3 for Measuring Domain Portability and ErrorPropagation in Biomedical QA

Figure 4 for Measuring Domain Portability and ErrorPropagation in Biomedical QA

Abstract:In this work we present Google's submission to the BioASQ 7 biomedical question answering (QA) task (specifically Task 7b, Phase B). The core of our systems are based on BERT QA models, specifically the model of \cite{alberti2019bert}. In this report, and via our submissions, we aimed to investigate two research questions. We start by studying how domain portable are QA systems that have been pre-trained and fine-tuned on general texts, e.g., Wikipedia. We measure this via two submissions. The first is a non-adapted model that uses a public pre-trained BERT model and is fine-tuned on the Natural Questions data set \cite{kwiatkowski2019natural}. The second system takes this non-adapted model and fine-tunes it with the BioASQ training data. Next, we study the impact of error propagation in end-to-end retrieval and QA systems. Again we test this via two submissions. The first uses human annotated relevant documents and snippets as input to the model and the second predicted documents and snippets. Our main findings are that domain specific fine-tuning can benefit Biomedical QA. However, the biggest quality bottleneck is at the retrieval stage, where we see large drops in metrics -- over 10pts absolute -- when using non gold inputs to the QA model.

Via

Access Paper or Ask Questions

Giving BERT a Calculator: Finding Operations and Arguments with Reading Comprehension

Sep 12, 2019

Daniel Andor, Luheng He, Kenton Lee, Emily Pitler

Figure 1 for Giving BERT a Calculator: Finding Operations and Arguments with Reading Comprehension

Figure 2 for Giving BERT a Calculator: Finding Operations and Arguments with Reading Comprehension

Figure 3 for Giving BERT a Calculator: Finding Operations and Arguments with Reading Comprehension

Figure 4 for Giving BERT a Calculator: Finding Operations and Arguments with Reading Comprehension

Abstract:Reading comprehension models have been successfully applied to extractive text answers, but it is unclear how best to generalize these models to abstractive numerical answers. We enable a BERT-based reading comprehension model to perform lightweight numerical reasoning. We augment the model with a predefined set of executable 'programs' which encompass simple arithmetic as well as extraction. Rather than having to learn to manipulate numbers directly, the model can pick a program and execute it. On the recent Discrete Reasoning Over Passages (DROP) dataset, designed to challenge reading comprehension models, we show a 33% absolute improvement by adding shallow programs. The model can learn to predict new operations when appropriate in a math word problem setting (Roy and Roth, 2015) with very few training examples.

Via

Access Paper or Ask Questions

Synthetic QA Corpora Generation with Roundtrip Consistency

Jun 12, 2019

Chris Alberti, Daniel Andor, Emily Pitler, Jacob Devlin, Michael Collins

Figure 1 for Synthetic QA Corpora Generation with Roundtrip Consistency

Figure 2 for Synthetic QA Corpora Generation with Roundtrip Consistency

Figure 3 for Synthetic QA Corpora Generation with Roundtrip Consistency

Figure 4 for Synthetic QA Corpora Generation with Roundtrip Consistency

Abstract:We introduce a novel method of generating synthetic question answering corpora by combining models of question generation and answer extraction, and by filtering the results to ensure roundtrip consistency. By pretraining on the resulting corpora we obtain significant improvements on SQuAD2 and NQ, establishing a new state-of-the-art on the latter. Our synthetic data generation models, for both question generation and answer extraction, can be fully reproduced by finetuning a publicly available BERT model on the extractive subsets of SQuAD2 and NQ. We also describe a more powerful variant that does full sequence-to-sequence pretraining for question generation, obtaining exact match and F1 at less than 0.1% and 0.4% from human performance on SQuAD2.

Via

Access Paper or Ask Questions

Linguistically-Informed Self-Attention for Semantic Role Labeling

Aug 28, 2018

Emma Strubell, Patrick Verga, Daniel Andor, David Weiss, Andrew McCallum

Figure 1 for Linguistically-Informed Self-Attention for Semantic Role Labeling

Figure 2 for Linguistically-Informed Self-Attention for Semantic Role Labeling

Figure 3 for Linguistically-Informed Self-Attention for Semantic Role Labeling

Figure 4 for Linguistically-Informed Self-Attention for Semantic Role Labeling

Abstract:Current state-of-the-art semantic role labeling (SRL) uses a deep neural network with no explicit linguistic features. However, prior work has shown that gold syntax trees can dramatically improve SRL decoding, suggesting the possibility of increased accuracy from explicit modeling of syntax. In this work, we present linguistically-informed self-attention (LISA): a neural network model that combines multi-head self-attention with multi-task learning across dependency parsing, part-of-speech tagging, predicate detection and SRL. Unlike previous models which require significant pre-processing to prepare linguistic features, LISA can incorporate syntax using merely raw tokens as input, encoding the sequence only once to simultaneously perform parsing, predicate detection and role labeling for all predicates. Syntax is incorporated by training one attention head to attend to syntactic parents for each token. Moreover, if a high-quality syntactic parse is already available, it can be beneficially injected at test time without re-training our SRL model. In experiments on CoNLL-2005 SRL, LISA achieves new state-of-the-art performance for a model using predicted predicates and standard word embeddings, attaining 2.5 F1 absolute higher than the previous state-of-the-art on newswire and more than 3.5 F1 on out-of-domain data, nearly 10% reduction in error. On ConLL-2012 English SRL we also show an improvement of more than 2.5 F1. LISA also out-performs the state-of-the-art with contextually-encoded (ELMo) word representations, by nearly 1.0 F1 on news and more than 2.0 F1 on out-of-domain text.

* In Conference on Empirical Methods in Natural Language Processing (EMNLP). Brussels, Belgium. October 2018

Via

Access Paper or Ask Questions