Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mikolaj Binkowski

Flamingo: a Visual Language Model for Few-Shot Learning

Apr 29, 2022

Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc, Arthur Mensch, Katie Millican, Malcolm Reynolds(+17 more)

Figure 1 for Flamingo: a Visual Language Model for Few-Shot Learning

Figure 2 for Flamingo: a Visual Language Model for Few-Shot Learning

Figure 3 for Flamingo: a Visual Language Model for Few-Shot Learning

Figure 4 for Flamingo: a Visual Language Model for Few-Shot Learning

Abstract:Building models that can be rapidly adapted to numerous tasks using only a handful of annotated examples is an open challenge for multimodal machine learning research. We introduce Flamingo, a family of Visual Language Models (VLM) with this ability. Flamingo models include key architectural innovations to: (i) bridge powerful pretrained vision-only and language-only models, (ii) handle sequences of arbitrarily interleaved visual and textual data, and (iii) seamlessly ingest images or videos as inputs. Thanks to their flexibility, Flamingo models can be trained on large-scale multimodal web corpora containing arbitrarily interleaved text and images, which is key to endow them with in-context few-shot learning capabilities. We perform a thorough evaluation of the proposed Flamingo models, exploring and measuring their ability to rapidly adapt to a variety of image and video understanding benchmarks. These include open-ended tasks such as visual question-answering, where the model is prompted with a question which it has to answer, captioning tasks, which evaluate the ability to describe a scene or an event, and close-ended tasks such as multiple choice visual question-answering. For tasks lying anywhere on this spectrum, we demonstrate that a single Flamingo model can achieve a new state of the art for few-shot learning, simply by prompting the model with task-specific examples. On many of these benchmarks, Flamingo actually surpasses the performance of models that are fine-tuned on thousands of times more task-specific data.

Via

Access Paper or Ask Questions

Step-unrolled Denoising Autoencoders for Text Generation

Dec 13, 2021

Nikolay Savinov, Junyoung Chung, Mikolaj Binkowski, Erich Elsen, Aaron van den Oord

Figure 1 for Step-unrolled Denoising Autoencoders for Text Generation

Figure 2 for Step-unrolled Denoising Autoencoders for Text Generation

Figure 3 for Step-unrolled Denoising Autoencoders for Text Generation

Figure 4 for Step-unrolled Denoising Autoencoders for Text Generation

Abstract:In this paper we propose a new generative model of text, Step-unrolled Denoising Autoencoder (SUNDAE), that does not rely on autoregressive models. Similarly to denoising diffusion techniques, SUNDAE is repeatedly applied on a sequence of tokens, starting from random inputs and improving them each time until convergence. We present a simple new improvement operator that converges in fewer iterations than diffusion methods, while qualitatively producing better samples on natural language datasets. SUNDAE achieves state-of-the-art results (among non-autoregressive methods) on the WMT'14 English-to-German translation task and good qualitative results on unconditional language modeling on the Colossal Cleaned Common Crawl dataset and a dataset of Python code from GitHub. The non-autoregressive nature of SUNDAE opens up possibilities beyond left-to-right prompted generation, by filling in arbitrary blank patterns in a template.

Via

Access Paper or Ask Questions

A Deep Learning Approach for Characterizing Major Galaxy Mergers

Feb 09, 2021

Skanda Koppula, Victor Bapst, Marc Huertas-Company, Sam Blackwell, Agnieszka Grabska-Barwinska, Sander Dieleman, Andrea Huber, Natasha Antropova, Mikolaj Binkowski, Hannah Openshaw(+8 more)

Figure 1 for A Deep Learning Approach for Characterizing Major Galaxy Mergers

Figure 2 for A Deep Learning Approach for Characterizing Major Galaxy Mergers

Figure 3 for A Deep Learning Approach for Characterizing Major Galaxy Mergers

Abstract:Fine-grained estimation of galaxy merger stages from observations is a key problem useful for validation of our current theoretical understanding of galaxy formation. To this end, we demonstrate a CNN-based regression model that is able to predict, for the first time, using a single image, the merger stage relative to the first perigee passage with a median error of 38.3 million years (Myrs) over a period of 400 Myrs. This model uses no specific dynamical modeling and learns only from simulated merger events. We show that our model provides reasonable estimates on real observations, approximately matching prior estimates provided by detailed dynamical modeling. We provide a preliminary interpretability analysis of our models, and demonstrate first steps toward calibrated uncertainty estimation.

* Third Workshop on Machine Learning and the Physical Sciences (NeurIPS 2020), Vancouver, Canada

Via

Access Paper or Ask Questions