Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Gautam Mittal

Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language

Jun 28, 2023

William Berrios, Gautam Mittal, Tristan Thrush, Douwe Kiela, Amanpreet Singh

Figure 1 for Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language

Figure 2 for Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language

Figure 3 for Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language

Figure 4 for Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language

Abstract:We propose LENS, a modular approach for tackling computer vision problems by leveraging the power of large language models (LLMs). Our system uses a language model to reason over outputs from a set of independent and highly descriptive vision modules that provide exhaustive information about an image. We evaluate the approach on pure computer vision settings such as zero- and few-shot object recognition, as well as on vision and language problems. LENS can be applied to any off-the-shelf LLM and we find that the LLMs with LENS perform highly competitively with much bigger and much more sophisticated systems, without any multimodal training whatsoever. We open-source our code at https://github.com/ContextualAI/lens and provide an interactive demo.

Via

Access Paper or Ask Questions

Balsa: Learning a Query Optimizer Without Expert Demonstrations

Jan 05, 2022

Zongheng Yang, Wei-Lin Chiang, Sifei Luan, Gautam Mittal, Michael Luo, Ion Stoica

Figure 1 for Balsa: Learning a Query Optimizer Without Expert Demonstrations

Figure 2 for Balsa: Learning a Query Optimizer Without Expert Demonstrations

Figure 3 for Balsa: Learning a Query Optimizer Without Expert Demonstrations

Figure 4 for Balsa: Learning a Query Optimizer Without Expert Demonstrations

Abstract:Query optimizers are a performance-critical component in every database system. Due to their complexity, optimizers take experts months to write and years to refine. In this work, we demonstrate for the first time that learning to optimize queries without learning from an expert optimizer is both possible and efficient. We present Balsa, a query optimizer built by deep reinforcement learning. Balsa first learns basic knowledge from a simple, environment-agnostic simulator, followed by safe learning in real execution. On the Join Order Benchmark, Balsa matches the performance of two expert query optimizers, both open-source and commercial, with two hours of learning, and outperforms them by up to 2.8$\times$ in workload runtime after a few more hours. Balsa thus opens the possibility of automatically learning to optimize in future compute environments where expert-designed optimizers do not exist.

* Preprint, SIGMOD 2022

Via

Access Paper or Ask Questions

Symbolic Music Generation with Diffusion Models

Mar 30, 2021

Gautam Mittal, Jesse Engel, Curtis Hawthorne, Ian Simon

Figure 1 for Symbolic Music Generation with Diffusion Models

Figure 2 for Symbolic Music Generation with Diffusion Models

Figure 3 for Symbolic Music Generation with Diffusion Models

Figure 4 for Symbolic Music Generation with Diffusion Models

Abstract:Score-based generative models and diffusion probabilistic models have been successful at generating high-quality samples in continuous domains such as images and audio. However, due to their Langevin-inspired sampling mechanisms, their application to discrete and sequential data has been limited. In this work, we present a technique for training diffusion models on sequential data by parameterizing the discrete domain in the continuous latent space of a pre-trained variational autoencoder. Our method is non-autoregressive and learns to generate sequences of latent embeddings through the reverse process and offers parallel generation with a constant number of iterative refinement steps. We apply this technique to modeling symbolic music and show strong unconditional generation and post-hoc conditional infilling results compared to autoregressive language models operating over the same continuous embeddings.

Via

Access Paper or Ask Questions