Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yaroslav Ganin

Continuous diffusion for categorical data

Dec 15, 2022

Sander Dieleman, Laurent Sartran, Arman Roshannai, Nikolay Savinov, Yaroslav Ganin, Pierre H. Richemond, Arnaud Doucet, Robin Strudel, Chris Dyer, Conor Durkan(+4 more)

Figure 1 for Continuous diffusion for categorical data

Figure 2 for Continuous diffusion for categorical data

Figure 3 for Continuous diffusion for categorical data

Figure 4 for Continuous diffusion for categorical data

Abstract:Diffusion models have quickly become the go-to paradigm for generative modelling of perceptual signals (such as images and sound) through iterative refinement. Their success hinges on the fact that the underlying physical phenomena are continuous. For inherently discrete and categorical data such as language, various diffusion-inspired alternatives have been proposed. However, the continuous nature of diffusion models conveys many benefits, and in this work we endeavour to preserve it. We propose CDCD, a framework for modelling categorical data with diffusion models that are continuous both in time and input space. We demonstrate its efficacy on several language modelling tasks.

* 26 pages, 8 figures; corrections and additional information about hyperparameters

Via

Access Paper or Ask Questions

Self-conditioned Embedding Diffusion for Text Generation

Nov 08, 2022

Robin Strudel, Corentin Tallec, Florent Altché, Yilun Du, Yaroslav Ganin, Arthur Mensch, Will Grathwohl, Nikolay Savinov, Sander Dieleman, Laurent Sifre(+1 more)

Abstract:Can continuous diffusion models bring the same performance breakthrough on natural language they did for image generation? To circumvent the discrete nature of text data, we can simply project tokens in a continuous space of embeddings, as is standard in language modeling. We propose Self-conditioned Embedding Diffusion, a continuous diffusion mechanism that operates on token embeddings and allows to learn flexible and scalable diffusion models for both conditional and unconditional text generation. Through qualitative and quantitative evaluation, we show that our text diffusion models generate samples comparable with those produced by standard autoregressive language models - while being in theory more efficient on accelerator hardware at inference time. Our work paves the way for scaling up diffusion models for text, similarly to autoregressive models, and for improving performance with recent refinements to continuous diffusion.

* 15 pages

Via

Access Paper or Ask Questions

A Generalist Neural Algorithmic Learner

Sep 22, 2022

Borja Ibarz, Vitaly Kurin, George Papamakarios, Kyriacos Nikiforou, Mehdi Bennani, Róbert Csordás, Andrew Dudzik, Matko Bošnjak, Alex Vitvitskyi, Yulia Rubanova(+5 more)

Figure 1 for A Generalist Neural Algorithmic Learner

Figure 2 for A Generalist Neural Algorithmic Learner

Figure 3 for A Generalist Neural Algorithmic Learner

Figure 4 for A Generalist Neural Algorithmic Learner

Abstract:The cornerstone of neural algorithmic reasoning is the ability to solve algorithmic tasks, especially in a way that generalises out of distribution. While recent years have seen a surge in methodological improvements in this area, they mostly focused on building specialist models. Specialist models are capable of learning to neurally execute either only one algorithm or a collection of algorithms with identical control-flow backbone. Here, instead, we focus on constructing a generalist neural algorithmic learner -- a single graph neural network processor capable of learning to execute a wide range of algorithms, such as sorting, searching, dynamic programming, path-finding and geometry. We leverage the CLRS benchmark to empirically show that, much like recent successes in the domain of perception, generalist algorithmic learners can be built by "incorporating" knowledge. That is, it is possible to effectively learn algorithms in a multi-task manner, so long as we can learn to execute them well in a single-task regime. Motivated by this, we present a series of improvements to the input representation, training regime and processor architecture over CLRS, improving average single-task performance by over 20% from prior art. We then conduct a thorough ablation of multi-task learners leveraging these improvements. Our results demonstrate a generalist learner that effectively incorporates knowledge captured by specialist models.

* 20 pages, 10 figures

Via

Access Paper or Ask Questions

Computer-Aided Design as Language

May 06, 2021

Yaroslav Ganin, Sergey Bartunov, Yujia Li, Ethan Keller, Stefano Saliceti

Figure 1 for Computer-Aided Design as Language

Figure 2 for Computer-Aided Design as Language

Figure 3 for Computer-Aided Design as Language

Figure 4 for Computer-Aided Design as Language

Abstract:Computer-Aided Design (CAD) applications are used in manufacturing to model everything from coffee mugs to sports cars. These programs are complex and require years of training and experience to master. A component of all CAD models particularly difficult to make are the highly structured 2D sketches that lie at the heart of every 3D construction. In this work, we propose a machine learning model capable of automatically generating such sketches. Through this, we pave the way for developing intelligent tools that would help engineers create better designs with less effort. Our method is a combination of a general-purpose language modeling technique alongside an off-the-shelf data serialization protocol. We show that our approach has enough flexibility to accommodate the complexity of the domain and performs well for both unconditional synthesis and image-to-sketch translation.

* 24 pages, 11 figures, 3 tables

Via

Access Paper or Ask Questions

PolyGen: An Autoregressive Generative Model of 3D Meshes

Feb 23, 2020

Charlie Nash, Yaroslav Ganin, S. M. Ali Eslami, Peter W. Battaglia

Figure 1 for PolyGen: An Autoregressive Generative Model of 3D Meshes

Figure 2 for PolyGen: An Autoregressive Generative Model of 3D Meshes

Figure 3 for PolyGen: An Autoregressive Generative Model of 3D Meshes

Figure 4 for PolyGen: An Autoregressive Generative Model of 3D Meshes

Abstract:Polygon meshes are an efficient representation of 3D geometry, and are of central importance in computer graphics, robotics and games development. Existing learning-based approaches have avoided the challenges of working with 3D meshes, instead using alternative object representations that are more compatible with neural architectures and training approaches. We present an approach which models the mesh directly, predicting mesh vertices and faces sequentially using a Transformer-based architecture. Our model can condition on a range of inputs, including object classes, voxels, and images, and because the model is probabilistic it can produce samples that capture uncertainty in ambiguous scenarios. We show that the model is capable of producing high-quality, usable meshes, and establish log-likelihood benchmarks for the mesh-modelling task. We also evaluate the conditional models on surface reconstruction metrics against alternative methods, and demonstrate competitive performance despite not training directly on this task.

Via

Access Paper or Ask Questions

Unsupervised Doodling and Painting with Improved SPIRAL

Oct 02, 2019

John F. J. Mellor, Eunbyung Park, Yaroslav Ganin, Igor Babuschkin, Tejas Kulkarni, Dan Rosenbaum, Andy Ballard, Theophane Weber, Oriol Vinyals, S. M. Ali Eslami

Figure 1 for Unsupervised Doodling and Painting with Improved SPIRAL

Figure 2 for Unsupervised Doodling and Painting with Improved SPIRAL

Figure 3 for Unsupervised Doodling and Painting with Improved SPIRAL

Figure 4 for Unsupervised Doodling and Painting with Improved SPIRAL

Abstract:We investigate using reinforcement learning agents as generative models of images (extending arXiv:1804.01118). A generative agent controls a simulated painting environment, and is trained with rewards provided by a discriminator network simultaneously trained to assess the realism of the agent's samples, either unconditional or reconstructions. Compared to prior work, we make a number of improvements to the architectures of the agents and discriminators that lead to intriguing and at times surprising results. We find that when sufficiently constrained, generative agents can learn to produce images with a degree of visual abstraction, despite having only ever seen real photographs (no human brush strokes). And given enough time with the painting environment, they can produce images with considerable realism. These results show that, under the right circumstances, some aspects of human drawing can emerge from simulated embodiment, without the need for external supervision, imitation or social cues. Finally, we note the framework's potential for use in creative applications.

* See https://learning-to-paint.github.io for an interactive version of this paper, with videos

Via

Access Paper or Ask Questions

Synthesizing Programs for Images using Reinforced Adversarial Learning

Apr 03, 2018

Yaroslav Ganin, Tejas Kulkarni, Igor Babuschkin, S. M. Ali Eslami, Oriol Vinyals

Figure 1 for Synthesizing Programs for Images using Reinforced Adversarial Learning

Abstract:Advances in deep generative networks have led to impressive results in recent years. Nevertheless, such models can often waste their capacity on the minutiae of datasets, presumably due to weak inductive biases in their decoders. This is where graphics engines may come in handy since they abstract away low-level details and represent images as high-level programs. Current methods that combine deep learning and renderers are limited by hand-crafted likelihood or distance functions, a need for large amounts of supervision, or difficulties in scaling their inference algorithms to richer datasets. To mitigate these issues, we present SPIRAL, an adversarially trained agent that generates a program which is executed by a graphics engine to interpret and sample images. The goal of this agent is to fool a discriminator network that distinguishes between real and rendered data, trained with a distributed reinforcement learning setup without any supervision. A surprising finding is that using the discriminator's output as a reward signal is the key to allow the agent to make meaningful progress at matching the desired output rendering. To the best of our knowledge, this is the first demonstration of an end-to-end, unsupervised and adversarial inverse graphics agent on challenging real world (MNIST, Omniglot, CelebA) and synthetic 3D datasets.

* 12 pages, 13 figures

Via

Access Paper or Ask Questions

GibbsNet: Iterative Adversarial Inference for Deep Graphical Models

Dec 12, 2017

Alex Lamb, Devon Hjelm, Yaroslav Ganin, Joseph Paul Cohen, Aaron Courville, Yoshua Bengio

Figure 1 for GibbsNet: Iterative Adversarial Inference for Deep Graphical Models

Figure 2 for GibbsNet: Iterative Adversarial Inference for Deep Graphical Models

Figure 3 for GibbsNet: Iterative Adversarial Inference for Deep Graphical Models

Figure 4 for GibbsNet: Iterative Adversarial Inference for Deep Graphical Models

Abstract:Directed latent variable models that formulate the joint distribution as $p(x,z) = p(z) p(x \mid z)$ have the advantage of fast and exact sampling. However, these models have the weakness of needing to specify $p(z)$, often with a simple fixed prior that limits the expressiveness of the model. Undirected latent variable models discard the requirement that $p(z)$ be specified with a prior, yet sampling from them generally requires an iterative procedure such as blocked Gibbs-sampling that may require many steps to draw samples from the joint distribution $p(x, z)$. We propose a novel approach to learning the joint distribution between the data and a latent code which uses an adversarially learned iterative procedure to gradually refine the joint distribution, $p(x, z)$, to better match with the data distribution on each step. GibbsNet is the best of both worlds both in theory and in practice. Achieving the speed and simplicity of a directed latent variable model, it is guaranteed (assuming the adversarial game reaches the virtual training criteria global minimum) to produce samples from $p(x, z)$ with only a few sampling iterations. Achieving the expressiveness and flexibility of an undirected latent variable model, GibbsNet does away with the need for an explicit $p(z)$ and has the ability to do attribute prediction, class-conditional generation, and joint image-attribute modeling in a single model which is not trained for any of these specific tasks. We show empirically that GibbsNet is able to learn a more complex $p(z)$ and show that this leads to improved inpainting and iterative refinement of $p(x, z)$ for dozens of steps and stable generation without collapse for thousands of steps, despite being trained on only a few steps.

* NIPS 2017

Via

Access Paper or Ask Questions

Multiregion Bilinear Convolutional Neural Networks for Person Re-Identification

Sep 06, 2017

Evgeniya Ustinova, Yaroslav Ganin, Victor Lempitsky

Figure 1 for Multiregion Bilinear Convolutional Neural Networks for Person Re-Identification

Figure 2 for Multiregion Bilinear Convolutional Neural Networks for Person Re-Identification

Figure 3 for Multiregion Bilinear Convolutional Neural Networks for Person Re-Identification

Figure 4 for Multiregion Bilinear Convolutional Neural Networks for Person Re-Identification

Abstract:In this work we propose a new architecture for person re-identification. As the task of re-identification is inherently associated with embedding learning and non-rigid appearance description, our architecture is based on the deep bilinear convolutional network (Bilinear-CNN) that has been proposed recently for fine-grained classification of highly non-rigid objects. While the last stages of the original Bilinear-CNN architecture completely removes the geometric information from consideration by performing orderless pooling, we observe that a better embedding can be learned by performing bilinear pooling in a more local way, where each pooling is confined to a predefined region. Our architecture thus represents a compromise between traditional convolutional networks and bilinear CNNs and strikes a balance between rigid matching and completely ignoring spatial information. We perform the experimental validation of the new architecture on the three popular benchmark datasets (Market-1501, CUHK01, CUHK03), comparing it to baselines that include Bilinear-CNN as well as prior art. The new architecture outperforms the baseline on all three datasets, while performing better than state-of-the-art on two out of three. The code and the pretrained models of the approach can be found at https://github.com/madkn/MultiregionBilinearCNN-ReId.

* in AVSS 2017

Via

Access Paper or Ask Questions

DeepWarp: Photorealistic Image Resynthesis for Gaze Manipulation

Jul 26, 2016

Yaroslav Ganin, Daniil Kononenko, Diana Sungatullina, Victor Lempitsky

Figure 1 for DeepWarp: Photorealistic Image Resynthesis for Gaze Manipulation

Figure 2 for DeepWarp: Photorealistic Image Resynthesis for Gaze Manipulation

Figure 3 for DeepWarp: Photorealistic Image Resynthesis for Gaze Manipulation

Figure 4 for DeepWarp: Photorealistic Image Resynthesis for Gaze Manipulation

Abstract:In this work, we consider the task of generating highly-realistic images of a given face with a redirected gaze. We treat this problem as a specific instance of conditional image generation and suggest a new deep architecture that can handle this task very well as revealed by numerical comparison with prior art and a user study. Our deep architecture performs coarse-to-fine warping with an additional intensity correction of individual pixels. All these operations are performed in a feed-forward manner, and the parameters associated with different operations are learned jointly in the end-to-end fashion. After learning, the resulting neural network can synthesize images with manipulated gaze, while the redirection angle can be selected arbitrarily from a certain range and provided as an input to the network.

* Fixed typos, 14 + 2 + 2 pages, ECCV 2016

Via

Access Paper or Ask Questions