Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

David Berthelot

Fiona

STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis

Jun 06, 2025

Jiatao Gu, Tianrong Chen, David Berthelot, Huangjie Zheng, Yuyang Wang, Ruixiang Zhang, Laurent Dinh, Miguel Angel Bautista, Josh Susskind, Shuangfei Zhai

Abstract:We present STARFlow, a scalable generative model based on normalizing flows that achieves strong performance in high-resolution image synthesis. The core of STARFlow is Transformer Autoregressive Flow (TARFlow), which combines the expressive power of normalizing flows with the structured modeling capabilities of Autoregressive Transformers. We first establish the theoretical universality of TARFlow for modeling continuous distributions. Building on this foundation, we introduce several key architectural and algorithmic innovations to significantly enhance scalability: (1) a deep-shallow design, wherein a deep Transformer block captures most of the model representational capacity, complemented by a few shallow Transformer blocks that are computationally efficient yet substantially beneficial; (2) modeling in the latent space of pretrained autoencoders, which proves more effective than direct pixel-level modeling; and (3) a novel guidance algorithm that significantly boosts sample quality. Crucially, our model remains an end-to-end normalizing flow, enabling exact maximum likelihood training in continuous spaces without discretization. STARFlow achieves competitive performance in both class-conditional and text-conditional image generation tasks, approaching state-of-the-art diffusion models in sample quality. To our knowledge, this work is the first successful demonstration of normalizing flows operating effectively at this scale and resolution.

* TLDR: We show for the first time that normalizing flows can be scaled for high-resolution and text-conditioned image synthesis

Via

Access Paper or Ask Questions

Mechanisms of Projective Composition of Diffusion Models

Feb 06, 2025

Arwen Bradley, Preetum Nakkiran, David Berthelot, James Thornton, Joshua M. Susskind

Abstract:We study the theoretical foundations of composition in diffusion models, with a particular focus on out-of-distribution extrapolation and length-generalization. Prior work has shown that composing distributions via linear score combination can achieve promising results, including length-generalization in some cases (Du et al., 2023; Liu et al., 2022). However, our theoretical understanding of how and why such compositions work remains incomplete. In fact, it is not even entirely clear what it means for composition to "work". This paper starts to address these fundamental gaps. We begin by precisely defining one possible desired result of composition, which we call projective composition. Then, we investigate: (1) when linear score combinations provably achieve projective composition, (2) whether reverse-diffusion sampling can generate the desired composition, and (3) the conditions under which composition fails. Finally, we connect our theoretical analysis to prior empirical observations where composition has either worked or failed, for reasons that were unclear at the time.

* 9 pages, 7 figures. The first two authors contributed equally

Via

Access Paper or Ask Questions

Normalizing Flows are Capable Generative Models

Dec 10, 2024

Shuangfei Zhai, Ruixiang Zhang, Preetum Nakkiran, David Berthelot, Jiatao Gu, Huangjie Zheng, Tianrong Chen, Miguel Angel Bautista, Navdeep Jaitly, Josh Susskind

Abstract:Normalizing Flows (NFs) are likelihood-based models for continuous inputs. They have demonstrated promising results on both density estimation and generative modeling tasks, but have received relatively little attention in recent years. In this work, we demonstrate that NFs are more powerful than previously believed. We present TarFlow: a simple and scalable architecture that enables highly performant NF models. TarFlow can be thought of as a Transformer-based variant of Masked Autoregressive Flows (MAFs): it consists of a stack of autoregressive Transformer blocks on image patches, alternating the autoregression direction between layers. TarFlow is straightforward to train end-to-end, and capable of directly modeling and generating pixels. We also propose three key techniques to improve sample quality: Gaussian noise augmentation during training, a post training denoising procedure, and an effective guidance method for both class-conditional and unconditional settings. Putting these together, TarFlow sets new state-of-the-art results on likelihood estimation for images, beating the previous best methods by a large margin, and generates samples with quality and diversity comparable to diffusion models, for the first time with a stand-alone NF model. We make our code available at https://github.com/apple/ml-tarflow.

Via

Access Paper or Ask Questions

TRACT: Denoising Diffusion Models with Transitive Closure Time-Distillation

Mar 07, 2023

David Berthelot, Arnaud Autef, Jierui Lin, Dian Ang Yap, Shuangfei Zhai, Siyuan Hu, Daniel Zheng, Walter Talbott, Eric Gu

Figure 1 for TRACT: Denoising Diffusion Models with Transitive Closure Time-Distillation

Figure 2 for TRACT: Denoising Diffusion Models with Transitive Closure Time-Distillation

Figure 3 for TRACT: Denoising Diffusion Models with Transitive Closure Time-Distillation

Figure 4 for TRACT: Denoising Diffusion Models with Transitive Closure Time-Distillation

Abstract:Denoising Diffusion models have demonstrated their proficiency for generative sampling. However, generating good samples often requires many iterations. Consequently, techniques such as binary time-distillation (BTD) have been proposed to reduce the number of network calls for a fixed architecture. In this paper, we introduce TRAnsitive Closure Time-distillation (TRACT), a new method that extends BTD. For single step diffusion,TRACT improves FID by up to 2.4x on the same architecture, and achieves new single-step Denoising Diffusion Implicit Models (DDIM) state-of-the-art FID (7.4 for ImageNet64, 3.8 for CIFAR10). Finally we tease apart the method through extended ablations. The PyTorch implementation will be released soon.

Via

Access Paper or Ask Questions

AdaMatch: A Unified Approach to Semi-Supervised Learning and Domain Adaptation

Jun 08, 2021

David Berthelot, Rebecca Roelofs, Kihyuk Sohn, Nicholas Carlini, Alex Kurakin

Figure 1 for AdaMatch: A Unified Approach to Semi-Supervised Learning and Domain Adaptation

Figure 2 for AdaMatch: A Unified Approach to Semi-Supervised Learning and Domain Adaptation

Figure 3 for AdaMatch: A Unified Approach to Semi-Supervised Learning and Domain Adaptation

Figure 4 for AdaMatch: A Unified Approach to Semi-Supervised Learning and Domain Adaptation

Abstract:We extend semi-supervised learning to the problem of domain adaptation to learn significantly higher-accuracy models that train on one data distribution and test on a different one. With the goal of generality, we introduce AdaMatch, a method that unifies the tasks of unsupervised domain adaptation (UDA), semi-supervised learning (SSL), and semi-supervised domain adaptation (SSDA). In an extensive experimental study, we compare its behavior with respective state-of-the-art techniques from SSL, SSDA, and UDA on vision classification tasks. We find AdaMatch either matches or significantly exceeds the state-of-the-art in each case using the same hyper-parameters regardless of the dataset or task. For example, AdaMatch nearly doubles the accuracy compared to that of the prior state-of-the-art on the UDA task for DomainNet and even exceeds the accuracy of the prior state-of-the-art obtained with pre-training by 6.4% when AdaMatch is trained completely from scratch. Furthermore, by providing AdaMatch with just one labeled example per class from the target domain (i.e., the SSDA setting), we increase the target accuracy by an additional 6.1%, and with 5 labeled examples, by 13.6%.

Via

Access Paper or Ask Questions

Assessing Post-Disaster Damage from Satellite Imagery using Semi-Supervised Learning Techniques

Nov 24, 2020

Jihyeon Lee, Joseph Z. Xu, Kihyuk Sohn, Wenhan Lu, David Berthelot, Izzeddin Gur, Pranav Khaitan, Ke-Wei, Huang, Kyriacos Koupparis(+1 more)

Figure 1 for Assessing Post-Disaster Damage from Satellite Imagery using Semi-Supervised Learning Techniques

Figure 2 for Assessing Post-Disaster Damage from Satellite Imagery using Semi-Supervised Learning Techniques

Figure 3 for Assessing Post-Disaster Damage from Satellite Imagery using Semi-Supervised Learning Techniques

Figure 4 for Assessing Post-Disaster Damage from Satellite Imagery using Semi-Supervised Learning Techniques

Abstract:To respond to disasters such as earthquakes, wildfires, and armed conflicts, humanitarian organizations require accurate and timely data in the form of damage assessments, which indicate what buildings and population centers have been most affected. Recent research combines machine learning with remote sensing to automatically extract such information from satellite imagery, reducing manual labor and turn-around time. A major impediment to using machine learning methods in real disaster response scenarios is the difficulty of obtaining a sufficient amount of labeled data to train a model for an unfolding disaster. This paper shows a novel application of semi-supervised learning (SSL) to train models for damage assessment with a minimal amount of labeled data and large amount of unlabeled data. We compare the performance of state-of-the-art SSL methods, including MixMatch and FixMatch, to a supervised baseline for the 2010 Haiti earthquake, 2017 Santa Rosa wildfire, and 2016 armed conflict in Syria. We show how models trained with SSL methods can reach fully supervised performance despite using only a fraction of labeled data and identify areas for further improvements.

* NeurIPS 2020 Artificial Intelligence for Humanitarian Assistance and Disaster Response Workshop

Via

Access Paper or Ask Questions

Creating High Resolution Images with a Latent Adversarial Generator

Mar 04, 2020

David Berthelot, Peyman Milanfar, Ian Goodfellow

Figure 1 for Creating High Resolution Images with a Latent Adversarial Generator

Figure 2 for Creating High Resolution Images with a Latent Adversarial Generator

Figure 3 for Creating High Resolution Images with a Latent Adversarial Generator

Figure 4 for Creating High Resolution Images with a Latent Adversarial Generator

Abstract:Generating realistic images is difficult, and many formulations for this task have been proposed recently. If we restrict the task to that of generating a particular class of images, however, the task becomes more tractable. That is to say, instead of generating an arbitrary image as a sample from the manifold of natural images, we propose to sample images from a particular "subspace" of natural images, directed by a low-resolution image from the same subspace. The problem we address, while close to the formulation of the single-image super-resolution problem, is in fact rather different. Single image super-resolution is the task of predicting the image closest to the ground truth from a relatively low resolution image. We propose to produce samples of high resolution images given extremely small inputs with a new method called Latent Adversarial Generator (LAG). In our generative sampling framework, we only use the input (possibly of very low-resolution) to direct what class of samples the network should produce. As such, the output of our algorithm is not a unique image that relates to the input, but rather a possible se} of related images sampled from the manifold of natural images. Our method learns exclusively in the latent space of the adversary using perceptual loss -- it does not have a pixel loss.

Via

Access Paper or Ask Questions

Semi-Supervised Class Discovery

Feb 22, 2020

Jeremy Nixon, Jeremiah Liu, David Berthelot

Figure 1 for Semi-Supervised Class Discovery

Figure 2 for Semi-Supervised Class Discovery

Figure 3 for Semi-Supervised Class Discovery

Figure 4 for Semi-Supervised Class Discovery

Abstract:One promising approach to dealing with datapoints that are outside of the initial training distribution (OOD) is to create new classes that capture similarities in the datapoints previously rejected as uncategorizable. Systems that generate labels can be deployed against an arbitrary amount of data, discovering classification schemes that through training create a higher quality representation of data. We introduce the Dataset Reconstruction Accuracy, a new and important measure of the effectiveness of a model's ability to create labels. We introduce benchmarks against this Dataset Reconstruction metric. We apply a new heuristic, class learnability, for deciding whether a class is worthy of addition to the training dataset. We show that our class discovery system can be successfully applied to vision and language, and we demonstrate the value of semi-supervised learning in automatically discovering novel classes.

Via

Access Paper or Ask Questions

FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence

Jan 21, 2020

Kihyuk Sohn, David Berthelot, Chun-Liang Li, Zizhao Zhang, Nicholas Carlini, Ekin D. Cubuk, Alex Kurakin, Han Zhang, Colin Raffel

Figure 1 for FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence

Figure 2 for FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence

Figure 3 for FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence

Figure 4 for FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence

Abstract:Semi-supervised learning (SSL) provides an effective means of leveraging unlabeled data to improve a model's performance. In this paper, we demonstrate the power of a simple combination of two common SSL methods: consistency regularization and pseudo-labeling. Our algorithm, FixMatch, first generates pseudo-labels using the model's predictions on weakly-augmented unlabeled images. For a given image, the pseudo-label is only retained if the model produces a high-confidence prediction. The model is then trained to predict the pseudo-label when fed a strongly-augmented version of the same image. Despite its simplicity, we show that FixMatch achieves state-of-the-art performance across a variety of standard semi-supervised learning benchmarks, including 94.93% accuracy on CIFAR-10 with 250 labels and 88.61% accuracy with 40 -- just 4 labels per class. Since FixMatch bears many similarities to existing SSL methods that achieve worse performance, we carry out an extensive ablation study to tease apart the experimental factors that are most important to FixMatch's success. We make our code available at https://github.com/google-research/fixmatch.

Via

Access Paper or Ask Questions

Combining MixMatch and Active Learning for Better Accuracy with Fewer Labels

Dec 03, 2019

Shuang Song, David Berthelot, Afshin Rostamizadeh

Figure 1 for Combining MixMatch and Active Learning for Better Accuracy with Fewer Labels

Figure 2 for Combining MixMatch and Active Learning for Better Accuracy with Fewer Labels

Figure 3 for Combining MixMatch and Active Learning for Better Accuracy with Fewer Labels

Figure 4 for Combining MixMatch and Active Learning for Better Accuracy with Fewer Labels

Abstract:We propose using active learning based techniques to further improve the state-of-the-art semi-supervised learning MixMatch algorithm. We provide a thorough empirical evaluation of several active-learning and baseline methods, which successfully demonstrate a significant improvement on the benchmark CIFAR-10, CIFAR-100, and SVHN datasets (as much as 1.5% in absolute accuracy). We also provide an empirical analysis of the cost trade-off between incrementally gathering more labeled versus unlabeled data. This analysis can be used to measure the relative value of labeled/unlabeled data at different points of the learning curve, where we find that although the incremental value of labeled data can be as much as 20x that of unlabeled, it quickly diminishes to less than 3x once more than 2,000 labeled example are observed. Code can be found at https://github.com/google-research/mma.

Via

Access Paper or Ask Questions