Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

J. Jon Ryu

Score-of-Mixture Training: Training One-Step Generative Models Made Simple

Feb 13, 2025

Tejas Jayashankar, J. Jon Ryu, Gregory Wornell

Abstract:We propose Score-of-Mixture Training (SMT), a novel framework for training one-step generative models by minimizing a class of divergences called the $\alpha$-skew Jensen-Shannon divergence. At its core, SMT estimates the score of mixture distributions between real and fake samples across multiple noise levels. Similar to consistency models, our approach supports both training from scratch (SMT) and distillation using a pretrained diffusion model, which we call Score-of-Mixture Distillation (SMD). It is simple to implement, requires minimal hyperparameter tuning, and ensures stable training. Experiments on CIFAR-10 and ImageNet 64x64 show that SMT/SMD are competitive with and can even outperform existing methods.

* 27 pages, 9 figures

Via

Access Paper or Ask Questions

A Unified View on Learning Unnormalized Distributions via Noise-Contrastive Estimation

Sep 26, 2024

J. Jon Ryu, Abhin Shah, Gregory W. Wornell

Abstract:This paper studies a family of estimators based on noise-contrastive estimation (NCE) for learning unnormalized distributions. The main contribution of this work is to provide a unified perspective on various methods for learning unnormalized distributions, which have been independently proposed and studied in separate research communities, through the lens of NCE. This unified view offers new insights into existing estimators. Specifically, for exponential families, we establish the finite-sample convergence rates of the proposed estimators under a set of regularity assumptions, most of which are new.

* 35 pages

Via

Access Paper or Ask Questions

Improved Evidential Deep Learning via a Mixture of Dirichlet Distributions

Feb 09, 2024

J. Jon Ryu, Maohao Shen, Soumya Ghosh, Yuheng Bu, Prasanna Sattigeri, Subhro Das, Gregory W. Wornell

Abstract:This paper explores a modern predictive uncertainty estimation approach, called evidential deep learning (EDL), in which a single neural network model is trained to learn a meta distribution over the predictive distribution by minimizing a specific objective function. Despite their strong empirical performance, recent studies by Bengs et al. identify a fundamental pitfall of the existing methods: the learned epistemic uncertainty may not vanish even in the infinite-sample limit. We corroborate the observation by providing a unifying view of a class of widely used objectives from the literature. Our analysis reveals that the EDL methods essentially train a meta distribution by minimizing a certain divergence measure between the distribution and a sample-size-independent target distribution, resulting in spurious epistemic uncertainty. Grounded in theoretical principles, we propose learning a consistent target distribution by modeling it with a mixture of Dirichlet distributions and learning via variational inference. Afterward, a final meta distribution model distills the learned uncertainty from the target model. Experimental results across various uncertainty-based downstream tasks demonstrate the superiority of our proposed method, and illustrate the practical implications arising from the consistency and inconsistency of learned epistemic uncertainty.

* 18 pages, 5 figures

Via

Access Paper or Ask Questions

Operator SVD with Neural Networks via Nested Low-Rank Approximation

Feb 06, 2024

J. Jon Ryu, Xiangxiang Xu, H. S. Melihcan Erol, Yuheng Bu, Lizhong Zheng, Gregory W. Wornell

Figure 1 for Operator SVD with Neural Networks via Nested Low-Rank Approximation

Figure 2 for Operator SVD with Neural Networks via Nested Low-Rank Approximation

Figure 3 for Operator SVD with Neural Networks via Nested Low-Rank Approximation

Figure 4 for Operator SVD with Neural Networks via Nested Low-Rank Approximation

Abstract:Computing eigenvalue decomposition (EVD) of a given linear operator, or finding its leading eigenvalues and eigenfunctions, is a fundamental task in many machine learning and scientific computing problems. For high-dimensional eigenvalue problems, training neural networks to parameterize the eigenfunctions is considered as a promising alternative to the classical numerical linear algebra techniques. This paper proposes a new optimization framework based on the low-rank approximation characterization of a truncated singular value decomposition, accompanied by new techniques called nesting for learning the top-$L$ singular values and singular functions in the correct order. The proposed method promotes the desired orthogonality in the learned functions implicitly and efficiently via an unconstrained optimization formulation, which is easy to solve with off-the-shelf gradient-based optimization algorithms. We demonstrate the effectiveness of the proposed optimization framework for use cases in computational physics and machine learning.

* 44 pages, 7 figures

Via

Access Paper or Ask Questions

One-Nearest-Neighbor Search is All You Need for Minimax Optimal Regression and Classification

Feb 05, 2022

J. Jon Ryu, Young-Han Kim

Figure 1 for One-Nearest-Neighbor Search is All You Need for Minimax Optimal Regression and Classification

Figure 2 for One-Nearest-Neighbor Search is All You Need for Minimax Optimal Regression and Classification

Figure 3 for One-Nearest-Neighbor Search is All You Need for Minimax Optimal Regression and Classification

Figure 4 for One-Nearest-Neighbor Search is All You Need for Minimax Optimal Regression and Classification

Abstract:Recently, Qiao, Duan, and Cheng~(2019) proposed a distributed nearest-neighbor classification method, in which a massive dataset is split into smaller groups, each processed with a $k$-nearest-neighbor classifier, and the final class label is predicted by a majority vote among these groupwise class labels. This paper shows that the distributed algorithm with $k=1$ over a sufficiently large number of groups attains a minimax optimal error rate up to a multiplicative logarithmic factor under some regularity conditions, for both regression and classification problems. Roughly speaking, distributed 1-nearest-neighbor rules with $M$ groups has a performance comparable to standard $\Theta(M)$-nearest-neighbor rules. In the analysis, alternative rules with a refined aggregation method are proposed and shown to attain exact minimax optimal rates.

* 25 pages, 2 figures

Via

Access Paper or Ask Questions

Parameter-free Online Linear Optimization with Side Information via Universal Coin Betting

Feb 04, 2022

J. Jon Ryu, Alankrita Bhatt, Young-Han Kim

Figure 1 for Parameter-free Online Linear Optimization with Side Information via Universal Coin Betting

Abstract:A class of parameter-free online linear optimization algorithms is proposed that harnesses the structure of an adversarial sequence by adapting to some side information. These algorithms combine the reduction technique of Orabona and P{\'a}l (2016) for adapting coin betting algorithms for online linear optimization with universal compression techniques in information theory for incorporating sequential side information to coin betting. Concrete examples are studied in which the side information has a tree structure and consists of quantized values of the previous symbols of the adversarial sequence, including fixed-order and variable-order Markov cases. By modifying the context-tree weighting technique of Willems, Shtarkov, and Tjalkens (1995), the proposed algorithm is further refined to achieve the best performance over all adaptive algorithms with tree-structured side information of a given maximum order in a computationally efficient manner.

* 23 pages, 5 figures, to appear at AISTATS 2022

Via

Access Paper or Ask Questions

Feedback Recurrent AutoEncoder

Nov 11, 2019

Yang Yang, Guillaume Sautière, J. Jon Ryu, Taco S Cohen

Figure 1 for Feedback Recurrent AutoEncoder

Figure 2 for Feedback Recurrent AutoEncoder

Figure 3 for Feedback Recurrent AutoEncoder

Figure 4 for Feedback Recurrent AutoEncoder

Abstract:In this work, we propose a new recurrent autoencoder architecture, termed Feedback Recurrent AutoEncoder (FRAE), for online compression of sequential data with temporal dependency. The recurrent structure of FRAE is designed to efficiently extract the redundancy along the time dimension and allows a compact discrete representation of the data to be learned. We demonstrate its effectiveness in speech spectrogram compression. Specifically, we show that the FRAE, paired with a powerful neural vocoder, can produce high-quality speech waveforms at a low, fixed bitrate. We further show that by adding a learned prior for the latent space and using an entropy coder, we can achieve an even lower variable bitrate.

Via

Access Paper or Ask Questions

Wyner VAE: Joint and Conditional Generation with Succinct Common Representation Learning

May 27, 2019

J. Jon Ryu, Yoojin Choi, Young-Han Kim, Mostafa El-Khamy, Jungwon Lee

Figure 1 for Wyner VAE: Joint and Conditional Generation with Succinct Common Representation Learning

Figure 2 for Wyner VAE: Joint and Conditional Generation with Succinct Common Representation Learning

Figure 3 for Wyner VAE: Joint and Conditional Generation with Succinct Common Representation Learning

Figure 4 for Wyner VAE: Joint and Conditional Generation with Succinct Common Representation Learning

Abstract:A new variational autoencoder (VAE) model is proposed that learns a succinct common representation of two correlated data variables for conditional and joint generation tasks. The proposed Wyner VAE model is based on two information theoretic problems---distributed simulation and channel synthesis---in which Wyner's common information arises as the fundamental limit of the succinctness of the common representation. The Wyner VAE decomposes a pair of correlated data variables into their common representation (e.g., a shared concept) and local representations that capture the remaining randomness (e.g., texture and style) in respective data variables by imposing the mutual information between the data variables and the common representation as a regularization term. The utility of the proposed approach is demonstrated through experiments for joint and conditional generation with and without style control using synthetic data and real images. Experimental results show that learning a succinct common representation achieves better generative performance and that the proposed model outperforms existing VAE variants and the variational information bottleneck method.

* 24 pages, 18 figures

Via

Access Paper or Ask Questions