Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

James G. Scott

Conditional diffusions for neural posterior estimation

Oct 24, 2024

Tianyu Chen, Vansh Bansal, James G. Scott

Abstract:Neural posterior estimation (NPE), a simulation-based computational approach for Bayesian inference, has shown great success in situations where posteriors are intractable or likelihood functions are treated as "black boxes." Existing NPE methods typically rely on normalizing flows, which transform a base distributions into a complex posterior by composing many simple, invertible transformations. But flow-based models, while state of the art for NPE, are known to suffer from several limitations, including training instability and sharp trade-offs between representational power and computational cost. In this work, we demonstrate the effectiveness of conditional diffusions as an alternative to normalizing flows for NPE. Conditional diffusions address many of the challenges faced by flow-based methods. Our results show that, across a highly varied suite of benchmarking problems for NPE architectures, diffusions offer improved stability, superior accuracy, and faster training times, even with simpler, shallower models. These gains persist across a variety of different encoder or "summary network" architectures, as well as in situations where no summary network is required. The code will be publicly available at \url{https://github.com/TianyuCodings/cDiff}.

Via

Access Paper or Ask Questions

Interpretable Low-Dimensional Regression via Data-Adaptive Smoothing

Aug 06, 2017

Wesley Tansey, Jesse Thomason, James G. Scott

Figure 1 for Interpretable Low-Dimensional Regression via Data-Adaptive Smoothing

Figure 2 for Interpretable Low-Dimensional Regression via Data-Adaptive Smoothing

Abstract:We consider the problem of estimating a regression function in the common situation where the number of features is small, where interpretability of the model is a high priority, and where simple linear or additive models fail to provide adequate performance. To address this problem, we present Maximum Variance Total Variation denoising (MVTV), an approach that is conceptually related both to CART and to the more recent CRISP algorithm, a state-of-the-art alternative method for interpretable nonlinear regression. MVTV divides the feature space into blocks of constant value and fits the value of all blocks jointly via a convex optimization routine. Our method is fully data-adaptive, in that it incorporates highly robust routines for tuning all hyperparameters automatically. We compare our approach against CART and CRISP via both a complexity-accuracy tradeoff metric and a human study, demonstrating that that MVTV is a more powerful and interpretable method.

* 4 pages, 1 figure presented at 2017 ICML Workshop on Human Interpretability in Machine Learning (WHI 2017), Sydney, NSW, Australia

Via

Access Paper or Ask Questions

Deep Nonparametric Estimation of Discrete Conditional Distributions via Smoothed Dyadic Partitioning

Feb 28, 2017

Wesley Tansey, Karl Pichotta, James G. Scott

Figure 1 for Deep Nonparametric Estimation of Discrete Conditional Distributions via Smoothed Dyadic Partitioning

Figure 2 for Deep Nonparametric Estimation of Discrete Conditional Distributions via Smoothed Dyadic Partitioning

Figure 3 for Deep Nonparametric Estimation of Discrete Conditional Distributions via Smoothed Dyadic Partitioning

Figure 4 for Deep Nonparametric Estimation of Discrete Conditional Distributions via Smoothed Dyadic Partitioning

Abstract:We present an approach to deep estimation of discrete conditional probability distributions. Such models have several applications, including generative modeling of audio, image, and video data. Our approach combines two main techniques: dyadic partitioning and graph-based smoothing of the discrete space. By recursively decomposing each dimension into a series of binary splits and smoothing over the resulting distribution using graph-based trend filtering, we impose a strict structure to the model and achieve much higher sample efficiency. We demonstrate the advantages of our model through a series of benchmarks on both synthetic and real-world datasets, in some cases reducing the error by nearly half in comparison to other popular methods in the literature. All of our models are implemented in Tensorflow and publicly available at https://github.com/tansey/sdp .

Via

Access Paper or Ask Questions

GapTV: Accurate and Interpretable Low-Dimensional Regression and Classification

Feb 23, 2017

Wesley Tansey, James G. Scott

Figure 1 for GapTV: Accurate and Interpretable Low-Dimensional Regression and Classification

Figure 2 for GapTV: Accurate and Interpretable Low-Dimensional Regression and Classification

Figure 3 for GapTV: Accurate and Interpretable Low-Dimensional Regression and Classification

Figure 4 for GapTV: Accurate and Interpretable Low-Dimensional Regression and Classification

Abstract:We consider the problem of estimating a regression function in the common situation where the number of features is small, where interpretability of the model is a high priority, and where simple linear or additive models fail to provide adequate performance. To address this problem, we present GapTV, an approach that is conceptually related both to CART and to the more recent CRISP algorithm, a state-of-the-art alternative method for interpretable nonlinear regression. GapTV divides the feature space into blocks of constant value and fits the value of all blocks jointly via a convex optimization routine. Our method is fully data-adaptive, in that it incorporates highly robust routines for tuning all hyperparameters automatically. We compare our approach against CART and CRISP and demonstrate that GapTV finds a much better trade-off between accuracy and interpretability.

Via

Access Paper or Ask Questions

Diet2Vec: Multi-scale analysis of massive dietary data

Dec 01, 2016

Wesley Tansey, Edward W. Lowe Jr., James G. Scott

Figure 1 for Diet2Vec: Multi-scale analysis of massive dietary data

Figure 2 for Diet2Vec: Multi-scale analysis of massive dietary data

Figure 3 for Diet2Vec: Multi-scale analysis of massive dietary data

Figure 4 for Diet2Vec: Multi-scale analysis of massive dietary data

Abstract:Smart phone apps that enable users to easily track their diets have become widespread in the last decade. This has created an opportunity to discover new insights into obesity and weight loss by analyzing the eating habits of the users of such apps. In this paper, we present diet2vec: an approach to modeling latent structure in a massive database of electronic diet journals. Through an iterative contract-and-expand process, our model learns real-valued embeddings of users' diets, as well as embeddings for individual foods and meals. We demonstrate the effectiveness of our approach on a real dataset of 55K users of the popular diet-tracking app LoseIt\footnote{http://www.loseit.com/}. To the best of our knowledge, this is the largest fine-grained diet tracking study in the history of nutrition and obesity research. Our results suggest that diet2vec finds interpretable results at all levels, discovering intuitive representations of foods, meals, and diets.

* Accepted to the NIPS 2016 Workshop on Machine Learning for Health

Via

Access Paper or Ask Questions

Better Conditional Density Estimation for Neural Networks

Jun 07, 2016

Wesley Tansey, Karl Pichotta, James G. Scott

Figure 1 for Better Conditional Density Estimation for Neural Networks

Figure 2 for Better Conditional Density Estimation for Neural Networks

Figure 3 for Better Conditional Density Estimation for Neural Networks

Figure 4 for Better Conditional Density Estimation for Neural Networks

Abstract:The vast majority of the neural network literature focuses on predicting point values for a given set of response variables, conditioned on a feature vector. In many cases we need to model the full joint conditional distribution over the response variables rather than simply making point predictions. In this paper, we present two novel approaches to such conditional density estimation (CDE): Multiscale Nets (MSNs) and CDE Trend Filtering. Multiscale nets transform the CDE regression task into a hierarchical classification task by decomposing the density into a series of half-spaces and learning boolean probabilities of each split. CDE Trend Filtering applies a k-th order graph trend filtering penalty to the unnormalized logits of a multinomial classifier network, with each edge in the graph corresponding to a neighboring point on a discretized version of the density. We compare both methods against plain multinomial classifier networks and mixture density networks (MDNs) on a simulated dataset and three real-world datasets. The results suggest the two methods are complementary: MSNs work well in a high-data-per-feature regime and CDE-TF is well suited for few-samples-per-feature scenarios where overfitting is a primary concern.

* 12 pages, 3 figures, code available soon

Via

Access Paper or Ask Questions

Tensor decomposition with generalized lasso penalties

May 13, 2016

Oscar Hernan Madrid Padilla, James G. Scott

Figure 1 for Tensor decomposition with generalized lasso penalties

Figure 2 for Tensor decomposition with generalized lasso penalties

Figure 3 for Tensor decomposition with generalized lasso penalties

Figure 4 for Tensor decomposition with generalized lasso penalties

Abstract:We present an approach for penalized tensor decomposition (PTD) that estimates smoothly varying latent factors in multi-way data. This generalizes existing work on sparse tensor decomposition and penalized matrix decompositions, in a manner parallel to the generalized lasso for regression and smoothing problems. Our approach presents many nontrivial challenges at the intersection of modeling and computation, which are studied in detail. An efficient coordinate-wise optimization algorithm for (PTD) is presented, and its convergence properties are characterized. The method is applied both to simulated data and real data on flu hospitalizations in Texas. These results show that our penalized tensor decomposition can offer major improvements on existing methods for analyzing multi-way data that exhibit smooth spatial or temporal features.

Via

Access Paper or Ask Questions

Priors for Random Count Matrices Derived from a Family of Negative Binomial Processes

Jul 13, 2015

Mingyuan Zhou, Oscar Hernan Madrid Padilla, James G. Scott

Figure 1 for Priors for Random Count Matrices Derived from a Family of Negative Binomial Processes

Figure 2 for Priors for Random Count Matrices Derived from a Family of Negative Binomial Processes

Figure 3 for Priors for Random Count Matrices Derived from a Family of Negative Binomial Processes

Figure 4 for Priors for Random Count Matrices Derived from a Family of Negative Binomial Processes

Abstract:We define a family of probability distributions for random count matrices with a potentially unbounded number of rows and columns. The three distributions we consider are derived from the gamma-Poisson, gamma-negative binomial, and beta-negative binomial processes. Because the models lead to closed-form Gibbs sampling update equations, they are natural candidates for nonparametric Bayesian priors over count matrices. A key aspect of our analysis is the recognition that, although the random count matrices within the family are defined by a row-wise construction, their columns can be shown to be i.i.d. This fact is used to derive explicit formulas for drawing all the columns at once. Moreover, by analyzing these matrices' combinatorial structure, we describe how to sequentially construct a column-i.i.d. random count matrix one row at a time, and derive the predictive distribution of a new row count vector with previously unseen features. We describe the similarities and differences between the three priors, and argue that the greater flexibility of the gamma- and beta- negative binomial processes, especially their ability to model over-dispersed, heavy-tailed count data, makes these well suited to a wide variety of real-world applications. As an example of our framework, we construct a naive-Bayes text classifier to categorize a count vector to one of several existing random count matrices of different categories. The classifier supports an unbounded number of features, and unlike most existing methods, it does not require a predefined finite vocabulary to be shared by all the categories, and needs neither feature selection nor parameter tuning. Both the gamma- and beta- negative binomial processes are shown to significantly outperform the gamma-Poisson process for document categorization, with comparable performance to other state-of-the-art supervised text classification algorithms.

* To appear in Journal of the American Statistical Association (Theory and Methods). 31 pages + 11 page supplement, 5 figures

Via

Access Paper or Ask Questions

A Fast and Flexible Algorithm for the Graph-Fused Lasso

Jun 01, 2015

Wesley Tansey, James G. Scott

Figure 1 for A Fast and Flexible Algorithm for the Graph-Fused Lasso

Figure 2 for A Fast and Flexible Algorithm for the Graph-Fused Lasso

Figure 3 for A Fast and Flexible Algorithm for the Graph-Fused Lasso

Figure 4 for A Fast and Flexible Algorithm for the Graph-Fused Lasso

Abstract:We propose a new algorithm for solving the graph-fused lasso (GFL), a method for parameter estimation that operates under the assumption that the signal tends to be locally constant over a predefined graph structure. Our key insight is to decompose the graph into a set of trails which can then each be solved efficiently using techniques for the ordinary (1D) fused lasso. We leverage these trails in a proximal algorithm that alternates between closed form primal updates and fast dual trail updates. The resulting techinque is both faster than previous GFL methods and more flexible in the choice of loss function and graph structure. Furthermore, we present two algorithms for constructing trail sets and show empirically that they offer a tradeoff between preprocessing time and convergence rate.

* 16 pages, 6 figures

Via

Access Paper or Ask Questions

Proximal Algorithms in Statistics and Machine Learning

May 30, 2015

Nicholas G. Polson, James G. Scott, Brandon T. Willard

Figure 1 for Proximal Algorithms in Statistics and Machine Learning

Figure 2 for Proximal Algorithms in Statistics and Machine Learning

Figure 3 for Proximal Algorithms in Statistics and Machine Learning

Figure 4 for Proximal Algorithms in Statistics and Machine Learning

Abstract:In this paper we develop proximal methods for statistical learning. Proximal point algorithms are useful in statistics and machine learning for obtaining optimization solutions for composite functions. Our approach exploits closed-form solutions of proximal operators and envelope representations based on the Moreau, Forward-Backward, Douglas-Rachford and Half-Quadratic envelopes. Envelope representations lead to novel proximal algorithms for statistical optimisation of composite objective functions which include both non-smooth and non-convex objectives. We illustrate our methodology with regularized Logistic and Poisson regression and non-convex bridge penalties with a fused lasso norm. We provide a discussion of convergence of non-descent algorithms with acceleration and for non-convex functions. Finally, we provide directions for future research.

Via

Access Paper or Ask Questions