Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Nicholas G. Polson

Bayesian Deep ICE

Jun 24, 2024

Jyotishka Datta, Nicholas G. Polson

Abstract:Deep Independent Component Estimation (DICE) has many applications in modern day machine learning as a feature engineering extraction method. We provide a novel latent variable representation of independent component analysis that enables both point estimates via expectation-maximization (EM) and full posterior sampling via Markov Chain Monte Carlo (MCMC) algorithms. Our methodology also applies to flow-based methods for nonlinear feature extraction. We discuss how to implement conditional posteriors and envelope-based methods for optimization. Through this representation hierarchy, we unify a number of hitherto disjoint estimation procedures. We illustrate our methodology and algorithms on a numerical example. Finally, we conclude with directions for future research.

Via

Access Paper or Ask Questions

Deep Partial Least Squares for Empirical Asset Pricing

Jun 20, 2022

Matthew F. Dixon, Nicholas G. Polson, Kemen Goicoechea

Figure 1 for Deep Partial Least Squares for Empirical Asset Pricing

Figure 2 for Deep Partial Least Squares for Empirical Asset Pricing

Figure 3 for Deep Partial Least Squares for Empirical Asset Pricing

Figure 4 for Deep Partial Least Squares for Empirical Asset Pricing

Abstract:We use deep partial least squares (DPLS) to estimate an asset pricing model for individual stock returns that exploits conditioning information in a flexible and dynamic way while attributing excess returns to a small set of statistical risk factors. The novel contribution is to resolve the non-linear factor structure, thus advancing the current paradigm of deep learning in empirical asset pricing which uses linear stochastic discount factors under an assumption of Gaussian asset returns and factors. This non-linear factor structure is extracted by using projected least squares to jointly project firm characteristics and asset returns on to a subspace of latent factors and using deep learning to learn the non-linear map from the factor loadings to the asset returns. The result of capturing this non-linear risk factor structure is to characterize anomalies in asset returns by both linear risk factor exposure and interaction effects. Thus the well known ability of deep learning to capture outliers, shed lights on the role of convexity and higher order terms in the latent factor structure on the factor risk premia. On the empirical side, we implement our DPLS factor models and exhibit superior performance to LASSO and plain vanilla deep learning models. Furthermore, our network training times are significantly reduced due to the more parsimonious architecture of DPLS. Specifically, using 3290 assets in the Russell 1000 index over a period of December 1989 to January 2018, we assess our DPLS factor model and generate information ratios that are approximately 1.2x greater than deep learning. DPLS explains variation and pricing errors and identifies the most prominent latent factors and firm characteristics.

Via

Access Paper or Ask Questions

Bayesian Inference for Polya Inverse Gamma Models

May 29, 2019

Christopher Glynn, Jingyu He, Nicholas G. Polson, Jianeng Xu

Figure 1 for Bayesian Inference for Polya Inverse Gamma Models

Figure 2 for Bayesian Inference for Polya Inverse Gamma Models

Figure 3 for Bayesian Inference for Polya Inverse Gamma Models

Figure 4 for Bayesian Inference for Polya Inverse Gamma Models

Abstract:Probability density functions that include the gamma function are widely used in statistics and machine learning. The normalizing constants of gamma, inverse gamma, beta, and Dirichlet distributions all include model parameters as arguments in the gamma function; however, the gamma function does not naturally admit a conjugate prior distribution in a Bayesian analysis, and statistical inference of these parameters is a significant challenge. In this paper, we construct the Polya-inverse Gamma (P-IG) distribution as an infinite convolution of Generalized inverse Gaussian (GIG) distributions, and we represent the reciprocal gamma function as a scale mixture of normal distributions. As a result, the P-IG distribution yields an efficient data augmentation strategy for fully Bayesian inference on model parameters in gamma, inverse gamma, beta, and Dirichlet distributions. To illustrate the applied utility of our data augmentation strategy, we infer the proportion of overdose deaths in the United States attributed to different opioid and prescription drugs with a Dirichlet allocation model.

Via

Access Paper or Ask Questions

Horseshoe Regularization for Machine Learning in Complex and Deep Models

Apr 24, 2019

Anindya Bhadra, Jyotishka Datta, Yunfan Li, Nicholas G. Polson

Figure 1 for Horseshoe Regularization for Machine Learning in Complex and Deep Models

Figure 2 for Horseshoe Regularization for Machine Learning in Complex and Deep Models

Abstract:Since the advent of the horseshoe priors for regularization, global-local shrinkage methods have proved to be a fertile ground for the development of Bayesian methodology in machine learning, specifically for high-dimensional regression and classification problems. They have achieved remarkable success in computation, and enjoy strong theoretical support. Most of the existing literature has focused on the linear Gaussian case; see Bhadra et al. (2019) for a systematic survey. The purpose of the current article is to demonstrate that the horseshoe regularization is useful far more broadly, by reviewing both methodological and computational developments in complex models that are more relevant to machine learning applications. Specifically, we focus on methodological challenges in horseshoe regularization in nonlinear and non-Gaussian models; multivariate models; and deep neural networks. We also outline the recent computational developments in horseshoe shrinkage for complex models along with a list of available software implementations that allows one to venture out beyond the comfort zone of the canonical linear regression problems.

Via

Access Paper or Ask Questions

Scalable Data Augmentation for Deep Learning

Mar 22, 2019

Yuexi Wang, Nicholas G. Polson, Vadim O. Sokolov

Figure 1 for Scalable Data Augmentation for Deep Learning

Figure 2 for Scalable Data Augmentation for Deep Learning

Figure 3 for Scalable Data Augmentation for Deep Learning

Figure 4 for Scalable Data Augmentation for Deep Learning

Abstract:Scalable Data Augmentation (SDA) provides a framework for training deep learning models using auxiliary hidden layers. Scalable MCMC is available for network training and inference. SDA provides a number of computational advantages over traditional algorithms, such as avoiding backtracking, local modes and can perform optimization with stochastic gradient descent (SGD) in TensorFlow. Standard deep neural networks with logit, ReLU and SVM activation functions are straightforward to implement. To illustrate our architectures and methodology, we use P\'{o}lya-Gamma logit data augmentation for a number of standard datasets. Finally, we conclude with directions for future research.

Via

Access Paper or Ask Questions

Deep Fundamental Factor Models

Mar 18, 2019

Matthew F. Dixon, Nicholas G. Polson

Figure 1 for Deep Fundamental Factor Models

Figure 2 for Deep Fundamental Factor Models

Figure 3 for Deep Fundamental Factor Models

Figure 4 for Deep Fundamental Factor Models

Abstract:Deep fundamental factor models are developed to interpret and capture non-linearity, interaction effects and non-parametric shocks in financial econometrics. Uncertainty quantification provides interpretability with interval estimation, ranking of factor importances and estimation of interaction effects. Estimating factor realizations under either homoscedastic or heteroscedastic error is also available. With no hidden layers we recover a linear factor model and for one or more hidden layers, uncertainty bands for the sensitivity to each input naturally arise from the network weights. To illustrate our methodology, we construct a six-factor model of assets in the S\&P 500 index and generate information ratios that are three times greater than generalized linear regression. We show that the factor importances are materially different from the linear factor model when accounting for non-linearity. Finally, we conclude with directions for future research

Via

Access Paper or Ask Questions

Deep Learning

Aug 03, 2018

Nicholas G. Polson, Vadim O. Sokolov

Abstract:Deep learning (DL) is a high dimensional data reduction technique for constructing high-dimensional predictors in input-output models. DL is a form of machine learning that uses hierarchical layers of latent features. In this article, we review the state-of-the-art of deep learning from a modeling and algorithmic perspective. We provide a list of successful areas of applications in Artificial Intelligence (AI), Image Processing, Robotics and Automation. Deep learning is predictive in its nature rather then inferential and can be viewed as a black-box methodology for high-dimensional function estimation.

* arXiv admin note: text overlap with arXiv:1602.06561

Via

Access Paper or Ask Questions

Deep Learning for Spatio-Temporal Modeling: Dynamic Traffic Flows and High Frequency Trading

May 07, 2018

Matthew F. Dixon, Nicholas G. Polson, Vadim O. Sokolov

Figure 1 for Deep Learning for Spatio-Temporal Modeling: Dynamic Traffic Flows and High Frequency Trading

Figure 2 for Deep Learning for Spatio-Temporal Modeling: Dynamic Traffic Flows and High Frequency Trading

Figure 3 for Deep Learning for Spatio-Temporal Modeling: Dynamic Traffic Flows and High Frequency Trading

Figure 4 for Deep Learning for Spatio-Temporal Modeling: Dynamic Traffic Flows and High Frequency Trading

Abstract:Deep learning applies hierarchical layers of hidden variables to construct nonlinear high dimensional predictors. Our goal is to develop and train deep learning architectures for spatio-temporal modeling. Training a deep architecture is achieved by stochastic gradient descent (SGD) and drop-out (DO) for parameter regularization with a goal of minimizing out-of-sample predictive mean squared error. To illustrate our methodology, we predict the sharp discontinuities in traffic flow data, and secondly, we develop a classification rule to predict short-term futures market prices as a function of the order book depth. Finally, we conclude with directions for future research.

Via

Access Paper or Ask Questions

Deep Learning for Predicting Asset Returns

Apr 26, 2018

Guanhao Feng, Jingyu He, Nicholas G. Polson

Figure 1 for Deep Learning for Predicting Asset Returns

Figure 2 for Deep Learning for Predicting Asset Returns

Figure 3 for Deep Learning for Predicting Asset Returns

Figure 4 for Deep Learning for Predicting Asset Returns

Abstract:Deep learning searches for nonlinear factors for predicting asset returns. Predictability is achieved via multiple layers of composite factors as opposed to additive ones. Viewed in this way, asset pricing studies can be revisited using multi-layer deep learners, such as rectified linear units (ReLU) or long-short-term-memory (LSTM) for time-series effects. State-of-the-art algorithms including stochastic gradient descent (SGD), TensorFlow and dropout design provide imple- mentation and efficient factor exploration. To illustrate our methodology, we revisit the equity market risk premium dataset of Welch and Goyal (2008). We find the existence of nonlinear factors which explain predictability of returns, in particular at the extremes of the characteristic space. Finally, we conclude with directions for future research.

Via

Access Paper or Ask Questions

Horseshoe Regularization for Feature Subset Selection

Jun 22, 2017

Anindya Bhadra, Jyotishka Datta, Nicholas G. Polson, Brandon Willard

Figure 1 for Horseshoe Regularization for Feature Subset Selection

Figure 2 for Horseshoe Regularization for Feature Subset Selection

Figure 3 for Horseshoe Regularization for Feature Subset Selection

Figure 4 for Horseshoe Regularization for Feature Subset Selection

Abstract:Feature subset selection arises in many high-dimensional applications of statistics, such as compressed sensing and genomics. The $\ell_0$ penalty is ideal for this task, the caveat being it requires the NP-hard combinatorial evaluation of all models. A recent area of considerable interest is to develop efficient algorithms to fit models with a non-convex $\ell_\gamma$ penalty for $\gamma\in (0,1)$, which results in sparser models than the convex $\ell_1$ or lasso penalty, but is harder to fit. We propose an alternative, termed the horseshoe regularization penalty for feature subset selection, and demonstrate its theoretical and computational advantages. The distinguishing feature from existing non-convex optimization approaches is a full probabilistic representation of the penalty as the negative of the logarithm of a suitable prior, which in turn enables efficient expectation-maximization and local linear approximation algorithms for optimization and MCMC for uncertainty quantification. In synthetic and real data, the resulting algorithms provide better statistical performance, and the computation requires a fraction of time of state-of-the-art non-convex solvers.

Via

Access Paper or Ask Questions