Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Martin T. Wells

Boltzmann convolutions and Welford mean-variance layers with an application to time series forecasting and classification

Mar 06, 2025

Daniel Andrew Coulson, Martin T. Wells

Abstract:In this paper we propose a novel problem called the ForeClassing problem where the loss of a classification decision is only observed at a future time point after the classification decision has to be made. To solve this problem, we propose an approximately Bayesian deep neural network architecture called ForeClassNet for time series forecasting and classification. This network architecture forces the network to consider possible future realizations of the time series, by forecasting future time points and their likelihood of occurring, before making its final classification decision. To facilitate this, we introduce two novel neural network layers, Welford mean-variance layers and Boltzmann convolutional layers. Welford mean-variance layers allow networks to iteratively update their estimates of the mean and variance for the forecasted time points for each inputted time series to the network through successive forward passes, which the model can then consider in combination with a learned representation of the observed realizations of the time series for its classification decision. Boltzmann convolutional layers are linear combinations of approximately Bayesian convolutional layers with different filter lengths, allowing the model to learn multitemporal resolution representations of the input time series, and which resolutions to focus on within a given Boltzmann convolutional layer through a Boltzmann distribution. Through several simulation scenarios and two real world applications we demonstrate ForeClassNet achieves superior performance compared with current state of the art methods including a near 30% improvement in test set accuracy in our financial example compared to the second best performing model.

* 40 pages, 7 figures, 11 tables

Via

Access Paper or Ask Questions

Supervised Similarity for High-Yield Corporate Bonds with Quantum Cognition Machine Learning

Feb 03, 2025

Joshua Rosaler, Luca Candelori, Vahagn Kirakosyan, Kharen Musaelian, Ryan Samson, Martin T. Wells, Dhagash Mehta, Stefano Pasquali

Abstract:We investigate the application of quantum cognition machine learning (QCML), a novel paradigm for both supervised and unsupervised learning tasks rooted in the mathematical formalism of quantum theory, to distance metric learning in corporate bond markets. Compared to equities, corporate bonds are relatively illiquid and both trade and quote data in these securities are relatively sparse. Thus, a measure of distance/similarity among corporate bonds is particularly useful for a variety of practical applications in the trading of illiquid bonds, including the identification of similar tradable alternatives, pricing securities with relatively few recent quotes or trades, and explaining the predictions and performance of ML models based on their training data. Previous research has explored supervised similarity learning based on classical tree-based models in this context; here, we explore the application of the QCML paradigm for supervised distance metric learning in the same context, showing that it outperforms classical tree-based models in high-yield (HY) markets, while giving comparable or better performance (depending on the evaluation metric) in investment grade (IG) markets.

Via

Access Paper or Ask Questions

How Aligned are Generative Models to Humans in High-Stakes Decision-Making?

Oct 20, 2024

Sarah Tan, Keri Mallari, Julius Adebayo, Albert Gordo, Martin T. Wells, Kori Inkpen

Figure 1 for How Aligned are Generative Models to Humans in High-Stakes Decision-Making?

Figure 2 for How Aligned are Generative Models to Humans in High-Stakes Decision-Making?

Figure 3 for How Aligned are Generative Models to Humans in High-Stakes Decision-Making?

Figure 4 for How Aligned are Generative Models to Humans in High-Stakes Decision-Making?

Abstract:Large generative models (LMs) are increasingly being considered for high-stakes decision-making. This work considers how such models compare to humans and predictive AI models on a specific case of recidivism prediction. We combine three datasets -- COMPAS predictive AI risk scores, human recidivism judgements, and photos -- into a dataset on which we study the properties of several state-of-the-art, multimodal LMs. Beyond accuracy and bias, we focus on studying human-LM alignment on the task of recidivism prediction. We investigate if these models can be steered towards human decisions, the impact of adding photos, and whether anti-discimination prompting is effective. We find that LMs can be steered to outperform humans and COMPAS using in context-learning. We find anti-discrimination prompting to have unintended effects, causing some models to inhibit themselves and significantly reduce their number of positive predictions.

Via

Access Paper or Ask Questions

Bellwether Trades: Characteristics of Trades influential in Predicting Future Price Movements in Markets

Sep 08, 2024

Tejas Ramdas, Martin T. Wells

Figure 1 for Bellwether Trades: Characteristics of Trades influential in Predicting Future Price Movements in Markets

Figure 2 for Bellwether Trades: Characteristics of Trades influential in Predicting Future Price Movements in Markets

Figure 3 for Bellwether Trades: Characteristics of Trades influential in Predicting Future Price Movements in Markets

Figure 4 for Bellwether Trades: Characteristics of Trades influential in Predicting Future Price Movements in Markets

Abstract:In this study, we leverage powerful non-linear machine learning methods to identify the characteristics of trades that contain valuable information. First, we demonstrate the effectiveness of our optimized neural network predictor in accurately predicting future market movements. Then, we utilize the information from this successful neural network predictor to pinpoint the individual trades within each data point (trading window) that had the most impact on the optimized neural network's prediction of future price movements. This approach helps us uncover important insights about the heterogeneity in information content provided by trades of different sizes, venues, trading contexts, and over time.

* 49 Pages

Via

Access Paper or Ask Questions

K-ARMA Models for Clustering Time Series Data

Jun 30, 2022

Derek O. Hoare, David S. Matteson, Martin T. Wells

Figure 1 for K-ARMA Models for Clustering Time Series Data

Figure 2 for K-ARMA Models for Clustering Time Series Data

Figure 3 for K-ARMA Models for Clustering Time Series Data

Figure 4 for K-ARMA Models for Clustering Time Series Data

Abstract:We present an approach to clustering time series data using a model-based generalization of the K-Means algorithm which we call K-Models. We prove the convergence of this general algorithm and relate it to the hard-EM algorithm for mixture modeling. We then apply our method first with an AR($p$) clustering example and show how the clustering algorithm can be made robust to outliers using a least-absolute deviations criteria. We then build our clustering algorithm up for ARMA($p,q$) models and extend this to ARIMA($p,d,q$) models. We develop a goodness of fit statistic for the models fitted to clusters based on the Ljung-Box statistic. We perform experiments with simulated data to show how the algorithm can be used for outlier detection, detecting distributional drift, and discuss the impact of initialization method on empty clusters. We also perform experiments on real data which show that our method is competitive with other existing methods for similar time series clustering tasks.

* 24 pages, 8 figures

Via

Access Paper or Ask Questions

Interpretable Latent Variables in Deep State Space Models

Mar 03, 2022

Haoxuan Wu, David S. Matteson, Martin T. Wells

Figure 1 for Interpretable Latent Variables in Deep State Space Models

Figure 2 for Interpretable Latent Variables in Deep State Space Models

Figure 3 for Interpretable Latent Variables in Deep State Space Models

Figure 4 for Interpretable Latent Variables in Deep State Space Models

Abstract:We introduce a new version of deep state-space models (DSSMs) that combines a recurrent neural network with a state-space framework to forecast time series data. The model estimates the observed series as functions of latent variables that evolve non-linearly through time. Due to the complexity and non-linearity inherent in DSSMs, previous works on DSSMs typically produced latent variables that are very difficult to interpret. Our paper focus on producing interpretable latent parameters with two key modifications. First, we simplify the predictive decoder by restricting the response variables to be a linear transformation of the latent variables plus some noise. Second, we utilize shrinkage priors on the latent variables to reduce redundancy and improve robustness. These changes make the latent variables much easier to understand and allow us to interpret the resulting latent variables as random effects in a linear mixed model. We show through two public benchmark datasets the resulting model improves forecasting performances.

Via

Access Paper or Ask Questions

Clustering Structure of Microstructure Measures

Jul 05, 2021

Liao Zhu, Ningning Sun, Martin T. Wells

Figure 1 for Clustering Structure of Microstructure Measures

Figure 2 for Clustering Structure of Microstructure Measures

Figure 3 for Clustering Structure of Microstructure Measures

Figure 4 for Clustering Structure of Microstructure Measures

Abstract:This paper builds the clustering model of measures of market microstructure features which are popular in predicting the stock returns. In a 10-second time frequency, we study the clustering structure of different measures to find out the best ones for predicting. In this way, we can predict more accurately with a limited number of predictors, which removes the noise and makes the model more interpretable.

Via

Access Paper or Ask Questions

A News-based Machine Learning Model for Adaptive Asset Pricing

Jun 13, 2021

Liao Zhu, Haoxuan Wu, Martin T. Wells

Figure 1 for A News-based Machine Learning Model for Adaptive Asset Pricing

Figure 2 for A News-based Machine Learning Model for Adaptive Asset Pricing

Figure 3 for A News-based Machine Learning Model for Adaptive Asset Pricing

Figure 4 for A News-based Machine Learning Model for Adaptive Asset Pricing

Abstract:The paper proposes a new asset pricing model -- the News Embedding UMAP Selection (NEUS) model, to explain and predict the stock returns based on the financial news. Using a combination of various machine learning algorithms, we first derive a company embedding vector for each basis asset from the financial news. Then we obtain a collection of the basis assets based on their company embedding. After that for each stock, we select the basis assets to explain and predict the stock return with high-dimensional statistical methods. The new model is shown to have a significantly better fitting and prediction power than the Fama-French 5-factor model.

Via

Access Paper or Ask Questions

Time-Invariance Coefficients Tests with the Adaptive Multi-Factor Model

Nov 09, 2020

Liao Zhu, Robert A. Jarrow, Martin T. Wells

Figure 1 for Time-Invariance Coefficients Tests with the Adaptive Multi-Factor Model

Figure 2 for Time-Invariance Coefficients Tests with the Adaptive Multi-Factor Model

Figure 3 for Time-Invariance Coefficients Tests with the Adaptive Multi-Factor Model

Figure 4 for Time-Invariance Coefficients Tests with the Adaptive Multi-Factor Model

Abstract:The purpose of this paper is to test the multi-factor beta model implied by the generalized arbitrage pricing theory (APT) and the Adaptive Multi-Factor (AMF) model with the Groupwise Interpretable Basis Selection (GIBS) algorithm, without imposing the exogenous assumption of constant betas. The intercept (arbitrage) tests validate both the AMF and the Fama-French 5-factor (FF5) model. We do the time-invariance tests for the betas for both the AMF model and the FF5 in various time periods. We show that for nearly all time periods with length less than 6 years, the beta coefficients are time-invariant for the AMF model, but not the FF5 model. The beta coefficients are time-varying for both AMF and FF5 models for longer time periods. Therefore, using the dynamic AMF model with a decent rolling window (such as 5 years) is more powerful and stable than the FF5 model.

Via

Access Paper or Ask Questions

Hierarchical Adaptive Lasso: Learning Sparse Neural Networks with Shrinkage via Single Stage Training

Aug 24, 2020

Skyler Seto, Martin T. Wells, Wenyu Zhang

Figure 1 for Hierarchical Adaptive Lasso: Learning Sparse Neural Networks with Shrinkage via Single Stage Training

Figure 2 for Hierarchical Adaptive Lasso: Learning Sparse Neural Networks with Shrinkage via Single Stage Training

Figure 3 for Hierarchical Adaptive Lasso: Learning Sparse Neural Networks with Shrinkage via Single Stage Training

Figure 4 for Hierarchical Adaptive Lasso: Learning Sparse Neural Networks with Shrinkage via Single Stage Training

Abstract:Deep neural networks achieve state-of-the-art performance in a variety of tasks, however this performance is closely tied to model size. Sparsity is one approach to limiting model size. Modern techniques for inducing sparsity in neural networks are (1) network pruning, a procedure involving iteratively training a model initialized with a previous run's weights and hard thresholding, (2) training in one-stage with a sparsity inducing penalty (usually based on the Lasso), and (3) training a binary mask jointly with the weights of the network. In this work, we study different sparsity inducing penalties from the perspective of Bayesian hierarchical models with the goal of designing penalties which perform well without retraining subnetworks in isolation. With this motivation, we present a novel penalty called Hierarchical Adaptive Lasso (HALO) which learns to adaptively sparsify weights of a given network via trainable parameters without learning a mask. When used to train over-parametrized networks, our penalty yields small subnetworks with high accuracy (winning tickets) even when the subnetworks are not trained in isolation. Empirically, on the CIFAR-100 dataset, we find that HALO is able to learn highly sparse network (only $5\%$ of the parameters) with approximately a $2\%$ and $4\%$ gain in performance over state-of-the-art magnitude pruning methods at the same level of sparsity.

Via

Access Paper or Ask Questions