Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Brandon M. Greenwell

Explainable Boosting Machines with Sparsity -- Maintaining Explainability in High-Dimensional Settings

Nov 13, 2023

Brandon M. Greenwell, Annika Dahlmann, Saurabh Dhoble

Figure 1 for Explainable Boosting Machines with Sparsity -- Maintaining Explainability in High-Dimensional Settings

Figure 2 for Explainable Boosting Machines with Sparsity -- Maintaining Explainability in High-Dimensional Settings

Figure 3 for Explainable Boosting Machines with Sparsity -- Maintaining Explainability in High-Dimensional Settings

Abstract:Compared to "black-box" models, like random forests and deep neural networks, explainable boosting machines (EBMs) are considered "glass-box" models that can be competitively accurate while also maintaining a higher degree of transparency and explainability. However, EBMs become readily less transparent and harder to interpret in high-dimensional settings with many predictor variables; they also become more difficult to use in production due to increases in scoring time. We propose a simple solution based on the least absolute shrinkage and selection operator (LASSO) that can help introduce sparsity by reweighting the individual model terms and removing the less relevant ones, thereby allowing these models to maintain their transparency and relatively fast scoring times in higher-dimensional settings. In short, post-processing a fitted EBM with many (i.e., possibly hundreds or thousands) of terms using the LASSO can help reduce the model's complexity and drastically improve scoring time. We illustrate the basic idea using two real-world examples with code.

* 14 pages, 3 figures

Via

Access Paper or Ask Questions

A Simple and Effective Model-Based Variable Importance Measure

May 12, 2018

Brandon M. Greenwell, Bradley C. Boehmke, Andrew J. McCarthy

Figure 1 for A Simple and Effective Model-Based Variable Importance Measure

Figure 2 for A Simple and Effective Model-Based Variable Importance Measure

Figure 3 for A Simple and Effective Model-Based Variable Importance Measure

Figure 4 for A Simple and Effective Model-Based Variable Importance Measure

Abstract:In the era of "big data", it is becoming more of a challenge to not only build state-of-the-art predictive models, but also gain an understanding of what's really going on in the data. For example, it is often of interest to know which, if any, of the predictors in a fitted model are relatively influential on the predicted outcome. Some modern algorithms---like random forests and gradient boosted decision trees---have a natural way of quantifying the importance or relative influence of each feature. Other algorithms---like naive Bayes classifiers and support vector machines---are not capable of doing so and model-free approaches are generally used to measure each predictor's importance. In this paper, we propose a standardized, model-based approach to measuring predictor importance across the growing spectrum of supervised learning algorithms. Our proposed method is illustrated through both simulated and real data examples. The R code to reproduce all of the figures in this paper is available in the supplementary materials.

Via

Access Paper or Ask Questions