Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Adapting and Evaluating Influence-Estimation Methods for Gradient-Boosted Decision Trees

Apr 30, 2022

Jonathan Brophy, Zayd Hammoudeh, Daniel Lowd

Figure 1 for Adapting and Evaluating Influence-Estimation Methods for Gradient-Boosted Decision Trees

Figure 2 for Adapting and Evaluating Influence-Estimation Methods for Gradient-Boosted Decision Trees

Figure 3 for Adapting and Evaluating Influence-Estimation Methods for Gradient-Boosted Decision Trees

Figure 4 for Adapting and Evaluating Influence-Estimation Methods for Gradient-Boosted Decision Trees

Share this with someone who'll enjoy it:

Abstract:Influence estimation analyzes how changes to the training data can lead to different model predictions; this analysis can help us better understand these predictions, the models making those predictions, and the data sets they're trained on. However, most influence-estimation techniques are designed for deep learning models with continuous parameters. Gradient-boosted decision trees (GBDTs) are a powerful and widely-used class of models; however, these models are black boxes with opaque decision-making processes. In the pursuit of better understanding GBDT predictions and generally improving these models, we adapt recent and popular influence-estimation methods designed for deep learning models to GBDTs. Specifically, we adapt representer-point methods and TracIn, denoting our new methods TREX and BoostIn, respectively; source code is available at https://github.com/jjbrophy47/tree_influence. We compare these methods to LeafInfluence and other baselines using 5 different evaluation measures on 22 real-world data sets with 4 popular GBDT implementations. These experiments give us a comprehensive overview of how different approaches to influence estimation work in GBDT models. We find BoostIn is an efficient influence-estimation method for GBDTs that performs equally well or better than existing work while being four orders of magnitude faster. Our evaluation also suggests the gold-standard approach of leave-one-out~(LOO) retraining consistently identifies the single-most influential training example but performs poorly at finding the most influential set of training examples for a given target prediction.

* 47 pages, 15 figures, and 5 tables. Submitted to JMLR

View paper on

Share this with someone who'll enjoy it:

Title:Adapting and Evaluating Influence-Estimation Methods for Gradient-Boosted Decision Trees

Paper and Code