Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:A cautionary tale on fitting decision trees to data from additive models: generalization lower bounds

Oct 18, 2021

Yan Shuo Tan, Abhineet Agarwal, Bin Yu

Figure 1 for A cautionary tale on fitting decision trees to data from additive models: generalization lower bounds

Figure 2 for A cautionary tale on fitting decision trees to data from additive models: generalization lower bounds

Figure 3 for A cautionary tale on fitting decision trees to data from additive models: generalization lower bounds

Share this with someone who'll enjoy it:

Abstract:Decision trees are important both as interpretable models amenable to high-stakes decision-making, and as building blocks of ensemble methods such as random forests and gradient boosting. Their statistical properties, however, are not well understood. The most cited prior works have focused on deriving pointwise consistency guarantees for CART in a classical nonparametric regression setting. We take a different approach, and advocate studying the generalization performance of decision trees with respect to different generative regression models. This allows us to elicit their inductive bias, that is, the assumptions the algorithms make (or do not make) to generalize to new data, thereby guiding practitioners on when and how to apply these methods. In this paper, we focus on sparse additive generative models, which have both low statistical complexity and some nonparametric flexibility. We prove a sharp squared error generalization lower bound for a large class of decision tree algorithms fitted to sparse additive models with $C^1$ component functions. This bound is surprisingly much worse than the minimax rate for estimating such sparse additive models. The inefficiency is due not to greediness, but to the loss in power for detecting global structure when we average responses solely over each leaf, an observation that suggests opportunities to improve tree-based algorithms, for example, by hierarchical shrinkage. To prove these bounds, we develop new technical machinery, establishing a novel connection between decision tree estimation and rate-distortion theory, a sub-field of information theory.

View paper on

Share this with someone who'll enjoy it:

Title:A cautionary tale on fitting decision trees to data from additive models: generalization lower bounds

Paper and Code