Abstract:We introduce a multimodal deep learning framework, Prescriptive Neural Networks (PNNs), that combines ideas from optimization and machine learning, and is, to the best of our knowledge, the first prescriptive method to handle multimodal data. The PNN is a feedforward neural network trained on embeddings to output an outcome-optimizing prescription. In two real-world multimodal datasets, we demonstrate that PNNs prescribe treatments that are able to significantly improve estimated outcomes in transcatheter aortic valve replacement (TAVR) procedures by reducing estimated postoperative complication rates by 32% and in liver trauma injuries by reducing estimated mortality rates by over 40%. In four real-world, unimodal tabular datasets, we demonstrate that PNNs outperform or perform comparably to other well-known, state-of-the-art prescriptive models; importantly, on tabular datasets, we also recover interpretability through knowledge distillation, fitting interpretable Optimal Classification Tree models onto the PNN prescriptions as classification targets, which is critical for many real-world applications. Finally, we demonstrate that our multimodal PNN models achieve stability across randomized data splits comparable to other prescriptive methods and produce realistic prescriptions across the different datasets.
Abstract:Decision Trees have remained a popular machine learning method for tabular datasets, mainly due to their interpretability. However, they lack the expressiveness needed to handle highly nonlinear or unstructured datasets. Motivated by recent advances in tree-based machine learning (ML) techniques and first-order optimization methods, we introduce Generalized Soft Trees (GSTs), which extend soft decision trees (STs) and are capable of processing images directly. We demonstrate their advantages with respect to tractability, performance, and interpretability. We develop a tractable approach to growing GSTs, given by the DeepTree algorithm, which, in addition to new regularization terms, produces high-quality models with far fewer nodes and greater interpretability than traditional soft trees. We test the performance of our GSTs on benchmark tabular and image datasets, including MIMIC-IV, MNIST, Fashion MNIST, CIFAR-10 and Celeb-A. We show that our approach outperforms other popular tree methods (CART, Random Forests, XGBoost) in almost all of the datasets, with Convolutional Trees having a significant edge in the hardest CIFAR-10 and Fashion MNIST datasets. Finally, we explore the interpretability of our GSTs and find that even the most complex GSTs are considerably more interpretable than deep neural networks. Overall, our approach of Generalized Soft Trees provides a tractable method that is high-performing on (un)structured datasets and preserves interpretability more than traditional deep learning methods.
Abstract:Random Forests have been one of the most popular bagging methods in the past few decades, especially due to their success at handling tabular datasets. They have been extensively studied and compared to boosting models, like XGBoost, which are generally considered more performant. Random Forests adopt several simplistic assumptions, such that all samples and all trees that form the forest are equally important for building the final model. We introduce Enhanced Random Forests, an extension of vanilla Random Forests with extra functionalities and adaptive sample and model weighting. We develop an iterative algorithm for adapting the training sample weights, by favoring the hardest examples, and an approach for finding personalized tree weighting schemes for each new sample. Our method significantly improves upon regular Random Forests across 15 different binary classification datasets and considerably outperforms other tree methods, including XGBoost, when run with default hyperparameters, which indicates the robustness of our approach across datasets, without the need for extensive hyperparameter tuning. Our tree-weighting methodology results in enhanced or comparable performance to the uniformly weighted ensemble, and is, more importantly, leveraged to define importance scores for trees based on their contributions to classifying each new sample. This enables us to only focus on a small number of trees as the main models that define the outcome of a new sample and, thus, to partially recover interpretability, which is critically missing from both bagging and boosting methods. In binary classification problems, the proposed extensions and the corresponding results suggest the equivalence of bagging and boosting methods in performance, and the edge of bagging in interpretability by leveraging a few learners of the ensemble, which is not an option in the less explainable boosting methods.