Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Anupam Chaudhuri

Minimising changes to audit when updating decision trees

Aug 29, 2024

Anj Simmons, Scott Barnett, Anupam Chaudhuri, Sankhya Singh, Shangeetha Sivasothy

Abstract:Interpretable models are important, but what happens when the model is updated on new training data? We propose an algorithm for updating a decision tree while minimising the number of changes to the tree that a human would need to audit. We achieve this via a greedy approach that incorporates the number of changes to the tree as part of the objective function. We compare our algorithm to existing methods and show that it sits in a sweet spot between final accuracy and number of changes to audit.

* 12 pages

Via

Access Paper or Ask Questions

Quantifying Manifolds: Do the manifolds learned by Generative Adversarial Networks converge to the real data manifold

Mar 08, 2024

Anupam Chaudhuri, Anj Simmons, Mohamed Abdelrazek

Abstract:This paper presents our experiments to quantify the manifolds learned by ML models (in our experiment, we use a GAN model) as they train. We compare the manifolds learned at each epoch to the real manifolds representing the real data. To quantify a manifold, we study the intrinsic dimensions and topological features of the manifold learned by the ML model, how these metrics change as we continue to train the model, and whether these metrics convergence over the course of training to the metrics of the real data manifold.

* arXiv admin note: text overlap with arXiv:2311.13102

Via

Access Paper or Ask Questions

Detecting out-of-distribution text using topological features of transformer-based language models

Nov 22, 2023

Andres Pollano, Anupam Chaudhuri, Anj Simmons

Abstract:We attempt to detect out-of-distribution (OOD) text samples though applying Topological Data Analysis (TDA) to attention maps in transformer-based language models. We evaluate our proposed TDA-based approach for out-of-distribution detection on BERT, a transformer-based language model, and compare the to a more traditional OOD approach based on BERT CLS embeddings. We found that our TDA approach outperforms the CLS embedding approach at distinguishing in-distribution data (politics and entertainment news articles from HuffPost) from far out-of-domain samples (IMDB reviews), but its effectiveness deteriorates with near out-of-domain (CNN/Dailymail) or same-domain (business news articles from HuffPost) datasets.

* 12 pages, 6 figures, 3 tables

Via

Access Paper or Ask Questions