Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Delyar Tabatabai

Beyond Value: CHECKLIST for Testing Inferences in Planning-Based RL

Jun 07, 2022

Kin-Ho Lam, Delyar Tabatabai, Jed Irvine, Donald Bertucci, Anita Ruangrotsakun, Minsuk Kahng, Alan Fern

Figure 1 for Beyond Value: CHECKLIST for Testing Inferences in Planning-Based RL

Figure 2 for Beyond Value: CHECKLIST for Testing Inferences in Planning-Based RL

Figure 3 for Beyond Value: CHECKLIST for Testing Inferences in Planning-Based RL

Figure 4 for Beyond Value: CHECKLIST for Testing Inferences in Planning-Based RL

Abstract:Reinforcement learning (RL) agents are commonly evaluated via their expected value over a distribution of test scenarios. Unfortunately, this evaluation approach provides limited evidence for post-deployment generalization beyond the test distribution. In this paper, we address this limitation by extending the recent CheckList testing methodology from natural language processing to planning-based RL. Specifically, we consider testing RL agents that make decisions via online tree search using a learned transition model and value function. The key idea is to improve the assessment of future performance via a CheckList approach for exploring and assessing the agent's inferences during tree search. The approach provides the user with an interface and general query-rule mechanism for identifying potential inference flaws and validating expected inference invariances. We present a user study involving knowledgeable AI researchers using the approach to evaluate an agent trained to play a complex real-time strategy game. The results show the approach is effective in allowing users to identify previously-unknown flaws in the agent's reasoning. In addition, our analysis provides insight into how AI experts use this type of testing approach, which may help improve future instantiations.

* This work will appear in the Proceedings of the 32nd International Conference on Automated Planning and Scheduling (ICAPS2022) https://icaps22.icaps-conference.org/papers

Via

Access Paper or Ask Questions

Visual Exploration of Large-Scale Image Datasets for Machine Learning with Treemaps

May 14, 2022

Donald Bertucci, Md Montaser Hamid, Yashwanthi Anand, Anita Ruangrotsakun, Delyar Tabatabai, Melissa Perez, Minsuk Kahng

Figure 1 for Visual Exploration of Large-Scale Image Datasets for Machine Learning with Treemaps

Figure 2 for Visual Exploration of Large-Scale Image Datasets for Machine Learning with Treemaps

Figure 3 for Visual Exploration of Large-Scale Image Datasets for Machine Learning with Treemaps

Figure 4 for Visual Exploration of Large-Scale Image Datasets for Machine Learning with Treemaps

Abstract:In this paper, we present DendroMap, a novel approach to interactively exploring large-scale image datasets for machine learning. Machine learning practitioners often explore image datasets by generating a grid of images or projecting high-dimensional representations of images into 2-D using dimensionality reduction techniques (e.g., t-SNE). However, neither approach effectively scales to large datasets because images are ineffectively organized and interactions are insufficiently supported. To address these challenges, we develop DendroMap by adapting Treemaps, a well-known visualization technique. DendroMap effectively organizes images by extracting hierarchical cluster structures from high-dimensional representations of images. It enables users to make sense of the overall distributions of datasets and interactively zoom into specific areas of interests at multiple levels of abstraction. Our case studies with widely-used image datasets for deep learning demonstrate that users can discover insights about datasets and trained models by examining the diversity of images, identifying underperforming subgroups, and analyzing classification errors. We conducted a user study that evaluates the effectiveness of DendroMap in grouping and searching tasks by comparing it with a gridified version of t-SNE and found that participants preferred DendroMap over the compared method.

Via

Access Paper or Ask Questions