Abstract:Most machine learning (ML) systems assume stationary and matching data distributions during training and deployment. This is often a false assumption. When ML models are deployed on real devices, data distributions often shift over time due to changes in environmental factors, sensor characteristics, and task-of-interest. While it is possible to have a human-in-the-loop to monitor for distribution shifts and engineer new architectures in response to these shifts, such a setup is not cost-effective. Instead, non-stationary automated ML (AutoML) models are needed. This paper presents the Encoder-Adaptor-Reconfigurator (EAR) framework for efficient continual learning under domain shifts. The EAR framework uses a fixed deep neural network (DNN) feature encoder and trains shallow networks on top of the encoder to handle novel data. The EAR framework is capable of 1) detecting when new data is out-of-distribution (OOD) by combining DNNs with hyperdimensional computing (HDC), 2) identifying low-parameter neural adaptors to adapt the model to the OOD data using zero-shot neural architecture search (ZS-NAS), and 3) minimizing catastrophic forgetting on previous tasks by progressively growing the neural architecture as needed and dynamically routing data through the appropriate adaptors and reconfigurators for handling domain-incremental and class-incremental continual learning. We systematically evaluate our approach on several benchmark datasets for domain adaptation and demonstrate strong performance compared to state-of-the-art algorithms for OOD detection and few-/zero-shot NAS.
Abstract:The ability for computational agents to reason about the high-level content of real world scene images is important for many applications. Existing attempts at addressing the problem of complex scene understanding lack representational power, efficiency, and the ability to create robust meta-knowledge about scenes. In this paper, we introduce scenarios as a new way of representing scenes. The scenario is a simple, low-dimensional, data-driven representation consisting of sets of frequently co-occurring objects and is useful for a wide range of scene understanding tasks. We learn scenarios from data using a novel matrix factorization method which we integrate into a new neural network architecture, the ScenarioNet. Using ScenarioNet, we can recover semantic information about real world scene images at three levels of granularity: 1) scene categories, 2) scenarios, and 3) objects. Training a single ScenarioNet model enables us to perform scene classification, scenario recognition, multi-object recognition, content-based scene image retrieval, and content-based image comparison. In addition to solving many tasks in a single, unified framework, ScenarioNet is more computationally efficient than other CNNs because it requires significantly fewer parameters while achieving similar performance on benchmark tasks and is more interpretable because it produces explanations when making decisions. We validate the utility of scenarios and ScenarioNet on a diverse set of scene understanding tasks on several benchmark datasets.