Abstract:We introduce PACOH-RL, a novel model-based Meta-Reinforcement Learning (Meta-RL) algorithm designed to efficiently adapt control policies to changing dynamics. PACOH-RL meta-learns priors for the dynamics model, allowing swift adaptation to new dynamics with minimal interaction data. Existing Meta-RL methods require abundant meta-learning data, limiting their applicability in settings such as robotics, where data is costly to obtain. To address this, PACOH-RL incorporates regularization and epistemic uncertainty quantification in both the meta-learning and task adaptation stages. When facing new dynamics, we use these uncertainty estimates to effectively guide exploration and data collection. Overall, this enables positive transfer, even when access to data from prior tasks or dynamic settings is severely limited. Our experiment results demonstrate that PACOH-RL outperforms model-based RL and model-based Meta-RL baselines in adapting to new dynamic conditions. Finally, on a real robotic car, we showcase the potential for efficient RL policy adaptation in diverse, data-scarce conditions.
Abstract:In this report, we provide a comparative analysis of different techniques for user intent classification towards the task of app recommendation. We analyse the performance of different models and architectures for multi-label classification over a dataset with a relative large number of classes and only a handful examples of each class. We focus, in particular, on memory network architectures, and compare how well the different versions perform under the task constraints. Since the classifier is meant to serve as a module in a practical dialog system, it needs to be able to work with limited training data and incorporate new data on the fly. We devise a 1-shot learning task to test the models under the above constraint. We conclude that relatively simple versions of memory networks perform better than other approaches. Although, for tasks with very limited data, simple non-parametric methods perform comparably, without needing the extra training data.
Abstract:Developments in semantic web technologies have promoted ontological encoding of knowledge from diverse domains. However, modelling many practical domains requires more expressive representations schemes than what the standard description logics(DLs) support. We extend the DL SROIQ with constraint networks and grounded circumscription. Applications of constraint modelling include embedding ontologies with temporal or spatial information, while grounded circumscription allows defeasible inference and closed world reasoning. This paper overcomes restrictions on existing constraint modelling approaches by introducing expressive constructs. Grounded circumscription allows concept and role minimization and is decidable for DL. We provide a general and intuitive algorithm for the framework of grounded circumscription that can be applied to a whole range of logics. We present the resulting logic: GC-SROIQ(C), and describe a tableau decision procedure for it.
Abstract:Developments in semantic web technologies have promoted ontological encoding of knowledge from diverse domains. However, modelling many practical domains requires more expressiveness than what the standard description logics (most prominently SROIQ) support. In this paper, we extend the expressive DL SROIQ with constraint networks (resulting in the logic SROIQc) and grounded circumscription (resulting in the logic GC-SROIQ). Applications of constraint modelling include embedding ontologies with temporal or spatial information, while those of grounded circumscription include defeasible inference and closed world reasoning. We describe the syntax and semantics of the logic formed by including constraint modelling constructs in SROIQ, and provide a sound, complete and terminating tableau algorithm for it. We further provide an intuitive algorithm for Grounded Circumscription in SROIQc, which adheres to the general framework of grounded circumscription, and which can be applied to a whole range of expressive logics for which no such specific algorithm presently exists.