Abstract:Reinforcement learning algorithms typically necessitate extensive exploration of the state space to find optimal policies. However, in safety-critical applications, the risks associated with such exploration can lead to catastrophic consequences. Existing safe exploration methods attempt to mitigate this by imposing constraints, which often result in overly conservative behaviours and inefficient learning. Heavy penalties for early constraint violations can trap agents in local optima, deterring exploration of risky yet high-reward regions of the state space. To address this, we introduce a method that explicitly learns state-conditioned safety representations. By augmenting the state features with these safety representations, our approach naturally encourages safer exploration without being excessively cautious, resulting in more efficient and safer policy learning in safety-critical scenarios. Empirical evaluations across diverse environments show that our method significantly improves task performance while reducing constraint violations during training, underscoring its effectiveness in balancing exploration with safety.
Abstract:Autonomous service mobile robots need to consistently, accurately, and robustly localize in human environments despite changes to such environments over time. Episodic non-Markov Localization addresses the challenge of localization in such changing environments by classifying observations as arising from Long-Term, Short-Term, or Dynamic Features. However, in order to do so, EnML relies on an estimate of the Long-Term Vector Map (LTVM) that does not change over time. In this paper, we introduce a recursive algorithm to build and update the LTVM over time by reasoning about visibility constraints of objects observed over multiple robot deployments. We use a signed distance function (SDF) to filter out observations of short-term and dynamic features from multiple deployments of the robot. The remaining long-term observations are used to build a vector map by robust local linear regression. The uncertainty in the resulting LTVM is computed via Monte Carlo resampling the observations arising from long-term features. By combining occupancy-grid based SDF filtering of observations with continuous space regression of the filtered observations, our proposed approach builds, updates, and amends LTVMs over time, reasoning about all observations from all robot deployments in an environment. We present experimental results demonstrating the accuracy, robustness, and compact nature of the extracted LTVMs from several long-term robot datasets.