Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Arko Banerjee

Dynamic Model Predictive Shielding for Provably Safe Reinforcement Learning

May 22, 2024

Arko Banerjee, Kia Rahmani, Joydeep Biswas, Isil Dillig

Abstract:Among approaches for provably safe reinforcement learning, Model Predictive Shielding (MPS) has proven effective at complex tasks in continuous, high-dimensional state spaces, by leveraging a backup policy to ensure safety when the learned policy attempts to take risky actions. However, while MPS can ensure safety both during and after training, it often hinders task progress due to the conservative and task-oblivious nature of backup policies. This paper introduces Dynamic Model Predictive Shielding (DMPS), which optimizes reinforcement learning objectives while maintaining provable safety. DMPS employs a local planner to dynamically select safe recovery actions that maximize both short-term progress as well as long-term rewards. Crucially, the planner and the neural policy play a synergistic role in DMPS. When planning recovery actions for ensuring safety, the planner utilizes the neural policy to estimate long-term rewards, allowing it to observe beyond its short-term planning horizon. Conversely, the neural policy under training learns from the recovery plans proposed by the planner, converging to policies that are both high-performing and safe in practice. This approach guarantees safety during and after training, with bounded recovery regret that decreases exponentially with planning horizon depth. Experimental results demonstrate that DMPS converges to policies that rarely require shield interventions after training and achieve higher rewards compared to several state-of-the-art baselines.

Via

Access Paper or Ask Questions

ESDF: Ensemble Selection using Diversity and Frequency

Aug 18, 2015

Shouvick Mondal, Arko Banerjee

Figure 1 for ESDF: Ensemble Selection using Diversity and Frequency

Figure 2 for ESDF: Ensemble Selection using Diversity and Frequency

Figure 3 for ESDF: Ensemble Selection using Diversity and Frequency

Figure 4 for ESDF: Ensemble Selection using Diversity and Frequency

Abstract:Recently ensemble selection for consensus clustering has emerged as a research problem in Machine Intelligence. Normally consensus clustering algorithms take into account the entire ensemble of clustering, where there is a tendency of generating a very large size ensemble before computing its consensus. One can avoid considering the entire ensemble and can judiciously select few partitions in the ensemble without compromising on the quality of the consensus. This may result in an efficient consensus computation technique and may save unnecessary computational overheads. The ensemble selection problem addresses this issue of consensus clustering. In this paper, we propose an efficient method of ensemble selection for a large ensemble. We prioritize the partitions in the ensemble based on diversity and frequency. Our method selects top K of the partitions in order of priority, where K is decided by the user. We observe that considering jointly the diversity and frequency helps in identifying few representative partitions whose consensus is qualitatively better than the consensus of the entire ensemble. Experimental analysis on a large number of datasets shows our method gives better results than earlier ensemble selection methods.

* Conference: National Conference on Research Trends in Computer Science and Application (NCRTCSA-2014) Date: 8th February 2014 Organized by: Dept. of Computer Application, Siliguri Institute of Technology, India In Association With: Computer Society of India, Siliguri Chapter Technically Sponsored By: IEEE, Kolkata Section Paper Id: NCRTCSA118. Shouvick Mondal et al.; ESDF: Ensemble Selection using Diversity and Frequency; Proceedings of NCRTCSA 2014; pp. 28-33, 2014

Via

Access Paper or Ask Questions

Consensus Sequence Segmentation

Dec 30, 2013

Tamal Chowdhury, Rabindra Rakshit, Arko Banerjee

Abstract:In this paper we introduce a method to detect words or phrases in a given sequence of alphabets without knowing the lexicon. Our linear time unsupervised algorithm relies entirely on statistical relationships among alphabets in the input sequence to detect location of word boundaries. We compare our algorithm to previous approaches from unsupervised sequence segmentation literature and provide superior segmentation over number of benchmarks.

* This paper has been withdrawn by the authors. The paper has been withdrawn due to error data input in table no. 1

Via

Access Paper or Ask Questions