Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Vicenc Rubies-Royo

Expert Selection in High-Dimensional Markov Decision Processes

Oct 26, 2020

Vicenc Rubies-Royo, Eric Mazumdar, Roy Dong, Claire Tomlin, S. Shankar Sastry

Figure 1 for Expert Selection in High-Dimensional Markov Decision Processes

Figure 2 for Expert Selection in High-Dimensional Markov Decision Processes

Figure 3 for Expert Selection in High-Dimensional Markov Decision Processes

Figure 4 for Expert Selection in High-Dimensional Markov Decision Processes

Abstract:In this work we present a multi-armed bandit framework for online expert selection in Markov decision processes and demonstrate its use in high-dimensional settings. Our method takes a set of candidate expert policies and switches between them to rapidly identify the best performing expert using a variant of the classical upper confidence bound algorithm, thus ensuring low regret in the overall performance of the system. This is useful in applications where several expert policies may be available, and one needs to be selected at run-time for the underlying environment.

* In proceedings of the 59th IEEE Conference on Decision and Control 2020. arXiv admin note: text overlap with arXiv:1707.05714

Via

Access Paper or Ask Questions

An Iterative Quadratic Method for General-Sum Differential Games with Feedback Linearizable Dynamics

Oct 01, 2019

David Fridovich-Keil, Vicenc Rubies-Royo, Claire J. Tomlin

Figure 1 for An Iterative Quadratic Method for General-Sum Differential Games with Feedback Linearizable Dynamics

Figure 2 for An Iterative Quadratic Method for General-Sum Differential Games with Feedback Linearizable Dynamics

Figure 3 for An Iterative Quadratic Method for General-Sum Differential Games with Feedback Linearizable Dynamics

Figure 4 for An Iterative Quadratic Method for General-Sum Differential Games with Feedback Linearizable Dynamics

Abstract:Iterative linear-quadratic (ILQ) methods are widely used in the nonlinear optimal control community. Recent work has applied similar methodology in the setting of multi-player general-sum differential games. Here, ILQ methods are capable of finding local Nash equilibria in interactive motion planning problems in real-time. As in most iterative procedures, however, this approach can be sensitive to initial conditions and hyperparameter choices, which can result in poor computational performance or even unsafe trajectories. In this paper, we focus our attention on a broad class of dynamical systems which are feedback linearizable, and exploit this structure to improve both algorithmic reliability and runtime. We showcase our new algorithm in three distinct traffic scenarios, and observe that in practice our method converges significantly more often and more quickly than was possible without exploiting the feedback linearizable structure.

* 7 pages, 5 figures, submitted to International Conference on Robotics and Automation (2020)

Via

Access Paper or Ask Questions

Fast Neural Network Verification via Shadow Prices

Mar 08, 2019

Vicenc Rubies-Royo, Roberto Calandra, Dusan M. Stipanovic, Claire Tomlin

Figure 1 for Fast Neural Network Verification via Shadow Prices

Figure 2 for Fast Neural Network Verification via Shadow Prices

Figure 3 for Fast Neural Network Verification via Shadow Prices

Figure 4 for Fast Neural Network Verification via Shadow Prices

Abstract:To use neural networks in safety-critical settings it is paramount to provide assurances on their runtime operation. Recent work on ReLU networks has sought to verify whether inputs belonging to a bounded box can ever yield some undesirable output. Input-splitting procedures, a particular type of verification mechanism, do so by recursively partitioning the input set into smaller sets. The efficiency of these methods is largely determined by the number of splits the box must undergo before the property can be verified. In this work, we propose a new technique based on shadow prices that fully exploits the information of the problem yielding a more efficient generation of splits than the state-of-the-art. Results on the Airborne Collision Avoidance System (ACAS) benchmark verification tasks show a considerable reduction in the partitions generated which substantially reduces computation times. These results open the door to improved verification methods for a wide variety of machine learning applications including vision and control.

Via

Access Paper or Ask Questions

Classification-based Approximate Reachability with Guarantees Applied to Safe Trajectory Tracking

Mar 08, 2018

Vicenc Rubies-Royo, David Fridovich-Keil, Sylvia Herbert, Claire J. Tomlin

Figure 1 for Classification-based Approximate Reachability with Guarantees Applied to Safe Trajectory Tracking

Figure 2 for Classification-based Approximate Reachability with Guarantees Applied to Safe Trajectory Tracking

Figure 3 for Classification-based Approximate Reachability with Guarantees Applied to Safe Trajectory Tracking

Figure 4 for Classification-based Approximate Reachability with Guarantees Applied to Safe Trajectory Tracking

Abstract:Hamilton-Jacobi (HJ) reachability analysis has been developed over the past decades into a widely-applicable tool for determining goal satisfaction and safety verification in nonlinear systems. While HJ reachability can be formulated very generally, computational complexity can be a serious impediment for many systems of practical interest. Much prior work has been devoted to computing approximate solutions to large reachability problems, yet many of these methods apply to only restricted problem classes, do not generate controllers, and/or are extremely conservative. In this paper, we present a novel approach to approximate HJ reachability in which computing an optimal controller is viewed as a sequential classification problem. Even though we employ neural networks for this classification task, our method still provides safety guarantees in many cases. We demonstrate the utility of our approach in the context of safe trajectory following with specific application to quadrotor navigation. Offline computation and online evaluation confirm that our method preserves safety.

Via

Access Paper or Ask Questions