Abstract:Complicated first principles modelling and controller synthesis can be prohibitively slow and expensive for high-mix, low-volume products such as hydraulic excavators. Instead, in a data-driven approach, recorded trajectories from the real system can be used to train local model networks (LMNs), for which feedforward controllers are derived via feedback linearization. However, previous works required LMNs without zero dynamics for feedback linearization, which restricts the model structure and thus modelling capacity of LMNs. In this paper, we overcome this restriction by providing a criterion for when feedback linearization of LMNs with zero dynamics yields a valid controller. As a criterion we propose the bounded-input bounded-output stability of the resulting controller. In two additional contributions, we extend this approach to consider measured disturbance signals and multiple inputs and outputs. We illustrate the effectiveness of our contributions in a hydraulic excavator control application with hardware experiments. To this end, we train LMNs from recorded, noisy data and derive feedforward controllers used as part of a leveling assistance system on the excavator. In our experiments, incorporating disturbance signals and multiple inputs and outputs enhances tracking performance of the learned controller. A video of our experiments is available at https://youtu.be/lrrWBx2ASaE.
Abstract:The manipulation of deformable linear objects (DLOs) via model-based control requires an accurate and computationally efficient dynamics model. Yet, data-driven DLO dynamics models require large training data sets while their predictions often do not generalize, whereas physics-based models rely on good approximations of physical phenomena and often lack accuracy. To address these challenges, we propose a physics-informed neural ODE capable of predicting agile movements with significantly less data and hyper-parameter tuning. In particular, we model DLOs as serial chains of rigid bodies interconnected by passive elastic joints in which interaction forces are predicted by neural networks. The proposed model accurately predicts the motion of an robotically-actuated aluminium rod and an elastic foam cylinder after being trained on only thirty seconds of data. The project code and data are available at: \url{https://tinyurl.com/neuralprba}
Abstract:Combining Reinforcement Learning (RL) with a prior controller can yield the best out of two worlds: RL can solve complex nonlinear problems, while the control prior ensures safer exploration and speeds up training. Prior work largely blends both components with a fixed weight, neglecting that the RL agent's performance varies with the training progress and across regions in the state space. Therefore, we advocate for an adaptive strategy that dynamically adjusts the weighting based on the RL agent's current capabilities. We propose a new adaptive hybrid RL algorithm, Contextualized Hybrid Ensemble Q-learning (CHEQ). CHEQ combines three key ingredients: (i) a time-invariant formulation of the adaptive hybrid RL problem treating the adaptive weight as a context variable, (ii) a weight adaption mechanism based on the parametric uncertainty of a critic ensemble, and (iii) ensemble-based acceleration for data-efficient RL. Evaluating CHEQ on a car racing task reveals substantially stronger data efficiency, exploration safety, and transferability to unknown scenarios than state-of-the-art adaptive hybrid RL methods.
Abstract:There has been significant progress in deep reinforcement learning (RL) in recent years. Nevertheless, finding suitable hyperparameter configurations and reward functions remains challenging even for experts, and performance heavily relies on these design choices. Also, most RL research is conducted on known benchmarks where knowledge about these choices already exists. However, novel practical applications often pose complex tasks for which no prior knowledge about good hyperparameters and reward functions is available, thus necessitating their derivation from scratch. Prior work has examined automatically tuning either hyperparameters or reward functions individually. We demonstrate empirically that an RL algorithm's hyperparameter configurations and reward function are often mutually dependent, meaning neither can be fully optimised without appropriate values for the other. We then propose a methodology for the combined optimisation of hyperparameters and the reward function. Furthermore, we include a variance penalty as an optimisation objective to improve the stability of learned policies. We conducted extensive experiments using Proximal Policy Optimisation and Soft Actor-Critic on four environments. Our results show that combined optimisation significantly improves over baseline performance in half of the environments and achieves competitive performance in the others, with only a minor increase in computational costs. This suggests that combined optimisation should be best practice.
Abstract:The consistency of a learning method is usually established under the assumption that the observations are a realization of an independent and identically distributed (i.i.d.) or mixing process. Yet, kernel methods such as support vector machines (SVMs), Gaussian processes, or conditional kernel mean embeddings (CKMEs) all give excellent performance under sampling schemes that are obviously non-i.i.d., such as when data comes from a dynamical system. We propose the new notion of empirical weak convergence (EWC) as a general assumption explaining such phenomena for kernel methods. It assumes the existence of a random asymptotic data distribution and is a strict weakening of previous assumptions in the field. Our main results then establish consistency of SVMs, kernel mean embeddings, and general Hilbert-space valued empirical expectations with EWC data. Our analysis holds for both finite- and infinite-dimensional outputs, as we extend classical results of statistical learning to the latter case. In particular, it is also applicable to CKMEs. Overall, our results open new classes of processes to statistical learning and can serve as a foundation for a theory of learning beyond i.i.d. and mixing.
Abstract:Dyna-style model-based reinforcement learning (MBRL) combines model-free agents with predictive transition models through model-based rollouts. This combination raises a critical question: 'When to trust your model?'; i.e., which rollout length results in the model providing useful data? Janner et al. (2019) address this question by gradually increasing rollout lengths throughout the training. While theoretically tempting, uniform model accuracy is a fallacy that collapses at the latest when extrapolating. Instead, we propose asking the question 'Where to trust your model?'. Using inherent model uncertainty to consider local accuracy, we obtain the Model-Based Actor-Critic with Uncertainty-Aware Rollout Adaption (MACURA) algorithm. We propose an easy-to-tune rollout mechanism and demonstrate substantial improvements in data efficiency and performance compared to state-of-the-art deep MBRL methods on the MuJoCo benchmark.
Abstract:We consider a distributed learning problem, where agents minimize a global objective function by exchanging information over a network. Our approach has two distinct features: (i) It substantially reduces communication by triggering communication only when necessary, and (ii) it is agnostic to the data-distribution among the different agents. We can therefore guarantee convergence even if the local data-distributions of the agents are arbitrarily distinct. We analyze the convergence rate of the algorithm and derive accelerated convergence rates in a convex setting. We also characterize the effect of communication drops and demonstrate that our algorithm is robust to communication failures. The article concludes by presenting numerical results from a distributed LASSO problem, and distributed learning tasks on MNIST and CIFAR-10 datasets. The experiments underline communication savings of 50% or more due to the event-based communication strategy, show resilience towards heterogeneous data-distributions, and highlight that our approach outperforms common baselines such as FedAvg, FedProx, and FedADMM.
Abstract:Model Predictive Control (MPC) is a method to control nonlinear systems with guaranteed stability and constraint satisfaction but suffers from high computation times. Approximate MPC (AMPC) with neural networks (NNs) has emerged to address this limitation, enabling deployment on resource-constrained embedded systems. However, when tuning AMPCs for real-world systems, large datasets need to be regenerated and the NN needs to be retrained at every tuning step. This work introduces a novel, parameter-adaptive AMPC architecture capable of online tuning without recomputing large datasets and retraining. By incorporating local sensitivities of nonlinear programs, the proposed method not only mimics optimal MPC inputs but also adjusts to changes in physical parameters of the model using linear predictions while still guaranteeing stability. We showcase the effectiveness of parameter-adaptive AMPC by controlling the swing-ups of two different real cartpole systems with a severely resource-constrained microcontroller (MCU). We use the same NN across both system instances that have different parameters. This work not only represents the first experimental demonstration of AMPC for fast-moving systems on low-cost MCUs to the best of our knowledge, but also showcases generalization across system instances and variations through our parameter-adaptation method. Taken together, these contributions represent a marked step toward the practical application of AMPC in real-world systems.
Abstract:Optimizing an unknown function under safety constraints is a central task in robotics, biomedical engineering, and many other disciplines, and increasingly safe Bayesian Optimization (BO) is used for this. Due to the safety critical nature of these applications, it is of utmost importance that theoretical safety guarantees for these algorithms translate into the real world. In this work, we investigate three safety-related issues of the popular class of SafeOpt-type algorithms. First, these algorithms critically rely on frequentist uncertainty bounds for Gaussian Process (GP) regression, but concrete implementations typically utilize heuristics that invalidate all safety guarantees. We provide a detailed analysis of this problem and introduce Real-\b{eta}-SafeOpt, a variant of the SafeOpt algorithm that leverages recent GP bounds and thus retains all theoretical guarantees. Second, we identify assuming an upper bound on the reproducing kernel Hilbert space (RKHS) norm of the target function, a key technical assumption in SafeOpt-like algorithms, as a central obstacle to real-world usage. To overcome this challenge, we introduce the Lipschitz-only Safe Bayesian Optimization (LoSBO) algorithm, which guarantees safety without an assumption on the RKHS bound, and empirically show that this algorithm is not only safe, but also exhibits superior performance compared to the state-of-the-art on several function classes. Third, SafeOpt and derived algorithms rely on a discrete search space, making them difficult to apply to higher-dimensional problems. To widen the applicability of these algorithms, we introduce Lipschitz-only GP-UCB (LoS-GP-UCB), a variant of LoSBO applicable to moderately high-dimensional problems, while retaining safety.
Abstract:In this paper, we address the problem of automatically approximating nonlinear model predictive control (MPC) schemes with closed-loop guarantees. First, we discuss how this problem can be reduced to a function approximation problem, which we then tackle by proposing ALKIA-X, the Adaptive and Localized Kernel Interpolation Algorithm with eXtrapolated reproducing kernel Hilbert space norm. ALKIA-X is a non-iterative algorithm that ensures numerically well-conditioned computations, a fast-to-evaluate approximating function, and the guaranteed satisfaction of any desired bound on the approximation error. Hence, ALKIA-X automatically computes an explicit function that approximates the MPC, yielding a controller suitable for safety-critical systems and high sampling rates. In a numerical experiment, we apply ALKIA-X to a nonlinear MPC scheme, demonstrating reduced offline computation and online evaluation time compared to a state-of-the-art method.