Abstract:Extensive research has shown that deep neural networks (DNNs) are vulnerable to slight adversarial perturbations$-$small changes to the input data that appear insignificant but cause the model to produce drastically different outputs. In addition to augmenting training data with adversarial examples generated from a specific attack method, most of the current defense strategies necessitate modifying the original model architecture components to improve robustness or performing test-time data purification to handle adversarial attacks. In this work, we demonstrate that strong feature representation learning during training can significantly enhance the original model's robustness. We propose MOREL, a multi-objective feature representation learning approach, encouraging classification models to produce similar features for inputs within the same class, despite perturbations. Our training method involves an embedding space where cosine similarity loss and multi-positive contrastive loss are used to align natural and adversarial features from the model encoder and ensure tight clustering. Concurrently, the classifier is motivated to achieve accurate predictions. Through extensive experiments, we demonstrate that our approach significantly enhances the robustness of DNNs against white-box and black-box adversarial attacks, outperforming other methods that similarly require no architectural changes or test-time data purification. Our code is available at https://github.com/salomonhotegni/MOREL
Abstract:Recently, there has been an increasing interest in exploring the application of multiobjective optimization (MOO) in machine learning (ML). The interest is driven by the numerous situations in real-life applications where multiple objectives need to be optimized simultaneously. A key aspect of MOO is the existence of a Pareto set, rather than a single optimal solution, which illustrates the inherent trade-offs between objectives. Despite its potential, there is a noticeable lack of satisfactory literature that could serve as an entry-level guide for ML practitioners who want to use MOO. Hence, our goal in this paper is to produce such a resource. We critically review previous studies, particularly those involving MOO in deep learning (using Physics-Informed Neural Networks (PINNs) as a guiding example), and identify misconceptions that highlight the need for a better grasp of MOO principles in ML. Using MOO of PINNs as a case study, we demonstrate the interplay between the data loss and the physics loss terms. We highlight the most common pitfalls one should avoid while using MOO techniques in ML. We begin by establishing the groundwork for MOO, focusing on well-known approaches such as the weighted sum (WS) method, alongside more complex techniques like the multiobjective gradient descent algorithm (MGDA). Additionally, we compare the results obtained from the WS and MGDA with one of the most common evolutionary algorithms, NSGA-II. We emphasize the importance of understanding the specific problem, the objective space, and the selected MOO method, while also noting that neglecting factors such as convergence can result in inaccurate outcomes and, consequently, a non-optimal solution. Our goal is to offer a clear and practical guide for ML practitioners to effectively apply MOO, particularly in the context of DL.
Abstract:We utilize extreme learning machines for the prediction of partial differential equations (PDEs). Our method splits the state space into multiple windows that are predicted individually using a single model. Despite requiring only few data points (in some cases, our method can learn from a single full-state snapshot), it still achieves high accuracy and can predict the flow of PDEs over long time horizons. Moreover, we show how additional symmetries can be exploited to increase sample efficiency and to enforce equivariance.
Abstract:The value function plays a crucial role as a measure for the cumulative future reward an agent receives in both reinforcement learning and optimal control. It is therefore of interest to study how similar the values of neighboring states are, i.e., to investigate the continuity of the value function. We do so by providing and verifying upper bounds on the value function's modulus of continuity. Additionally, we show that the value function is always H\"older continuous under relatively weak assumptions on the underlying system and that non-differentiable value functions can be made differentiable by slightly "disturbing" the system.
Abstract:Sparsity is a highly desired feature in deep neural networks (DNNs) since it ensures numerical efficiency, improves the interpretability of models (due to the smaller number of relevant features), and robustness. In machine learning approaches based on linear models, it is well known that there exists a connecting path between the sparsest solution in terms of the $\ell^1$ norm (i.e., zero weights) and the non-regularized solution, which is called the regularization path. Very recently, there was a first attempt to extend the concept of regularization paths to DNNs by means of treating the empirical loss and sparsity ($\ell^1$ norm) as two conflicting criteria and solving the resulting multiobjective optimization problem. However, due to the non-smoothness of the $\ell^1$ norm and the high number of parameters, this approach is not very efficient from a computational perspective. To overcome this limitation, we present an algorithm that allows for the approximation of the entire Pareto front for the above-mentioned objectives in a very efficient manner. We present numerical examples using both deterministic and stochastic gradients. We furthermore demonstrate that knowledge of the regularization path allows for a well-generalizing network parametrization.
Abstract:The Koopman operator has become an essential tool for data-driven analysis, prediction and control of complex systems, the main reason being the enormous potential of identifying linear function space representations of nonlinear dynamics from measurements. Until now, the situation where for large-scale systems, we (i) only have access to partial observations (i.e., measurements, as is very common for experimental data) or (ii) deliberately perform coarse graining (for efficiency reasons) has not been treated to its full extent. In this paper, we address the pitfall associated with this situation, that the classical EDMD algorithm does not automatically provide a Koopman operator approximation for the underlying system if we do not carefully select the number of observables. Moreover, we show that symmetries in the system dynamics can be carried over to the Koopman operator, which allows us to massively increase the model efficiency. We also briefly draw a connection to domain decomposition techniques for partial differential equations and present numerical evidence using the Kuramoto--Sivashinsky equation.
Abstract:The goal of this paper is to make a strong point for the usage of dynamical models when using reinforcement learning (RL) for feedback control of dynamical systems governed by partial differential equations (PDEs). To breach the gap between the immense promises we see in RL and the applicability in complex engineering systems, the main challenges are the massive requirements in terms of the training data, as well as the lack of performance guarantees. We present a solution for the first issue using a data-driven surrogate model in the form of a convolutional LSTM with actuation. We demonstrate that learning an actuated model in parallel to training the RL agent significantly reduces the total amount of required data sampled from the real system. Furthermore, we show that iteratively updating the model is of major importance to avoid biases in the RL training. Detailed ablation studies reveal the most important ingredients of the modeling process. We use the chaotic Kuramoto-Sivashinsky equation do demonstarte our findings.
Abstract:We present a convolutional framework which significantly reduces the complexity and thus, the computational effort for distributed reinforcement learning control of dynamical systems governed by partial differential equations (PDEs). Exploiting translational invariances, the high-dimensional distributed control problem can be transformed into a multi-agent control problem with many identical, uncoupled agents. Furthermore, using the fact that information is transported with finite velocity in many cases, the dimension of the agents' environment can be drastically reduced using a convolution operation over the state space of the PDE. In this setting, the complexity can be flexibly adjusted via the kernel width or by using a stride greater than one. Moreover, scaling from smaller to larger systems -- or the transfer between different domains -- becomes a straightforward task requiring little effort. We demonstrate the performance of the proposed framework using several PDE examples with increasing complexity, where stabilization is achieved by training a low-dimensional deep deterministic policy gradient agent using minimal computing resources.
Abstract:Data-driven models for nonlinear dynamical systems based on approximating the underlying Koopman operator or generator have proven to be successful tools for forecasting, feature learning, state estimation, and control. It has become well known that the Koopman generators for control-affine systems also have affine dependence on the input, leading to convenient finite-dimensional bilinear approximations of the dynamics. Yet there are still two main obstacles that limit the scope of current approaches for approximating the Koopman generators of systems with actuation. First, the performance of existing methods depends heavily on the choice of basis functions over which the Koopman generator is to be approximated; and there is currently no universal way to choose them for systems that are not measure preserving. Secondly, if we do not observe the full state, we may not gain access to a sufficiently rich collection of such functions to describe the dynamics. This is because the commonly used method of forming time-delayed observables fails when there is actuation. To remedy these issues, we write the dynamics of observables governed by the Koopman generator as a bilinear hidden Markov model, and determine the model parameters using the expectation-maximization (EM) algorithm. The E-step involves a standard Kalman filter and smoother, while the M-step resembles control-affine dynamic mode decomposition for the generator. We demonstrate the performance of this method on three examples, including recovery of a finite-dimensional Koopman-invariant subspace for an actuated system with a slow manifold; estimation of Koopman eigenfunctions for the unforced Duffing equation; and model-predictive control of a fluidic pinball system based only on noisy observations of lift and drag.
Abstract:Many problems in science and engineering require the efficient numerical approximation of integrals, a particularly important application being the numerical solution of initial value problems for differential equations. For complex systems, an equidistant discretization is often inadvisable, as it either results in prohibitively large errors or computational effort. To this end, adaptive schemes have been developed that rely on error estimators based on Taylor series expansions. While these estimators a) rely on strong smoothness assumptions and b) may still result in erroneous steps for complex systems (and thus require step rejection mechanisms), we here propose a data-driven time stepping scheme based on machine learning, and more specifically on reinforcement learning (RL) and meta-learning. First, one or several (in the case of non-smooth or hybrid systems) base learners are trained using RL. Then, a meta-learner is trained which (depending on the system state) selects the base learner that appears to be optimal for the current situation. Several examples including both smooth and non-smooth problems demonstrate the superior performance of our approach over state-of-the-art numerical schemes. The code is available under https://github.com/lueckem/quadrature-ML.