Abstract:Sparse model identification enables nonlinear dynamical system discovery from data. However, the control of false discoveries for sparse model identification is challenging, especially in the low-data and high-noise limit. In this paper, we perform a theoretical study on ensemble sparse model discovery, which shows empirical success in terms of accuracy and robustness to noise. In particular, we analyse the bootstrapping-based sequential thresholding least-squares estimator. We show that this bootstrapping-based ensembling technique can perform a provably correct variable selection procedure with an exponential convergence rate of the error rate. In addition, we show that the ensemble sparse model discovery method can perform computationally efficient uncertainty estimation, compared to expensive Bayesian uncertainty quantification methods via MCMC. We demonstrate the convergence properties and connection to uncertainty quantification in various numerical studies on synthetic sparse linear regression and sparse model discovery. The experiments on sparse linear regression support that the bootstrapping-based sequential thresholding least-squares method has better performance for sparse variable selection compared to LASSO, thresholding least-squares, and bootstrapping-based LASSO. In the sparse model discovery experiment, we show that the bootstrapping-based sequential thresholding least-squares method can provide valid uncertainty quantification, converging to a delta measure centered around the true value with increased sample sizes. Finally, we highlight the improved robustness to hyperparameter selection under shifting noise and sparsity levels of the bootstrapping-based sequential thresholding least-squares method compared to other sparse regression methods.
Abstract:Recent progress in autoencoder-based sparse identification of nonlinear dynamics (SINDy) under $\ell_1$ constraints allows joint discoveries of governing equations and latent coordinate systems from spatio-temporal data, including simulated video frames. However, it is challenging for $\ell_1$-based sparse inference to perform correct identification for real data due to the noisy measurements and often limited sample sizes. To address the data-driven discovery of physics in the low-data and high-noise regimes, we propose Bayesian SINDy autoencoders, which incorporate a hierarchical Bayesian sparsifying prior: Spike-and-slab Gaussian Lasso. Bayesian SINDy autoencoder enables the joint discovery of governing equations and coordinate systems with a theoretically guaranteed uncertainty estimate. To resolve the challenging computational tractability of the Bayesian hierarchical setting, we adapt an adaptive empirical Bayesian method with Stochatic gradient Langevin dynamics (SGLD) which gives a computationally tractable way of Bayesian posterior sampling within our framework. Bayesian SINDy autoencoder achieves better physics discovery with lower data and fewer training epochs, along with valid uncertainty quantification suggested by the experimental studies. The Bayesian SINDy autoencoder can be applied to real video data, with accurate physics discovery which correctly identifies the governing equation and provides a close estimate for standard physics constants like gravity $g$, for example, in videos of a pendulum.