Recent progress in autoencoder-based sparse identification of nonlinear dynamics (SINDy) under $\ell_1$ constraints allows joint discoveries of governing equations and latent coordinate systems from spatio-temporal data, including simulated video frames. However, it is challenging for $\ell_1$-based sparse inference to perform correct identification for real data due to the noisy measurements and often limited sample sizes. To address the data-driven discovery of physics in the low-data and high-noise regimes, we propose Bayesian SINDy autoencoders, which incorporate a hierarchical Bayesian sparsifying prior: Spike-and-slab Gaussian Lasso. Bayesian SINDy autoencoder enables the joint discovery of governing equations and coordinate systems with a theoretically guaranteed uncertainty estimate. To resolve the challenging computational tractability of the Bayesian hierarchical setting, we adapt an adaptive empirical Bayesian method with Stochatic gradient Langevin dynamics (SGLD) which gives a computationally tractable way of Bayesian posterior sampling within our framework. Bayesian SINDy autoencoder achieves better physics discovery with lower data and fewer training epochs, along with valid uncertainty quantification suggested by the experimental studies. The Bayesian SINDy autoencoder can be applied to real video data, with accurate physics discovery which correctly identifies the governing equation and provides a close estimate for standard physics constants like gravity $g$, for example, in videos of a pendulum.