The data-based discovery of effective, coarse-grained (CG) models of high-dimensional dynamical systems presents a unique challenge in computational physics and particularly in the context of multiscale problems. The present paper offers a probabilistic perspective that simultaneously identifies predictive, lower-dimensional coarse-grained (CG) variables as well as their dynamics. We make use of the expressive ability of deep neural networks in order to represent the right-hand side of the CG evolution law. Furthermore, we demonstrate how domain knowledge that is very often available in the form of physical constraints (e.g. conservation laws) can be incorporated with the novel concept of virtual observables. Such constraints, apart from leading to physically realistic predictions, can significantly reduce the requisite amount of training data which enables reducing the amount of required, computationally expensive multiscale simulations (Small Data regime). The proposed state-space model is trained using probabilistic inference tools and, in contrast to several other techniques, does not require the prescription of a fine-to-coarse (restriction) projection nor time-derivatives of the state variables. The formulation adopted is capable of quantifying the predictive uncertainty as well as of reconstructing the evolution of the full, fine-scale system which allows to select the quantities of interest a posteriori. We demonstrate the efficacy of the proposed framework in a high-dimensional system of moving particles.