Gaussian Processes (GPs) are powerful kernelized methods for non-parameteric regression used in many applications. However, their plain usage is limited to a few thousand of training samples due to their cubic time complexity. In order to scale GPs to larger datasets, several sparse approximations based on so-called inducing points have been proposed in the literature. The majority of previous work has focused on the batch setting, whereas in this work we focusing on the training with mini-batches. In particular, we investigate the connection between a general class of sparse inducing point GP regression methods and Bayesian recursive estimation which enables Kalman Filter and Information Filter like updating for online learning. Moreover, exploiting ideas from distributed estimation, we show how our approach can be distributed. For unknown parameters, we propose a novel approach that relies on recursively propagating the analytical gradients of the posterior over mini-batches of the data. Compared to state of the art methods, we have analytic updates for the mean and covariance of the posterior, thus reducing drastically the size of the optimization problem. We show that our method achieves faster convergence and superior performance compared to state of the art sequential Gaussian Process regression on synthetic GP as well as real-world data with up to a million of data samples.