The mass, or binding energy, is the basis property of the atomic nucleus. It determines its stability, and reaction and decay rates. Quantifying the nuclear binding is important for understanding the origin of elements in the universe. The astrophysical processes responsible for the nucleosynthesis in stars often take place far from the valley of stability, where experimental masses are not known. In such cases, missing nuclear information must be provided by theoretical predictions using extreme extrapolations. Bayesian machine learning techniques can be applied to improve predictions by taking full advantage of the information contained in the deviations between experimental and calculated masses. We consider 10 global models based on nuclear Density Functional Theory as well as two more phenomenological mass models. The emulators of S2n residuals and credibility intervals defining theoretical error bars are constructed using Bayesian Gaussian processes and Bayesian neural networks. We consider a large training dataset pertaining to nuclei whose masses were measured before 2003. For the testing datasets, we considered those exotic nuclei whose masses have been determined after 2003. We then carried out extrapolations towards the 2n dripline. While both Gaussian processes and Bayesian neural networks reduce the rms deviation from experiment significantly, GP offers a better and much more stable performance. The increase in the predictive power is quite astonishing: the resulting rms deviations from experiment on the testing dataset are similar to those of more phenomenological models. The empirical coverage probability curves we obtain match very well the reference values which is highly desirable to ensure honesty of uncertainty quantification, and the estimated credibility intervals on predictions make it possible to evaluate predictive power of individual models.