We present a diffeomorphic image registration algorithm to learn spatial transformations between pairs of images to be registered using fully convolutional networks (FCNs) under a self-supervised learning setting. The network is trained to estimate diffeomorphic spatial transformations between pairs of images by maximizing an image-wise similarity metric between fixed and warped moving images, similar to conventional image registration algorithms. It is implemented in a multi-resolution image registration framework to optimize and learn spatial transformations at different image resolutions jointly and incrementally with deep self-supervision in order to better handle large deformation between images. A spatial Gaussian smoothing kernel is integrated with the FCNs to yield sufficiently smooth deformation fields to achieve diffeomorphic image registration. Particularly, spatial transformations learned at coarser resolutions are utilized to warp the moving image, which is subsequently used for learning incremental transformations at finer resolutions. This procedure proceeds recursively to the full image resolution and the accumulated transformations serve as the final transformation to warp the moving image at the finest resolution. Experimental results for registering high resolution 3D structural brain magnetic resonance (MR) images have demonstrated that image registration networks trained by our method obtain robust, diffeomorphic image registration results within seconds with improved accuracy compared with state-of-the-art image registration algorithms.