Diminished reality is a technology that aims to remove objects from video images and fills in the missing region with plausible pixels. Most conventional methods utilize the different cameras that capture the same scene from different viewpoints to allow regions to be removed and restored. In this paper, we propose an RGB-D image inpainting method using generative adversarial network, which does not require multiple cameras. Recently, an RGB image inpainting method has achieved outstanding results by employing a generative adversarial network. However, RGB inpainting methods aim to restore only the texture of the missing region and, therefore, does not recover geometric information (i.e, 3D structure of the scene). We expand conventional image inpainting method to RGB-D image inpainting to jointly restore the texture and geometry of missing regions from a pair of RGB and depth images. Inspired by other tasks that use RGB and depth images (e.g., semantic segmentation and object detection), we propose late fusion approach that exploits the advantage of RGB and depth information each other. The experimental results verify the effectiveness of our proposed method.