Augmented Reality (AR) applications necessitates methods of inserting needed objects into scenes captured by cameras in a way that is coherent with the surroundings. Common AR applications require the insertion of predefined 3D objects with known properties and shape. This simplifies the problem since it is reduced to extracting an illumination model for the object in that scene by understanding the surrounding light sources. However, it is often not the case that we have information about the properties of an object, especially when we depart from a single source image. Our method renders such source fragments in a coherent way with the target surroundings using only these two images. Our pipeline uses a Deep Image Prior (DIP) network based on a U-Net architecture as the main renderer, alongside robust-feature extracting networks that are used to apply needed losses. Our method does not require any pair-labeled data, and no extensive training on a dataset. We compare our method using qualitative metrics to the baseline methods such as Cut and Paste, Cut And Paste Neural Rendering, and Image Harmonization