This paper addresses the problem of visible (RGB) to Near-Infrared (NIR) image fusion. Multispectral imaging is an important task relevant to image processing and computer vision, even more, since the development of the RGBT sensor. While the visible image sees color and suffers from noise, haze, and clouds, the NIR channel captures a clearer picture and it is significantly required by applications such as dehazing or object detection. The proposed approach fuses these two aligned channels by training a Convolutional-Neural-Network (CNN) by a Self-Supervised-Learning (SSL) on a single example. For each such pair, RGB and IR, the network is trained for seconds to deduce the final fusion. The SSL is based on Sturcture-of-Similarity (SSIM) loss combined with Edge-Preservation (EP) loss. The labels for the SSL are the input channels themselves. This fusion preserves the relevant detail of each spectral channel while not based on a heavy training process. In the experiments section, the proposed approach achieves better qualitative and quantitative multispectral fusion results with respect to other recent methods, that are not based on large dataset training.