Multispectral stereoscopy is an emerging field. A lot of work has been done in classical stereoscopy, but multispectral stereoscopy is not studied as frequently. This type of stereoscopy can be used in autonomous vehicles to complete the information given by RGB cameras. It helps to identify objects in the surroundings when the conditions are more difficult, such as in night scenes. This paper focuses on the RGB-LWIR spectrum. RGB-LWIR stereoscopy has the same challenges as classical stereoscopy, that is occlusions, textureless surfaces and repetitive patterns, plus specific ones related to the different modalities. Finding matches between two spectrums adds another layer of complexity. Color, texture and shapes are more likely to vary from a spectrum to another. To address this additional challenge, this paper focuses on estimating the disparity of people present in a scene. Given the fact that people's shape is captured in both RGB and LWIR, we propose a novel method that uses segmentation masks of the human in both spectrum and than concatenate them to the original images before the first layer of a Siamese Network. This method helps to improve the accuracy, particularly within the one pixel error range.